HTR Models - Update

Since our last post about our HTR models, we have continued our efforts to make the HTR models for the district court records better and more accurate. We have transcribed over 2000 more images of GT, broadened the GT to also include images from the main records, and have trained nine models that are more experimental and one final model.

Our final model is trained with approximately 2700 images, or 1 226 202 words, of training data. To put that word count into perspective, that word count equals the word count of the entire Harry Potter book series, and there would be enough words over to write The Philosopher’s Stone and one fourth of The Chamber of Secrets again.

Using Transkribus’ compare samples tool, which calculates the likely interval of the model’s CER in the collection the sample is of, we calculated the final model’s CER on the district court records falls between 5.1% and 8.4% (compared to the 9.3%-13.8% range of the model talked about in the previous blog post). The final…

Turku Book Fair

Keyword spotting – an effective search tool

READ project’s final meeting and the Transkribus seminar in Helsinki

Kustaa Vaasa Seminar: Court Book Search Engines and outlooks on digital history