Since our last post about our HTR models, we have continued our efforts to make the HTR models for the district court records better and more accurate. We have transcribed over 2000 more images of GT, broadened the GT to also include images from the main records, and have trained nine models that are more experimental and one final model. Our final model is trained with approximately 2700 images, or 1 226 202 words, of training data. To put that word count into perspective, that word count equals the word count of the entire Harry Potter book series, and there would be enough words over to write The Philosopher’s Stone and one fourth of The Chamber of Secrets again. Using Transkribus’ compare samples tool, which calculates the likely interval of the model’s CER in the collection the sample is of, we calculated the final model’s CER on the district court records falls between 5.1% and 8.4% (compared to the 9.3%-13.8% range of the model talked about in the previous blog post). The fi
- Get link
- X
- Other Apps