One of our main goals at the National Archive of Finland as a participant in the READ project is to develop HTR (Handwritten Text Recognition) model that can read 19th century Renovated District Court Records (in Swedish: renovedare domböckerna; in Finnish: renovoidut tuomiokirjat). They are written mostly in Swedish but some parts are in Finnish. Currently the HTR models made with the most training data have a CER (Character Error Rate) of around 3 %, so we have almost reached our goal. However, we still need to improve our HTR model’s ability to read court records from the early 19th century.
In this blog post, I will introduce the Renovated Court Records to our readers. How have they come into existence? What do they contain? Why did we choose them to be our pilot material in the READ project?
Volumes of Renovated Court Records from the rural district of Heinola |
Court Records from the geographical area of Finland start as a continuous archival series from the year 1623, when the Turku court of appeal was founded. Renovated Court Records were made because the lower courts (lawman courts, rural district courts and city courts) were obligated to produce a copy of the court records to the court of appeal for review. The original records remained in the archives of the lower courts and only some of them have survived. Nevertheless, Renovated Court Records form a largely continuous series from year to year, on some districts up to the year 1970. However, large parts of the renovated records from the rural district courts of Åland, Savonia and Sääksmäki were lost in the 1827 fire of Turku. Despite all, due to geographical scale and long time span, Renovated District Court Records are one of the largest collections in NAF.
As mentioned earlier Renovated Court Records are fair copies, which were sent to the higher court instances, the courts of appeal. This makes them excellent source material to our READ pilot project, since the writing is in a specified form and already “transcribed” once. Renovated Court Records are easier to read than the originals, which have a lot of strikethroughs and additions in the margins and in between the lines. Additionally, 19th century court records are physically in good condition and the handwriting is usually easy to read. However, what makes them a challenging material to the AI based learning is that they include many different handwritings. Our time period is 100 years, and the geographical area is larger than Finland’s current borders, so there are lot of different handwriting styles and unfortunately some of them are really difficult to read.
A page from the renovated registration records. Rural district of Åland, year 1862. |
The registration records, the other series under renovated court records, include deeds, mortgages and guardianship registrations. Registration records formed their own series from 18th century forward, and they were copied to renovated court records as a whole even after main records started to be copied in more and more reduced manner. Registration records can be used e.g. to track the ownership of property. The digitized collection of Renovated District Court Records’ registration records in NAF includes over 800,000 images.
Due to their contents, main records and registration records are somewhat different from each other. The structure of registration records is more specified and the contents of them are repetitive. This means that for AI based learning registration records are easier material and hence NAF is going to transcribe them first and process main records afterwards.
At the moment we are focusing on Renovated Court Records, since we want to give the customers of NAF concrete results of the benefits of HTR technology. However, the Renovated Court Records are only foretaste of things to come. During the course of READ project, NAF have tried HTR technology with several different source materials and we have had good results with many of them. In the future, we hope that HTR can be used for different materials so all the customers of NAF can benefit from it.
Ville-Pekka Kääriäinen
Comments
Post a Comment