As mentioned in the previous blog posts, one of our main goals at the moment is to develop an HTR (Handwritten Text Recognition) model that can fluently read 19th century Court Records. In order to achieve this we must produce a lot of “teaching material” which we call GT (Ground Truth). Basically we have to transcribe lots of documents so currently me and my co-workers spend most of our working hours transcribing 19th century Court Records. As explained in the blog post “Renovated District Court Records”, most of the Court Records are written in Swedish (sometimes in Finnish also). In this blog post I will go through all the resources, which have helped me as I have transcribed Old Swedish. Most of the tips I give will only apply to Swedish. However, I hope that this post can also give ideas to the readers that don’t use or need help with Swedish.
When I face words that I do not recognize, I consider myself as a detective. Usually there are several leads to recognize the word. Most of the time you can judge from the context the word even though you cannot first recognize it. Sometimes it’s hard to tell if you have correctly recognized the word because it looks gibberish to you. Then you must find out whether you have made a mistake, the original writer has made a mistake or if it’s a word that has gone out of use and you don’t know it yet. Sometimes the situation is even worse and you can only read a few letters from here and there. For example you might recognize the beginning and the end of the word. However, if you are a good detective, this can be enough to solve the mystery.
Let’s have a made-up example. Whilst transcribing I come up with a difficult word and I think it might be “förmyndare” (in English: guardian), but I’m not sure about it. I can search for “förmyndare” in Transkribus search function and it gives me all the instances where the same word is found. The result of the search is shown below. Since we have already created a lot of GT material in the project, there are 136 results. As shown in the screen capture you can also see a picture of the word in the original document for every result. With comparing the results with your word this technique can be used to recognize words quite easily!
But as I previously mentioned sometimes you only recognize few letters or syllables so you can’t use the method described above. However, with the help of “wildcard characters” you don’t even have to know what you are looking for. If you are not familiar with the wildcard characters you can find more information online. According to Wikipedia: “… a wildcard character is a kind of placeholder represented by a single character, such as an asterisk (*), which can be interpreted as a number of literal characters or an empty string.” In Transkribus addition to the asterisk (*) you can use interrogation mark (?) and quotation marks (“ ”). The interrogation mark is the same as asterisk but instead of any number of characters, the interrogation mark is used to replace any one character. Quotation marks are used to search phrases. This is useful since words are usually used in a certain order in the same context.
Let’s have some examples again. Let’s say I come across the word below:
It’s quite easy to recognize certain letters. It definitely begins with “p” and after that might be “e”. Towards the end, there is also syllable “lig” which is quite common in Swedish. We can use this information in Transkribus search by searching “p*lig*”.
There were 54 results and many of them are “personligen” (personally). I can compare the results with my word and after doing it, I’m confirmed that our word was “personligen”.
Let’s have another example:
In this example the part “stä” is quite easy to read. You can also tell that there are two letters before it and either of them is “i”. Now let’s use this information. We can search “??stä*”.
Again we have lot of results but from them we can quickly see that “inställde” is our word.
You can make two kind of searches in SAOB, “Quick search” and “Full-text search”. Quick search searches results only from the listed words in dictionary but not from their definitions. Whilst searching you have to keep on mind that most of the words are usually listed in quite Modern Swedish. For example over time Swedish words have become shorter, for example “hafva” (to have) has become “hava”. In addition, some letters are replaced with others. For example, “f” is “v” usually in modern Swedish, so bref (letter) has become brev. These are just few examples and there are lot of different changes that you must be aware of. The ground rule is that the older handwriting is, the more it has changed.
Luckily, since SAOB is a historical dictionary it also includes definitions from history. Let’s see for example the previously mentioned “hava” (to have). Among other things (etymology, definition, etc.), there is a list how the word “hava” has been written in the past. You can also see the source of the information and the year when it was written in that way.
This information can be searched from SAOB’s full-text search (Fritextsök). For example, let’s search the word “hafva” which doesn’t give any results from the quick search. From full-text search, we get over 1000 results.
With these functions, SAOB can be a lifesaver for transcriber. I think it’s the best and only place where you can search written Old Swedish.
MOT Dictionaries
I have found that MOT Dictionaries are the best for my needs. At least for Swedish they have extensive vocabulary that also includes some old words that aren’t commonly used anymore. However, the main reason I use them is that you can use wildcard characters while searching.
Project Runeberg
Nonetheless, problem with MOT Dictionaries is that the vocabulary is quite modern. That’s why it’s good to have other options too. When I was at a paleography course in the University, our teacher recommended that all students should go to a second-hand bookshop or to an antique shop to buy the oldest Swedish-Finnish dictionary they could find. I think it is a great advice, but you can also find those online. For example, Project Runeberg has published some old Swedish, Danish, Norwegian and Finnish dictionaries. For my needs (Swedish-Finnish) I have found that the best available dictionary is Svenskt-Finskt Lexikon (Ruotsalais-Suomalainen Sanakirja). It’s published by The Finnish Literature Society (SKS) in 1930 and edited by Knut Cannelin. It’s not the oldest available one but I think its vocabulary is definitely the biggest. Unfortunately, there isn’t a search engine for it, but the people behind Project Runeberg have made a great Table of Contents for it with hyperlinks so it is definitely much faster to use than a physical copy.
HisKi
The Genealogical Society of Finland (Suomen Sukututkimusseura) has collected a database called HisKi, which includes lists of christenings, marriages, burials and moves. There is also a search program for HisKi where you can make searches with wildcard characters.
Let’s have a made-up example again. I’m transcribing the court records of Karijoki (Bötöm) parish and I come across a farm name that is shown below:
I’m not sure about it, but it begins with “N” and there is “d” in the end. We can make search “N*d”.
From the results, we can draw a conclusion that our place was Norrgård. This is just one example how we can use HisKi. We can also search first names, last names and patronymics. However, you should always be aware that spelling in HisKi can be different than the spelling in your source material. It was common that there were many different ways to spell the same name.
FFHA Court Record databases
FFHA, Finland's Family History Association (SSHY, Suomen Sukuhistoriallinen Yhdistys) has also made Databases of the Court Records. There are three different databases: Savonia, Western Finland and Kexholm province court record databases. Databases include social standing, first name, family name or home village. Their search engine also accepts wildcard characters.
Finland's Family History Association: Phrases
Includes most common phrases in Court Records. Includes also common crime names and names of the laws.
Finland's Family History Association: Vocabulary of Court Records and Estate Inventory documents
Includes basic vocabulary from Court Records and Estate Inventory documents.
The Society of Swedish Literature in Finland (SLS): Administrative History Dictionary:
Includes administrative history terminology used in government documentation.
Karjala Database Foundation (Karjala-tietokantasäätiö): Glossary
Includes glossary of Dates/Time, Occupation, Miscellaneous, Social Relations and Diseases.
Genealogical Society of Finland: List of abbreviations
Juuret.org: List of abbreviations
The first one is easier and faster to use, since all abbreviations are in the same page. These lists are especially useful with Poll Tax Records, Validation Poll Tax Records and Church Registers. Unfortunately, I have found that they do not include many abbreviations from Court Records.
Historismi.net: Currency and measurement
This website doesn’t include any list of abbreviations, but it explains the currency system and measurements used in Finland in different time periods. It is useful since currency and measurements are usually abbreviated.
Old Handwriting and Documents, Volume 1. The National Archives of Finland (1977)
Old Handwriting and Documents, Volume 2. The National Archives of Finland (1977)
There are many great paleography manuals that can help you to learn old handwriting and Old Swedish. National Archives have published their manual online and it’s open for everyone. Page 16 from the Volume 1 include list of the most common abbreviations and it’s really useful.
About the Transcribing and making GT
Transcribing text that is difficult to read is a time-consuming job. When you are doing a research, it isn’t always necessary, that you can read 100% of the old handwriting. You don’t have to understand all the scrawl and scribble and you focus on the general idea of the text. However, when you are making GT (Ground Truth) material, which the AI uses to learn to read handwriting it is important that you recognize words and letters as accurate as it is humanly possible. Otherwise, you might do more harm than good for the AI’s ability to read handwritten text.When I face words that I do not recognize, I consider myself as a detective. Usually there are several leads to recognize the word. Most of the time you can judge from the context the word even though you cannot first recognize it. Sometimes it’s hard to tell if you have correctly recognized the word because it looks gibberish to you. Then you must find out whether you have made a mistake, the original writer has made a mistake or if it’s a word that has gone out of use and you don’t know it yet. Sometimes the situation is even worse and you can only read a few letters from here and there. For example you might recognize the beginning and the end of the word. However, if you are a good detective, this can be enough to solve the mystery.
Transkribus platform itself
I think that the cumulative nature of the Transkribus platform itself is a great tool for researchers who struggle to read difficult handwriting. The desktop version of Transkribus platform has a “Search for” function, which enables you to do searches from the already transcribed documents that you have access to.Let’s have a made-up example. Whilst transcribing I come up with a difficult word and I think it might be “förmyndare” (in English: guardian), but I’m not sure about it. I can search for “förmyndare” in Transkribus search function and it gives me all the instances where the same word is found. The result of the search is shown below. Since we have already created a lot of GT material in the project, there are 136 results. As shown in the screen capture you can also see a picture of the word in the original document for every result. With comparing the results with your word this technique can be used to recognize words quite easily!
But as I previously mentioned sometimes you only recognize few letters or syllables so you can’t use the method described above. However, with the help of “wildcard characters” you don’t even have to know what you are looking for. If you are not familiar with the wildcard characters you can find more information online. According to Wikipedia: “… a wildcard character is a kind of placeholder represented by a single character, such as an asterisk (*), which can be interpreted as a number of literal characters or an empty string.” In Transkribus addition to the asterisk (*) you can use interrogation mark (?) and quotation marks (“ ”). The interrogation mark is the same as asterisk but instead of any number of characters, the interrogation mark is used to replace any one character. Quotation marks are used to search phrases. This is useful since words are usually used in a certain order in the same context.
Let’s have some examples again. Let’s say I come across the word below:
It’s quite easy to recognize certain letters. It definitely begins with “p” and after that might be “e”. Towards the end, there is also syllable “lig” which is quite common in Swedish. We can use this information in Transkribus search by searching “p*lig*”.
There were 54 results and many of them are “personligen” (personally). I can compare the results with my word and after doing it, I’m confirmed that our word was “personligen”.
Let’s have another example:
In this example the part “stä” is quite easy to read. You can also tell that there are two letters before it and either of them is “i”. Now let’s use this information. We can search “??stä*”.
Again we have lot of results but from them we can quickly see that “inställde” is our word.
SAOB
SAOB, Svenska Akademiens Ordbok (the Swedish Academy Dictionary) is definitely the most useful tool when translating Old Swedish. It’s a historical dictionary that describes written Swedish language from king Gustav Vasa's time (16th century) to our day. In practice, it’s the Oxford English Dictionary for Swedish. SAOB is great since it includes words that have gone out of use or changed to something different. The best part is that it’s completely free. What makes it also great for transcribing is that it also supports wildcard characters.You can make two kind of searches in SAOB, “Quick search” and “Full-text search”. Quick search searches results only from the listed words in dictionary but not from their definitions. Whilst searching you have to keep on mind that most of the words are usually listed in quite Modern Swedish. For example over time Swedish words have become shorter, for example “hafva” (to have) has become “hava”. In addition, some letters are replaced with others. For example, “f” is “v” usually in modern Swedish, so bref (letter) has become brev. These are just few examples and there are lot of different changes that you must be aware of. The ground rule is that the older handwriting is, the more it has changed.
Luckily, since SAOB is a historical dictionary it also includes definitions from history. Let’s see for example the previously mentioned “hava” (to have). Among other things (etymology, definition, etc.), there is a list how the word “hava” has been written in the past. You can also see the source of the information and the year when it was written in that way.
This information can be searched from SAOB’s full-text search (Fritextsök). For example, let’s search the word “hafva” which doesn’t give any results from the quick search. From full-text search, we get over 1000 results.
With these functions, SAOB can be a lifesaver for transcriber. I think it’s the best and only place where you can search written Old Swedish.
Dictionaries
Dictionaries can be very useful tools for understanding old handwriting. This is especially the case if you are not a native speaker of the language yourself. Internet is full of dictionaries. Some of them are free and some are behind paywall. Free ones can be good, but at least for Swedish-Finnish usually the ones that cost are better. I think that the problem with the free dictionaries is that the vocabulary of them is small and they include usually only modern words.MOT Dictionaries
I have found that MOT Dictionaries are the best for my needs. At least for Swedish they have extensive vocabulary that also includes some old words that aren’t commonly used anymore. However, the main reason I use them is that you can use wildcard characters while searching.
Project Runeberg
Nonetheless, problem with MOT Dictionaries is that the vocabulary is quite modern. That’s why it’s good to have other options too. When I was at a paleography course in the University, our teacher recommended that all students should go to a second-hand bookshop or to an antique shop to buy the oldest Swedish-Finnish dictionary they could find. I think it is a great advice, but you can also find those online. For example, Project Runeberg has published some old Swedish, Danish, Norwegian and Finnish dictionaries. For my needs (Swedish-Finnish) I have found that the best available dictionary is Svenskt-Finskt Lexikon (Ruotsalais-Suomalainen Sanakirja). It’s published by The Finnish Literature Society (SKS) in 1930 and edited by Knut Cannelin. It’s not the oldest available one but I think its vocabulary is definitely the biggest. Unfortunately, there isn’t a search engine for it, but the people behind Project Runeberg have made a great Table of Contents for it with hyperlinks so it is definitely much faster to use than a physical copy.
Names, place names, titles
Transcribing names and place names can be tricky. This is especially the case, if you are not familiar with the local area. Luckily, genealogy is a common hobby in Finland and genealogists have made many different tools to help find information from old documents. Thus, also historians benefit a lot from this.HisKi
The Genealogical Society of Finland (Suomen Sukututkimusseura) has collected a database called HisKi, which includes lists of christenings, marriages, burials and moves. There is also a search program for HisKi where you can make searches with wildcard characters.
Let’s have a made-up example again. I’m transcribing the court records of Karijoki (Bötöm) parish and I come across a farm name that is shown below:
I’m not sure about it, but it begins with “N” and there is “d” in the end. We can make search “N*d”.
From the results, we can draw a conclusion that our place was Norrgård. This is just one example how we can use HisKi. We can also search first names, last names and patronymics. However, you should always be aware that spelling in HisKi can be different than the spelling in your source material. It was common that there were many different ways to spell the same name.
FFHA Court Record databases
FFHA, Finland's Family History Association (SSHY, Suomen Sukuhistoriallinen Yhdistys) has also made Databases of the Court Records. There are three different databases: Savonia, Western Finland and Kexholm province court record databases. Databases include social standing, first name, family name or home village. Their search engine also accepts wildcard characters.
Glossaries
Internet is full of great glossaries. I have found these most useful whilst reading old documents in Swedish:Finland's Family History Association: Phrases
Includes most common phrases in Court Records. Includes also common crime names and names of the laws.
Finland's Family History Association: Vocabulary of Court Records and Estate Inventory documents
Includes basic vocabulary from Court Records and Estate Inventory documents.
The Society of Swedish Literature in Finland (SLS): Administrative History Dictionary:
Includes administrative history terminology used in government documentation.
Karjala Database Foundation (Karjala-tietokantasäätiö): Glossary
Includes glossary of Dates/Time, Occupation, Miscellaneous, Social Relations and Diseases.
Abbreviations
Old administrative documents are full of different abbreviations. It can be quite difficult to find the right meaning for different abbreviations. Clerks used them so much that some of them have developed to symbols of their own, so it’s hard to tell from which letters the abbreviations are consisted of. Luckily there are some resources that help with this also.Genealogical Society of Finland: List of abbreviations
Juuret.org: List of abbreviations
The first one is easier and faster to use, since all abbreviations are in the same page. These lists are especially useful with Poll Tax Records, Validation Poll Tax Records and Church Registers. Unfortunately, I have found that they do not include many abbreviations from Court Records.
Historismi.net: Currency and measurement
This website doesn’t include any list of abbreviations, but it explains the currency system and measurements used in Finland in different time periods. It is useful since currency and measurements are usually abbreviated.
Old Handwriting and Documents, Volume 1. The National Archives of Finland (1977)
Old Handwriting and Documents, Volume 2. The National Archives of Finland (1977)
There are many great paleography manuals that can help you to learn old handwriting and Old Swedish. National Archives have published their manual online and it’s open for everyone. Page 16 from the Volume 1 include list of the most common abbreviations and it’s really useful.
Anything else?
These were my tips that can help you to understand old handwriting. Now I would like to ask from our readers, do you know any other useful resources? Please leave a comment below if you do. It doesn’t have to be for Swedish!
Ville-Pekka Kääriäinen
Comments
Post a Comment