Kustaa Vaasa Seminar: Court Book Search Engines and outlooks on digital history

Greetings from Jyväskylä where we participated at the Kustaa Vaasa seminar 12-13 June 2019. The annual seminar is organised by the department of history and ethnology of the University of Jyväskylä. This year’s theme was “New and old approaches to history research and other humanities in the digital age”. The interesting presentations covered many different eras, methods, and sources, ranging from author recognition in Ancient Rome to computational analysis of recent Brexit debates in the UK Parliament. In this blog post we discuss our seminar presentation and also some common themes that were most relevant to our READ project.
 
Our presentation was about the upcoming user interface for the text recognized material. Right now we have actually two interfaces: The first is an indexed search engine developed by Universitat
Politècnica de València with about 130,000 pictures of the Renovated District Court Records. This has been used to test the search engine and the search results are already good. However, as the user interface has not been designed specially to display this kind of results, there is still work to be done to get the search results more user-friendly. The second user interface, designed mainly by Berthold Ulreich, will address this issue and will become the base for the National Archives’ customer interface. The interfaces are not yet in public use but the engines were already seen as promising, and the seminar participants were eager to see and use the upcoming interface.

Search engine by Universitat Politècnica de València. A search example, filtered to cover the Ilmajoki jurisdiction, year 1864. In this case, there are 68 pages that include the word “förmyndare” (guardian) at least once.


Search engine by Berthold Ulreich. A search example of “Nilsdotter” (Patronymic, “daughter of Nils”). The search engine has found 426 pages where Nilsdotter is mentioned at least once. Results can be filtered and sorted in different ways. In this example the user interface language is Finnish. The final search engine will also be available in English and Swedish.


Search engine by Berthold Ulreich. A search example of “syytinki” (traditional life-annuity) in the Janakkala jurisdiction, year 1868. In this case the search engine highlights the results on the page and you can also compare the text with the transcription.
 

A couple of themes popped out in many of the seminar presentations and are also especially relevant to our READ project. Firstly, a common notion between the presentations was that if properly planned, digital history projects may offer good opportunities for interdisciplinary cooperation. Besides cooperation between for example historians and data scientists, cooperation between the academia and the GLAM (galleries, libraries, archives, museums) sector is important. 

In digital humanities new skills are required and new tools are used. This does not mean that traditional history research skills are no longer needed, as the machines and data scientists cannot produce useful results alone. Most of the project work is still based on source criticism and contextualising the material in different ways, and in this process the skills of the historian are most valuable. This has been very relevant in the READ project, where the National Archives has most knowledge about the source material and the Finnish historical context. As pointed out for example in the earlier blog post the resources may be digital but the skills needed are traditional humanities methods. However, of course we would not have got anywhere without the technology and data science skills of our collaborators around Europe. 

Second common theme that was raised is that harmonising the data takes a lot of work in digital humanities projects. Digital tools do not mean that the results will arrive by simply pushing a button. This is also true in our READ project where a considerable amount of human work hours is needed to produce and check the text transcriptions, in order to get a useful HTR model.

The final presentation of the seminar was about new learning materials in learning old handwriting. Teaching old handwriting is a challenge for many history departments. If the students no longer want or can learn the old handwriting, the research may drift into using other kinds of primary sources.

Is handwritten text recognition technology then a possibility or a threat? One could think that if a text recognition tool like Transkribus works “too well” nobody bothers to learn old handwriting anymore, but we think it’s unlikely that the technology will become so good that humans with knowledge of old handwriting are no longer needed. In fact, we think that for example the Transkribus platform is a useful tool in learning old handwriting. The digital tools help in the transcription process but it still takes a lot of human brain work. Also, it is an important motivation factor when you know that your transcription is used in teaching the machine and will be a part of a real corpus. 

Overall, the READ project was seen as a promising initiative by many participants, and even considered as a “New Hope” in the field of historical subject modelling, where the researchers have previously been forced to only process digitised printed sources. We’ll try to continue to bring hope to this galaxy. One of the next steps is that we’ll release the mentioned court book search engines to the general public.

Sampo Viiri & Ville-Pekka Kääriäinen

Comments