Greetings
from Jyväskylä where we participated at the Kustaa Vaasa seminar 12-13 June
2019. The annual seminar is organised by the department of history and
ethnology of the University of Jyväskylä. This year’s theme was “New and old
approaches to history research and other humanities in the digital age”. The
interesting presentations covered many different eras, methods, and sources,
ranging from author recognition in Ancient Rome to computational analysis of
recent Brexit debates in the UK Parliament. In this blog post we discuss our
seminar presentation and also some common themes that were most relevant to our
READ project.
Our presentation
was about the upcoming user interface for the text recognized material. Right
now we have actually two interfaces: The first is an indexed search engine
developed by Universitat
Politècnica de València with about 130,000 pictures of
the Renovated District Court Records. This has been used to test the search engine
and the search results are already good. However, as the user interface has not
been designed specially to display this kind of results, there is still work to
be done to get the search results more user-friendly. The second user
interface, designed mainly by Berthold Ulreich, will address this issue and
will become the base for the National Archives’ customer interface. The
interfaces are not yet in public use but the engines were already seen as
promising, and the seminar participants were eager to see and use the upcoming
interface.
A couple of
themes popped out in many of the seminar presentations and are also especially
relevant to our READ project. Firstly, a common notion between the
presentations was that if properly planned, digital history projects may offer
good opportunities for interdisciplinary cooperation. Besides cooperation
between for example historians and data scientists, cooperation between the
academia and the GLAM (galleries, libraries, archives, museums) sector is
important.
In digital
humanities new skills are required and new tools are used. This does not mean
that traditional history research skills are no longer needed, as the machines
and data scientists cannot produce useful results alone. Most of the project work
is still based on source criticism and contextualising the material in
different ways, and in this process the skills of the historian are most
valuable. This has been very relevant in the READ project, where the National
Archives has most knowledge about the source material and the Finnish
historical context. As pointed out for example in the earlier blog post the resources may be digital but
the skills needed are traditional humanities methods. However, of course we
would not have got anywhere without the technology and data science skills of
our collaborators around Europe.
Second
common theme that was raised is that harmonising the data takes a lot of work
in digital humanities projects. Digital tools do not mean that the results will
arrive by simply pushing a button. This is also true in our READ project where
a considerable amount of human work hours is needed to produce and check the
text transcriptions, in order to get a useful HTR model.
The final
presentation of the seminar was about new learning materials in learning old
handwriting. Teaching old handwriting is a challenge for many history
departments. If the students no longer want or can learn the old handwriting,
the research may drift into using other kinds of primary sources.
Is
handwritten text recognition technology then a possibility or a threat? One could
think that if a text recognition tool like Transkribus works “too well” nobody
bothers to learn old handwriting anymore, but we think it’s unlikely that the
technology will become so good that humans with knowledge of old handwriting
are no longer needed. In fact, we think that for example the Transkribus platform
is a useful tool in learning old handwriting. The digital tools help in the
transcription process but it still takes a lot of human brain work. Also, it is
an important motivation factor when you know that your transcription is used in
teaching the machine and will be a part of a real corpus.
Overall, the
READ project was seen as a promising initiative by many participants, and even
considered as a “New Hope” in the field of historical subject modelling, where
the researchers have previously been forced to only process digitised printed
sources. We’ll try to continue to bring hope to this galaxy. One of the next
steps is that we’ll release the mentioned court book search engines to the
general public.
Sampo Viiri
& Ville-Pekka Kääriäinen
Comments
Post a Comment