Can Machines Read Old Manuscripts? Bilingual Cyrillic Texts as a Case Study

Ve středu 4. března prosloví naše kolegyně Constanța Burlacu přednášku na téma „Can Machines Read Old Manuscripts? Bilingual Cyrillic Texts as a Case Study“.

Přednáška se uskuteční od 17:30 v rámci semináře „Digital Humanities in Medieval Studies” v Centru medievistických studií Praha (Jilská 1, 1. patro).

Anotace: This paper examines the application of Handwritten Text Recognition (HTR) technologies to bilingual Cyrillic manuscripts from the Romanian territories, focusing on sixteenth-century materials written in Church Slavonic and Old Romanian. These sources present a particularly complex test case for machine-assisted reading: they combine multiple languages within a shared script, exhibit unstable orthographic conventions, and reflect layered practices of translation, revision, and scribal adaptation. Historically approached through diplomatic transcription and interpretative editing, such materials raise fundamental questions about the relationship between script, language, and textual authority. Questions that become newly urgent in the context of automated processing.

Drawing on the development of custom Transkribus models trained on early Romanian and Slavonic sources, the paper discusses the methodological choices involved in preparing, segmenting, and encoding these texts for HTR workflows. Particular attention is paid to the challenges posed by bilingual switching, superscript letters, abbreviation systems, and graphic variation across scribal hands. The results demonstrate that while current models can achieve promising Character Error Rates, accuracy alone does not equate to understanding. The machine “reads” graphemic patterns, but philological interpretation — including questions of redaction, translation technique, and linguistic stratification — remains a human task.

Rather than framing HTR as a replacement for traditional scholarship, this study argues that digital tools can reshape the epistemology of philology.