Balancing the diverse page layouts of medieval psalm manuscripts with the strict logistical and clean bracket structure of an XML document is a recurring challenge in my work. Many questions and problems only arise when I begin to transcribe a particular manuscript for the Media History of the Psalms project.
Diversity in appearance
Anyone who has worked with manuscripts from the late medieval period will know how diverse they are in appearance. They differ greatly in terms of layout (one or two columns per page), book decoration (initial letters, rubrics or underlining), the degree of care taken (corrections and deletions), writing conventions (virgules and frequency of abbreviations) and last but not least, the language used (writing language and diction). These manuscript characteristics are also encoded during transcription using a grammar structure (XML - Extensible Markup Language) and a dictionary (TEI - Text Encoding Initiative).
The development of an XML document follows a structure in which different elements are nested in layers. These elements can be filled with content, in this case with the manuscripts, and nesting can be on as many levels as required. The function or meaning of an element is defined by a tag. The respective content is included in angled brackets between the start and end tags. It is these brackets which contain the encoding or markup.
Psalm 18.5 from the Lutheran Bible (Stuttgart, German Bible Society, 1999) can be used to demonstrate what the TEI compliant encoding of a modern edition of the psalms would look like:
If this psalm from the Lutheran Bible was coded as part of the Media History of the Psalms project, verse 5 would look like this:
The above-mentioned nesting can be seen here. <ab> functions as the route element like a container holding the elements which follow. As our project examines and compares the structures of psalms, the typographic and structural features are marked up with attributes: the superscript of the verse number, 5 (attribute: rend="sup") and the paragraph mark in bold print (attribute: rend="bold"). In this case, the extent of nesting is within manageable limits.
The markup of psalm 18.5 from the manuscript shown below (Wolfenbüttel, Herzog August Bibliothek, Cod. Guelf. 81.10 Aug. 2°) produces a rather different result.
This manuscript is a 15th century psalter in two languages. Each psalm is preceded by a commentary in Middle Low German. Every single psalm verse is reproduced in both Latin and a Middle Low German translation or interpretation. The scribe and rubricator structured this text using the methods usual at that time: large red Lombard initials mark the beginning of a new section (in this case a new commentary), rubricated capitals sub-divide the continuous text (in this case, the psalm verses) and certain parts of the text are underlined in red for emphasis (in this case, the Latin verse). The beginning of verse 18.5 (in Latin) in the manuscript looks like this:
The markup of Psalm 18.5 from the above manuscript on the other hand looks like this:
In addition to the length, the depth and extent of the nesting are striking. This is due both to the structure, which is predetermined by the manuscript, and to the configuration of an XML document. It is essential that different code elements are nested consistently. Documents which contain conflicts or overlaps are malformed and cannot be processed further.
The correct encoding of abbreviations presents a particular challenge. The very first word of the text has four characteristics which need to be expressed in brackets: the rubricated capital letter, the hyphenation at the end of the line, the continuous red line and the abbreviations. However, once you have understood the fundamental principle of an XML document (i.e. that the brackets of a child object opened within the brackets of a parent object must be closed again within the parent object), you can grasp the more detailed structural characteristics.
The example shown here clearly demonstrates how difficult it can be to transfer medieval lettering to correct XML configurations. The situation becomes even more complicated if further additions which are structurally complex or which conflict with existing structures have to be accounted for. For example, in medieval manuscripts, comments which supplement the content are often found in the margins. These may run over several lines, are often in different handwriting and relate to sections of the text which cover more than one column or page.
Hierarchical nesting is pushed to its limits here; however, the TEI guidelines propose solutions for dealing with issues such as overlapping structures (http://www.tei-c.org/release/doc/tei-p5-doc/en/html/NH.html). Whilst these guidelines do not specifically address the phenomena and encoding of medieval texts, it is possible to markup the different structural phenomena of late medieval manuscripts using existing elements and attributes. This means, however, that continual analysis of the various markup options is required and that to some extent elements and attributes need to be handled flexibly.
Through the transcription and encoding of different manuscripts from the late Middle Ages, guidelines have now been developed on how and according to which rules markup should be carried out for the Media History of the Psalms project. At the end of the project, these guidelines (which document encoding solutions for numerous phenomena) along with the further development of an efficient markup system for transcription, will provide a well-founded tool for projects with a similar focus on the capture and handling of text. Many encoding issues have already arisen and more are sure to come up. Future research projects will no longer have to clarify these issues and will be able to adopt, apply and expand on our findings.
Hanne Grießmann (M.A.) is a research assistant on the ‘Media History of the Psalms’ project led by associate professor Dr. Ursula Kundert. It is a sub-project of the Wolfenbüttel Text and Frame research project. Grießmann is entrusted with the transcription and encoding of Middle Low German, Early New High German and Latin psalms.