Resulting from its status as the official language of the Sasanian Empire (3rd–7th centuries CE), the Middle Persian language played a prominent transcultural role during a period stretching from late antiquity to the early Islamic period. Besides serving as a medium of communication between the linguistically and culturally diverse areas of Western and Eastern Iran, it was employed as a literary language by several religious traditions.
The Middle Persian corpus consists of Zoroastrian, Manichaean, narrative, administrative and epigraphic sub-corpora. The Zoroastrian Middle Persian corpus represents by far the largest among these and has so far only partially been examined. Our project therefore aims at the creation of a digital corpus of Zoroastrian Middle Persian texts in Pahlavi script (in short: Pahlavi texts) as well as at the development of a comprehensive dictionary covering that corpus.
The corpus consists of roughly 50 texts, containing some 705,000 tokens. As its point of departure, the project relies on the ca. 20 oldest Pahlavi codices (from the 13th–17th centuries CE). The texts contained in those codices will be supplied with several layers of annotation:
Within the overall structure of the project, the corpus and dictionary function as closely interrelated analytical tools even though their focal points differ (syntax vs. semantics). Their technical integration also plays an essential role in the general workflow of the project. For this purpose, the project employs a web-based working environment that facilitates the collaborative work on the corpus and dictionary. The same environment will also provide a public user interface for enquiries into those resources that have already been processed at a given point in time.'
The project intends to create a platform that may in the future also be used for the remaining Middle Persian sub-corpora, as well as for the corpora of other Middle Iranian languages. It is thus conducted with a view to the eventual creation of a complete dictionary of Middle Persian (incorporating all of its sub-corpora) as well as to an expansion of the corpus into the domains of other Middle Iranian languages.
The project is a cooperation between three universities: Ruhr-University Bochum, Freie Universität Berlin and University of Cologne It is funded by the Deutsche Forschungsgemeinschaft (DFG) and designed as a “Langfristvorhaben” (Long-Term Project, 2021–2030).