Forensic Linguistics in Armenian Studies?
Yesterday I took to Twitter to ask for leads about variables with which to determine authors of anonymous texts. I had in mind some works attributed to Yeghishe, and so I was interested in finding informative linguistic (rather than chirographic) variables that have been used with success in previous quantitative studies.
Thanks to some helpful responses, I learned that there has been over a century of quantitative research on the problem of determining authorship. These studies now often fall under the labels “authorship identification”, “authorship attribution” or “stylometry”. Other keywords include “authorship detection”, “authorship verification”, “author profiling” and “computational stylistics”. The physicist Thomas Mendenhall first took a statistical approach to this problem in his 1887 article, The Characteristic Curves of Composition. Early variables of interest pertained to things like word and sentence lengths, word frequencies, type-token ratios and n-grams. But there has been much research on authorship identification since then, and recent years have seen an increasing application of machine learning methods to authorship identification problems (more on which, below). These approaches may also help bring clarity to long-standing mysteries concerning authorship of some Classical Armenian texts, including those with disputed authorship, anonymous translators and suspicious segments from histories of known authors that are thought to have been tampered with. In a future post, I will list some outstanding questions, together with the texts.
Here, I would like to share a selection of practical (hands-on) introductory resources on authorship identification. I hope these will inspire those who have both a computational background and an interest in history, linguistics and/or Armenian Studies to apply such methods to Classical Armenian texts.
—
The following is a selection of practical introductory resources for authorship identification:
[Text] Revisiting the Disputed Federalist Papers: Historical Forensics with the Chaos Game Representation and AI (Mathematica). See also: Authorship Attribution Using the Chaos Game Representation [arXiv], with source code.
[Text] Introduction to Stylometry with Python (Programming Historian).
[Text] Authorship Detection with Machine Learning Lab (US Naval Academy).
[Text] Author Identification Lab (US Naval Academy).
[Text] Authorship Attribution using Machine Learning (GitHub).
[Text] Attributing Authorship with Stylometry (from Real-World Python).
[Text] Authorship Identification Using Neural Networks (Mathematica).
[Text] Automated Authorship Verification (Mathematica).
[Video] Computational Approaches to Authorship Attribution in a Corpus of 12th century Latin texts (Mathematica).
Comments
Post a Comment