Nan Z. Da’s in Critical Inquiry is the latest salvo in the endless digital humanities culture wars, a sub-section of the humanities in which we use computers and see how long we can sustain the same conversation we’ve been having since 1985. There’s a lot that Da writes that’s true, I have my own list of problems with what Da refers to as computational literary criticism (CLS) and a lot of it corresponds with Da’s but enough of it is sufficiently different that I felt motivated to write this far from exhaustive response. I’ll leave that to the people Da goes after.
So the notion that the field is plauged by insufficient amounts of bootstrapping, ahistoricism, the shortcomings inherent to the analysis of hundreds of thousands of novels (speaking for myself I’m totally uninterested in reading literary criticism written by someone who has not read the novels they’re quantifying) the suspect nature of network visualisations, topic modelling, reductive notions of influence which are definitely pervasive within the field, the dubiousness of mark-up are all well considered and I think I broadly correct. I think the fact that so much ink is spilled about digital humanities within higher education publications has much to do with the messianic tone in which stylometrists such as Moretti presented their work in the past. I think this is a problem which the field has to account for, and if we were in need of another reason to not read Moretti anymore, the lack of robustness in his quantitative work (if it could even be called that) would definitely be another. As Da writes in the piece’s opening paragraph, CLS can often seem tautological. The abscence of an intermediary scale between the macro at which the text is analysed and the micro at which it is read through which we might bring these two into a meaningful relation seems to me to be quite true; I’m sure a lot of people reading this are familiar with the bathetic tone of many stylometric publications’ ‘Results and Discussion’ sections. The only autononomously interesting thing I’ve ever turned up from my own analyses is that in the chapters in which Joyce introduces a woman narrator cluster with the very early sections of A Portrait of the Artist as a Young Man, suggesting that Joyce writes his women characters in much the same way as he writes young children. Otherwise, a medical humanities study being run partialy out of UCD used supervised topic modelling to analyse a large corpus of British medical journalsin a bid to provide some historical provenance to the anti-vax scare discovered the language associated with disease is heavily inflected by race and the primary means through which disease is conceived of was in racial terms. (Yes, we probably didn’t need an algorithm to tell us the British are racist or Joyce’s representations of women are an embarrassment but I thought these were interesting).
The first thing I’d dissent from Da outright on is the failure to consider the gap between authorship attribution and stylometry, this being the different between forensic attribution studies and more exploratory approaches. Though this might seem like a hedge (‘I don’t have to be rigorous if I say I don’t have to be’) but this has been the most straightforward means through which I’ve gotten away from the tyranny of replication more towards a literary criticism without a narrowly conceived utility. These is a notable lack of a consideration of the implications of Burrows’ Delta method; the way Da describes it, it would seem as though the fixation on vocabulary was a totally arbitrary decision, but in fact it was adapted because of how robust it proved as a measure of authorship, I’ll put some of articles in the works cited which will all give an indication of just how successful Delta was, but in short, it demonstrated that a relatively small sample of the words in a corpus all tend in a highly significant direction from author to author and that authorial style is absolutely rooted in the relative frequencies with which these words are deployed. This might have helped focus Da’s consideration of the value of word frequencies in Shakespeare’s plays for instance. I absolutely agree that in their own terms they’re insufficient, they need to be paired with historicisation, context, the state of the art and most critically, sensitive close readings, and many literary critics might think there’s more direct routes which I completely understand.
Replicability is the point at which I think more structural concerns need to be introduced. Just from my own anecdotal experience between a handful of Irish institutions, conversations at conferences, etc. I really do have to say the notion of stylometry as some kind of cash cow is vastly overstated. A lot of the articles which are cited to this effect are the products of individual and intra-institutional score-settling. I’d like to see some actual statistics to demonstrate that the p-values (lol) are actually of an order of less than 5% chance and that extravagantly funded literary labs are cropping up at anything like a rate which could be considered statistically significant.
I am a bit more worried for example, about the use of text analysis within the context of political science. There’s no shortage of publications out there which are attempting to collapse the distinction between Nicolas Maduro and Donald Trump as political actors, and again just in my opinion, I think computational literary criticism gets a disproportionate amounts of flack considering how successful the anti-democratic project of promoting ‘populism’ as a category has been in Natural Language Processing. This is why the cheap rhetorical connections of quantitative literary criticism with the NSA or amazon are so irritating, where is the awareness of the material facts of the economics into which our universities are locked? How many graduate students on your campus are being funded by private companies to demonstrate the health benefits of a noodle brand or skin cream? (These are real examples) Does your university have investments in weapons manufacturing, cigarette companies? Ethical critiques are all well and good, but I think we need to start somewhere more fundamental than ‘you are reproducing hegemonic forms of knowledge-production’. Just in my opinion, I think a lot of the digital humanities as vanguard of neoliberalisation represents a means for the humanities to wash their hands of all responsibility in the contemporary decimation of universities qua educational institution.
Finally, finally, the closing arguments about how people just invent means of measuring things and then talk about them in roundabout ways doesn’t meaningfully differentiate stylometry from the rest of academia for me.
Works on Delta
Argamon, S. “Interpreting Burrows’s Delta: Geometric and Probabilistic Foundations.” Literary and Linguistic Computing23.2 (2007): 131–147. Web.
Burrows, J. “All the Way Through: Testing for Authorship in Different Frequency Strata.” Literary and Linguistic Computing22.1 (2007): 27–47. Web.
—. “Questions of Authorship: Attribution and Beyond: a Lecture Delivered on the Occasion of the Roberto Busa Award ACH-ALLC 2001, New York.” Computers and the Humanities37.1 (2003): 5–32. Print.
— . “‘Delta’: a Measure of Stylistic Difference and a Guide to Likely Authorship.” Literary and Linguistic Computing17.3 (2002): 267–287. Print.
Eder, Maciej. “Visualization in Stylometry: Cluster Analysis Using Networks.” Digital Scholarship in the Humanities32.1 (2017): 50–64. Web.
— , and Jan Rybicki. “Do Birds of a Feather Really Flock Together, or How to Choose Training Samples for Authorship Attribution.” Literary and Linguistic Computing28.2 (2013): 229–236. Web.
Elliott, Jack. “Whole Genre Sequencing.” Digital Scholarship in the Humanities32.1 (2017): 65–79. Web.
Evert, Stefan et al. “Understanding and Explaining Delta Measures for Authorship Attribution.” Digital Scholarship in the Humanities32.suppl_2 (2017): ii4–ii16. Web.
Hoover, David L. “Quantitative Analysis and Literary Studies.” A Companion to Digital Literary Studies. Ed. Ray Siemens and Susan Schreibman. Oxford: Blackwell, 2008. 1–12. Print.
— . “The Microanalysis of Style Variation.” Digital Scholarship in the Humanities32.suppl_2 (2017): ii17–ii30. Web.
Ilsemann, Hartmut. “Forensic Stylometry.” Digital Scholarship in the Humanities17.3 (2018): 267–15. Web.
Jannidis, Fotis et al. “Improving Burrows’ Delta — an Empirical Evaluation of Text Distance Measures.” 2015. Print.
Rybicki, Jan. “Vive La Différence: Tracing the (Authorial) Gender Signal by Multivariate Analysis of Word Frequencies.” Digital Scholarship in the Humanities31.4 (2016): 746–761. Web.
— , and Maciej Eder. “Deeper Delta Across Genres and Languages: Do We Really Need the Most Frequent Words?.” Literary and Linguistic Computing26.3 (2011): 315–321. Web.
Smith, Peter W H, and W Aldridge. “Improving Authorship Attribution: Optimizing Burrows’ Delta Method*.” Journal of Quantitative Linguistics18.1 (2011): 63–88. Web.