Tag Archives: Digital Humanities

Marx, Agamben, literary history and Underwood

I think Ted Underwood’s Distant Horizonsis probably the first book about computational literary studies, stylometry that I would feel confident in recommending to anyone interested in literature and not just other grad students who analyse it via stats, for the reason that the findings are so interesting on their own terms, not just because it advances promising lines of flight for the discipline, though it does that too.

The pace of change in detective versus sci-fi writing

Some of Underwood’s most interesting findings include i) the notion that the novel is a generic excrescence from biography, with a preference for physical description and sense-perception at the expense of more intricate and conceptual language, ii) that detective fiction is a far more coherent genre across time than science-fiction is and iii) segmented gender roles have become increasingly difficult to identify in fiction over the past two hundred years, and instances in which these divisions are maintained are primarily within texts written by men. By rendering these findings in the form of headlines I omit the clarifiers to which they are subject and the methods through which they were devised and I thereby do a disservice to Underwood’s work. I can only recommend that you take the time to read it yourself, as this blog will be more invested in taking up the notion of periodisation in his work by way of his previous work, Why Literary Periods Mattered.

The words which correlate with fiction (left) and those which correlate with biography (right)

If Underwood’s two books have a common ground between them, it is to challenge the terms by which literary criticism, and literary history specifically, operates. His first book provides a short history of the various institutional, historical and national interests to which the survey course, the particular modular approach in which literary history is rationalised and doled out into discrete eras or temporally bound discourses such as elizabethan, victorian, modernist etc.

The bifurcation of fiction + biography

Underwood relates the institutionalisation of these periods in university curricula to the concerns of nineteenth century historiography, and how they sustained themselves via the pre-requisites of what Gramsci would have referred to as the cultural imperatives of Fordism. Underwood does so by way of Gellner, arguing that they formed an apparatus within the nation state’s cultural legitimation: a heterogenous but shared literary history belied by an essential deep structure provides the foundation on which a collective identity may be founded.

Periodisation is therefore enormously productive, not least because it allows us to render literary history intelligible to undergraduates within increasingly industrialised universities, but it undermines our research in a number of ways. The account Underwood presents of modernist scholars nuancing what it is that makes their subject area unique versus that of their victorian colleagues (the birth of the individual, the shifting modalities of industrialisation, the growing of a gulf between the rural versus urban) rings true, and it is in this spirit that Underwood proposes his own longue duréemodel of literary history, where these changes are subsumed within much broader histories otherwise imperceptible to scholars used to focusing on quite narrow sections of the literary timeline. This would bring us to one of the more engaging findings in the book that I mentioned earlier; Underwood’s analysis does not identify the classic rise of the novel in the eighteenth century, the achievement of its classical apex in the nineteenth and its explosion in the early years of the twentieth, but rather a differentiation from non-fiction and biography along a far longer time frame. None of this necessarily invalidates the models which have been erected upon this schema; as Underwood notes, close reading needs to be a part of any literary history and far exceeds the capacity of quantitative methods at a more fine-grained textual level.

Despite the fact that I also agree with Owen Hatherley’s contention that gradualist theories of cultural production versus ones of breach and fissure are quite boring, I am with Underwood on this, as any new re-conceptualisation of literary history does need to contend with the fact that discreet phases of time are quite rarely represent decisive shifts, so much as what Lee Oser refers to as different stages in the digestion of the same metaphysic. Since coming around to stylometry myself, I’ve become more and more drawn to this notion of literary history. I nevertheless contend it leaves us with a number of other problems which I will sketch out by way of Agamben.

I was struck, when reading Agamben over the past few weeks, just how distinct his notion of temporality is when compared to that of literary criticism. It seems philosophy has far more elastic temporal boundaries as a discipline. I wouldn’t be the first to criticise Agamben’s shortcomings in respect of his speaking in broad and idealistic terms about particular intellectual trends which were fashionable or, insofar as we can tell, dominant in a particular conjuncture and thereby taking them as as representative, so we move from Aristotle to Kant to Hegel, Arendt, Heidegger, Benjamin, early modern painters and Foucault’s tendency subsume the whole of history, from the earliest of these thinkers to the last to an Entire History, which germs of the same apparatuses, laws of capture are operating effectively in perpetuity. Agamben does allow some history to emerge here and there with regard to civil liberties since 9/11 and the securitisation of the neoliberal subject, the concentration of domestic policing is one which is often missed within an contemporary account of economic forces, but these are few and far between, his notion begins with Foucault’s notion of governmentality and the instrumentalisation of the ‘mass’.

I could go on to wonder about the highly Western nature of these accounts and why the concentration camp became, for Agamben, the paradigmatic mode of being, if we accept this, and I’m not sure we should, surely we should be starting with imperialism, the training grounds for any given security state, or Prussia, rather than the civic/political divide instantiated in Aristotle’s philosophy or how Agamben reminds me of Arrighi’s critique of Gramsci which said that between coercion and consent Gramsci never contends with the kind of real social power the capitalist class have access to by virtue of their control over the means of payment and that this state of exception is in fact a disguised norm and I sort of already have, but I won’t for much longer, suffice it to say that Agamben is fairly dead set on rendering this as a logical rather than an historical argument, which is annoying. As opposed to three diagrams of two circles overlaping to varying extents you could just tell me about the British Empire.

There’s something in Agamben’s approach that touches off some of Underwood’s institutional history of comparative literature (which I hope to see him return to, there’s the germ here of a far more substantial and lengthy study that I would really relish reading about English studies pedagogy and material interest), namely, his tendency towards perennialisation, which runs parallel I think, to critiques of new modernist studies, such as one authored by identified here by Gayle Rogers. The earliest I’ve yet heard the emergence of modernist form put back to is now the fourteenth century, whereas previously I would have understood it as a pre-war phenomenon, with some fragmentary bits of proto-modernism floating around Paris in the 1850s. As Rogers argues, modernism has accumulated some degree of cachet over the past number of years and the identification of previously overlooked authors or movements as modernist, especially when they were quite avowedly not, has become sort of necessary for a new generation of grad students. To put it in more straightforward terms, there’s significant amounts of pressure behind researchers to, in the pursuit of ever-shrinking amounts of grant money, well-paid positions, to re-invent the literary-historical wheel every time they write a book. This is why Underwood’s repudiation of Marxian historiography within this overall critique of periodisation, in the same clause as Saint-Simon and Spengler (!!) are referred to, is so mystifying, as if Marx was of a kind with these two in erecting two historical moments on either side of a crevasse and never the twain shall meet. It’s a particular bugbear of minethat people read Marx’s early and polemic writings as all about the grand and powerful dialectic bestriding the planet when anyone who has taken the time to read his major works that if you are interested in how the old contains within it the seed of the new and the new the remnants of a transformed old, there is no one but no one you should be reading more attentively than Marx. It’s a bit of a shame that Underwood, who places machine learning at the centre of a new departure for quantitative literary studies precisely because of its capacity to open up computational logic to greater degrees of fuzziness and exploration is more indebted to a notion of Marxism as stagist rather than a really good way of grounding and articulating the relationship between concept and the material evidence available for it within particular contexts.

I author this more than slightly hectoring paragraph because Underwood is very aware that the most significant issues within literary criticism are structural, which is why his anecdote about a Google founder recommending teaching literary history in computer science departments rather than trying to develop the latter within English departments so chilling; this is a properly convincing solution. If periodisation represents a problem though, there needs to be some serious thought about how a posited solution doesn’t create further problems or pass over them as if they were not there. For one, in order for us and our various regimes of knowledge production to make it to the other end of the century, history is going to have to change quite quickly, everywhere and even if gradualism is a more accurate approach, and machine learning is less messianic than many of the claims which were made for DH in its early days (which has definitely been, and remains a problem) we should be wary of shedding some sense of the revolutionary altogether.

Advertisements

‘Some thoughts on Nan Z. Da’s ‘The Computational Case Against Computational Literary Criticism’ or; ‘Against Articles Beginning with the word ‘Against’’

Nan Z. Da’s in Critical Inquiry is the latest salvo in the endless digital humanities culture wars, a sub-section of the humanities in which we use computers and see how long we can sustain the same conversation we’ve been having since 1985. There’s a lot that Da writes that’s true, I have my own list of problems with what Da refers to as computational literary criticism (CLS) and a lot of it corresponds with Da’s but enough of it is sufficiently different that I felt motivated to write this far from exhaustive response. I’ll leave that to the people Da goes after.

So the notion that the field is plauged by insufficient amounts of bootstrapping, ahistoricism, the shortcomings inherent to the analysis of hundreds of thousands of novels (speaking for myself I’m totally uninterested in reading literary criticism written by someone who has not read the novels they’re quantifying) the suspect nature of network visualisations, topic modelling, reductive notions of influence which are definitely pervasive within the field, the dubiousness of mark-up are all well considered and I think I broadly correct. I think the fact that so much ink is spilled about digital humanities within higher education publications has much to do with the messianic tone in which stylometrists such as Moretti presented their work in the past. I think this is a problem which the field has to account for, and if we were in need of another reason to not read Moretti anymore, the lack of robustness in his quantitative work (if it could even be called that) would definitely be another. As Da writes in the piece’s opening paragraph, CLS can often seem tautological. The abscence of an intermediary scale between the macro at which the text is analysed and the micro at which it is read through which we might bring these two into a meaningful relation seems to me to be quite true; I’m sure a lot of people reading this are familiar with the bathetic tone of many stylometric publications’ ‘Results and Discussion’ sections. The only autononomously interesting thing I’ve ever turned up from my own analyses is that in the chapters in which Joyce introduces a woman narrator cluster with the very early sections of A Portrait of the Artist as a Young Man, suggesting that Joyce writes his women characters in much the same way as he writes young children. Otherwise, a medical humanities study being run partialy out of UCD used supervised topic modelling to analyse a large corpus of British medical journalsin a bid to provide some historical provenance to the anti-vax scare discovered the language associated with disease is heavily inflected by race and the primary means through which disease is conceived of was in racial terms. (Yes, we probably didn’t need an algorithm to tell us the British are racist or Joyce’s representations of women are an embarrassment but I thought these were interesting).

The first thing I’d dissent from Da outright on is the failure to consider the gap between authorship attribution and stylometry, this being the different between forensic attribution studies and more exploratory approaches. Though this might seem like a hedge (‘I don’t have to be rigorous if I say I don’t have to be’) but this has been the most straightforward means through which I’ve gotten away from the tyranny of replication more towards a literary criticism without a narrowly conceived utility. These is a notable lack of a consideration of the implications of Burrows’ Delta method; the way Da describes it, it would seem as though the fixation on vocabulary was a totally arbitrary decision, but in fact it was adapted because of how robust it proved as a measure of authorship, I’ll put some of articles in the works cited which will all give an indication of just how successful Delta was, but in short, it demonstrated that a relatively small sample of the words in a corpus all tend in a highly significant direction from author to author and that authorial style is absolutely rooted in the relative frequencies with which these words are deployed. This might have helped focus Da’s consideration of the value of word frequencies in Shakespeare’s plays for instance. I absolutely agree that in their own terms they’re insufficient, they need to be paired with historicisation, context, the state of the art and most critically, sensitive close readings, and many literary critics might think there’s more direct routes which I completely understand.

Replicability is the point at which I think more structural concerns need to be introduced. Just from my own anecdotal experience between a handful of Irish institutions, conversations at conferences, etc. I really do have to say the notion of stylometry as some kind of cash cow is vastly overstated. A lot of the articles which are cited to this effect are the products of individual and intra-institutional score-settling. I’d like to see some actual statistics to demonstrate that the p-values (lol) are actually of an order of less than 5% chance and that extravagantly funded literary labs are cropping up at anything like a rate which could be considered statistically significant.

I am a bit more worried for example, about the use of text analysis within the context of political science. There’s no shortage of publications out there which are attempting to collapse the distinction between Nicolas Maduro and Donald Trump as political actors, and again just in my opinion, I think computational literary criticism gets a disproportionate amounts of flack considering how successful the anti-democratic project of promoting ‘populism’ as a category has been in Natural Language Processing. This is why the cheap rhetorical connections of quantitative literary criticism with the NSA or amazon are so irritating, where is the awareness of the material facts of the economics into which our universities are locked? How many graduate students on your campus are being funded by private companies to demonstrate the health benefits of a noodle brand or skin cream? (These are real examples) Does your university have investments in weapons manufacturing, cigarette companies? Ethical critiques are all well and good, but I think we need to start somewhere more fundamental than ‘you are reproducing hegemonic forms of knowledge-production’. Just in my opinion, I think a lot of the digital humanities as vanguard of neoliberalisation represents a means for the humanities to wash their hands of all responsibility in the contemporary decimation of universities qua educational institution.

Finally, finally, the closing arguments about how people just invent means of measuring things and then talk about them in roundabout ways doesn’t meaningfully differentiate stylometry from the rest of academia for me.

Works on Delta

Argamon, S. “Interpreting Burrows’s Delta: Geometric and Probabilistic Foundations.” Literary and Linguistic Computing23.2 (2007): 131–147. Web.

Burrows, J. “All the Way Through: Testing for Authorship in Different Frequency Strata.” Literary and Linguistic Computing22.1 (2007): 27–47. Web.

 —. “Questions of Authorship: Attribution and Beyond: a Lecture Delivered on the Occasion of the Roberto Busa Award ACH-ALLC 2001, New York.” Computers and the Humanities37.1 (2003): 5–32. Print.

— . “‘Delta’: a Measure of Stylistic Difference and a Guide to Likely Authorship.” Literary and Linguistic Computing17.3 (2002): 267–287. Print.

Eder, Maciej. “Visualization in Stylometry: Cluster Analysis Using Networks.” Digital Scholarship in the Humanities32.1 (2017): 50–64. Web.

 — , and Jan Rybicki. “Do Birds of a Feather Really Flock Together, or How to Choose Training Samples for Authorship Attribution.” Literary and Linguistic Computing28.2 (2013): 229–236. Web.

Elliott, Jack. “Whole Genre Sequencing.” Digital Scholarship in the Humanities32.1 (2017): 65–79. Web.

Evert, Stefan et al. “Understanding and Explaining Delta Measures for Authorship Attribution.” Digital Scholarship in the Humanities32.suppl_2 (2017): ii4–ii16. Web.

Hoover, David L. “Quantitative Analysis and Literary Studies.” A Companion to Digital Literary Studies. Ed. Ray Siemens and Susan Schreibman. Oxford: Blackwell, 2008. 1–12. Print.

— . “The Microanalysis of Style Variation.” Digital Scholarship in the Humanities32.suppl_2 (2017): ii17–ii30. Web.

Ilsemann, Hartmut. “Forensic Stylometry.” Digital Scholarship in the Humanities17.3 (2018): 267–15. Web.

Jannidis, Fotis et al. “Improving Burrows’ Delta — an Empirical Evaluation of Text Distance Measures.” 2015. Print.

Rybicki, Jan. “Vive La Différence: Tracing the (Authorial) Gender Signal by Multivariate Analysis of Word Frequencies.” Digital Scholarship in the Humanities31.4 (2016): 746–761. Web.

— , and Maciej Eder. “Deeper Delta Across Genres and Languages: Do We Really Need the Most Frequent Words?.” Literary and Linguistic Computing26.3 (2011): 315–321. Web.

Smith, Peter W H, and W Aldridge. “Improving Authorship Attribution: Optimizing Burrows’ Delta Method*.” Journal of Quantitative Linguistics18.1 (2011): 63–88. Web.

The quantitative analysis of literature in theory

Screenshot 2018-11-06 at 11.34.35

This blog post will provide some notes towards the methodology underpinning my doctoral research. In completing my research project I will model 640 novels and short story collections within a consensus network in order to project a potential definition of modernist literary style through both qualitative and quantitative means. In the fullness of time I will have a full and replicable account up on RPubs and Github, for the moment this general introduction will have to do.

The quantitative analysis of literature has had a fraught history. Since the cultural turn of the sixties and seventies, when the political revionisms of feminism, queer and critical race theory were gaining increasing currency, the concept of ‘style’, some quintessence of the work which could be instrumentally distilled from the text, became increasingly untenable. Context became the predominant means through which literature is understood in Anglo-American literature departments. Indeed the very idea would seem to recall the belles lettres approach of the nineteenth century.

Computational literary criticism, out of necessity, treats literary materials in more pragmatic terms. When filling a spreadsheet, things need to be inputted into cells and there’s no real conversation in quantitative terms that’s possible outside of these terms. This stands in contrast to contemporary literary studies, in which one can quite happily have a long and involved discussion on what the text is not saying. Since the more recent developments inculcated within new modernist studies and neo-victorianism, which have expanded the temporal and spatial limits of their respective objects of study, into the present day, far into the past and beyond the metropoles of London, New York and Paris, aiming to de-tether the implicit value judgements of their respective categorisations, from the more problematic aspects of modernity or colonialism, these two positions have only become more polarised.

This leaves quantitative literary critics in something of a quandary. Despite some of its more vociferous advocates claiming that the application of computational logic to literary materials represents a definitive paradigm shift which the discipline at large should take more account of, their epistemological conservatism is often reflected in their political conservatism. The notion of style as combinations of quantifiable features seem to underpin an uncritical celebrations of formal competence and has been intriguingly read as an example of ‘third way’ knowledge production, as well as a backlash against politically oriented cultural criticism.

I would argue that falling into retrograde modes of thought is certainly a risk of analyses of this kind, but it doesn’t have to be a necessity, and networks, with their capacity to regard texts as embedded within a broader ecosystem, offer the possibility of bringing the new modernist studies dispensation into dialogue with quantitative literary criticism.

The quantitative analysis of literature can be said to have been kicking around as far back as monks first devised manual concordances of the Bible. Every digital humanist will be familiar with the work of Roberto Busa, but the history of the statistical analysis of literature is a more decentralised phenomenon than the big tent digital humanities. The earliest example I can find is Louis Tonko Milic’s A Quantitative Approach to the Style of Jonathan Swift which was published in 1967. Milic, bless him, seems to be under the impression that he stands at the brink of a newly invigorated formalism which can mobilise computation to reveal literary works as they truly are, bypassing the impressionism which elsewhere characterises appraisals of style within the field. Unfortunately literary critics are not terribly well-known for their command of statistics and Milic’s tendency to reproduce pages and pages of tables without assessing their significance, with a student t-test for instance, is symptomatic. Many of the earliest digital humanities journals simply reproduce the raw data in binary form and advance interpretations based on their visual impressions, rather than mathematical findings.

The development of analyses based on the richness of a text’s vocabulary (number of unique words/total number words), hapax richness (number of words which appear once in the text/total number of words) or average sentence length, word length represent an improvement on this approach, but not by much. These may be understood as indexes of style, but as before they were placed in tables and often ‘read’ in the same way literary critics usually do. There were no systematic attempts to assess sentence length across a broader corpus, nor was there any benchmark established for the assessment of significant differences.

The first quantitative analysis of literature which yielded replicable results was developed by the Australian literary critic J.F. Burrows. His Delta method, rather than focusing on the more evocative or longer words that literary critics usually focus their attention on, aimed to uncover stylistic signal by quantifying the relative occurrences of high-frequency terms, such as ‘the’, ‘an’, ‘a’, ‘and’ or ‘said’. Burrows’ original method involved using just the first 150 most frequent words (MFWs) but subsequent analyses have demonstrated successful authorship attribution increases all the way up to 5000 MFWs. The more of these particle words which are analysed, in effect, the better.

This leaves us with a problem as to what scale we analyse texts at. Eder has noted that analysing words at different scales broadcast different stylistic signals, with discomfiting amounts of variation between them. I’ve noted this phenomenon myself when analysing individual words as opposed to combinations of words in twos  (‘the man’, ‘she said’, ‘over there’), threes (‘she also said’, ‘over by the’) or even on the level of individual characters (‘th ‘, ‘a’, ‘n he’). Rybicki and Eder’s solution is to quantify all 5000 words six times, and culling them in increments of twenty; rather than finding a single ‘best’ fit, we just throw everything in and attain the average level of similarity existing between each text, subject to particular conditions. I propose a similar approach, by analysing single words, bigrams, trigrams, quadgrams and quingrams in both word and character form. This is all done through the ‘Stylo’ package, a custom-made library constructed within the R language.

Once all these analyses have been run, R outputs a list of edges into the working directory, which will form the basis of the network. It looks like this:

Screenshot 2018-11-06 at 11.09.06

Each row here represents a relationship from one text, ‘Source’, ‘Target’. Each row is effectively a line drawn from column A to column B. The third column marked ‘Weight’ signifies the intensity of the relationship, the weakest being 1 and the strongest being ~1125. This seems to be the maximum value possible so I suspect the algorithm which creates this table cuts off the similarity calculation past a certain point. To return to the table, we can see that they run in descending order of intensity, and that Anne Bronte’s novel Agnes Grey is by far most like her other novel The Tenant of Wildfell Hall. From there there’s a pronounced drop-off from 902 to a weight of 226, the next most similar novel, James Joyce’s Finnegans Wake.

A list of this kind is effectively outputted for every single scale mentioned above. They are then combined into a single massive list of edges (about 14720 rows in all). Because there are about ten edges lists, there are ten different weights for each relationship. Each of these are average into a single ‘edge’, and this forms the basis of the network, which I’ll talk about in a subsequent post.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

A Statistical Analysis of the narrators of ‘Ulysses’ or ‘why ‘Ulysses’ isn’t wisdom literature’

The second time I read Ulysses,in advance of an undergraduate seminar, it was around the ninetieth anniversary of the original text’s publication. The newspapers were printing archive material relating to the novel, extended supplements about its importance from the usual quarters, as well as reviews of recently published monographs from both young and established scholars. Unfortunately, the critical trend of the time was to read Ulysses as wisdom literature. Critics urged prospective readers of the novel to wrest Joyce from the scholars and bring him ‘back to the people’. This school of thought treated Leopold Bloom as a model of the way in which the contemporary urban subject should be living: aloof, polite, well-intentioned but not dogmatic on political issues. Moderately informed, but more often wrong, a reader, but not self-serious, an everyman. Ulysses’ structural indebtedness to cornerstones of The Canon such as William Shakespeare’s Hamlet and Homer’s The Odyssey frequently undergirds this line of argument, demonstrative in itself of how easily high literary art and everyday life may be set next to one another. This generally requires critics to treat the characters of Bloom and Stephen Dedalus as two opposites in need of the other. Each has a little to impart on life, love and literature, whether it be to reflect a little deeper on themselves or their marriage, move past their respective losses or to find in each other their lost son/father.

This interpretation of the novel reads it along a linear trajectory, as Stephen and Bloom come together to form Blephen and Stoom. Through computation it may be possible to examine the writing style of later chapters, and determine whether or not they bear formal witness to this change in character. We must first however, consider the difficulty of locating where Joyce’s narrators actually are. Part of what makes Joyce’s writing style so unique is his use of free indirect discourse, a mode of writing in which the reality of the text is inflected by the consciousness(es) of the beholder(s). As such, putting a category on each episode of Ulysses as though it were narrated by one person or a combination of persons might seem reductive; it very much is. But in fusing computation and literature, certain assumptions have to be made.

In carrying out this analysis, I made use of R’s ‘Stylo’ package, which contains tools for breaking a number of texts into equal sizes, removing words which are not common to most samples, calculating the relative frequencies of these words, transforming these observations into new combinations of variables called ‘components’ with greater explanatory potential, and clustering them together. These words appear below:

These might seem like boring terms, as literary critics we tend to look past them to more evocative ones like ‘serpentine’ or ‘columbanus’ but unfortunately, in computational terms it is the relative frequencies of these ‘particles’ or ‘function words’ that provide the most secure means of modelling a writer’s particular idiom. These samples were then plotted on a correlation matrix, which can be taken as an index of similarity, based on where they cluster:

The six different narrators of Ulysses appearing in the index above are:

‘Anon’, who narrates the episode ‘Cyclops’

‘Blephen’, a composite delineation for episodes in which both characters feature, such as ‘Circe’, ‘Eumaeus’, ‘Ithaca’ and ‘Oxen of the Sun’

Bloom, who narrates ‘Hades’, ‘Calypso’, ‘Lestrygonians’ and ‘The Lotus Eaters’, Gerty, who narrates at least half of ‘Nausicaa’ (this is a controversial point within the literature, it might by Bloom who is narrating for her)

Molly, who narrates the book’s final chapter ‘Penelope’,

and finally Stephen, who narrates the first three episodes ‘Telemachus’, ‘Nestor’ and ‘Proteus’, as well as the novel A Portrait of the Artist as a Young Man, which has been thrown in here for comparison.

Here’s the same plot as above with the labels more clearly indicated

The first thing we could note is the gender divide. Molly and Gerty both spread over to the right, with Molly as an outlier. Both are more proximate to the A Portrait samples than any other, which are all taken from the earlier parts of the novel, suggesting that Joyce writes women and young children using the same number of words at the same rate. As the Gerty samples move through the episode, they move closer and closer to the Bloom cluster, visually conforming that the episode starts in Gerty’s voice before he takes over, and that Bloom doesn’t think much of women’s intelligence in the main either.

Overall we can say that there doesn’t look to be a fusing of perspectives here as such. Rather than the Blephen episodes meeting halfway between the Stephen and Bloom, Stephen and Bloom already seem quite comfortably clustered at the novel’s outset. Based on the divide between Stephen’s episodes of Ulysses and A Portrait, we might say that the way in which Stephen narrates A Portrait is very different from the way in which he narrates Ulysses.This is justified I think by how sensitive the analysis is to changes in narrator, demonstrated by the Gerty/Bloom example already discussed, as well as the fact that the earlier part of Aeolous, in which Bloom is present, clusters with his samples, whereas the second part, after Stephen’s entered, clusters with the Stephen samples.

Below is the plot with the Portrait samples removed:

Words Stephen’s narration is most likely to use in comparison to Bloom
Words Bloom’s narration is more likely to use in comparison to Stephen

There are a number of ways one could use these results to interrogate the notion of Ulysses as wisdom literature. We could begin by asking after the gendered aspects of the adjective ‘wise’, and ask why so many of these books which teach us how one might best live are written by men (and how tone-deaf this argument can sound because to read Ulysses one might almost think married women weren’t let out of the house) or we could ask what interests an Irish model of bourgeois respectability might serve, along the lines of an Irish ‘keep calm and carry on’ poster.

Ulysses as a guide to life risks rendering it a novel of parts coming together, the middle-class intellectual and the middle-class working stiff holding hands across whatever barricade is supposed to be dividing them. Not that I would go to the other extreme and frame it as one of dissolution. Ulysses’ shape is one I would be loathe to put a vector to in fact; to say that Stephen and Bloom’s relationship moves from a) state to b) state would be too easy by half.

What makes Ulyssesan interesting novel to me is its self-referentiality, the dialogue it establishes between the novel and its supposed referent of ‘real Dublin’, which is made most clear in ‘Circe’, but also in the book’s other failed attempts to understand itself, as in the cases of the characters referenced as being in particular places at particular times who may or may not be Bloom, the McIntosh mystery or the puzzle of crossing Dublin without passing a pub. In this context, I think ‘Eumaeus’ appearing as a stylistic outlier is significant.

It is in this episode that we get information about a sequence of coincidences, and resonant differences between Bloom and Stephen’s lives. The depth of these coincidences (which I won’t provide a summary of here, because I think they’re among the most poignant parts of the novel) gesture towards something a bit more cosmically ordered than the rest of the novel even as they take place within the circumscribed rituals of Irish urban middle-class life in the early twentieth century. ‘Eumaeus’ is written in a chill tone which most closely resembles that of a scientific paper, eliding the indirect discourse which ostensibly defines the rest of the text, and it is the fact that these connections are raised here rather than anywhere else that the true interest in their relationship, such as it is, is to be found.

These connections which remain unrealised by the two, rather than bring us to some Forsterian notion of connection should raise instead questions of alienation and of their unity in separation. It presents problems both epistemological and political, about how our reality is structured, the means through which it is circumscribed and how it is more defined by how little of it we are aware of rather than how much. Rather than teaching us ‘how to live’ Ulysses shows us how we do not live, how we probably won’t live and how it could so easily have been otherwise. It is no more an explanation for life as it is an explanation of itself, or Homer, or Ireland.

Joanna Walsh’s ‘Seed’

The first thing one notices about Joanna Walsh’s online novella Seed is the quality of the design. Seed’s aesthetic is very consistent, and was obviously designed with an eye to the material at hand. For all this we have its illustrator Charlotte Hicks to thank, as well as the digital publishing company responsible for designing the platform on which the text is hosted. Seed is optimised for iOS, and, as the site tells us, is probably better viewed there, but it can also be read on a laptop or a PC.

The reader begins by being presented with seventeen different plants which open up onto different lexia, with suggestive and minimalistic titles such as ‘Baby’, ‘Touch’ or ‘Red’. Each one gives a brief insight into the life of an eighteen year old woman living in a middle-class housing estate in suburban England, coming to terms with herself, her environment, the people around her and the reality of her incipient young adulthood. By presenting the reader with seventeen different starting points (ignoring the opening explanatory remarks for a moment), and the means of proceeding in any way they might choose, the text emulates the same provisional and tentative steps that the narrator concurrently takes in the development of her own identity. In an interview, Walsh explains that the rhizoidal orientation of the text provided her with the opportunity to disorientate the reader, and perhaps engender in them the same uncertainty that the protagonist of the novella may be feeling at any given time, so that the reader has:

no sense of reading left to right, of the weight of the book, of how far they were through, or, sometimes, of the direction within the narrative.

Seed is therefore doing very deliberate and self-conscious things with the particularities of its format, typical of texts which, overtly or otherwise, draw attention to their digitality. Insofar as a firm distinction can be drawn between these two facets of the work, Seed therefore introduces a coherence/tension between its form and its content.

In a design quirk which enables this sense of openness that Seed conveys, the reader has the option of changing the text’s visual interface in order to display differently-coloured vines intertwined between each of the plants. The colours refer to each lexia’s subject matter, and inverts the standardised and industrial nature of colour-coding, a tendency, or obssession, that the narrator exhibits throughout the text:

Fruits in the supermarket. They’re a different species. Those strawberries all white in the middle all the year round, like crunchy peaches. Everything so shiny. Not a speck of earth anywhere. Why would there be? It goes straight from the formica shed to our formica kitchen. Once cut my mother wraps it in cling film and puts it in the fridge.

The narrator’s sustained attention to post-industrial artefacts, the symptoms of contemporary, or then-contemporary suburban living, is the strongest aspect of Seed. The narrator’s oscillation between a tone of matter-of-fact inventory and syntax-rupturing anxiety, enacts the very process of interpretation and the fact that so much narrative time is deployed in coming to terms with such quotidian objects, made to seem strange by their presence in a narrative medium known for attention to other, less strange things, intensifies the effect:

The doves in our garden say something else no they say somewhere else from their tall perspective looking down on lawns mowed with stripes, somewhere nature isn’t the same kind we have round here.

The site’s drawing together of Seed’s structure and content, finds a corollary in the text’s actual word usage. Walsh uses leitmotifs, particularly the names of plants or descriptions of colours in order to string each unit of text together with one another in more subtle ways, without making use of an overt visual interface.

It should be noted that the text is not as radically discontinuous as it might at first seem, or certainly was not regarded as such by Walsh, who said the following in an interview:

I’ve been thinking about the authority I’m still claiming as an ‘author’ in Seed; despite the degree of reader-control offered by the project, it’s still a fairly traditional ‘authorial’ work.

I had to write Seed as a linear text to ensure it will read ok for anyone who wants to follow the temporal narrative. That said, I never write in a ‘linear’ fashion, but in one that resembles the Seed reading experience: I write phrases, notes, paragraphs, then brings them together on shuffle, until they work.

Walsh’s comments may be surprising for those familiar with her writing methodology, which involves the use of cut-ups, or other aleatoric methods which introduce an element of chance into the composition process. It is surprising also, for those who are familiar with the somewhat niche history of digital or hypertextual literature. For many of hypertext’s trailblazing practitioners, such as Shelley Jackson or Michael Joyce, the crux of hypertextual literature was the game-playing that new digital formats allowed the author to engage in as an absent centre of meaning, which expedited the then-extremely trendy dalliances with post-structuralist philosophy and critical theory in a digital context. Within Seed’s units of text after all, there is no opportunity for interaction, except insofar as the text requires you to turn the page. In an interview with Review31, Walsh described how Seed barely resembles a hypertext in the original sense of the term at all, and that it is much better understood as a traditional work focalised around the author’s vision.

This is true, firstly for the structural reasons already outlined, but also because Seed’s formal architecture is best understood as functioning in the same way as literary works in print do, in that they imply, or gesture, far more readily than they state directly. This is axiomatic for all novels worthy of the name, but it presents an interesting means of thinking about how narrative works in the context of Seed in particular. While it might seem to present some amount of freedom or capacity for interaction, Seed is in fact circumscribing you even as it offers the chance of liberation. This has a nice visual metaphor in Seed’s visual interface which deliberately places a number of other flowers beyond the reader’s reach in darkness, suggesting both the thwarted ambition to move beyond the text that we’re presented with, and, as I’ve said already, the myopia of the narrator in her own environment:

it’s a fairly tight work, and I’ve said what I wanted to say in it. I love the idea of locked passages: part of my intent was to create a feeling of implied space beyond what is described (isn’t that the intent of most novels, to create, in however abstract a sense, a ‘world’, even if ‘world’ means a set of conceptual parameters?). I’d like to do a print edition to see if and how the circle of nonlinearity could be squared.

Though we have the ability to read Seed in any order we might like, each section is up to five pages long, and therefore requires us to read chronologically for a far greater length of time than hypertexts of the nineties do. Whether this can be attributed to the now mainstream nature of micro-textual formats, which requires literature to aspire to something else is probably a question for others to answer. Personally speaking, if writers working digitally can produce works as good as Seed, I won’t be unduly detained by the sociological reasonings why.

How big are the words modernists use?

It’s a fairly straightforward question to ask, one which most literary scholars would be able to provide a halfway decent answer to based on their own readings. Ernest Hemingway, Samuel Beckett and Gertrude Stein more likely to use short words, James Joyce, Marcel Proust and Virginia Woolf using longer ones, the rest falling somewhere between the two extremes.

Most Natural Language Processing textbooks or introductions to quantitative literary analysis demonstrate how the most frequently occurring words in a corpus will decline at a rate of about 50%, i.e. the most frequently occurring term will appear twice as often as the second, which is twice as frequent as the third, and so on and so on. I was curious to see whether another process was at work for word lengths, and whether we can see a similar decline at work in modernist novels, or whether more ‘experimental’ authors visibly buck the trend. With some fairly elementary analysis in NLTK, and data frames over into R, I generated a visualisation which looked nothing like this one.*

*The previous graph had twice as many authors and was far too noisy, with not enough distinction between the colours to make it anything other than a headwreck to read.

In narrowing down the amount of authors I was going to plot, I did incline myself more towards authors that I thought would be more variegated, getting rid of the ‘strong centre’ of modernist writing, not quite as prosodically charged as Marcel Proust, but not as brutalist as Stein either. I also put in a couple of contemporary writers for comparison, such as Will Self and Eimear McBride.

As we can see, after the rather disconnected percentages of corpora that use one letter words, with McBride and Hemingway on top at around 25%, and Stein a massive outlier at 11%, things become increasingly harmonious, and the longer the words get, the more the lines of the vectors coalesce.

Self and Hemingway dip rather egregiously with regard to their use of two-letter words (which is almost definitely because of a mutual disregard for a particular word, I’m almost sure of it), but it is Stein who exponentially increases her usage of two and three letter words. As my previous analyses have found, Stein is an absolute outlier in every analysis.

By the time the words are ten letters long, true to form it’s Self who’s writing is the only one above 1%.

Can a recurrent neural network write good prose?

At this stage in my PhD research into literary style I am looking to machine learning and neural networks, and moving away from stylostatistical methodologies, partially out of fatigue. Statistical analyses are intensely process-based and always open, it seems to me, to fairly egregious ‘nudging’ in the name of reaching favourable outcomes. This brings a kind of bathos to some statistical analyses, as they account, for a greater extent than I’d like, for methodology and process, with the result that the novelty these approaches might have brought us are neglected. I have nothing against this emphasis on process necessarily, but I do also have a thing for outcomes, as well as the mysticism and relativity machine learning can bring, alienating us as it does from the process of the script’s decision making.

I first heard of the sci-fi writer from a colleague of mine in my department. It’s Robin Sloan’s plug-in for the script-writing interface Atom which allows you to ‘autocomplete’ texts based on your input. After sixteen hours of installing, uninstalling, moving directories around and looking up stackoverflow, I got it to work.I typed in some Joyce and got stuff about Chinese spaceships as output, which was great, but science fiction isn’t exactly my area, and I wanted to train the network on a corpus of modernist fiction. Fortunately, I had the complete works of Joyce, Virginia Woolf, Gertrude Stein, Sara Baume, Anne Enright, Will Self, F. Scott FitzGerald, Eimear McBride, Ernest Hemingway, Jorge Luis Borges, Joseph Conrad, Ford Madox Ford, Franz Kafka, Katherine Mansfield, Marcel Proust, Elizabeth Bowen, Samuel Beckett, Flann O’Brien, Djuna Barnes, William Faulkner & D.H. Lawrence to hand.

My understanding of this recurrent neural network, such as it is, runs as follows. The script reads the entire corpus of over 100 novels, and calculates the distance that separates every word from every other word. The network then hazards a guess as to what word follows the word or words that you present it with, then validates this against what its actuality. It then does so over and over and over, getting ‘better’ at predicting each time. The size of the corpus is significant in determining the length of time this will take, and mine required something around twelve days. I had to cut it off after twenty four hours because I was afraid my laptop wouldn’t be able to handle it. At this point it had carried out the process 135000 times, just below 10% of the full process. Once I get access to a computer with better hardware I can look into getting better results.

How this will feed into my thesis remains nebulous, I might move in a sociological direction and take survey data on how close they reckon the final result approximates literary prose. But at this point I’m interested in what impact it might conceivably have on my own writing. I am currently trying to sustain progress on my first novel alongside my research, so, in a self-interested enough way, I pose the question, can neural networks be used in the creation of good prose?

There have been many books written on the place of cliometric methodologies in literary history. I’m thinking here of William S. Burroughs’ cut-ups, Mallarmé’s infinite book of sonnets, and the brief flirtation the literary world had with hypertext in the 90’s, but beyond of the avant-garde, I don’t think I could think of an example of an author who has foregrounded their use of numerical methods of composition. A poet friend of mine has dabbled in this sort of thing but finds it expedient to not emphasise the aleatory aspect of what she’s doing, as publishers tend to give a frosty reception when their writers suggest that their work is automated to some extent.

And I can see where they’re coming from. No matter how good they get at it, I’m unlikely to get to a point where I’ll read automatically generated literary art. Speaking for myself, when I’m reading, it is not just about the words. I’m reading Enright or Woolf or Pynchon because I’m as interested in them as I am in what they produce. How synthetic would it be to set Faulkner and McCarthy in conversation with one another if their congruencies were wholly manufactured by outside interpretation or an anonymous algorithmic process as opposed to the discursive tissue of literary sphere, if a work didn’t arise from material and actual conditions? I know I’m making a lot of value-based assessments here that wouldn’t have a place in academic discourse, and on that basis what I’m saying is indefensible, but the probabilistic infinitude of it bothers me too. When I think about all the novelists I have yet to read I immediately get panicky about my own death, and the limitless possibilities of neural networks to churn out tomes and tomes of literary data in seconds just seems to me to exacerbate the problem.

However, speaking outside of my reader-identity, as a writer, I find it invigorating. My biggest problem as a writer isn’t writing nice sentences, given enough time I’m more than capable of that, the difficulty is finding things to wrap them around. Mood, tone, image, aren’t daunting, but a text’s momentum, the plot, I suppose, eludes me completely. It’s not something that bothers me, I consider plot to be a necessary evil, and resent novels that suspend information in a deliberate, keep-you-on-the-hook sort of way, but the ‘what next’ of composition is still a knotty issue.

The generation of text could be a useful way of getting an intelligent prompt that stylistically ‘borrows’ from a broad base of literary data, smashing words and images together in a generative manner to get the associative faculties going. I’m not suggesting that these scripts would be successful were they autonomous, I think we’re a few years off one of these algorithms writing a good novel, but I hope to demonstrate that my circa 350 generated words would be successful in facilitating the process of composition:

be as the whoo, put out and going to Ingleway effect themselves old shadows as she was like a farmers of his lake, for all or grips — that else bigs they perfectly clothes and the table and chest and under her destynets called a fingers of hanged staircase and cropping in her hand from him, “never married them my said?” know’s prode another hold of the utals of the bright silence and now he was much renderuched, his eyes. It was her natural dependent clothes, cattle that they came in loads of the remarks he was there inside him. There were she was solid drugs.

“I’m sons to see, then?’ she have no such description. The legs that somewhere to chair followed, the year disappeared curl at an entire of him frwented her in courage had approached. It was a long rose of visit. The moment, the audience on the people still the gulsion rowed because it was a travalious. But nothing in the rash.

“No, Jane. What does then they all get out him, but? Or perfect?”

“The advices?”

Of came the great as prayer. He said the aspect who, she lay on the white big remarking through the father — of the grandfather did he had seen her engoors, came garden, the irony opposition on his colling of the roof. Next parapes he had coming broken as though they fould

has a sort. Quite angry to captraita in the fact terror, and a sound and then raised the powerful knocking door crawling for a greatly keep, and is so many adventored and men. He went on. He had been her she had happened his hands on a little hand of a letter and a road that he had possibly became childish limp, her keep mind over her face went in himself voice. He came to the table, to a rashes right repairing that he fulfe, but it was soldier, to different and stuff was. The knees as it was a reason and that prone, the soul? And with grikening game. In such an inquisilled-road and commanded for a magbecross that has been deskled, tight gratulations in front standing again, very unrediction and automatiled spench and six in command, a

I don’t think I’d be alone in thinking that there’s some merit in parts of this writing. I wonder if there’s an extent to which Finnegans Wake has ‘tainted’ the corpus somewhat, because stylistically, I think that’s the closest analogue to what could be said to be going on here. Interestingly, it seems to be formulating its own puns, words like ‘unrediction,’ ‘automatiled spench’ (a tantalising meta-textual reference I think) and ‘destynets’, I think, would all be reminiscent of what you could expect to find in any given section of the Wake, but they don’t turn up in the corpus proper, at least according to a ctrl + f search. What this suggests to me is that the algorithm is plotting relationships on the level of the character, as well as phrasal units. However, I don’t recall the sci-fi model turning up paragraphs that were quite so disjointed and surreal — they didn’t make loads of sense, but they were recognisable, as grammatically coherent chunks of text. Although this could be the result of working with a partially trained model.

So, how might they feed our creative process? Here’s my attempt at making nice sentences out of the above.

— I have never been married, she said. — There’s no good to be gotten out of that sort of thing at all.

He’d use his hands to do chin-ups, pull himself up over the second staircase that hung over the landing, and he’d hang then, wriggling across the awning it created over the first set of stairs, grunting out eight to ten numbers each time he passed, his feet just missing the carpeted surface of the real stairs, the proper stairs.

Every time she walked between them she would wonder which of the two that she preferred. Not the one that she preferred, but the one that were more her, which one of these two am I, which one of these two is actually me? It was the feeling of moving between the two that she could remember, not his hands. They were just an afterthought, something cropped in in retrospect.

She can’t remember her sons either.

Her life had been a slow rise, to come to what it was. A house full of men, chairs and staircases, and she wished for it now to coil into itself, like the corners of stale newspapers.

The first thing you’ll notice about this is that it is a lot shorter. I started off by traducing the above, in as much as possible, into ‘plain words’ while remaining faithful to the n-grams I liked, like ‘bright silence’ ‘old shadows’ and ‘great as prayer’. In order to create images that play off one another, and to account for the dialogue, sentences that seemed to be doing similar things began to cluster together, so paragraphs organically started to shrink. Ultimately, once the ‘purpose’ of what I was doing started to come out, a critique of bourgeois values, memory loss, the nice phrasal units started to become spurious, and the eight or so paragraphs collapsed into the three and a half above. This is also ones of my biggest writing issues, I’ll type three full pages and after the editing process they’ll come to no more than 1.5 paragraphs, maybe?

The thematic sense of dislocation and fragmentation could be a product of the source material, but most things I write are about substance-abusing depressives with broken brains cos I’m a twenty-five year old petit-bourgeois male. There’s also a fairly pallid Enright vibe to what I’ve done with the above, I think the staircases line could come straight out of The Portable Virgin.

Maybe a more well-trained corpus could provide better prompts, but overall, if you want better results out of this for any kind of creative praxis, it’s probably better to be a good writer.