Literary Cluster Analysis

I: Introduction

My PhD research will involve arguing that there has been a resurgence of modernist aesthetics in the novels of a number of contemporary authors. These authors are Anne Enright, Will Self, Eimear McBride and Sara Baume. All these writers have at various public events and in the course of many interviews, given very different accounts of their specific relation to modernism, and even if the definition of modernism wasn’t totally overdetermined, we could spend the rest of our lives defining the ways in which their writing engages, or does not engage, with the modernist canon. Indeed, if I have my way, this is what I will spend a substantial portion of my life doing.

It is not in the spirit of reaching a methodology of greater objectivity that I propose we analyse these texts through digital methods; having begun my education in statistical and quantitative methodologies in September of last year, I can tell you that these really afford us no *better* a view of any text then just reading them would, but fortunately I intend to do that too.

This cluster dendrogram was generated in R, and owes its existence to Matthew Jockers’ book Text Analysis with R for Students of Literature, from which I developed a substantial portion of the code that creates the output above.

What the code is attentive to, is the words that these authors use the most. When analysing literature qualitatively, we tend to have a magpie sensibility, zoning in on words which produce more effects or stand out in contrast to the literary matter which surrounds it. As such, the ways in which a writer would use the words ‘the’, ‘an’, ‘a’, or ‘this’, tends to pass us by, but they may be far more indicative of a writer’s style, or at least in the way that a computer would be attentive to; sentences that are ‘pretty’ are generally statistically insignificant.

II: Methodology

Every corpus that you can see in the above image was scanned into R, and then run through a code which counted the number of times every word was used in the text. The resulting figure is called the word’s frequency, and was then reduced down to its relative frequency, by dividing the figure by total number of words, and multiplying the result by 100. Every word with a relative frequency above a certain threshold was put into a matrix, and a function was used to cluster each matrix together based on the similarity of the figures they contained, according to a Euclidean metric I don’t fully understand.

The final matrix was 21 X 57, and compared these 21 corpora on the basis of their relative usage of the words ‘a’, ‘all’, ‘an’, ‘and’, ‘are’, ‘as’, ‘at’, ‘be’, ‘but’, ‘by’, ‘for’, ‘from’, ‘had’, ‘have’, ‘he’, ‘her’, ‘him’, ‘his’, ‘I’, ‘if’, ‘in’, ‘is’, ‘it’, ‘like’, ‘me’, ‘my’, ‘no’, ‘not’, ‘now’, ‘of’, ‘on’, ‘one’, ‘or’, ‘out’, ‘said’, ‘she’, ‘so’, ‘that’, ‘the’, ‘them’, ‘then’, ‘there’, ‘they’, ‘this’, ‘to’, ‘up’, ‘was’, ‘we’, ‘were’, ‘what’, ‘when’, ‘which’, ‘with’, ‘would’, and ‘you’.

Anyway, now we can read the dendrogram.

III: Interpretation

Speaking about the dendrogram in broad terms can be difficult for precisely the reason that I indicative above; quantitative/qualitative methodologies for text analysis are totally opposed to one another, but what is obvious is that Eimear McBride and Gertrude Stein are extreme outliers, and comparable only to each other. This is one way unsurprising, because of the brutish, repetitive styles and is in other ways very surprising, because McBride is on record as dismissing her work, for being ‘too navel-gaze-y.’

Jorge Luis Borges and Marcel Proust have branched off in their own direction, as has Sara Baume, which I’m not quite sure what to make of. Franz Kafka, Ernest Hemingway and William Faulkner have formed their own nexus. More comprehensible is the Anne Enright, Katherine Mansfield, D.H. Lawrence, Elizabeth Bowen, F. Scott FitzGerald and Virginia Woolf cluster; one could make, admittedly sweeping judgements about how this could be said to be modernism’s extreme centre, in which the radical experimentalism of its more revanchiste wing was fused rather harmoniously with nineteenth-century social realism, which produced a kind of indirect discourse, at which I think each of these authors excel.

These revanchistes are well represented in the dendrogram’s right wing, with Flann O’Brien, James Joyce, Samuel Beckett and Djuna Barnes having clustered together, though I am not quite sure what to make of Ford Madox Ford/Joseph Conrad’s showing at all, being unfamiliar with the work.

IV: Conclusion

The basic rule in interpreting dendrograms is that the closer the ‘leaves’ reach the bottom, the more similar they can be said to be. Therefore, Anne Enright and Will Self are the contemporary modernists most closely aligned to the forebears, if indeed forebears they can be said to be. It would be harder, from a quantitative perspective, to align Sara Baume with this trend in a straightforward manner, and McBride only seems to correlate with Stein because of how inalienably strange their respective prose styles are.

The primary point to take away here, if there is one, is that more investigations are required. The analysis is hardly unproblematic. For one, the corpus sizes vary enormously. Borges’ corpus is around 46 thousand words, whereas Proust reaches somewhere around 1.2 million. In one way, the results are encouraging, Borges and Barnes, two authors with only one texts in their corpus, aren’t prevented from being compared to novelists with serious word counts, but in another way, it is pretty well impossible to derive literary measurements from texts without taking their length into account. The next stage of the analysis will probably involve breaking the corpora up into units of 50 thousand words, so that the results for individual novels can be compared.

Elizabeth Bowen RTÉ Documentary

Really, really entertaining and informative documentary about the novelist Elizabeth Bowen, what Virginia Woolf made of Bowen’s gaff and Bowen’s extra-marital affairs. Most worth it, I think, for the details given of Bowen’s spying for the English. Her accounts of key Irish figures of the time are less bureaucratic and informative than one might expect; they more closely resemble the forensic character sketches that one encounters in her fiction.


John Banville’s ‘The Book of Evidence’ and Anglo-Irish Nostalgia

Every time I read a John Banville novel, I wish that it were the first time that I was reading a John Banville novel because, taken in a vacuum, each one is a work of great invention. Banville has a capacity to infuse into his high narratives of failed epistemology features of non-high literature (an impulse that Banville now channels into his Benjamin Black persona), and his post-Nabokovian reveries are surely among the most compelling of their kind but, having read about four them, a pattern begins to stand out and here we come to the less appealing aspects of his writing.

  • The perpetually waning, ethereal, always-described-relative-to-their-physical-features female ‘characters.’
  • The aging, reprehensibly lecherous but aesthetically-atuned middle-aged or old men at each of the novel’s centres.
  • The deconstruction of the novel’s artifice every page or so.
  • Four or five points at which it is suggested that the plot in its entirety is contrived.
  • The quiet twist in the text’s last four or five pages.

I could go on, and say a lot of other things that annoy me but the London Review of Books pretty well covered it in its review of his most recent novel The Blue Guitar. So I’ll just say that The Infinities featuring an omniscient God-narrator rather rather than a mortal one, allowed the usual course of his writings to be unsettled and re-vitalised in a way. Still a shame about Helen Godley, as sketchily characterised as she is attractive. Similarly, Banville remains a good sentencer, with a firm grasp on underplayed humour and The Book of Evidence had more than the average amount of good phrases and the momentary diversions of his baroque prose style is generally enough to get me through one of his books.

However, there was more than just this to keep my interest throughout The Book of Evidence, and that was the main character’s apparent nostalgia for the departed world of Georgian Dublin, through the prism of the Anglo-Irish ruling class. Freddie Montgomery is of upper-middle class Catholic stock, though his household, when he returns to Ireland, seems to have Gone Down, as big houses in Irish books will do. Montgomery remembers his father’s attitude to modern Irish history in the following terms: ‘the world, the only worthwhile world, had ended with the last viceroy’s departure from these shores. After that it was all just a wrangle among peasants.’ He even calls Dún Laoghaire Kingstown. This nostalgic treatment of seventeenth-century Ireland is familiar within Irish literature, as one can see from the works of W.B. Yeats and Elizabeth Bowen. One can perhaps just about glimpse the emergent rhythms of Banville’s prose style in the following quote from Bowen’s Court:

‘The great bold rooms, the high doors imposed an order on life. Sun blazed in at the windows, fires roared in the grates. There was a sweet, fresh-paned smell from the floors. Life still kept a touch of colonial vigour; at the same time, because of the glory of everything, it was bound up in the quality of a dream.’

Some of Banville’s thematic preoccupations seem to be gestured towards here also, the faint oscillation of unreality beneath appearance, the intensity of things just in their raw being-ness and the wealth on the backs of colonial subjects without the compromising fact of their existence relates to Banville’s capacity to keep his distance from the interiority of others, and perhaps from the interiority of protagonists themselves. We see also an attraction to surface and a repudiation of tacky actualness.

Roy Foster sees the eighteenth-century pursuit of a high-style in all things from buildings, public works and overwrought, intricate furbelows in their neo-classical architecture as a try-hard pathology in response to their self-perception, a recognition of their colonial status with the attempt to construct a better capital with better public buildings than the English. Foster writes that many contemporary visitors to Dublin expected a provincial town and were confronted with a totally inappropriate level of architectural and civic grandeur. One, in a mode that is not entirely un-Banvillean mode writes that visiting Dublin was like being ‘at table with a man who serves me Burgundy, but whose attendant is a bailiff disguised in livery.’ This pretentiousness emerges from Georgian Dublin’s precarious sense of itself and relates meaningfully to Banville’s high style, as a compensation for the insufficiency of one’s identity. Montgomery’s dreams, his notions, his self are even more dream like, than they at first seem, as they are constructed on a misinterpretation of history.

Tessa Hadley reads Elizabeth Bowen

I’ve been worrying lately that I’m a bit of an avant-garde literature fetishist, incapable of responding to stories that non-pretentious types read and that I’ll never enjoy a ‘there once was a woman story’ ever again. Luckily, I just read/heard my first Elizabeth Bowen story, and it’s great