Tag Archives: Against the Wordle

A (Proper) Statistical analysis of the prose works of Samuel Beckett


Content warning: If you want to get to the fun parts, the results of an analysis of Beckett’s use of language, skip to sections VII and VIII. Everything before that is navel-gazing methodology stuff.

If you want to know how I carried out my analysis, and utilise my code for your own purposes, here’s a link to my R code on my blog, with step-by-step instructions, because not enough places on the internet include that.

I: Things Wrong with my Dissertation’s Methodology

For my masters, I wrote a 20000 word dissertation, which took as its subject, an empirical analysis of the works of Samuel Beckett. I had a corpus of his entire works with the exception of his first novel Dream of Fair to Middling Women, which is a forgivable lapse, because he ended up cannibalising it for his collection of short stories, More Pricks than Kicks.

Quantitative literary analysis is generally carried out in one of two ways, through either one of the open-source programming languages Python or R. The former you’ve more likely to have heard of, being one of the few languages designed with usability in mind. The latter, R, would be more familiar to specialists, or people who work in the social sciences, as it is more obtuse than Python, doesn’t have many language cousins and has a very unfriendly learning curve. But I am attracted to difficulty, so I am using it for my PhD analysis.

I had about four months to carry out my analysis, so the idea of taking on a programming language in a self-directed learning environment was not feasible, particularly since I wanted to make a good go at the extensive body of secondary literature written on Beckett. I therefore made use of a corpus analysis tool called Voyant. This was a couple of years ago, so this was before its beta release, when it got all tricked out with some qualitative tools and a shiny new interface, which would have been helpful. Ah well. It can be run out of any browser, if you feel like giving it a look.

My analysis was also chronological, in that it looked at changes in Beckett’s use of language over time, with a view to proving the hypothesis that he used a less wide vocabulary as his career continued, in pursuit of his famed aesthetic of nothingness or deprivation. As I wanted to chart developments in his prose over time, I dated the composition of each text, and built a corpus for each year, from 1930–1987, excluding of course, years in which he just wrote drama, poetry, which wouldn’t be helpful to quantify in conjunction with one another. Which didn’t stop me doing so for my masters analysis. It was a disaster.

II: Uniqueness

Uniqueness, the measurement used to quantify the general spread of Beckett’s vocabulary, was obtained by the generally accepted formula below:

unique word tokens / total words

There is a problem with this measurement, in that it takes no account of a text’s relative length. As a text gets longer, the likelihood of each word being used approaches 1. Therefore, a text gets less unique as it gets bigger. I have the correlations to prove it:

Screen Shot 2016-11-03 at 12.18.03.png

There have been various solutions proposed to this quandary, which stymies our comparative analyses, somewhat. One among them is the use of vectorised measurements, which plot the text’s declining uniqueness against its word count, so we see a more impressionistic graph, such as this one, which should allow us to compare the word counts for James Joyce’s novels, A Portrait of the Artist as a Young Man and his short story collection, Dubliners.

Screen Shot 2016-11-03 at 13.28.18.png

All well and good for two or maybe even five texts, but one can see how, with large scale corpora, this sort of thing can get very incoherent very quickly. Furthermore, if one was to examine the numbers on the y-axis, one can see that the differences here are tiny. This is another idiosyncrasy of stylostatistical methods; because of the way syntax works, the margins of difference wouldn’t be regarded as significant by most statisticians. These issues relating to the measurement are exacerbated by the fact that ‘particles,’ the atomic structures of literary speech, (it, is, the, a, an, and, said, etc.) make up most of a text. In pursuit of greater statistical significance for their papers, digital literary critics remove these particles from their texts, which is another unforgivable that we do anyway. I did not, because I was concerned that I was complicit in the neoliberalisation of higher education. I also wrote a 4000 word chapter that outlined why what I was doing was awful.

IV: Ambiguity

The formula for ambiguity was arrived at by the following formula:

number of indefinite pronouns/total word count

I derived this measurement from Dr. Ian Lancashire’s study of the works of Agatha Christie, and counted Beckett’s use of a set of indefinite pronouns, ‘everyone,’ ‘everybody,’ ‘everywhere,’ ‘everything,’ ‘someone,’ ‘somebody,’ ‘somewhere,’ ‘something,’ ‘anyone,’ ‘anybody,’ ‘anywhere,’ ‘anything,’ ‘no one,’ ‘nobody,’ ‘nowhere,’ and ‘nothing.’ Those of you who know that there are more indefinite pronouns than just these, you are correct, I had found an incomplete list of indefinite pronouns, and I assumed that that was all. This is just one of the many things wrong with my study. My theory was that there were to be correlations to be detected in Beckett’s decreasing vocabulary, and increasing deployment of indefinite pronouns, relative to the total word count. I called the vocabulary measure ‘uniqueness,’ and the indefinite pronouns measure I called ‘ambiguity.’ This in tenuous I know, indefinite pronouns advance information as they elide the provision of information. It is, like so much else in the quantitative analysis of literature, totally unforgivable, yet we do it anyway.

V: Hapax Richness

I initially wanted to take into account another phenomenon known as the hapax score, which charts occurrences of words that appear only once in a text or corpus. The formula to obtain it would be the following:

number of words that appear once/total word count

I believe that the hapax count would be of significance to a Beckett analysis because of the points at which his normally incompetent narrators have sudden bursts of loquaciousness, like when Molloy says something like ‘digital emunction and the peripatetic piss,’ before lapsing back into his ‘normal’ tone of voice. Once again, because I was often working with a pen and paper, this became impossible, but now that I know how to code, I plan to go over my masters analysis, and do it properly. The hapax score will form a part of this new analysis.

VI: Code & Software

A much more accurate way of analysing vocabulary, for the purposes of comparative analysis when your texts are of different lengths, therefore, would be to randomly sample it. Obviously not very easy when you’re working with a corpus analysis tool online, but far more straightforward when working through a programming language. A formula for representative sampling was found, and integrated into the code. My script is essentially a series of nested loops and if/else statements, that randomly and sequentially sample a text, calculate the uniqueness, indefiniteness and hapax density ten times, store the results in a variable, and then calculate the mean value for each by dividing the result by ten, the number of times that the first loop runs. I inputted each value into the statistical analysis program SPSS, because it makes pretty graphs with less effort than R requires.

VII: Results

I used SPSS’ box plot function first to identify any outliers for uniqueness, hapax density and ambiguity. 1981 was the only year which scored particularly high for relative usage of indefinite pronouns.


It should be said that this measure too, is correlated to the length of the text, which only stands to reason; as a text gets longer the relative incidence of a particular set of words will decrease. Therefore, as the only texts Beckett wrote this year, ‘The Way’ and ‘Ceiling,’ both add up to about 582 words (the fifth lowest year for prose output in his life), one would expect indefiniteness to be somewhat higher in comparison to other years. However, this doesn’t wholly account for its status as an outlier value. Towards the end of his life Beckett wrote increasingly short prose pieces. Comment C’est (How It Is) was his last novel, and was written almost thirty years before he died. This probably has a lot to do with his concentration on writing and directing his plays, but in his letters he attributed it to a failure to progress beyond the third novel in his so-called trilogy of Molloy, Malone meurt (Malone Dies) and L’innomable (The Unnamable). It is in the year 1950, the year in which L’inno was completed, that Beckett began writing the Textes pour rien (Texts for Nothing), scrappy, disjointed pieces, many of which seem to be taking up from where L’inno left off, similarly the Fizzlesand the Faux Départs. ‘The Way,’ I think, is an outgrowth of a later phase in Beckett’s prose writing, which dispenses the peripatetic loquaciousness and the understated lyricism of the trilogy and replaces it with a more brute and staccato syntax, one which is often dependent on the repetition of monosyllables:

No knowledge of where gone from. Nor of how. Nor of whom. None of whence come to. Partly to. Nor of how. Nor of whom. None of anything. Save dimly of having come to. Partly to. With dread of being again. Partly again. Somewhere again. Somehow again. Someone again.

Note also the prevalence of particle words, that will have been stripped out for the analysis, and the ways in which words with a ‘some’ prefix are repeated as a sort of refrain. This essential structure persists in the work, or at least the artefact of the work that the code produces, and hence of it, the outlier that it is.

Screen Shot 2016-11-03 at 12.55.13.png

From plotting all the values together at once, we can see that uniqueness is partially dependent on hapax density; the words that appear only once in a particular corpus would be important in driving up the score for uniqueness. While there could said to be a case for the hypothesis that Beckett’s texts get less unique, more ambiguous up until 1944, when he completed his novel Watt, and if we’re feeling particularly risky, up until 1960 when Comment C’est was completed, it would be wholly disingenuous to advance it beyond this point, when his style becomes far too erratic to categorise definitively. Comment C’est is Beckett’s most uncompromising prose work. It has no punctuation, no capitalisation, and narrates the story of two characters, in a kind of love, who communicate with one another by banging kitchen implements off another:

as it comes bits and scraps all sorts not so many and to conclude happy end cut thrust DO YOU LOVE ME no or nails armpit and little song to conclude happy end of part two leaving only part three and last the day comes I come to the day Bom comes YOU BOM me Bom ME BOM you Bom we Bom

VIII: Conclusion

I would love to say that the general tone is what my model is being attentive to, which is why it identified Watt and How It Is as nadirs in Beckett’s career but I think their presence on the chart is more a product of their relative length, as novels, versus the shorter pieces which he moved towards in his later career. Clearly, Beckett’s decision to write shorter texts, make this means of summing up his oeuvre in general, insufficient. Whatever changes Beckett made to his aesthetic over time, we might not need to have such complicated graphs to map, and I could have just used a word processor to find it — length. Bom and Pim aside, for whatever reason after having written L’inno none of Beckett’s creatures presented themselves to him in novelistic form again. The partiality of vision and modal tone which pervades the post-L’inno works demonstrates, I think far more effectively what is was that Beckett was ‘pitching’ for, a new conceptual aspect to his prose, which re-emphasised its bibliographic aspects, the most fundamental of which was their brevity, or the appearance of an incompleteness, by virtue of being honed to sometimes less than five hundred words.

The quantification of differing categories of words seems like a radical, and the most fun, thing to quantify in the analysis of literary texts, as the words are what we came for, but the problem is similar to one that overtakes one who attempts to read a literary text word by word by word, and unpack its significance as one goes: overdetermination. Words are kaleidoscopic, and the longer you look at them, the more threatening their darkbloom becomes, the more they swallow, excrete, the more alive they are, all round. Which is fine. Letting new things into your life is what it should be about, until their attendant drawbacks become clear, and you start to become ambivalent about all the fat and living things you have in your head. You start to wish you read poems instead, rather than novels, which make you go mad, and worse, start to write them. The point is words breed words, and their connections are too easily traced by computer. There’s something else about knowing that their exact correlations to a decimal point. They seem so obvious now.


My Dissertation

I finished my dissertation – a quantitative analysis of the works of Samuel Beckett. There’s a copy available in Hodges & Figgis because I left one there.

Alternatively, here is the PDF.

Against the Wordle

The page 99 test: Samuel Beckett’s Company/Ill Seen Ill Said/Worstword Ho/Stirrings Still

The page 99 test (explanation here) is in many ways a somewhat imperfect methodology. Many texts, such as Rainer Maria Rilke’s Letters to a Young Poet are immune to the approach, due to it being 79 pages long. The Faber edition of Samuel Beckett’s Company/Ill Seen Ill Said/Worstword Ho/Stirrings Still, a collection of not-quite short stories, not quite poems, not quite proems is another problematic test case.

Company obviously receives top billing for a reason; it is one of the only ones of Beckett’s later works that can be read in polite company. For example:

“There on summer Sundays after his midday meal your father loved to retreat with Punch and a cushion. The waist of his trousers unbuttoned he sat on the one ledge turning the pages. You on the other with your feet dangling. When he chuckled you tried to chuckle too. When his chuckle died yours too. That you should try to imitate his chuckle pleased and tickled him greatly and sometimes he would chuckle for no other reason than to hear you try to chuckle too. Sometimes you turn your head and look out through a rose-red pane. You press your little nose against the pane and all without is rosy. The years have flown and there at the same place as then you sit in the bloom of adulthood bathed in rainbow light gazing before you.”

As you can see, it’s very nice.

Unfortunately, it’s only nice for 42 pages, and then it ends.

The page 99 test therefore takes us to the mid-point in Worstword Ho. The fact that we start reading it halfway rather than at the beginning makes little difference, the staccato monologue in which it is written, not to mention the demi-paragraphs in which it is arranged do more to obfuscate rather than to illuminate.

“Stare by words dimmed. Shades dimmed. Void dimmed. Dim dimmed.”

The above quotation expresses the kind of distaste for words that only a self-conscious practitioner of them can have. Nevertheless they are a necessary evil, when we think and when we speak, they are the tools that we think through and with.

“No. Shades cannot go.”

V.S. Pritchett’s ‘lyrical glints’ in the rough of Samuel Beckett’s ‘How It Is’

In his review of Beckett’s final novel, How It Is, V.S. Pritchett concluded that Beckett had paid ‘a heavy price in obscurity, pretentiousness and awful boredom.’ Evidently Pritchett was not a fan of Beckett’s free-wheeling with punctuation, lack of a plot and experiments with language. Blasphemous as it is, it’s possible to see his point of view, reading about the exploits of someone traversing a barren desert landscape with a bag of tins around their neck, seeking an other to rhythmically mash with a can-opener isn’t everyone’s idea of a good story.

Pritchett qualifies his critique with the point that there are ‘lyrical glints’ aplenty that mollify his more righteous instincts in his crusade against all things pretentiously boring and obscure. This can sometimes reflect the experience of reading texts that are in some ways manufactured to be monotonous and alienating, the Pritchetts of the world soldier vainly onward like the ‘protagonist’ Pim on his face in the dirt, (‘mouth opens the tongue comes out lolls in the mud and no question of thirst either’) tongue lolling outwards, thirsty for some more ‘lyrical glints’ amid the discordant grikes.

The following is one such lyrical glint:

we are on a veranda smothered in verbena the scented sun dapples the red tiles yes I assure you the huge head hatted with birds and flowers is bowed down over my curls the eyes burn with severe love I offer her mine pale upcast to the sky whence cometh our help and which I know perhaps even then with time shall pass away

Pritchett is correct in pinpointing these as one of the stand-out features of the novel, they are indicative of a certain kind of childhood memory that circulate throughout the text and occur compulsively, saturated in the sepia of nostalgia. But what makes them that much more poignant is the contrast with Pim’s reality, the seeming intensity of his inner life at one point, (whether it can be said to be dormant or a remnant of what it once during the narration of How It Is is somewhat moot) makes the degradation of his current state all the more incomprehensible and, though one shouldn’t be prone to making these sort of value judgements on a novel that repudiates the mechanism of characterisation, upsetting.

For example, a section of his monologue rendered below. Words that are capitalised are ones he is communicating to his ‘companion’ Bom, by smacking him with a can-opener.

as it comes bits and scraps all sorts not so many and to conclude happy end cut thrust DO YOU LOVE ME no or nails armpit and little song to conclude happy end of part two leaving only part three and last the day comes I come to the day Bom comes YOU BOM me Bom ME BOM you Bom we Bom

Jack Emery does Samuel Beckett’s ‘Malone Dies’

Jack Emery’s take on the opening lines of Malone Dies is a strange beast. One is struck initially by his somewhat cartoonish intonation. He furthermore deviates from the text a couple of times. These are, however, only ever merely minor infractions, the insertion of a mere where it is unnecessary once or twice – as I say, minor infractions, minor, think no more of them.

For the most part then, Emery is felicitous to words and punctuation, not so much delivery. Emery will occasionally become excited and raise the pitch of his voice to emphasise that which he may have possibly believed merited apotheosis, particularly when citing the various days of celebration that he believes his death may occur before.  What emerges from the text on reading it is the absurdity of keeping track of his progress relative to days in religious and civic life known for their feasting and celebration, for Malone they have morbid tinge. Emery will develop his excited pitch while listing them off, as if invigorated by the thought of publicly brandished bunting, something that I can’t imagine Malone being. When Emery recites the line about Malone’s ‘old debtor,’ death, Emery speaks as if death was an old friend who just walked into the room for a long ‘aul confab.

Another impression one gets is that Malone sounds like the beneficiary of elocution classes. Every syllable is enunciated precisely, the words ‘neutral’ and ‘inert’ are sounded in a sing-song tone. When he declares that his stories are to be as lifeless as the teller is, it seems somewhat nonsensical; Malone as he appears here seems explosively spontaneous and hopped up on his own vim.

It is possible that it is the residua of a national pride within me that reacts against Emery’s accent and is determined to view Malone as being Irish. This is possibly counter-intuitive, as, despite his name, the novel was written originally in French. This is possibly why the fourteenth of July is named with the reverence that it is. From Malone’s description of the view from his window I had assumed that he lived in London, perhaps the bed-sit in Paultons Square that Beckett himself lived in. However, Beckett neglects to mention the charming Georgian square garden that sits between him and the woman he sees going about her chores every evening. Either Malone has a predilection for taking a negative view of his situation (probable), or Beckett, glimpsing the future of Blarney-inflected Joyce walking tours of Dublin and, not wishing to be burdened with a similar legacy, purged his writings of all geographic identifiers (unlikely). This is Beckett’s decision to not state the location directly. I suppose that it is equally possible that he lives in a small bed-sit in Paris with a view into a woman’s flat. Then again, Malone says he doesn’t have to lift his head of the pillow to see her, so maybe the square is there. Then again again, Malone probably wouldn’t be able to see the whole way across the square, assuming that he’s visually impaired. In any case, one could say that Emery’s accent contributes this over determined palimpsest of conflicting national identities.

My issues with the Beckett monologues, as spoken by Emery and Pinter have been their tendency to either dabble in grandiloquence and to occasionally build momentum, attempting to deepen a sense of narrative or imminent arrival at some ‘point.’ The pursuit of an ictus or the speaking of one sentence in such a way that makes it more important than any other, the performer channels interiority, that I find sits uncomfortably with the source material. Perhaps am being too hard on both Emery and Pinter, their performances may reflect an attempt to introduce contrast and the demands of live performance simply would not work for this cool recitation on the subject of one’s demise.

En bref: Less is more.

Samuel Beckett’s ‘The Unnamable’ – Who Wore It Better?

The above video features the actor Jack MacGowran reading the closing lines of Beckett’s novel The Unnamable. MacGowran is one of a celebrated few who Beckett personally approved during his lifetime and when Beckett had the means put at his disposal to direct a production by a generous patron of the arts, he cast him opposite Patrick Magee in a London production of Endgame. The play Eh Joe was written for him and he also staged a number of one-man shows based on Beckett’s works.

This video depicts the playwright and actor Harold Pinter, reading a longer extract of the same novel, albeit prefaced by an anecdote of the man Beckett, who helped him with his hangover.

Pinter delivers the line with far more determination than MacGowran, his voice has a sonorous depth to it, which is not to mention his sophisticated accent, and though his capacity to keep pace with the breathless quality of the prose is to be praised, he sounds as though he is constantly barreling towards a conclusion.

The visual component also deserves treatment. Pinter stares into the lens while spouting all this, in itself an achievement, but with this seems to come a demand to stress, emphasise and bring to bear some of his pugnaciousness for which he is known, I say at the risk of looking to the man and not the work. I find it most pronounced at certain junctures when Pinter/The Unnamable seems to come to some sort of realisation, leaning quite a bit on the word ‘indictment.’ Being unfamiliar with the contents of Pinter’s mind this is a guess, but all told, this seems to be what The Unnamable represents for him, if the letter he reads at the start proclaiming Beckett to be an elucidator of the human condition is to be trusted. The close-up lends itself further to a crescendo, lending Pinter’s conclusion to the genre of reality television where the actors later confess by speaking about the day’s happenings.

MacGowran’s reading is barely above the volume of a whisper, his inhalations make his performance as much about what he does not say, or what his rasping gasps say. Its focus is more that of the rhythm of not-saying and creates the illusion that the voice could go on lisping into the void forever, which of course it does, an effect implied all the better by MacGowran’s shedding of trajectory.


Samuel Beckett’s ‘Molloy’

The video embedded above is one of my favourite voices, Barry McGovern, reading my favourite section of one of my favourite novels.

Molloy is the protagonist of the first part of his 1951 novel, Molloy. Throughout the narrative he wanders around an uncertain city and through an uncertain countryside (albeit one with a distinctly Irish ambience) before being apprehended in and made to commit his narrative to paper. The reasons for this are not stated clearly.

McGovern narrates a point in the novel at which Molloy finds himself at the seaside, determined to initiate a routine through which he can suck sixteen small stones that he has acquired for an equal amount of time. The more he thinks about how this system should be instituted, the more complicated the issue becomes.

This video functions more like a soundscape than a straightforward audio rendering, something I think more aural interpretations of an author’s works should aspire to, especially considering Beckett’s willingness to experiment with radio plays during his lifetime. A slower voice (also McGovern’s?), more drained of enunciation and distorted, both from its apparent distance and from its sounding like a tape recorder, repeats what Molloy has just said and at some points also manages to outrun his lagging train of thought, emphasising Molloy’s cognitive decline. Occasionally the voice will add words to McGovern’s enunciation, or deliver them with a greater degree of terseness

The recording is also interpolated with a number of non sequiturs in the form of an motor starting up and an atonal note from a trumpet. These come to prominence as Molloy is outlining his methodology, undermining the sense that his approach proceeds along rational lines, or that he has a sympathetic listener. Whatever is causing this noise seems pretty determined to drown Molloy out.

McGovern’s voice occasionally increases in volume and proximity, as if he’s suddenly leaning quite intently into the microphone. These articulations have a surreptitious intimacy to them and expands the range of McGobvern’s expression into three or so, normal McGovern, whisper-in-your-ear McGovern and the tape recorder McGovern. This triumvirate suggests the increasing diminution of self-presence, a crucial theme in the Trilogy of Molloy, Malone Dies and The Unnamable. The failed unity of the three is established roughly halfway in when the narrator allows the tension to build for the barest second, before two of them pronounce ‘all!’ slightly out of step with one another.