I have recently begun to experiment with Natural Language Processing to determine how particular words in modernist texts are correlated. I’m still getting my head around Python and NLTK, but so far I’m finding it much more user-friendly than similar packages in R.
Long-term I hope to graph these collocations in high-vector space, so that I can graph them, but for the moment, I’m interested in noting the prevalence of the term ‘young man’, Self and Baume being the only authors that have female adjective-noun phrases, and the usage of titles which convey particular social hierarchies; Joyce, Woolf and Bowen’s collocations are almost exclusively composed of these, as is Stein’s, with the clarifier that Stein’s appear shorn of their ‘Mr.’, ‘Miss.’ or ‘Doctor’.
Here’s all the collocations in the modernist corpus:
young man; robert jordan; new york; gertrude stein; old man; could see; henry martin; every one; years ago; first time; long time; hugh monckton; great deal; come back; david hersland; good deal; every day; edward colman; came back; alfred hersland
Canonical modernist texts:
young man; robert jordan; gertrude stein; henry martin; new york; every one; old man; could see; years ago; long time; hugh monckton; first time; great deal; david hersland; come back; good deal; every day; edward colman; alfred hersland; mr. bettesworth
fat controller; phar lap; von sasser; first time; per cent; could see; old man; one another; even though; years ago; new york; front door; young man; either side; someone else; dave rudman; last night; living room; steering wheel; every time
frau mann; nora said; english girl; someone else; long ago; leaned forward; london bridge; come upon; could never; god knows; doctor said; sweet sake; first time; five francs; terrible thing; francis joseph; hôtel récamier; orange blossoms; bowed slightly; would say
kentish town; someone else; first time; last night; jesus christ; something else; years ago; five minutes; every day; hail mary; take care; next week; arms around; never mind; every single; little girl; little boy; two years; soon enough; come back
mrs kerr; lady waters; mrs heccomb; major brutt; mme fisher; lady naylor; miss fisher; good deal; said mrs; first time; lady elfrida; one another; young man; colonel duperrier; aunt violet; last night; ann lee; one thing; sir robert; sir richard
robert jordan; old man; could see; colonel said; gran maestro; catherine said; jordan said; richard gordon; long time; pilar said; thou art; pablo said; nick said; bill said; girl said; captain willie; young man; automatic rifle; mr. frazer; david said
F. Scott FitzGerald
new york; young man; years ago; first time; sally carrol; several times; fifth avenue; ten minutes; minutes later; richard caramel; thousand dollars; five minutes; young men; evening post; old man; next day; saturday evening; long time; last night; come back
gertrude stein; every one; david hersland; alfred hersland; angry feeling; family living; independent dependent; jeff campbell; julia dehning; mrs. hersland; daily living; whole one; bottom nature; madeleine wyman; good deal; mary maxworthing; middle living; miss mathilda; mabel linker; every day
buck mulligan; said mr.; martin cunningham; aunt kate; says joe; mary jane; corny kelleher; ned lambert; mrs. kearney; stephen said; mr. henchy; ignatius gallaher; father conmee; nosey flynn; mr. kernan; myles crawford; cissy caffrey; ben dollard; mr. cunningham; miss douce
young man; faubourg saint-germain; long ago; caught sight; first time; every day; one day; great deal; des laumes; young men; could see; quite well; next day; one another; would never; nissim bernard; victor hugo; would say; louis xiv; long time
said camier; said mercier; miss counihan; lord gall; miss carridge; mr. kelly; panting stops; said belacqua; mr. endon; said wylie; said neary; one day; otto olaf; dr. killiecrankie; come back; vast stretch; mrs gorman; push pull; something else; ground floor
even though; tawny bay; living room; old man; passenger seat; bird walk; maggot nose; shut-up-and-locked room; stone fence; food bowl; lonely peephole; low chair; old woman; kennel keeper; rearview mirror; shih tzu; shore wall; safe space; every day; oneeye oneeye
miss barrett; mrs. ramsay; mrs. hilbery; young man; st. john; could see; years ago; peter walsh; mrs. thornbury; miss allan; said mrs.; young men; mrs. swithin; human beings; wimpole street; mrs. flushing; mr. ramsay; mrs. manresa; sir william; door opened
new york; per cent; eliza lynch; dear friend; years old; even though; first time; came back; years ago; long time; michael weiss; señor lópez; living room; every time; looked like; could see; one day; said constance; pat madigan; mrs hanratty
fat controller; phar lap; von sasser; one another; old man; could see; first time; per cent; dave rudman; let alone; front door; young man; skip tracer; quantity theory; jane bowen; los angeles; young woman; either side; charing cross; long since
father fahrt; good fairy; father cobble; said shanahan; mrs crotty; said furriskey; said lamont; mrs laverty; one thing; sergeant fottrell; said slug; old mathers; public house; far away; cardinal baldini; monsignor cahill; mrs furriskey; red swan; black box; said shorty
Ford Madox Ford
henry martin; hugh monckton; edward colman; privy seal; mr. bettesworth; mr. fleight; young man; mr. sorrell; sergius mihailovitch; young lovell; new york; jeanne becquerel; lady aldington; kerr howe; anne jeal; miss peabody; mr. pett; great deal; marie elizabeth; robert grimshaw
Jorge Luis Borges
ts’ui pên; buenos aires; pierre menard; eleventh volume; richard madden; nils runeberg; yiddische zeitung; stephen albert; hundred years; erik lönnrot; firing squad; henri bachelier; madame henri; orbis tertius; vincent moon; paint shop; seventeenth century; anglo-american cyclopaedia; fergus kilpatrick; years ago
mrs. travers; mrs verloc; mrs. fyne; peter ivanovitch; doña rita; miss haldin; mrs. gould; assistant commissioner; charles gould; san tomé; chief inspector; years ago; captain whalley; could see; van wyk; old man; dr. monygham; gaspar ruiz; young man; mr. jones
young man; st. mawr; mr. may; mrs. witt; blue eyes; miss frost; could see; one another; mrs bolton; ‘all right; come back; said alvina; two men; of course; good deal; long time; mr. george; next day
uncle buck; aleck sander; miss reba; years ago; dewey dell; mrs powers; could see; white man; four years; old man; ned said; division commander; general compson; miss habersham; new orleans; uncle buddy; let alone; one another; united states; old general
My PhD research will involve arguing that there has been a resurgence of modernist aesthetics in the novels of a number of contemporary authors. These authors are Anne Enright, Will Self, Eimear McBride and Sara Baume. All these writers have at various public events and in the course of many interviews, given very different accounts of their specific relation to modernism, and even if the definition of modernism wasn’t totally overdetermined, we could spend the rest of our lives defining the ways in which their writing engages, or does not engage, with the modernist canon. Indeed, if I have my way, this is what I will spend a substantial portion of my life doing.
It is not in the spirit of reaching a methodology of greater objectivity that I propose we analyse these texts through digital methods; having begun my education in statistical and quantitative methodologies in September of last year, I can tell you that these really afford us no *better* a view of any text then just reading them would, but fortunately I intend to do that too.
This cluster dendrogram was generated in R, and owes its existence to Matthew Jockers’ book Text Analysis with R for Students of Literature, from which I developed a substantial portion of the code that creates the output above.
What the code is attentive to, is the words that these authors use the most. When analysing literature qualitatively, we tend to have a magpie sensibility, zoning in on words which produce more effects or stand out in contrast to the literary matter which surrounds it. As such, the ways in which a writer would use the words ‘the’, ‘an’, ‘a’, or ‘this’, tends to pass us by, but they may be far more indicative of a writer’s style, or at least in the way that a computer would be attentive to; sentences that are ‘pretty’ are generally statistically insignificant.
Every corpus that you can see in the above image was scanned into R, and then run through a code which counted the number of times every word was used in the text. The resulting figure is called the word’s frequency, and was then reduced down to its relative frequency, by dividing the figure by total number of words, and multiplying the result by 100. Every word with a relative frequency above a certain threshold was put into a matrix, and a function was used to cluster each matrix together based on the similarity of the figures they contained, according to a Euclidean metric I don’t fully understand.
The final matrix was 21 X 57, and compared these 21 corpora on the basis of their relative usage of the words ‘a’, ‘all’, ‘an’, ‘and’, ‘are’, ‘as’, ‘at’, ‘be’, ‘but’, ‘by’, ‘for’, ‘from’, ‘had’, ‘have’, ‘he’, ‘her’, ‘him’, ‘his’, ‘I’, ‘if’, ‘in’, ‘is’, ‘it’, ‘like’, ‘me’, ‘my’, ‘no’, ‘not’, ‘now’, ‘of’, ‘on’, ‘one’, ‘or’, ‘out’, ‘said’, ‘she’, ‘so’, ‘that’, ‘the’, ‘them’, ‘then’, ‘there’, ‘they’, ‘this’, ‘to’, ‘up’, ‘was’, ‘we’, ‘were’, ‘what’, ‘when’, ‘which’, ‘with’, ‘would’, and ‘you’.
Anyway, now we can read the dendrogram.
Speaking about the dendrogram in broad terms can be difficult for precisely the reason that I indicative above; quantitative/qualitative methodologies for text analysis are totally opposed to one another, but what is obvious is that Eimear McBride and Gertrude Stein are extreme outliers, and comparable only to each other. This is one way unsurprising, because of the brutish, repetitive styles and is in other ways very surprising, because McBride is on record as dismissing her work, for being ‘too navel-gaze-y.’
Jorge Luis Borges and Marcel Proust have branched off in their own direction, as has Sara Baume, which I’m not quite sure what to make of. Franz Kafka, Ernest Hemingway and William Faulkner have formed their own nexus. More comprehensible is the Anne Enright, Katherine Mansfield, D.H. Lawrence, Elizabeth Bowen, F. Scott FitzGerald and Virginia Woolf cluster; one could make, admittedly sweeping judgements about how this could be said to be modernism’s extreme centre, in which the radical experimentalism of its more revanchiste wing was fused rather harmoniously with nineteenth-century social realism, which produced a kind of indirect discourse, at which I think each of these authors excel.
These revanchistes are well represented in the dendrogram’s right wing, with Flann O’Brien, James Joyce, Samuel Beckett and Djuna Barnes having clustered together, though I am not quite sure what to make of Ford Madox Ford/Joseph Conrad’s showing at all, being unfamiliar with the work.
The basic rule in interpreting dendrograms is that the closer the ‘leaves’ reach the bottom, the more similar they can be said to be. Therefore, Anne Enright and Will Self are the contemporary modernists most closely aligned to the forebears, if indeed forebears they can be said to be. It would be harder, from a quantitative perspective, to align Sara Baume with this trend in a straightforward manner, and McBride only seems to correlate with Stein because of how inalienably strange their respective prose styles are.
The primary point to take away here, if there is one, is that more investigations are required. The analysis is hardly unproblematic. For one, the corpus sizes vary enormously. Borges’ corpus is around 46 thousand words, whereas Proust reaches somewhere around 1.2 million. In one way, the results are encouraging, Borges and Barnes, two authors with only one texts in their corpus, aren’t prevented from being compared to novelists with serious word counts, but in another way, it is pretty well impossible to derive literary measurements from texts without taking their length into account. The next stage of the analysis will probably involve breaking the corpora up into units of 50 thousand words, so that the results for individual novels can be compared.
When it comes to reading Anne Enright’s novels, I am guilty of teleological thinking. This is because I believe her most recent novel, The Green Road, to be one of the best novels I’ve ever read and until I’d read that, I believed The Gathering to be one of the best novels I’ve ever read. So, there is an extent to which I have come to view her oeuvre as an inexorable movement towards the twin apotheoses of these two works.
What is interesting then, about the history of The Gathering’s composition, is that is seems to have begun almost as a run-up to The Green Road. It was initially Enright’s intention to make The Gathering a Faulknerian 500-some page novel that would follow three generations of the Hegarty family through a century of Irish history, from the early 1900’s to the early 2000’s. The section in the novel in which the whole family is gathered for their brother Liam’s funeral, certainly seems to emulate the set-piece of The Green Road’s Christmas dinner, albeit with substantially less information given about each family member. The Gathering apparently ‘fell apart’ in the drafting process, and became the far more fragmented work we now have, one which is at war with its own historical consciousness, an allegory of modern Irish history which acts as the novel’s framework.
Take Veronica’s account of her very Irish family, which is at once a detailed account of her own, as well as Irish families in a more general sense:
There is always a drunk. There is always someone who has been interfered with, as a child. There is always a colossal success, with several houses in various countries to which no one is ever invited. There is a mysterious sister. There are just trends, of course, and, like trends, they shift.
Take, also, Veronica’s name. The biblical Veronica wiped Jesus’ face witha piece of cloth, and took its imprint. A heavily freighted name, and one which carries with it the burden of creating truly mimetic art, an aspiration towards the re-creation of causality on the page which Veronica mostly fails to live up to. Veronica is conscious of all this, making fun of her mother in the following aside: ‘Such epic names she gave us — none of your Jimmy, Joe or Mick.’
The allegory also manifests itself in the novel’s portrait of the hundred years of Irish history from below. There is a suggestion that Veronica’s grandmother was a sex worker, part of the generation of ‘reformed’ prostitutes put into halfway houses by the church to dry out until they were deemed fit to re-join society. Veronica theorises that her grandmother was one of these, in an attempt to explain her brother’s suicide, and her family’s general fucked-up-edness, but casts doubt on her account even she advances it, dismissing it as ‘A dusty, middle-class fantasy, of crinkled stockings and TB, and hunkering to wash over a basin on the floor’.
Her narrative fails to account for Liam’s suicide. No shape that she puts on the narrative remains secure because Liam, her grandmother and her uncle, (institutionalised due to his being abused), are not victims in isolation, they are part of a far broader generation of victims over the state’s history, whether they be ‘fallen’ women put into Magdalene laundries, rape victims institutionalised on the suggestion of their rapists (who were often family members) or children molested and beaten in industrial schools. It is only after these testimonies begin to surface in public life that Veronica remembers witnessing Liam’s abuse, and places it within a national chronology:
This is what shame does. This is the anatomy and mechanism of a family — a whole fucking country — drowning in shame.
Over the next twenty years the world around us changed and I remembered Mr Nugent. But I never would have made that shift on my own if I hadn’t been listening to the radio and reading the paper and hearing about what went on in schools and churches and in people’s homes.
Of course, The Gathering is just one attempted explanation, for just one victim, and it can’t be expected to take the burden of just how many there were. This is highlighted at a stage in the novel in which Veronica visits as mass grave at a mental institution that has been recently closed:
Just one cross — quite new — at the end of a little central path. A double row of saplings promise rowan trees to come. There are no markers, no separate graves. I wonder how many people were slung into the dirt of this field, and realise, too late, that the place is boiling with corpses, the ground is knit out of their tangled bones.
Throughout the text, bones are associated with the act of narration, Veronica comforts her hand with the neat ‘arc’ of a cuttlefish bone, and feels for her children’s bones when she embraces them, enjoying their symmetry and their apparent lack of complication. The image of ‘tangled’ bones provides little hope of ever reaching closure for the innumerable victims of the Irish state’s negligence and cruelty.
To what extent The Gathering is about the history of systematic female oppression might all be Veronica’s contrivance, or Enright’s; she is not a heavy-handed novelist, and it is not just Veronica’s uncertainty that would prevent us from taking this reading up wholly, but Enright’s subtlety. (The one scene we might quibble with is one set in an asylum named St. Ita’s, a brief history of the saint’s role in embodying a feminine ideal is given also).
Perhaps any account is doomed to failure, knowing how pockmarked the historical record is by aporia and silence, enforced or otherwise, the extent of the suffering will be passed over, particularly as long as the state’s policy is to remain stingy with the provision of compensation or the bodies responsible continue to ‘deny till they die’.
I add it in to my life, as an event, and I think, well yes, that might explain some things. I add it into my brother’s life and it is crucial, it is the place where all cause meets all effect, the crux of an x. In a way, it explains too much.
At this stage in my PhD research into literary style I am looking to machine learning and neural networks, and moving away from stylostatistical methodologies, partially out of fatigue. Statistical analyses are intensely process-based and always open, it seems to me, to fairly egregious ‘nudging’ in the name of reaching favourable outcomes. This brings a kind of bathos to some statistical analyses, as they account, for a greater extent than I’d like, for methodology and process, with the result that the novelty these approaches might have brought us are neglected. I have nothing against this emphasis on process necessarily, but I do also have a thing for outcomes, as well as the mysticism and relativity machine learning can bring, alienating us as it does from the process of the script’s decision making.
I first heard of the sci-fi writer from a colleague of mine in my department. It’s Robin Sloan’s plug-in for the script-writing interface Atom which allows you to ‘autocomplete’ texts based on your input. After sixteen hours of installing, uninstalling, moving directories around and looking up stackoverflow, I got it to work.I typed in some Joyce and got stuff about Chinese spaceships as output, which was great, but science fiction isn’t exactly my area, and I wanted to train the network on a corpus of modernist fiction. Fortunately, I had the complete works of Joyce, Virginia Woolf, Gertrude Stein, Sara Baume, Anne Enright, Will Self, F. Scott FitzGerald, Eimear McBride, Ernest Hemingway, Jorge Luis Borges, Joseph Conrad, Ford Madox Ford, Franz Kafka, Katherine Mansfield, Marcel Proust, Elizabeth Bowen, Samuel Beckett, Flann O’Brien, Djuna Barnes, William Faulkner & D.H. Lawrence to hand.
My understanding of this recurrent neural network, such as it is, runs as follows. The script reads the entire corpus of over 100 novels, and calculates the distance that separates every word from every other word. The network then hazards a guess as to what word follows the word or words that you present it with, then validates this against what its actuality. It then does so over and over and over, getting ‘better’ at predicting each time. The size of the corpus is significant in determining the length of time this will take, and mine required something around twelve days. I had to cut it off after twenty four hours because I was afraid my laptop wouldn’t be able to handle it. At this point it had carried out the process 135000 times, just below 10% of the full process. Once I get access to a computer with better hardware I can look into getting better results.
How this will feed into my thesis remains nebulous, I might move in a sociological direction and take survey data on how close they reckon the final result approximates literary prose. But at this point I’m interested in what impact it might conceivably have on my own writing. I am currently trying to sustain progress on my first novel alongside my research, so, in a self-interested enough way, I pose the question, can neural networks be used in the creation of good prose?
There have been many books written on the place of cliometric methodologies in literary history. I’m thinking here of William S. Burroughs’ cut-ups, Mallarmé’s infinite book of sonnets, and the brief flirtation the literary world had with hypertext in the 90’s, but beyond of the avant-garde, I don’t think I could think of an example of an author who has foregrounded their use of numerical methods of composition. A poet friend of mine has dabbled in this sort of thing but finds it expedient to not emphasise the aleatory aspect of what she’s doing, as publishers tend to give a frosty reception when their writers suggest that their work is automated to some extent.
And I can see where they’re coming from. No matter how good they get at it, I’m unlikely to get to a point where I’ll read automatically generated literary art. Speaking for myself, when I’m reading, it is not just about the words. I’m reading Enright or Woolf or Pynchon because I’m as interested in them as I am in what they produce. How synthetic would it be to set Faulkner and McCarthy in conversation with one another if their congruencies were wholly manufactured by outside interpretation or an anonymous algorithmic process as opposed to the discursive tissue of literary sphere, if a work didn’t arise from material and actual conditions? I know I’m making a lot of value-based assessments here that wouldn’t have a place in academic discourse, and on that basis what I’m saying is indefensible, but the probabilistic infinitude of it bothers me too. When I think about all the novelists I have yet to read I immediately get panicky about my own death, and the limitless possibilities of neural networks to churn out tomes and tomes of literary data in seconds just seems to me to exacerbate the problem.
However, speaking outside of my reader-identity, as a writer, I find it invigorating. My biggest problem as a writer isn’t writing nice sentences, given enough time I’m more than capable of that, the difficulty is finding things to wrap them around. Mood, tone, image, aren’t daunting, but a text’s momentum, the plot, I suppose, eludes me completely. It’s not something that bothers me, I consider plot to be a necessary evil, and resent novels that suspend information in a deliberate, keep-you-on-the-hook sort of way, but the ‘what next’ of composition is still a knotty issue.
The generation of text could be a useful way of getting an intelligent prompt that stylistically ‘borrows’ from a broad base of literary data, smashing words and images together in a generative manner to get the associative faculties going. I’m not suggesting that these scripts would be successful were they autonomous, I think we’re a few years off one of these algorithms writing a good novel, but I hope to demonstrate that my circa 350 generated words would be successful in facilitating the process of composition:
be as the whoo, put out and going to Ingleway effect themselves old shadows as she was like a farmers of his lake, for all or grips — that else bigs they perfectly clothes and the table and chest and under her destynets called a fingers of hanged staircase and cropping in her hand from him, “never married them my said?” know’s prode another hold of the utals of the bright silence and now he was much renderuched, his eyes. It was her natural dependent clothes, cattle that they came in loads of the remarks he was there inside him. There were she was solid drugs.
“I’m sons to see, then?’ she have no such description. The legs that somewhere to chair followed, the year disappeared curl at an entire of him frwented her in courage had approached. It was a long rose of visit. The moment, the audience on the people still the gulsion rowed because it was a travalious. But nothing in the rash.
“No, Jane. What does then they all get out him, but? Or perfect?”
Of came the great as prayer. He said the aspect who, she lay on the white big remarking through the father — of the grandfather did he had seen her engoors, came garden, the irony opposition on his colling of the roof. Next parapes he had coming broken as though they fould
has a sort. Quite angry to captraita in the fact terror, and a sound and then raised the powerful knocking door crawling for a greatly keep, and is so many adventored and men. He went on. He had been her she had happened his hands on a little hand of a letter and a road that he had possibly became childish limp, her keep mind over her face went in himself voice. He came to the table, to a rashes right repairing that he fulfe, but it was soldier, to different and stuff was. The knees as it was a reason and that prone, the soul? And with grikening game. In such an inquisilled-road and commanded for a magbecross that has been deskled, tight gratulations in front standing again, very unrediction and automatiled spench and six in command, a
I don’t think I’d be alone in thinking that there’s some merit in parts of this writing. I wonder if there’s an extent to which Finnegans Wake has ‘tainted’ the corpus somewhat, because stylistically, I think that’s the closest analogue to what could be said to be going on here. Interestingly, it seems to be formulating its own puns, words like ‘unrediction,’ ‘automatiled spench’ (a tantalising meta-textual reference I think) and ‘destynets’, I think, would all be reminiscent of what you could expect to find in any given section of the Wake, but they don’t turn up in the corpus proper, at least according to a ctrl + f search. What this suggests to me is that the algorithm is plotting relationships on the level of the character, as well as phrasal units. However, I don’t recall the sci-fi model turning up paragraphs that were quite so disjointed and surreal — they didn’t make loads of sense, but they were recognisable, as grammatically coherent chunks of text. Although this could be the result of working with a partially trained model.
So, how might they feed our creative process? Here’s my attempt at making nice sentences out of the above.
— I have never been married, she said. — There’s no good to be gotten out of that sort of thing at all.
He’d use his hands to do chin-ups, pull himself up over the second staircase that hung over the landing, and he’d hang then, wriggling across the awning it created over the first set of stairs, grunting out eight to ten numbers each time he passed, his feet just missing the carpeted surface of the real stairs, the proper stairs.
Every time she walked between them she would wonder which of the two that she preferred. Not the one that she preferred, but the one that were more her, which one of these two am I, which one of these two is actually me? It was the feeling of moving between the two that she could remember, not his hands. They were just an afterthought, something cropped in in retrospect.
She can’t remember her sons either.
Her life had been a slow rise, to come to what it was. A house full of men, chairs and staircases, and she wished for it now to coil into itself, like the corners of stale newspapers.
The first thing you’ll notice about this is that it is a lot shorter. I started off by traducing the above, in as much as possible, into ‘plain words’ while remaining faithful to the n-grams I liked, like ‘bright silence’ ‘old shadows’ and ‘great as prayer’. In order to create images that play off one another, and to account for the dialogue, sentences that seemed to be doing similar things began to cluster together, so paragraphs organically started to shrink. Ultimately, once the ‘purpose’ of what I was doing started to come out, a critique of bourgeois values, memory loss, the nice phrasal units started to become spurious, and the eight or so paragraphs collapsed into the three and a half above. This is also ones of my biggest writing issues, I’ll type three full pages and after the editing process they’ll come to no more than 1.5 paragraphs, maybe?
The thematic sense of dislocation and fragmentation could be a product of the source material, but most things I write are about substance-abusing depressives with broken brains cos I’m a twenty-five year old petit-bourgeois male. There’s also a fairly pallid Enright vibe to what I’ve done with the above, I think the staircases line could come straight out of The Portable Virgin.
Maybe a more well-trained corpus could provide better prompts, but overall, if you want better results out of this for any kind of creative praxis, it’s probably better to be a good writer.
Aspiration: 50/50 gender & POC split (currently at a lame and terrible 20% and 0% respectively)
Samuel Beckett — How It Is
Reaching the conclusion that How It Is represents Beckett’s prose writing reaching its most concentrated point of distillation and intensity is somewhat inevitable, seeing as it was his last novel; the longest prose work subsequent to How It Is barely reaches the length of a novella, almost as if the weight of the novelistic tradition, a form known for its expansiveness and maximalism, couldn’t withstand Beckett’s striving towards a more hermetic and taciturn literature.
Having said this, I don’t wish to fetishise How It Is for its its impecuniousness alone, for there are plenty of sections in which traditionally pretty descriptive prose appears:
we are on a veranda smothered in verbena the scented sun dapples the red tiles yes I assure you the huge head hatted with birds and flowers is bowed down over my curls the eyes burn with severe love I offer her mine pale upcast to the sky whence cometh our help and which I know perhaps even then with time shall pass away
The ‘yes I assure you’ is demonstrative of How It Is’ overriding push/pull dynamic, in advancing an almost sickly description, almost reminiscent of Keats alongside its subverting narrative commentary. But this doesn’t deaden the effect of the writing, just as setting imagery of abject ugliness and inhumanity amid these lyrical digressions intensifies the effects of both:
as it comes bits and scraps all sorts not so many and to conclude happy end cut thrust DO YOU LOVE ME no or nails armpit and little song to conclude happy end of part two leaving only part three and last the day comes I come to the day Bom comes YOU BOM me Bom ME BOM you Bom we Bom
2. Jorge Luis Borges — Labyrinths
In talking about the short story’s as one of the more concentrated literary forms, one in which space is at a premium, and there can’t be too many words that don’t belong there, I think the work of Jorge Luis Borges is most deserving of mention. No other writer that I’m aware of is capable in under five hundred words of totally challenging the ways in which you think, how you think about how you think, and how you think about how you think about how you think. His capacity to do so through use of a style that is predominantly unadorned and perhaps uninviting makes him all the more fit to be praised.
Since ‘On Exactitude in Science’ is the length of just one paragraph, I’ll present it here:
In that Empire, the Art of Cartography attained such Perfection that the map of a single Province occupied the entirety of a City, and the map of the Empire, the entirety of a Province. In time, those Unconscionable Maps no longer satisfied, and the Cartographers Guilds struck a Map of the Empire whose size was that of the Empire, and which coincided point for point with it. The following Generations, who were not so fond of the Study of Cartography as their Forebears had been, saw that that vast map was Useless, and not without some Pitilessness was it, that they delivered it up to the Inclemencies of Sun and Winters. In the Deserts of the West, still today, there are Tattered Ruins of that Map, inhabited by Animals and Beggars; in all the Land there is no other Relic of the Disciplines of Geography.
At the premium of literary art is its capacity to open up entire worlds with just words on a page. For those who believe world-building to be a preserve of genre fiction only, I encourage them to read Borges.
3. J.M. Coetzee — Waiting for the Barbarians
The allegory, and playing with the conventions around allegory, is a way in which Coetzee’s writing career in its entirety has been characterised by critics, but it might be a line of interpretation advanced too tenuously; it might be more accurate to say that his novels reflect a radical scepticism regarding narrative itself; an unwillingness to confront anything directly. In the Heart of the Country is one of the most deft examples of metafiction I’ve ever come across, and in its refusal to fix its plot around any one sequence of events, we see a narrative force that is as congenial to the forces of its unmaking as its genesis.
Waiting for the Barbarians is more contained than In the Heart of the Country in this sense, but in no other. That it has parallels to South African society under apartheid will surprise no one familiar with the rich literary tradition of that political milieu of the past fifty years, but it has also an uncanny capacity to encompass and seemingly respond to the nature of racial prejudice and ethnically-based in general. I was so sure that it was a product of the Bush years, so I Googled it to find out whether it was written in 2007 or 2005, only to discover that it was published in 1980. Not to turn my ignorance into a virtue, but I think this speaks to its universality.
Which is not to say that the narrative entire is grounded in geopolitics — in the colonial administrator’s love affair with one of the supposed barbarians, we are permitted to meditate on the unknowability of any love object, and by extension ourselves, how ‘In all of us, deep down, there seems to be something granite and unteachable.’
4. Don DeLillo — Underworld
To write a Great American Novel has, thankfully, become rather passé, after feminist critics drew attention to how unusual it is for a female author to be feted with this title. The liberal commentariat’s realisation that they have committed the error of elevating Jonathan Franzen to the role of cultural commentator. Underworld, I would say, is one of the few published in recent years that’s worth reading, for the reason that it is a novel about America that won’t allow real life in.
Underworld is a novel supposedly about baseball, the lost era of old New York, the faux-simplicity of the Cold War, and yet there is nothing ordinary, white bread or milquetoast about the America in this novel; the closest we get to a ‘nuclear’ family is the most distorted and unsettling sections in the text.
It is a novel about subterranean connections and invisible intersections. As you read it, you may find yourself compulsively noticing, drawing analogies, knowing that you’re missing others that only reveal themselves the second time around. This is Underworld’s underworld; more so than many other novels from the time, it is pointing you again and again to what is beyond the page, to what’s beneath the words. You could go mental doing it, wonder why some chapters would be more aptly named with the title that a different chapter has, in what precise order the baseball passes from one character to another, which I suppose is only fitting for a novel in which a baseball is semi-seriously analogous to the famous magic bullet. But for once, I’d encourage any potential reader not to spend their time trying to read past Underworld, not when the prose is this good.
Civilisation did not rise and flourish as men hammered out hunting scenes on bronze gates and whispered philosophy under the stars, with garbage as a noisome offshoot, swept away and forgotten. No, garbage rose first, inciting people to build a civilisation in response, in self-defense.
5. Anne Enright — The Green Road
Enright is one of those few authors that refuses to write the same book twice, and never makes you regret it. Because there is, as publishers well know, a great seductive quality in becoming used to one writing style. Many authors who are too protean, simply do not catch on in a crowded marketplace. Well Enright is interested, and is good at, change. This is how she can move from the hilariously picaresque and surreal The Wig my Father Wore through the tortured monologue of The Gathering to an adept Irish family novel about land, which one could almost call realist, so subtle is the indirect discourse which drives it.
Enright is a deeply intellectual author, but unlike many book-readin’ writers, her ideas exists beneath the surface of the words, just gestured towards, to be decoded on repeated readings. For first readings, just allow the sentences to do their thing. You could read The Green Road all the way through and have no notion of the fact that its in conversation with William Shakespeare’s King Lear. You wouldn’t want to, of course, but you could.
It is a novel of many parts. Each of Rosaleen Madigan’s children get their own section and so the novel roves from Clare to New York to Mali and back, before they are all assembled for the set piece of the Christmas dinner. I really can’t emphasise enough how well this is done. It is in the novel’s closing sections that the function behind its structure becomes clear, in seeing exactly where these people are coming from, their ambivalence regarding their role in the family before their adult lives, then watching those roles slowly overcome them is great, hilarious and sad. A novel with characters you care about, things to say and great writing is too rare, which makes The Green Road all the more valuable.
6. William Faulkner — As I Lay Dying
7. David Foster Wallace — Infinite Jest
David Foster Wallace might be said to be undergoing his D.H. Lawrence moment, in having his reputation defined for too long by a reading community of dudey-bro-y dudebro brodudes, and y’know, to look at his representations of women, here and in The Pale King, not to mention his opinions, or life, it can be hard to say his books don’t deserve scrutiny. It is slightly disappointing all the same to see an author who, among the authors of phallogocentric literary fiction, to be tarred as such, considering he’s among the most giving of them. Infinite Jest apportions its fun about twenty per cent more generously than your average example of the genre, and reading about eschaton is about as much fun as you can have with your eyes open.
Its flaws, the sections dealing with the Québecois separatists, the exposition-laden conversations between Hal Incandenza and his older brother Orin, don’t totally come good in the end, but the unavoidable ambivalence one develops when reading a novel Infinite Jest’s length and ambition, is a feature, rather than a bug. As in any important relationship, the challenge is what matters.
So give yourself the chance to read it. It’s more than readable, and far more interesting than Foster Wallace’s persona as it has been construed in the pop-culture landscape since his death; as an icon, he simply cannot compare with the questions that his work throws up.
8. William Gaddis — The Recognitions
William Gaddis’ The Recognitions is a very conflicted novel. It is a profoundly generative work, one which may have given us every maximalist, encyclopaedic 500+ page text in contemporary American letters since, and it is also a profoundly angry text, one which lashes out at everything: organised religion, the commodification of great art, the hyper-mediation of our reality via advertising, the complacently bourgeois creative class, all these and more are targets of Gaddis’ ire.
However, it is also a novel based on profound erudition and cultural awareness. Its most proximate literary cousin is Marcel Proust’s In Search of Lost Time and just as gallantly as Proust does, Gaddis manages to balance many portentous thematic concerns with Being, death and sex, alongside a vibrant social comedy. If I had to guess, I would say about sixty-five percent of it is spent convincing the reader how shallow the hipsters of 1950’s New York are.
And of course, the sentences are very powerful
Undisciplined lights shone through the night instructed by the tireless precision of the squads of traffic lights, turning red to green, green to red, commanding voids with indifferent authority: for the night outside had not changed, with the whole history of night bound up inside it had not become better or worse, fewer lights and it was darker, less motion and it was more empty, more silent, less perturbed, and like the porous figures which continued to move against it, more itself.
It can often be a struggle, Jonathan Franzen tried, and mostly failed to deal with it (in a public article no less), but the bonus of my edition is a foreword by William H. Gass himself, who provides us with a great key to the work, as well as a get-out clause, should we find it too difficult:
No great book is explicable, and I shall not attempt to explain this one. An explanation…would defile it, for reduction is precisely what a work of art opposes…Interpretation replaces the original with the lamest sort of substitute. It tames, disarms.
9. William H. Gass — The Tunnel
10. James Joyce — Ulysses
I was once challenged to sum up a novel’s plot in six words, and for Ulysses, my attempt was ‘2 sad men meet. a woman thinks.’ This is a perfect example of how, when it comes to summing up Ulysses, its hard to know where to begin. Humour, bathos, beauty, poetry, history, love, death, family, sex, great writing, it has everything you could ever want.
I won’t contest that it’s a grower, and if you come to it fresh (‘fresh’ in this case meaning, having read Dubliners and A Portrait of the Artist as a Young Man, which will be necessary), expect to find yourself moving your eyes over large tracts of text without quite knowing exactly what’s happening. Reading aloud helps.
For those who may be used to more genre fare, there are sections for you too, there’s an episode written in the manner of a nineteenth-century romance novel, and while the line attributed to Joyce about enigmas codified into the text in sufficient quantities to keep the professors busy for hundreds of years is definitely apocryphal, what it tells us about the novel is definitely true — the novel is so dense with allusion, red herrings and unresolved questions that you’ll find yourself in the role of a sort of detective, which, is not a wholly inappropriate tack to take with Ulysses, since Joyce designed his one day in Dublin with meticulous attention to detail, his notes on how long it takes to walk down particular stretches of urban walkways, or the businesses Bloom encounters in his perambulations, were all derived from sources, and correspondences with people Joyce contacted in Dublin. A staggering work, everyone should make time for it.
11. Ben Marcus — The Flame Alphabet
12. Flann O’Brien — The Third Policeman
13. Marcel Proust — In Search of Lost Time
The term ‘baggy monster’, so often applied to the novel, is a rather ingenious one, as it captures a central ambivalence regarding the form in relation to itself. Both terms can be read negatively, in fact, they are perhaps more on the negative end of the spectrum than not, but taken together there’s something alluring about it, particularly when you have come to know, over the course of reading many of them, how successful a novel can be in reaching for exactly the kind of excess that ‘good taste’ might seem to advise against. Well there’s plenty baggy and monstrous in Proust’s seven volume work In Search of Lost Time, but, as much as it could be said to be in need of an editor, its vices are perhaps indissociable from its virtues.
And this is itself a virtue. What other work of fiction can be so assuming as to impose itself on you 1,267,069 words? Well it isn’t for no reason, and a close reading of fin-de-siecle French bourgeois culture next to the metaphysician Bergson is more than worth the time you’d spend on it. Yes, it is occasionally tedious, and seemingly repetitive, but you’re unlikely to come away from Proust without recognising yourself in at least a few of the characters, nor coming to some disturbing conclusions regarding the way you live your life. Write down your definitions of habit, love and time before getting into these novels. It’s unlikely they’ll have remained intact in your journey through these texts.
But don’t come to it with a pious reverence. James Grieve, a translator of À l’ombre des jeunes filles en fleurs, writes in his introduction to the second volume that
Proust’s reflections, his enunciation of philosophical and psychological truths…are often more importance to him than his verisimilitudes. His composition was often not linear; he wrote in bits and pieces; transitions from one scene to another are sometimes awkward, clumsy even…His paragraphing often seems idiosyncratic.
Far from being a virtuoso of words, or a fluent weaver of imaginative reality, Proust is in many ways inept, or amateurish, and it is in this way that we should appreciate him; the idiosyncrasies are what make In Search of Lost Time such a brilliantly bizarre novel.
14. Thomas Pynchon — Gravity’s Rainbow
15. J.D. Salinger — The Catcher in the Rye
Yes, I know, I should definitely have grown out of thinking this novel is great. Well, every time I’ve gotten back to it, convinced that this time, this time, I’ll realise that I am an adult, and that Holden Caulfield is an annoying idiot, and The Catcher in the Rye is a novel for teenagers, well, it doesn’t happen, and I could read him a hundred novels with him just going about his business, being judgemental and obnoxious inside his own head forever and ever. My liking him is somewhat beside the point, and perhaps proves my immaturity, so I’ll try to deal with why these critics are wrong, for the fact that they seem to miss the rather big reveal at the end that Holden’s been institutionalised, and the oscillation between two different periods of time in his narrative; a representation of his thoughts in the moment and his recollection, attest further to his divided state of mind. It’s a bit odd to hear literary critics condemn him so roundly when his curmudgeonly attitude surely doesn’t lack for a cause.
It’s a great testament to Salinger’s skill as a writer that the surface level of the text, a brash, abusive narrator, can seem so available, that going any deeper into it would seem wrongheaded, but I think he, like all unreliable narrators, provides you with a clue up front. The novel begins, after all, with an act of self-censorship, an invocation to silence, as Holden refuses to provide a holistic appraisal of his self or his place in the world, something that he dismisses as “all that David Copperfield kind of crap.”
The question that this blog post sets itself is: What differences and similarities can be detected in modernist and contemporary authors on the basis of three stylistic variables; hapax, unique and ambiguity, and how are these stylistic variables related to one another?
I: The Data
The data to be analysed in this project were derived from an analysis of twenty-one corpora of avant-garde literary prose through use of the open-source programming language R. The complete works of the authors James Joyce, Virginia Woolf, Gertrude Stein, Sara Baume, Anne Enright, Will Self, F. Scott FitzGerald, Eimear McBride, Ernest Hemingway, Jorge Luis Borges, Joseph Conrad, Ford Madox Ford, Franz Kafka, Katherine Mansfield, Marcel Proust, Elizabeth Bowen, Samuel Beckett, Flann O’Brien, Djuna Barnes, William Faulkner & D.H. Lawrence were used.
Seventeen of these writers were active between the years 1895 and 1968, a period of time associated with a genre of writing referred to as ‘modernist’ within the field of literary criticism. The remaining four remain alive, and have novels published as early as 1991, and as late as 2016. These novelists are known for their identification as latter-day modernists, and perceive their novels as re-engaging with the modernist aesthetic in a significant way.
The unique variable is a generally accepted measurement used within digital literary criticism to quantify the ‘richness’ of a particular text’s vocabulary. The formula for uniqueness is obtained by dividing the number of distinct word types in a text by the total number of words. For example, if a novel contained 20000 word types, but 100000 total words, the formula for obtaining this text’s uniqueness would be as follows:
20000/100000 = Uniqueness is equal to 0.2
Ambiguity is a measure used to calculate the approximate obscurity of a text, or the extent to which it is composed of indefinite pronouns. The indefinite pronouns quantified in this study are as follows, ‘another’, ‘anybody’, ‘anyone’, ‘anything’, ‘each’, ‘either’, ‘enough’, ‘everybody’, ‘everyone’, ‘everything’, ‘little’, ‘much’, ‘neither’, ‘nobody’, ‘no one’, ‘nothing’, ‘one’, ‘other’, ‘somebody’, ‘someone’, ‘something’, ‘both’, ‘few’, ‘everywhere’, ‘somewhere’, ‘nowhere’, ‘anywhere’, ‘many’, ‘others’, ‘all’, ‘any’, ‘more’, ‘most’, ‘none’, ‘some’, ‘such’. The formula for ambiguity is:
number of indefinite pronouns / number of total words
Finally, the hapax variable calculates the density of hapax legomena, words which appear only once in a particular author’s oeuvre. The formula for this variable is:
number of hapax legomena / number of total words
II: Data Overview
Even before analysing the data in great depth, the fact that these variables are interrelated with one another stands to a logical analysis. Hapax and unique are best understood as an indication of a text’s heterogeneity, as if a text is hapax-rich, the score for uniqueness will be similarly elevated. Ambiguity, as it is a set of pre-defined words, can be considered a measure of a text’s homogeneity, and if the occurrences of these commonplace words are increasing, hapax and uniqueness will be negatively effected. The aim of this study will be to first determine how these measures vary according to the time frame in which the different texts were written, i.e. across modern and contemporary corpora, which correlations between stylistic variables exist, and which of the three is most subject to the fluctuations of another.
IV.I: The Three Groups Hypothesis
A number of things are clear from these representations of the data. The first finding is that the authors fall into approximately three distinct groups. The first is the base- level of early twentieth-century modernist authors, who are all relatively undifferentiated. These are Ernest Hemingway, Virginia Woolf, William Faulkner, Elizabeth Bowen, Marcel Proust, F. Scott Fitzgerald, D.H. Lawrence, Joseph Conrad and Ford Madox Ford. They are all below the mean for the hapax and unique variables.
The second group reach into more extreme values for unique and hapax. These are Djuna Barnes, Jorge Luis Borges, Franz Kafka, Flann O’Brien, James Joyce, Eimear McBride and Sara Baume. Three of these authors are even outliers for the hapax variable, which can be seen in the box plot.
Joyce’s position as an extreme outlier in this context is probably due to his novel Finnegans Wake (1939), which was written in an amalgam of English, French, Irish, Italian and Norwegian. It’s no surprise then, that Joyce’s value for hapax is so high. The following quotation may be sufficient to give an indication of how eccentric the language of the novel is:
La la la lach! Hillary rillarry gibbous grist to our millery! A pushpull, qq: quiescence, pp: with extravent intervulve coupling. The savest lauf in the world. Paradoxmutose caring, but here in a present booth of Ballaclay, Barthalamou, where their dutchuncler mynhosts and serves them dram well right for a boors’ interior (homereek van hohmryk) that salve that selver is to screen its auntey and has ringround as worldwise eve her sins (pip, pip, pip)
Though Borges’ and Barnes’ prose may not be as far removed from modern English as Finnegans Wake, both of these authors are known for their highly idiosyncratic use of language; Borges for his use of obscure terms derived from archaic sources, and Barnes for reversing normative grammatical and syntactic structures in unique ways.
The third and final group may be thought of as an intermediary between these two extremes, and these are Katherine Mansfield, Samuel Beckett, Will Self and Anne Enright. These authors share characteristics of both groups, in that the values for ambiguity remain stable, but their uniqueness and hapax counts are far more pronounced than the first group, but not to the extent that they reach the values of the second group.
Gertrude Stein is the only author who’s stylistic profile doesn’t quite fit into any of the three groups. She is perhaps best thought of as most closely analogous to the first group of early twentieth century modernists, but her extreme value for ambiguity should be sufficient to distinguish her in this regard.
The value for ambiguity remains fairly stable throughout the dataset, the standard deviation is 0.03, but if Stein’s values are removed from the dataset, the standard deviation narrows from 0.03 to 0.01.
Two disclaimers need to be made about this general account from the descriptive statistics and graphs. The first is that there is a fundamental issue with making such a schematic account of these texts. The grouping approach that this project has taken thus far is insufficiently nuanced as it could probably be argued that McBride could just as easily fit into the third group as the second. Therefore, the stylistic variables do not adequately distinguish modern and contemporary corpora from one another.
IV.II Word Count
It should not escape our attention that those authors who score lowest for each variable and that the first group of early twentieth-century author are the most prolific. The correlation between word count and the stylistic variables was therefore constructed.
Both the Pearson correlation and Spearman’s rho suggest that word count is highly negatively correlated with hapax and unique (as word count increases, hapax and unique decreases and vice versa), but not with ambiguity.
The fact that the Spearman’s rho scores significantly higher than the Pearson suggests that the relationship between the two are non-linear. This can be seen in the scatter plot.
In the case of both variables, the correlation is obviously negative, but the data points fall in a non-linear way, suggesting that the Spearman’s rho is the better measure for calculating the relationship. In both cases it would seem that Joyce is the outlier, and most likely to be the author responsible for distorting the correlation.
SPSS flags the correlation between hapax and unique as being significant, as this is clearly the most noteworthy relationship between the three stylistic variables. The Spearman’s rho exceeded the Spearman correlation by a marginal amount, and it was therefore decided that the relationship was non-linear, which is confirmed by the scatter plot below:
The stylistic variables of unique and hapax are therefore highlycorrelated.
As was said already, the notion that stylistic variables are correlated stands to reason. However, it was not until the correlation tests were carried out that the extent to which uniqueness and hapax are determined by one another was made clear.
The biggest issue with this study is the issue that is still present within digital comparative analyses in literature generally; our apparent incapacity to compare texts of differing lengths. Attempts have been made elsewhere to account for the huge difference that a text’s length clearly makes to measures of its vocabulary, such as vectorised analyses that take measurements in 1000 word windows, but none have yet been wholly successful in accounting for this difference. This study is therefore one among many which presents its results with some clarifiers, considering how corpora of similar lengths clustered together with one another to the extent that they did. The only author that violated this trend was Joyce, who, despite a lengthy corpus of 265500 words, has the highest values for hapax and uniqueness, which marks his corpus out as idiosyncratic. Joyce’s style is therefore the only of the twenty-one authors that we can say has a writing style that can be meaningfully distinguished from the others on the basis of the stylistic variables, because he so egregiously reverses the trend.
But we hardly needed an analysis of this kind to say Joyce writes differently from most authors, did we.