Digital Humanities

The Nebula of Gephi

I have, unfortunately, been unable to use Gephi. I've uninstalled and reinstalled various versions of the beta - 7 and 8 - and it refuses to work. I hate to blame technology for something I could fix myself if I were more tech-savvy, but I'm pretty sure it keeps messing up because my computer runs on Vista.

That being said, I would like to discuss the idea of Gephi.

Gephi takes the vast world of literary analysis and compacts it into a tiny little nebula of information. Trends are turned into tiny planets and stars in the nebula that Gephi creates from each piece of work it reads. It takes information and data that would otherwise take hours to accrue, and consolidates them into easily-viewed "nodes" on its web graph. Looking at the graph itself is... different.

Personally, I have never studied literature in such a mathematical fashion, and, I'm going to be frank, it's weird to me. However, I do think it's necessary with the endlessly expanding universe of literature and knowledge. Without programs like Gephi, knowledge and information disappear into the abyss. As humbling a realization this is, it is impossible for humans to capture, analyze, and use every bit of knowledge we come across. As The Library of Babel and the literary philosophy of Derrida's Mal d'Archive posit, an archive has a "death drive". Constantly expanding to the point of disappearing into the margins, the vast and expanding oeuvre of mankind does not want to be known.

While I generally roll my eyes at people who think machines will supersede mankind, it is when I see programs like Gephi that I can sympathize a little with that paranoia. Humans just aren't good enough anymore. We create at a faster rate than we can analyze and archive, and efforts to become more efficient are made in vain. Gephi can gather up and read information, then preserve it in cryogenic stasis for man to further explore.

The Research Pendulum

 

My understanding of the study of English has varied significantly throughout my academic career.  It's amusing to remember my first grade-school research paper--my teacher insisted that I only use print sources for references, and spoke with distaste (and apparent suspicion) of the internet (no one should rely on computers; what if we lost access to our electronic infrastructure?).  I resented the monotony of the card-catalog bibliography she had us construct from 5x7 note cards (following Turabian guidelines).  Of course, the actual report was typed on my mother's business typewriter because she couldn't bear for me to learn how to type on a computer keyboard.  

Though these research skills were intended to prepare me for college, my first English professor introduced me to academic databases and expected all of my papers to be turned in with 12 point Times New Roman Font!  I quickly learned just how irrelevant some of my previous instruction proved to be.  As I read Kirschenbaum's article, "What is Digital Humanities and What's it doing in English Departments?" I remembered these experiences and found myself championing new modes for researching the humanities.  I can't imagine conducting some of the research I have done without the help of electronic databases and media.  Also, advanced forms of word processing make it more possible to correct and revise material.

Conversely, I observed that much of Kirschenbaum's idea of "digital humanities" simply related to the collaboration of individuals--something that is not necessarily a "'next big thing.'"  Nevertheless, I'm interested in learning more about digital humanities with the rest of you!

Roundup: Final Projects 2012

Here is the list of final projects from ENGL 8203: Modernism and Digital Humanities that are currently on the web:

Here are the ones that are currently offline:

  • Kent Emerson -- Ezra Pound's Cantos Timeline Project
  • Emma Sloan -- Digital Analysis of Jennifer Egan's A Visit from the Goon Squad and the two works that inspired it, Marcel Proust's In Search of Lost Time and The Sopranos.
  • Holly Vista Nascosta -- The Auto-Archiving of Jean Rhys
  • Karen Woolie -- Racial Dynamics in the Linguistics of Five Magazines of 1914

Boundaries in Flux

On the home page of Vector’s Journal, Tara McPherson and Steve Anderson help to contextualize their project amid paradoxes. They explain, “While we've long been urged not to judge a book by its cover, Stolen Time powerfully insists that such an adage works to conceal the myriad traces of labor that congeal in any textual artifact… [T]he long-standing scholarly distinction between tools and theories is profoundly destabilized by digital media, demanding a rethinking of long-held tenets of technological determinism” (Vector).
These paradoxes are further explored at some length by Paul J. Voss and Marta L. Werner’s “Toward a Poetics of the Archive Introduction:” noting that the archive has both “physical [limited, concrete] and imaginative [unlimited, abstract] space” (i) James Clifford, Susan Steward, and Michael Baxandall, cited in Voss and Werner’s article, help arrive at some delineation of interpretation of the archive with “cultural rules—of rational taxonomy, of gender, of aesthetics” (qtd. In Voss and Werner v).
Finally, in an article, coincidentally, read in my 19th Century Novel seminar involves the “Quixotic Realism and Romance of the Novel.” Clifford's (et al) idea that cultural rules can help to circumscribe the abstractions emanating from the archive, but only insofar, Scott argues, as viewed as dynamic forces which might even willfully (that is, on the part of the author/artist, as in Cervantes, Fielding, or Lewis) draw on periods prior to his own in order to effectuate a literary impact in his literature. For example, when an author alludes to a work of 1,000 years prior in his novel,, itemization of cultural rules also are challenged, but create a “feedback loop” in the mind of the reader.
The digital medium seems to add another dimension which seems less bound by time or by medium than did the novel. While I have far more questions than answers, and am seeking at root to find a kind of pattern, it would seem that perhaps the answer lies more directly in the “feedback loop” of reception and an artist’s intent in craft.
Still, even this idea of “aporia” forces one to consider evidence for and against this tenuous stance and echoes the idea that digital humanities interpretation finds itself very much in an information-gathering stage more than synthesis.

Network Analysis, Text Mining, and Emergence in the September 1918 Little Review

(This piece is cross-posted at the Magazine Modernisms Blog)

This post is a shorter version of what will eventually become a longer piece about digital methods for the analysis and teaching of modernist magazines. By way of background, the occasion for this post was a workshop I did this week with two joint sessions of my graduate course in modernism and digital humanities and Sean Latham's graduate course in modernism and new media. Sean's course is interested in the ways in which modernist literature (and this week, magazines) embody functionalities of 21st Century digital media (i.e. the "new" new media). They have been discussing Katherine N. Hayles' concept of emergence, through an unpublished essay of Sean's, "Unpacking My Digital Library: Programming Modernist Magazines," forthcoming in Editing Modernisms in Canada, eds. Colin Hill and Dean Irvine.

Emergence describes "a particular kind of complexity that arises not from the individual elements of a system, but only from their interaction" (15). It emphasizes the interactive system of meaning that derives from the connections among various content items of a magazine, and only from those connections. In arriving at this sense of a dynamic readerly coherence, he uses Espen Aarseth's concepts (from Cybertext ) of "texton," a string of information that exists in the text (a poem, an advertisement, a headline, etc.), and "scripton," their idiosyncratic assemblage by the reader.

Bound up in these concepts is the nature of magazine reading itself. Magazines are not codices. They are not books to be read serially from cover to cover (although a few of us 21st-Century denizens who study magazines might admit with blushed cheeks that we do).

So, our workshops looked at social network analysis and text mining as ways of potentially identifying, or at the very least of recording and analyzing, scriptons that might emerge in the magazines.

We picked the September 1918 issue of The Little Review at the Modernist Journals Project (MJP) because it features one of the few direct references to the First World War, W.B. Yeats' poem "In Memory of Robert Gregory," an Irish airman. It contains other items on the theme of death, including James Joyce's "ULYSSES Episode VI," later known as "Hades," depicting the funeral of Paddy Dignam. However, there is a number of other items that deal with the themes related to death, such as two short stories by Sherwood Anderson and Ben Hecht, respectively titled "Senility" and "Decay," and which immediately follow Joyce's installment. Aside from these editorially juxtaposed pieces, however, are numerous items of criticism or correspondence that rail against literary obsolescence as a kind of death, if not in so many words. For instance, it emerged in our discussion that Edgar Jepson's essay "The Western School" is talking about the deterioration of contemporary literary production in a way that is evocative of death and shares valences with the ways in which other pieces deal with death more explicitly. Most importantly, it was only through the comparison with the other pieces dealing with death -- which appear later in the issue -- that we were able to read it in concert with the emergence of death at all.

For our workshop, I prepared a dataset from information my students have culled from the MJP. These data derive from an interactive timeline project that my students in periodical studies have done in four different courses at three different universities. The students curate content in the MJP by entering items and their bibliographic data into a shared Google Docs spreadsheet. The bibliographic data include author, genre, page numbers, and publication date, the latter of which places the item on the timeline. More importantly, students assign topic tags to each item in order to provide a sense of its meaning. In the timeline, readers can click on an item to view a description and link to its location in the MJP. The timeline is additionally surrounded by filters from the data types (author, genre, magazine, topic tag) that allow the reader to refine further her exploration of the data. The idea is that codifying the metadata into the timeline will allow for discoveries and provoke questions as more and more content is entered. This is one possible technology for uncovering and articulating emergence within and across magazines. Using some (carefully massaged) data from the timeline, I made some CSV files to feed to Gephi for the generation of network graphs. While the timeline interface separates connections in time, and therefore also in space, Gephi presents them in a 2D, timeless space so that all are apparent. In the interest of transparency, I also posthumously added the Death tag and some other ones that reflect our discussion from the first day. These include Greatness and Mediocrity (among others), since we noticed that Yeats' poem takes pains to point out how much Gregory had in fact not accomplished relative to his more prolific peers.

So, with these connections in mind, we get this bibliographic and thematic overview of the September 1918 Little Review. The image uses the Fruchterman Reingold layout algorithm (see here for information about Gephi layouts) to place the more highly connected nodes in the center, grouped by edge weight. That means the nodes that have more connections with each other will be in closer geographical proximity. One pattern that emerges is the relative distance of the genres Poem and Short Story. They appear on opposite sides of Death and, aside from that, have nothing in common thematically or in terms of contributors. In Gephi, one can mouse over a node in order to view its nearest neighbors (a degree separation of 1) in context of the larger graph (see here).

 

That effect becomes even more pronounced if we generate an ego network of Death, with the node sizes re-set for context, and mouse over different terms within it so as to highlight their immediate connections. The disparity between Poem and Short story becomes even more clear. In this picture, which shows the micronetwork that emerges when mousing over the Short Story node, we see that the Short Story genre has virtually nothing in common with any other content.

 

On the other hand, mousing over the Poem node reveals a much wider micronetwork that connects Yeats' poem with several works by Eliot, Jepson's Essay, and the triad of Greatness, Mediocrity, and Irony. Why would Margaret Anderson and Jane Heap apparently use the short stories to carve out a separate space for the lowly and dissolute? Is it part of a strategy to explore different aspects of death and dying, using generic properties to present different facets?

 

Interestingly, the Novel and Essay genres bear nothing in common with Short Story but have several connections with Poem, particularly in the topics of Greatness, Poetry, and Art, and in featuring Irony as a device. This graph shows what happens when we mouse over Novel. It would seem that the short stories in this issue are more straightforwardly realistic about death and dissolution than their longer form and poetic peers.

 

 

The isolation of the Short Story group is even more pronounced if we change to the Yifan Hu Proportional layout algorithm, which calculates centrality and repulsion in such a way that clusters become apparent while emphasizing the differences as outlying branches (see here for more information about Gephi layouts). The bottom portion of this image shows the two clusters of the Hecht and Anderson stories as they attach with Death, while the entire field bears no relationship with them. Likewise, Ford Madox Hueffer's installment of Women and Men bears little relationship with the rest of the issue, constituting an outlying branch of the Novel node not connected with Death. Conversely, Joyce's installment of Ulysses is quite well integrated with other memes in the issue, while Heuffer's is the only literary piece not connected with Death.

It is interesting to note what else is not connected with Death. The advertisements at the back of the issue, one of them for a Hammond typewriter that emphasizes the Greatness of Literature, as if the tool could somehow make the buyer a great writer ("No Other Typewriter Can Do This"), but obviously without the Ironic representation of Mediocrity that characterizes much of the actual literary content. The ad for Mason & Hamlin, "The Stradivarius of Pianos," also emphasizes Greatness as a selling point. In thinking about these relationships, the ads seem to represent the lack of Ironic insight into worthiness and Mediocrity that take front and center in Yeats' poetic argument. While we can't know what the editorial intent was in placing these objects together (or if there even was one), what we can be sure of is that a system which enables readers to tag content semantically can help to provoke new questions that might be worth going back to the magazines to investigate.

I raise the latter issue about advertisements because they constitute a part of the emergence of Death that was not discussed in class. It occurred to me as I was massaging the spreadsheet to feed to Gephi and thinking about the pieces we had read. This would be an example of how collaborative markup, say of a small working group of scholars or even one comprising all the members of our field, can aid in the discovery of emergences utilizing artifacts we might not individually have noticed or thought of as relevant. This would be a reactive and not so much a predictive method, one that utilizes both the stable bibliographic data as well as the idiosyncratic scriptons assembled by readers.

I would like to suggest, though, that a predictive method might be found in text mining. Using the Voyeur Tools for analyzing corpora, we can see the chronological surges in word frequency over the entire Little Review corpus in the MJP. A spike in word frequency in a given issue might mean that we are more likely to generate scriptons related to that word. As a brief example, see this Voyeur corpus of The Little Review from its beginning in 1914 through the Winter 1922 number (use the gear cogs to apply the Taporware stop word list, and be sure to make the change globally).

After applying the stopword list (and making a few other manual removals), this picture shows the raw frequency trends of the top five reoccurring words in the currently available segment of The Little Review. The word life has the top overall frequency in the corpus, but its usage declines precipitously with the start of the First World War. What does the trend look like for the word death, and is there a significant pattern around the September 1918 issue, as above, or perhaps at different key moments in the War?

 

 

 

 

A much bigger, live version of this graph allows us to gain more information by mousing over and clicking. The word art has a massive and unique surge in Volume 3, number 8 (January 1917). The reason for it is that Jane Heap took over as content editor for that issue with a round of essays on art and aesthetics. Although a 3- or 4-issue arc of the run bears a higher focus upon art, the word (and the subject?) drop almost completely for a bit and then return to a normal pattern for the rest of the run, with some higher spikes later on. I will address this sort of method in more depth, looking specifically at how we can locate emergences in the issues to which the graph directs us.

Lecture: The Future of Authorship: Scholarly Writing in the Digital Age (Kathleen Fitzpatrick)

Some of you might be interested in a lecture Wednesday on scholarly publication and digital media:

Lecture: The Future of Authorship: Scholarly Writing in the Digital Age (Kathleen Fitzpatrick)
Date: March 2
Time: 4:00 p.m.
Location: Levis Faculty Center, Third Floor

More information can be found on the website of the IPRH, which hosting professor Fitzpatrick:

http://www.iprh.illinois.edu/news/iprhevents/default.aspx#KathleenFitzpa...

Digital Humanities 2009, Conference Twitter Feed

Since the 2009 Digital Humanities Conference is currently taking place at the University of Maryland, I decided to add its Twitter feed to the aggregator on our website. You can now peruse the updates by following the Feeds link at the top of the site, or by going directly to http://macaulay.cuny.edu/seminars/material-modernism/aggregator.

At the digital humanities conference, leading minds in the use of technology and information science for humanistic inquiry are presenting projects, ideas, theories, etc., many of which directly impinge on our work here. Feel free to blog any questions or observations, or to make comments on this post.

Check out participants' lightning interviews and other media here:

http://www.youtube.com/results?search_type=&search_query=dh09&aq=f

You might also wish to look at the leading journal in the field, Digital Humanities Quarterly.