Graph with the Wind: Gone with the Wind, Digital Humanities, and Adaptation

Harrison Brockwell

30 April 2019

Digital Humanities

Dr. Drouin

Whenever a book is adapted into a new medium, regardless of the popularity of the original text, someone is guaranteed to come out of the woodwork and claim that the book is better than the adapted version, often condescendingly and judgmentally so. An oft cited-reason for this is that the film, the most common method of adaptation in contemporary media, is unable to capture the essence of the book, and leaves out a lot of the aspects of the book that are not essential to the overall plot, making for a lesser experience overall. While this is often an exaggeration of the sort of things that change in the process of adaptation, it is not entirely untrue. Certain aspects of novels must be compressed or changed in the process of adaptation to film, and sometimes these alterations result in modifications to the underlying thematic elements or change the focal points of whole sections of narratives.

These are issues that must be considered even when examining Gone with the Wind, one of the largest cultural touchstones in American cinema. Both the novel and the film recount pretty much the same story, following southern belle Scarlett O’Hara as she has to cope with and process the Civil War and the effect it has on her family’s plantation, the city of Atlanta, and her own interpersonal relationships, romantic and platonic. This tale takes on the form of a grand, sweeping epic, as both a 1000 plus page novel and a film whose runtime clocks in at three hours fifty-eight minutes. But of course, in order to fit one thousand pages of anything into a run time shorter than 10 hours – something even high profile, high budget, multi-season television adaptations struggle to do – elements of the source text must be altered, adjusted, or even cut out in their entirety.

This is an aspect of analysis that digital humanities is well equipped to assist in. As both versions of Gone with the Wind are exceedingly large in scope, the amount of time required to close read both of them is equally large. By employing digital tools – such as Voyant Tools and the graphs and visualizations within – to comb through the textual elements of each version, it becomes easier to track some of larger, reoccurring themes and relationship arcs that define Gone with the Wind. However, these tools are also useful for pointing out what goes unnoticed, and the things that are implied through subtext and character networks and connections, which a tool like Gephi is pretty much designed to illuminate. Even when adaptations of a work are mostly true to the source material, as is the case with Gone with the Wind, segments of the material will be changed due to the shape of the culture and the attitudes of the filmmaking industry at the time.

My methodology for this whole project ended up being fairly straightforward. At the outset, I sourced a plaintext version of the Gone with the Wind novel from Project Gutenberg Australia and trimmed out the metadata details about the novel and Project Gutenberg itself. Once the novel was isolated in this manner, I plugged the novel into Voyant Tools and then pulled graphs out that I found to be the most revealing and enlightening. This also required trimming out some pronouns, conjunctions, and contractions which did not contribute any analytical value and were skewing some of the graphs. With these graphs now created, I turned my attention towards finding versions of the script. I pulled a version of the script with just the dialogue in it, and again ran it through Voyant Tools. After this, I took a PDF version of the full script and combed through it by hand, creating a spreadsheet to keep track of who is speaking, and to whom their lines are directed. At some points, I had to make an executive decision as to who is being spoken too, often defaulting to the last character the speaker was addressed by, unless otherwise specified. There are also instances where it is clear that the white characters are speaking past the black characters, and thus have to consider who the last white character that spoke to the speaker was. After generating this spreadsheet, which totaled 2,222 lines of dialogue, I put the whole thing into Gephi, in order to create a network graph of the character interactions, spheres of influence, and how central they are to the overall tapestry of Gone with the Wind’s narrative. I also took the newly constructed list of speaking parts in the film and plugged that into Voyant Tools as well, to track who speaks the most and in which sections.

The logical place to start this project is with the novel and general word count and frequency. Of the 422,779 words that make up the Gone with the Wind novel, 2539 of those are the word “Scarlett” (Fig. 1). This is not a surprising statistic, as Scarlett is the main character and focal point of the story. That being said, there are more instances of her name than the 3 next most common words – all of which are names as well – combined. As is clear in the frequency chart (Fig. 2) and the word loom (Fig. 3), instances of Scarlett’s name are more frequent than almost any other word in the novel by a fairly substantial margin.

What makes Figure 2 interesting is the relationships the other names have in relationship to both “Scarlett” and each other, specifically the occurrences of “Ashley” and “Rhett,” the two main romantic interests for Scarlett in the story. Everyone aware of Gone with the Wind usually recalls the love affair between Rhett Butler and Scarlett as the centerpiece of the story, which is true. However, it is not always the centerpiece of the actual text. Instances of Ashley’s name are more numerous than Rhett’s, at a count of 859 to 832, a difference of 25 times. While this may not seem like a lot, it contests the popular notion that this is just a story about Rhett and Scarlett; she has other men on her mind for a lot of the narrative. If only looking at the dialogue of the film, this would be supported, as there are 117 uttered instances of the name “Ashley” versus 109 for “Rhett” (Figure 4). That being said, the structure of the film does not, and explains why the culture at large has an understanding of the narrative centered on Rhett and Scarlett. Rhett has the second most lines of dialogue, clocking in at 495 lines of dialogue across the film’s nearly 4 hour run time, while Ashley comes in at a paltry 97 lines (fig. 5). This shift away from Ashley to focus more on Rhett makes sense from a Hollywood perspective, as the studio had Clark Gable cast as Rhett, and most likely wanted ensure the largest amount of screen time for the superstar as they could.

This change in favor of Clark Gable is reflected not just in number of lines, but in the distribution of his lines and screen time. In figure 2, when tracking the relationship between mentions of Ashley and mentions of Rhett in the novel, Ashley’s line stays above Rhett for the first twenty-five percent or so of the text, which roughly corresponds to the section of the narrative before the Civil War begins, when nothing is occupying Scarlett’s mind except for social norms and marriage prospects. However, during the middle 50% or so of the novel, an inverse relationship forms, where Ashley becomes more prominent when Rhett is not being mentioned, which corresponds to the sections of the plot where he disappears for stretches of time. In these sequences, the graphs indicate that Scarlett’s mind wanders away from Rhett and back to Ashley, undercutting the magnetism of Rhett’s character, as he is unable to stay in her head for too long after he is gone. Indeed, even towards the end of the novel, once Scarlett and Rhett have been married, instances of Ashley’s name being mentioned in the text do not drop substantially; in fact, they increase in frequency, even in the face of the large spike in Rhett’s frequency.

These trends are generally reflected in the dialogue from the film adaptation of Gone with the Wind, as shown in the word frequency graph for the spoken lines (Fig. 6). Again, instances of Ashley remain above Rhett for the first 25%, an inverse relationship is established for the middle 50%, and then Rhett shoots up and stays above Ashley for the remaining 25%. However, the most interesting thing about comparing these two charts is the heights “Rhett” reaches in both of these. In figure 2, Rhett bypasses Scarlett in frequency only once, towards the end of the novel, and then drops back down below her frequency for the last section of the novel. In figure 6, Rhett bypasses Scarlett twice, also towards the end of the narrative. Both of these coincide with the section of the text where Scarlett and Rhett are married, have a child, and are deep in the midst of marital strife, so naturally it follows that Rhett’s name would be said a lot more than it had previously. However, the fact that Ashley’s name is still spoken with only a slightly lower frequency at the end of the novel, while Scarlett is married, as it is at the start of the novel when she is actively pining after him, would indicate some consistency in adaptation with regards to Scarlett’s views on Ashley as a romantic interest.

However, this falls apart when the actual number of spoken lines comes into consideration (figure 7). Rhett has more spoken lines than Ashley for 90% of the script, as Ashley is pushed aside in favor of foregrounding the relationship drama between Rhett and Scarlett, which has affected the general cultural perception of Gone with the Wind as being mostly about those two. In reality, it is a result of adaptational changes that have brought Rhett Butler to the fore as much as he is. This is not the only affect the focus on Rhett and Scarlett brings. Much like how figures 2 and 6 have spikes in “Rhett” frequency, so too is there a spike towards the end of the script (figure 7), where Rhett is on screen and involved in proceedings pretty thoroughly. What is interesting about this is, once Rhett is speaking on screen consistently and frequently, Scarlett’s frequency of lines drops lower than it was at the start of the film, and does not rise again until after Rhett’s line frequency drops substantially right at the end. This is emblematic of a diversity and representation problem that Hollywood has had since the start of the studio era. Despite Scarlett having the most lines in the film by a margin of 256, her role is diminished somewhat when the male lead becomes a consistent part of the narrative. This interpretation is not only based in the graph, but also in the billing and on the original poster for Gone with the Wind, where Clark Gable gets first billing over Vivian Leigh, again, despite her having more screen time and lines than him.

Shifting the romantic focus and dynamic of Gone with the Wind is not the only change in adaptation visible in the graphs from Voyant Tools. Circling back to the loom graph for the novel version of the text (figure 3), the word frequency lines for every unique, non-article word is layered on top of each other. A majority of this is an unreadable mess, which makes the outlying spikes in word frequency stick out exceedingly well. Scarlett’s line is high above the rest, and the spike in Rhett’s name is clearly visible near the end. Right in the middle of the chart, though, is a spike in frequency for the word “money.” As the Civil War was directly related to the main economic propellant for the South, it stands to reason that, at some point, money would be some sort of concern for Scarlett and the novel at large. As the O’Hara plantation, called Tara, is essentially destroyed in the course of the war, I generated a graph to show the relationship in frequencies between “home,” “tara,” and “money” in the novel (figure 8). For the first section of the novel, before the war and before Scarlett has to relocate to Atlanta, the topic of money is almost invisible, with an extremely low frequency that blends into the loom (figure 3) and stays below “home” and “tara” (figure 8) until the major spike in the middle of the novel. Right around that spike, however, both “home” and “tara” drop in frequency, which corresponds to the fact that Scarlett loses the plantation and takes it upon herself to repair it and find a home again. At the outset, Scarlett does not have to worry about where her money is coming from, or the fact that it is built upon slave labor, so it barely comes up. “Money” also trails off by the end of the novel, but never drops down below “home” and “tara” again, as Scarlett, through marrying Rhett, no longer has to worry about having money. That being said, her wealth has changed from old money, landed elite wealth into newer, earned wealth, which is why it stays in the topic of conversation to a degree, while still fading back into the cluster of graph lines at the end of the loom (figure 3).

These trends are not replicated in the dialogue analyses. Creating a graph for the same terms using the dialogue of the film (figure 9) shows a similar peak for “money,” but less of a focus on outright finances overall. Instead, there are more frequent peaks for “tara” in the dialogue, occurring both earlier and later in the script, including one that corresponds to the spike in “money.” This indicates a shift towards emphasizing the loss of the family homestead, legacy, and land, and Scarlett’s desire to reclaim all that Tara symbolizes. This reinforces the Lost Cause and South Will Rise mentalities, the notion that the Civil War was not about slavery at all, but really a war of aggression from the Northern States who disliked the southern way of life.

This denial of racism and slavery ends up reflected in the word count and frequency charts (figures 1 and 4). Out of the approximately 423,000 words in the novel, there are only eighty-two appearances of the root word “slave,” 203 instances of some variation of “negro,” and 104 uses of “nigger.” The script only has five spoken instances of “slave,” one of “negro,” and none whatsoever of “nigger.” The racist roots of the South are brushed aside almost entirely by the film adaptation, leaving any racial interpretations to implication and interpretation.

What goes unspoken in Gone with the Wind, however, does not go entirely unexamined, as film involves more than just the words that are said and when. As there is a performative and visual component to film that is best represented by making a network graph, in order to show how everyone in the film relates to everyone else (figure 10). Franco Moretti demonstrates the potential effectiveness of network graphs through an analysis of Hamlet, to highlight how central some characters may be to a scene’s action than others. In terms of the usefulness of network graphs, Moretti argues they are necessary to remove “characters and interactions from a dramatic structure, and turning them into a set of signs” which are easier to parse out, and in less time (11). This is exactly how the character network for all the spoken lines in Gone with the Wind works; by removing the actual words spoken and instead focusing on each character’s sphere of influence, it becomes easier to interpret aspects of the film that are left unstated.

Which is exactly what Gephi provides, when given a spreadsheet of all speaking parts and the addressee of each line (figure 10). A lot is apparent from just this graph as a whole. Scarlett is clearly the focal point of the narrative, having the most central location in the graph and the most connections to individual nodes. Her family and Rhett make up the next most central ring, and then the tertiary and single line parts fan out from there. What is most interesting, however, is the location and spheres of influence of the O’Hara’s slaves, namely Mammy (figure 11), Pork (figure 12), Big Sam and Elijah (figure 13). Mammy is the most central of all the slaves, as she works in the house and travels with Scarlett through most of the film. Despite being the most mobile of the slaves and having the largest network, Mammy rarely has an interaction with someone who is not of direct use or in the way of the family in some way or another. It gets more concerning the deeper into the plantation slaves the network gets, as Pork literally does not have an interaction outside of the family and family slaves, while Big Sam only talks to Frank Kennedy as he is approaching the house at Tara, and Elijah only speaks to Big Sam while they are both working on the plantation. This restriction of interaction for the black folks in Gone with the Wind is representative of the larger unspoken racism within the text. As these slaves were considered property, it stands to reason, by the logic of the film and script, that they should not have any interactions outside of the family unless absolutely required. This corralling and gatekeeping of black characters, even after the Civil War section of the film is complete, serves to reinforce the stereotypical notion that the majority of slaves actually enjoyed their work and their masters, and did not see any need to extend their lives outside of that sphere of influence.

This is not the only questionable material that goes unnoticed by digital tools through implication and insinuation. Late in the film and novel, there are numerous instances of domestic abuse, one count of marital rape, a child death, and a miscarriage, none of which are represented in the Voyant tools charts nor the Gephi network graphs. This is most likely due to the sudden and isolated nature of the incidents in question, in addition to never really being named entirely. Subtleties and isolated events avoid the watchful eye of digital humanities tools and algorithms due to the fact that they are not quantifiable on a large scale. Digital humanities is said to be most useful with large data sets, but the downside is that it fails to notice the smaller, more subdued aspects of material that can often drastically shape an individual’s interpretation of a work. Changes made to a work in the process of adaptation feels like one of the better small-scale applications for digital humanities, as it makes it easier for individual scholars to track thematic alterations from novel to film, but the scholar can also fill in the minutiae that the algorithms end up missing, due to low word frequency or implications at other themes. Gone with the Wind, due to its mammoth length and cultural import, serves as the perfect test case in using digital tools to examine adaptations of works.


Works Cited

Mitchell, Margaret. Gone with the Wind. Project Gutenberg Australia. Uploaded February 2002, accessed April 2, 2019.

Moretti, Franco. “Network Theory, Plot Analysis.” Literary Lab. Pamphlet 2. 11 May 2011.


Novel Word Count Chart (Figure 1)

Novel Most Frequent Words Trend Graph (Figure 2)

Novel Word Frequency Loom Graph (Figure 3)

Script Dialogue Word Count Chart (Figure 4)

Script Lines Count Chart (Figure 5)

Script Dialogue Frequent Words Trend Graph (Figure 6)

Speaking Parts Frequency/Density Graph (Figure 7)

Novel Home/Tara/Money Frequency Graph (Figure 8)

Dialogue Home/Tara/Money Frequency Graph (Figure 9)