On the weakness of quantitative methods for qualitative data

I've had a lot of trouble figuring out how to go about acquiring usable information from Gephi—while I have traditionally done well in English and decently in math, the areas of qualitative and quantitative study are extremely segregated in my mind.  As a result, I spent a good deal of time fiddling with settings, messing things up, reopening the file, and trying again.  When it came down to it, changing the settings didn't reveal anything new for me.

What I ended up doing was to consider Moretti's analysis—removing points of data and looking at their effect on the overall network—and see where I could go with that.  What I did was a series of removals with a particular sequence of steps.

1) Screenshot

2) Delete the most heavily linked pieces of data—those that show up large and red.

3) Screenshot.

4) Reapply the node size and colour parameters so that the next largest pieces of data move up to attention.

5) Screenshot.

6) Repeat steps 2-5 until no points remain.

I have put together a gif to provide an overview of the process.  (It seemed a bit saner than trying to upload 16 similar screenshots…)



Some observations… In the earlier stages of this process, it was plainly obvious to identify a few nodes at a time to remove—themes and genres early on, and eventually authors, then finally particular works.  As I went on, the number of nodes to be removed became larger and larger, indicating a growing uniformity in the "value" of those nodes.  By the time the values were that unified, the connections were almost the minimum necessary to keep the remaining points all part of the network. 

I found it interesting that, going about in this manner, it took until the 6th iteration to isolate a node from the rest of the network.  The entire web was so interconnected that it took a lot of removal to isolate anything.   This is one of the primary reasons I find Gephi a difficult medium to extract information from—the complex interconnectedness of the points, growing out of a subjective process of tagging, does not lend itself easily to identifying structural centers.  Using the language of Moretti's article, the central points stand out quite easily—death, art, greatness, etc.—but the structural centers are much more elusive.

I think as far as quantitative analysis goes, I would be far more interested in looking at the frequency of particular words in texts or something along those lines, as this would be far more objective and would reveal something that hadn't already been revealed by the process of labeling and sorting, which is the case with the thematic tagging we have here.