What did nineteenth-century texts say about the British Empire, and how can we use code to figure it out? In the next few posts, I’m going to explore some digital humanities ideas, using R to sift through some nineteenth-century texts from the HathiTrust Digital Library. I’ll look at novels, essays, travelogues, etc. I’ll start with HathiTrust’s Extracted Features dataset, which gives word counts for tons of digitized volumes. I’m aiming for 500–1,000 texts with terms like “colony,” “trade,” or “power,” just to keep it manageable. In R, I’ll use tools like tidytext
and dplyr
to clean things up—removing stop words—and focus on words tied to the empire. My approach will be simple.
First, I’ll look at which empire-related words show up most, broken down by decade—maybe “conquest” early, “trade” later? Then, I’ll try sentiment analysis with the NRC lexicon to see if I can detect any tonal shifts in the way empire is discussed. After that, I’ll run topic modeling with LDA to spot patterns—military topics, economic topics, whatever emerges. I’ll use R to visualize this data, making it comprehensible and attractive.
If you’re into digital humanities, this might spark ideas about blending tech with texts in the public domain. If you like coding, it’s an interesting exercise in working with messy data in R. Stick around!
Leave a Reply