Digging into the British Empire with R and HathiTrust

Mar 7, 2025

What did nineteenth-century texts say about the British Empire, and how can we use code to figure it out? In the next few posts, I’m going to explore some digital humanities ideas, using R to sift through some nineteenth-century texts from the HathiTrust Digital Library. I’ll look at novels, essays, travelogues, etc. I’ll start with HathiTrust’s Extracted Features dataset, which gives word counts for tons of digitized volumes. I’m aiming for 500–1,000 texts with terms like “colony,” “trade,” or “power,” just to keep it manageable. In R, I’ll use tools like tidytext and dplyr to clean things up—removing stop words—and focus on words tied to the empire. My approach will be simple.

First, I’ll look at which empire-related words show up most, broken down by decade—maybe “conquest” early, “trade” later? Then, I’ll try sentiment analysis with the NRC lexicon to see if I can detect any tonal shifts in the way empire is discussed. After that, I’ll run topic modeling with LDA to spot patterns—military topics, economic topics, whatever emerges. I’ll use R to visualize this data, making it comprehensible and attractive.

If you’re into digital humanities, this might spark ideas about blending tech with texts in the public domain. If you like coding, it’s an interesting exercise in working with messy data in R. Stick around!

Digging into the British Empire with R and HathiTrust

Leave a Reply Cancel reply

You may also enjoy…

Living in the cloud(s)

Jastrow search

Digging into the British Empire with R: Lexical topics, part 5

Digging into the British Empire with R: Maps!, part 4

Digging into the British Empire with R: Loading a corpus, part 3

Digging into the British Empire with R: Swapping out HTRC for Gutenbergr, part 2

Digging into the British Empire with R and HathiTrust, step 1

Digging into the British Empire with R and HathiTrust

Using Ruby for Scripting