Digging into the British Empire with R: Maps!, part 4

Probably the coolest thing you can do with a text analysis project in R is to create data visualizations. In this script, I’ve used our corpus to create some interesting maps based on different regions and their prevalence in the corpus (what I’ve called “narrative intensity” in the charts).

This script takes advantage of the ggplot2 package, which has a lot of flexibility in terms of data display and aesthetic customizations. Here is an example of a map of narrative intensity–in the code, I supply the regions I want to measure intensity for. For this particular map, I also have to define the coordinates, which is kind of a pain, but it also allows you customize on a granular level. In this map, I’ve used the default coordinates for the middle of each country (the generic coordinates available on most maps). But presumably most of the writing about Australia concerns the west coast of the continent and not its center; this is something I could adjust in the code.

A heat map probably makes more sense for this data, and I can make that too. Below, I’ve used a heat map with a customized aesthetic theme and defined fixed coordinates.

ggplot2 is a powerful library, and it’s easy to sink lots of time into beautiful visualizations–the first image was the default visual I didn’t change, the second one I changed a bit. Now that I’m uploading it, the colors are probably a bit brighter than I’d normally recommend using. It’s hard to see some of the gradations in the heat map. I think this is a good example of the fact that the best thing about R–seemingly infinite customizations–can also be the biggest drawback.

In any event, hopefully you can see how powerful text mining and visualizations can be–in mining a corpus of 1,000 works on the British Empire, we can see pretty clearly which locations are the object of the most narrative intensity–what places were being talked about the most. One question that arises from this for me is the place (or lack thereof!) of Latin America in this data. Economic interest in Latin American mining was prevalent in the Victorian Era, so it might be interesting to dive into Latin American regions in the corpus and see how they compare to places that do figure in the map, such as Australia, India, or Canada. And of course, by manually reviewing the corpus data, it would be possible to select texts based on genre; I could imagine novels, for example, might be more focused on one region than financial reports or newspaper articles, or vice versa.

Here’s the script I used for these:

# In this script, we're going to plot some maps with our data to look at discrete 
# regions and evaluate the "narrative intensity" of those regions--in other words,
# how often they were depicted in the corpus 

# First let's install our packages
install.packages("stringi")
install.packages("dplyr")
install.packages("ggplot2")
install.packages("maps")
install.packages("stringr")
install.packages("tidyr")
install.packages("tidytext")

# Now we'll load them
library(stringi)
library(dplyr)
library(ggplot2)
library(maps)
library(stringr)
library(tidyr)
library(tidytext)


# Alternative approach using base R to handle problematic characters
empire_texts <- empire_texts %>%
  mutate(
    text_length = sapply(text, function(x) {
#     # Try to handle encoding issues safely
      clean_text <- tryCatch({
      clean <- iconv(x, from = "UTF-8", to = "UTF-8", sub = "")
      if(is.na(clean)) return(0)
      return(nchar(clean))
    }, error = function(e) {
      return(0)  # Return 0 for length if all else fails
    })
     return(clean_text)
   })
 )

regions <- c("India", "Africa", "Australia", "Canada", "Caribbean", "Egypt")


# calculates the proportional regional focus for texts in the corpus
# important to mutate sub out invalid utf-8, or else subsequent code will fail
regional_focus <- empire_texts %>%
  mutate(text = iconv(text, to = "UTF-8", sub = "")) %>%  # Clean at the start
  filter(gutenberg_id != 1470, gutenberg_id != 3310, gutenberg_id != 6134, 
         gutenberg_id != 6329, gutenberg_id != 6358, gutenberg_id != 6469) %>%
  group_by(gutenberg_id, title) %>%
  summarize(text = paste(text, collapse = " ")) %>%
  mutate(
    text_length = str_length(text),
    india_focus = str_count(text, regex("India|Indian|Hindustan", ignore_case = TRUE)),
    africa_focus = str_count(text, regex("Africa|African|Cape|Natal|Zulu", ignore_case = TRUE)),
    australia_focus = str_count(text, regex("Australia|Sydney|Melbourne", ignore_case = TRUE)),
    caribbean_focus = str_count(text, regex("Jamaica|Barbados|Caribbean|West Indies", ignore_case = TRUE)),
    egypt_focus = str_count(text, regex("Egypt|Egyptian|Nile|Cairo", ignore_case = TRUE)),
    canada_focus = str_count(text, regex("Canada|Canadian|Ontario|Quebec", ignore_case = TRUE))
  ) %>%
  mutate(
    india_ratio = india_focus / text_length * 10000,
    africa_ratio = africa_focus / text_length * 10000,
    australia_ratio = australia_focus / text_length * 10000,
    caribbean_ratio = caribbean_focus / text_length * 10000,
    egypt_ratio = egypt_focus / text_length * 10000,
    canada_ratio = canada_focus / text_length * 10000
  a_ratio = canada_focus / text_length * 10000)
  

# Let's start putting a map together with a focus on different regions 
regional_summary <- regional_focus %>%
  summarise(
    India = sum(india_ratio, na.rm = TRUE),
    Africa = sum(africa_ratio, na.rm = TRUE),
    Australia = sum(australia_ratio, na.rm = TRUE),
    Caribbean = sum(caribbean_ratio, na.rm = TRUE),
    Egypt = sum(egypt_ratio, na.rm = TRUE),
    Canada = sum(canada_ratio, na.rm = TRUE)
  ) %>%
  # Reshape to long format for mapping
  pivot_longer(cols = everything(), 
               names_to = "region", 
               values_to = "focus_strength")

# Creates a data frame with coordinates for each region prevalent in the data
region_coords <- data.frame(
  region = c("India", "Africa", "Australia", "Caribbean", "Egypt", "Canada"),
  lon = c(78.9629, 21.0936, 133.7751, -76.8099, 31.2357, -106.3468),
  lat = c(20.5937, 7.1881, -25.2744, 18.7357, 30.0444, 56.1304)
)

# Combine text focus with region 
map_data <- left_join(region_coords, regional_summary, by = "region")

# Get world map data
world <- map_data("world")

# Create the map! 
ggplot() +
  # World map background
  geom_polygon(data = world, 
               aes(x = long, y = lat, group = group), 
               fill = "gray90", color = "gray70", size = 0.1) +
  # Points for each region sized and colored by focus strength
  geom_point(data = map_data, 
             aes(x = lon, y = lat, 
                 size = focus_strength, 
                 color = focus_strength),
             alpha = 0.7) +
  # Optional: Add region labels
  geom_text(data = map_data,
            aes(x = lon, y = lat, label = region),
            vjust = -1, size = 3) +
  # Customize the appearance
  scale_size_continuous(range = c(3, 15), name = "Focus Strength") +
  scale_color_gradient(low = "blue", high = "red", name = "Focus Strength") +
  theme_minimal() +
  labs(title = "Regional Focus in Texts about the British Empire",
       subtitle = "Size and color intensity show relative focus strength",
       x = NULL, y = NULL) +
  coord_fixed(1.3) +  # Keeps map proportions reasonable
  theme(legend.position = "bottom")

# This is a bit easier because you don't have to supply the coordinates 
# but we do need to normalize the values in the focus strength field, so we'll
# add those values to regional_summary 

# Assuming world_subset is prepared as in previous examples
# If not, here's a quick setup (replace with your actual data prep):
world <- map_data("world")
# Example map_data from earlier
region_coords <- data.frame(
  region = c("India", "Africa", "Australia", "Caribbean", "Egypt", "Canada"),
  lon = c(78.9629, 21.0936, 133.7751, -76.8099, 31.2357, -106.3468),
  lat = c(20.5937, 7.1881, -25.2744, 18.7357, 30.0444, 56.1304),
  focus_strength = c(50, 30, 20, 15, 25, 40)  # Replace with your actual focus_strength
)
country_regions <- data.frame(
  region = c("India", "Africa", "Africa", "Australia", "Caribbean", "Caribbean", "Egypt", "Canada"),
  country = c("India", "South Africa", "Nigeria", "Australia", "Jamaica", "Barbados", "Egypt", "Canada")
)
map_data_countries <- left_join(country_regions, region_coords[, c("region", "focus_strength")], by = "region")
world_subset <- world %>% left_join(map_data_countries, by = c("region" = "country"))

# Create the heatmap
ggplot() +
  # World map with heatmap fill
  geom_polygon(data = world_subset, 
               aes(x = long, y = lat, group = group, fill = focus_strength),
               color = "#2a4d69", size = 0.05) +  # Thin, dark borders for contrast
  # Landmass background (middle)
  geom_polygon(data = world, 
               aes(x = long, y = lat, group = group), 
               fill = "#5A5A5A", color = "#B0B0B0", size = 0.3) +  # Even lighter gray
  # Heatmap layer (top, fully opaque)
  geom_polygon(data = world_subset, 
               aes(x = long, y = lat, group = group, fill = focus_strength),
               color = "#D0D0D0", size = 0.4, alpha = 1) +  # No transparency
  # Bright, adjusted gradient
  scale_fill_gradientn(
    colors = c("#80CFFF", "#CCFF99", "#FFFF99", "#FFCCCC", "#FF99CC"),  # Super bright palette
    name = "Focus Strength",
    na.value = "transparent",
    limits = c(min(world_subset$focus_strength, na.rm = TRUE), 
               max(world_subset$focus_strength, na.rm = TRUE)),  # Full data range
    breaks = seq(min(world_subset$focus_strength, na.rm = TRUE), 
                 max(world_subset$focus_strength, na.rm = TRUE), length.out = 5),
    guide = guide_colorbar(barwidth = 15, barheight = 0.5, title.position = "top")
  ) +
  # Theme
  theme_void() +
  theme(
    plot.background = element_rect(fill = "#2A4D69", color = NA),
    panel.background = element_rect(fill = "#2A4D69", color = NA),
    plot.title = element_text(family = "Arial", size = 16, color = "#FFFFFF", 
                              face = "bold", hjust = 0.5),
    plot.subtitle = element_text(family = "Arial", size = 12, color = "#E0E0E0", 
                                 hjust = 0.5),
    legend.position = "bottom",
    legend.title = element_text(color = "#FFFFFF", size = 10, face = "bold"),
    legend.text = element_text(color = "#FFFFFF", size = 8),
    legend.background = element_rect(fill = "transparent", color = NA)
  ) +
  labs(
    title = "Regional Focus in Texts about the British Empire",
    subtitle = "Heatmap of narrative intensity across colonial regions"
  ) +
  coord_fixed(1.3)

Digging into the British Empire with R: Maps!, part 4

Leave a Reply Cancel reply

You may also enjoy…

Living in the cloud(s)

Jastrow search

Digging into the British Empire with R: Lexical topics, part 5

Digging into the British Empire with R: Maps!, part 4

Digging into the British Empire with R: Loading a corpus, part 3

Digging into the British Empire with R: Swapping out HTRC for Gutenbergr, part 2

Digging into the British Empire with R and HathiTrust, step 1

Digging into the British Empire with R and HathiTrust

Using Ruby for Scripting