Living in the cloud(s)

It’s been a bit since my last few posts here–a new job and other issues have taken up a lot of my time lately, but I do want to dive into another blog series documenting a project. This time I’m going to shift gears from data analysis and presentation to building a cloud lab. More and more of my work involves planning and implementing infrastructure, and that’s probably true of many of us on the technical end of cultural organizations, whether it be libraries, museums and archives, digital humanities projects, etc.

There’s probably not much of a reason to spell out the virtues of the cloud these days, but, just in case, there are quite a few benefits to libraries, archives, and museums. While we’re all familiar with the arguments that the cloud provides scalability, redundancy, and stability, the hybrid cloud empowers our institutions bridge on-premises systems with the cloud, keeping control over sensitive data (user data or otherwise) while retaining the benefits of cloud computing.

The major benefits of cloud computing in research and digital cultural heritage are, among others:

  • Massive Data Growth: Digital archives, from digitized manuscripts to born-digital datasets, are growing exponentially.
  • Collaboration Needs: Scholars and archivists need secure, real-time access to resources across institutions, often globally.
  • Cost and Complexity: Fully cloud-based solutions can be expensive and rigid, while on-premises systems lack scalability.
  • Data Sensitivity: Cultural heritage data often requires strict compliance with privacy and preservation standards (e.g., GDPR, DPC).

But the hybrid cloud specifically has its benefits, offering:

  • Flexibility: Store sensitive archival data on-premises while leveraging the cloud’s compute power for data processing or public-facing apps.
  • Cost Efficiency: Scale cloud resources up or down based on project needs, avoiding the expense of overprovisioned local hardware.
  • Collaboration: Enable global access to digital collections via secure, cloud-based tools, fostering interdisciplinary research.
  • Resilience: Protect against data loss with backup and disaster recovery, critical for preserving irreplaceable cultural artifacts.

For example, a digital humanities project analyzing the personal correspondence of Anthony Trollope could process large datasets in the cloud while keeping original scans on local servers for compliance.

Tools

To demonstrate how we might set up a cloud lab, I’m going to use Python and Azure, but you could use Bash and AWS or PowerShell and the Google Cloud Platform–it doesn’t really matter, you could use whatever tools you’re most comfortable. I’m using Python for its portability and Azure because I’m already familiar with AWS and am pursuing an Azure administrator cert.

One quick plug for Python: Python’s versatility and huge number of libraries makes it ideal for automating workflows, managing data pipelines, and scripting hybrid cloud operations. Libraries like pandas for data analysis or azure-sdk for cloud management streamline tasks for systems administrators and researchers alike.

And a quick plug for Azure (as if Azure needs me to plug it): Azure has specific tools for like Azure Arc for hybrid management, Blob Storage that’s flexible and ideal for the kinds of unstructured data we often see in our field. AWS definitely has similar tools if that’s your preference (I’m less familiar with GCP).

Finally, both Python and Azure have tons of documentation, both traditional documentation, and a huge user base with YouTube videos, online courses, and the like.

Hopefully this won’t just be a technical exercise, but a method to future-proof an archive or a digital humanities project. It offers the flexibility to scale, the security to protect cultural heritage, and the automation to save time and easily repeat tasks when necessary.

In the next post, we’ll work on setting up the hybrid environment, starting with the tools and configurations.

Leave a Reply

Your email address will not be published. Required fields are marked *