azure

  • Living in the Clouds, 2/x

    In the previous post, I laid out the case for creating a hybrid cloud lab along with various use cases. I’ll use Python and Azure for this project, and I’m using MacOS, but all of these steps can easily be done on Windows or Linux as well.

    The first practical step of setting up this lab is installing Python and a few Python libraries we’ll use to connect to Azure. (I should also point out that it’s a good idea to work on this in Git for some kind of version control.)

    On a Mac, you’ll open the terminal and run:

    brew install python

    This should install Python as well as Pip, Python’s package manager. On Windows, you can either install by downloading an MSI from Python’s website or using an extension in VSCode. Python comes natively with almost every Linux distro.

    Now that we have Python, we’ll start by creating a virtual environment for Python–this is a step it took me too long to learn, and it’s important to ensure no conflicts, dependency nightmares, etc. So open your terminal on Mac or Linux (if you’re using Windows, I strongly recommend following these directions in WSL) and type

    # Create a project directory (if you haven't already)
    mkdir hybrid-cloud-lab
    cd hybrid-cloud-lab
    
    # Create a virtual environment called 'venv'
    python3 -m venv venv
    
    # Activate the virtual environment
    source venv/bin/activate

    Once activated, your terminal prompt will likely change to indicate you’re inside venv, the virtual environment. Now any Python package you install now will only apply to this specific project.

    To confirm we’ve done this correctly, we’ll start a simple web server with a single web page. The point here is to make sure we can host services on localhost. In the terminal write

    echo "<h1>Howdy from localhost!</h1>" > index.html
    
    # You can check the server by running it on port 8000
    python -m http.server 8000

    If everything is working correctly, you should be able to open your web browser and navigate to http://localhost:8000. If you see “Howdy from localhost!,” you’re all set. You can stop the server by clicking into the terminal and pressing Ctrl+C.

    Connecting to Azure

    Now that the local environment is ready, we can install Python’s Azure libraries. The Azure SDK for Python provides a comprehensive set of libraries that allow us to interact virtually with any Azure service.

    Make sure you’re still in the virtual environment, install the Azure packages:

    pip install azure-identity azure-mgmt-resource azure-mgmt-storage azure-storage-blob

    These packages are enabling us to complete certain functions in Azure, Specifically:

    • azure-identity: handles authentication to Azure
    • azure-mgmt-resource: manages Azure resource groups (which is a kind of container for Azure resources)
    • azure-mgmt-storage: creates and manages Azure Storage Accounts
    • azure-storage-blob: interacts with blob storage (like uploading files

    Now that we have these packages, we need the Azure Command Line Interface (Azure CLI). This is a little confusing because we’re using Python for the scripting, so why do we need the Azure CLI? Basically, it makes it easier to authenticate our Python scripts and perform quick admin tasks. You can download the CLI from the Azure website for any OS, but I’m just going to use Homebrew for MacOS here because it’s easy:

    brew install azure-cli

    Now that we’ve installed the Azure CLI, we can login to our account with a quick command:

    az login

    This command opens a web browser and asks for your Azure credentials.

    The last part of the setup process is to actually register an Azure resource provider to our subscription. Basically the resource provider is a service that enables Azure to offer specific resources (like a storage account, for example). Most are registered by default, but best practice is to explicitly register a resource provider for a project. In our case, we’ll register Microsoft.Storage.

    az provider register --namespace Microsoft.Storage

    This step can take several minutes, and you won’t get a notification in the terminal when it’s complete. Give it a few minutes and check the registration state with the following command:

    az provider show --namespace Microsoft.Storage --query "r"registrationState"

    Once that command returns “Registered,” you’re all set.

    So that’s the setup of the local environment and the connection of your local machine to Azure via Python and the Azure Cloud.

    In my next post, I’ll walk through setting up a storage account.

  • Living in the cloud(s)

    It’s been a bit since my last few posts here–a new job and other issues have taken up a lot of my time lately, but I do want to dive into another blog series documenting a project. This time I’m going to shift gears from data analysis and presentation to building a cloud lab. More and more of my work involves planning and implementing infrastructure, and that’s probably true of many of us on the technical end of cultural organizations, whether it be libraries, museums and archives, digital humanities projects, etc.

    There’s probably not much of a reason to spell out the virtues of the cloud these days, but, just in case, there are quite a few benefits to libraries, archives, and museums. While we’re all familiar with the arguments that the cloud provides scalability, redundancy, and stability, the hybrid cloud empowers our institutions bridge on-premises systems with the cloud, keeping control over sensitive data (user data or otherwise) while retaining the benefits of cloud computing.

    The major benefits of cloud computing in research and digital cultural heritage are, among others:

    • Massive Data Growth: Digital archives, from digitized manuscripts to born-digital datasets, are growing exponentially.
    • Collaboration Needs: Scholars and archivists need secure, real-time access to resources across institutions, often globally.
    • Cost and Complexity: Fully cloud-based solutions can be expensive and rigid, while on-premises systems lack scalability.
    • Data Sensitivity: Cultural heritage data often requires strict compliance with privacy and preservation standards (e.g., GDPR, DPC).

    But the hybrid cloud specifically has its benefits, offering:

    • Flexibility: Store sensitive archival data on-premises while leveraging the cloud’s compute power for data processing or public-facing apps.
    • Cost Efficiency: Scale cloud resources up or down based on project needs, avoiding the expense of overprovisioned local hardware.
    • Collaboration: Enable global access to digital collections via secure, cloud-based tools, fostering interdisciplinary research.
    • Resilience: Protect against data loss with backup and disaster recovery, critical for preserving irreplaceable cultural artifacts.

    For example, a digital humanities project analyzing the personal correspondence of Anthony Trollope could process large datasets in the cloud while keeping original scans on local servers for compliance.

    Tools

    To demonstrate how we might set up a cloud lab, I’m going to use Python and Azure, but you could use Bash and AWS or PowerShell and the Google Cloud Platform–it doesn’t really matter, you could use whatever tools you’re most comfortable. I’m using Python for its portability and Azure because I’m already familiar with AWS and am pursuing an Azure administrator cert.

    One quick plug for Python: Python’s versatility and huge number of libraries makes it ideal for automating workflows, managing data pipelines, and scripting hybrid cloud operations. Libraries like pandas for data analysis or azure-sdk for cloud management streamline tasks for systems administrators and researchers alike.

    And a quick plug for Azure (as if Azure needs me to plug it): Azure has specific tools for like Azure Arc for hybrid management, Blob Storage that’s flexible and ideal for the kinds of unstructured data we often see in our field. AWS definitely has similar tools if that’s your preference (I’m less familiar with GCP).

    Finally, both Python and Azure have tons of documentation, both traditional documentation, and a huge user base with YouTube videos, online courses, and the like.

    Hopefully this won’t just be a technical exercise, but a method to future-proof an archive or a digital humanities project. It offers the flexibility to scale, the security to protect cultural heritage, and the automation to save time and easily repeat tasks when necessary.

    In the next post, we’ll work on setting up the hybrid environment, starting with the tools and configurations.