Apache Airflow is a powerful open-source tool for orchestrating workflows, but getting it up and running—especially on Windows—can be a challenge. During my recent deep dive into Airflow, I explored various installation methods and discovered the best approach that worked for me. In this post, I’ll share my findings and guide you through setting up Airflow on a Windows machine in a way that actually works.


What is Apache Airflow?

Apache Airflow is an open-source platform for developing, scheduling, and monitoring batch-oriented workflows. It allows users to define workflows as Directed Acyclic Graphs (DAGs) using Python. Airflow then handles the scheduling and execution of tasks, ensuring efficient workflow management.

While Airflow itself isn’t particularly difficult to use, setting it up—especially on a Windows system—can be quite tricky due to its reliance on Linux-based dependencies.


Challenges of Installing Airflow on Windows

Airflow is designed to run on Unix-based systems, which means Windows users need to find workarounds. The most common approaches include:

  • Running Airflow inside Docker containers
  • Setting up Windows Subsystem for Linux (WSL)
  • Using a Virtual Machine (VM) with Ubuntu

Each method has its pros and cons, but in my experience, running Airflow on an Ubuntu VM provided the most stable and reliable setup.


Setting Up Airflow on an Ubuntu Virtual Machine

If you’re looking for a robust way to run Apache Airflow on Windows, using a virtual machine with Ubuntu is a great option. Here’s how you can set it up:

1. Install VirtualBox and Set Up Ubuntu

  • Download and install VirtualBox.
  • Create a new Ubuntu VM and allocate sufficient resources (at least 2 CPU cores, 4GB RAM).
  • Ensure the system is updated: sudo apt update && sudo apt upgrade -y

2. Install Required Dependencies

  • First, install necessary system packages: sudo apt-get install software-properties-common sudo apt-add-repository universe
  • Install Python and required libraries: sudo apt-get install python3-pip python3-setuptools

3. Install Airflow and Setup a Virtual Environment

To keep things organized, it’s a good idea to use a virtual environment for Airflow. You can do this using Conda:

  • Install Anaconda:
    • Download the Debian package for Anaconda from Anaconda.com
    • Install it with: bash Anaconda3-<version>-Linux-x86_64.sh
  • Create a new Airflow virtual environment: conda create --name airflow-environment python=3.9 conda activate airflow-environment
  • Set the AIRFLOW_HOME environment variable: export AIRFLOW_HOME=~/airflow-environment
  • Install Airflow dependencies: sudo apt-get update && sudo apt-get install -y python-setuptools python3-pip python-dev libffi-dev libssl-dev zip wget sudo apt-get install gcc python3-dev
  • Install Airflow and additional packages: pip install apache-airflow pip install apache-airflow[kubernetes] pip install statsd cryptography pyspark

4. Initialize and Start Airflow

  • Initialize the Airflow database: airflow db init
  • Create an admin user (only needs to be done once): airflow users create --role Admin --username admin --email admin@example.com --firstname Admin --lastname User --password my_secure_password
  • Start the Airflow web server: airflow webserver --port 8080
  • In a separate terminal, start the scheduler: airflow scheduler
  • Open a browser and navigate to http://localhost:8080 to access the Airflow UI.

Final Thoughts

After experimenting with different setups, I found that running Apache Airflow on an Ubuntu Virtual Machine was the most effective way to get it running on Windows. While Docker and WSL are viable alternatives, a VM provided the most stable and predictable experience for my workflows.

If you’re looking for a hassle-free way to get started with Airflow on Windows, I highly recommend trying the virtual machine approach. Hopefully, this guide saves you some time and frustration! If you have any questions or want to share your own experience with setting up Airflow, feel free to drop a comment!