Customized ZenML Pipeline

Airflow is designed for workflow management rather than data flow, so it doesn't support transferring data between tasks. ZenML, introduced in 2020, enhances the Airflow experience with easier setup and more flexible data transfer capabilities.

ZenML Setup

Open your terminal, and follow these steps for the first time setup:

  1. pip install zenml

  2. pip install google-cloud-bigquery-storage

  3. sudo python -m pip install google-cloud

  4. Run initialization commands:

  • pip install zenml tensorflow

  • git init

  • zenml init

Simple ZenML Airflow Pipeline

Lady H. built this pipeline with the identical 2 tasks used in the simple Airflow pipeline, data spliting task followed by model training task.

๐ŸŒป Check Simple ZenML Airflow DAG >>

Comparing with the airflow pipeline, there are 3 major differences in ZenML:

  1. User configurable parameters can be defined in a class that's accessible to all the functions. In this example, you can see parameters in pipeline_config can be called by both split_data step and train_evaltor step.

  2. Pandas dataframe can be passed across tasks. As you can see, the output of split_data step can be the input of step train_evaltor.

  3. In DAG = pipeline.run(), "DAG =" is needed in order to make sure your DAG will appear in Airflow UI http://0.0.0.0:8080

ZenML also allows you to inspect each step of the pipeline. For instance, the code below was trying to inspect step train_evaltor:

๐ŸŒป Check ZenML pipeline inspection code >>

The user interface looks the same as the Airflow pipeline:

Last updated