Skip to main content
Version: 2.0

GraphGrid Python Template Project

Introduction

The GraphGrid Python Template Project provides a generalized codebase for creating an Airflow deployment to tackle tasks which benefit from Airflow orchestration.

Environment

The GraphGrid Python Template Project requires the following software libraries:

  • helm
  • kubectl
  • minikube
  • docker

Features Overview

  • template-project command line interface
  • local-airflow command line interface
  • Basic expandable Airflow DAG
  • Testing suite
  • Linting

Usage

This project is intended to be expanded upon by employing it into a specific task (e.g. machine learning, ingest, etc.). However, the existing CLIs are the main way of using this codebase. The local-airflow CLI includes commands for deploying Airflow locally and managing the images its built upon. Whereas, the template-project CLI is a simple skeleton CLI to be built upon, but includes print, lint, and test.

The DAG and other Airflow specific functionality is tied directly to the Airflow dockerfile and Docker image. Therefore, the Airflow image needs to be rebuilt when expanding the DAG to include new tasks or new functionality. This also applies to anything related to backend Airflow functionality, e.g. where DAGs are stored, how logging functions, etc.

All other functionality is tied to the template-project image. For example, if a new invokable command is added to the template-project CLI, or the way existing commands change (e.g. the print/test/lint commands), then the template-project image needs to be updated to reflect that.

Local Airflow CLI

The local Airflow cli has multiple commands in order to facilitate local development and testing. The expected usage for the cli takes the following form: local-airflow <command> <command_args>.

This section will cover all the available commands for the local-airflow cli.

deploy

Locally deploy the Airflow cluster with all dependencies

Arguments: None

dashboard

Open the minikube dashboard to debug the deployment

Arguments: None

port_forward

Forwards ports to allow external local access to the Airflow

Arguments: None

teardown

Tears down the local minikube deployment

Arguments: None

update_dag

Rebuilds the Airflow DAG image and redeploys with the latest image

Arguments: None

update_template_image

Rebuilds the template project image

Arguments: None

build_images

Build the template project image and Airflow DAG

Arguments: None

Template project CLI

The template project cli is setup to be expanded upon as it is how the DAG spins up tasks within the Kubernetes cluster. However, it is dyanmic in that it can be invoked locally outside of kubernetes to perform tasks.

The expected usage for the cli takes the following form: template-project <task> <task_args>.

This section will cover all the available commands for the template-project cli.

print

Prints a given message to stdout

Arguments:

  • message (Required): The message to print

test

Runs the entire test suite on the codebase and reports basic coverage

Arguments: None

lint

Lints the entire codebase and reports a score

Arguments: None

Development and Extension

The intended means of extending and developing via the template-project is through extending the template-project CLI. For example, if the template-project is used to tackle a machine learning project, then this would include adding commands to the CLI which could build, train, and evaluate machine learning models. As this follows a basic directed workflow (i.e. building before training, and training before evaluating), this can define the DAG for this machine-learning task. Therefore, we add these invokable commands as Airflow tasks within the DAG. Then we can leverage the local-airflow CLI to deploy and test out this workflow and codebase.