GraphGrid Python Template Project
Introduction
The GraphGrid Python Template Project provides a generalized codebase for creating an Airflow deployment to tackle tasks which benefit from Airflow orchestration.
Environment
The GraphGrid Python Template Project requires the following software libraries:
helm
kubectl
minikube
docker
Features Overview
template-project
command line interfacelocal-airflow
command line interface- Basic expandable Airflow
DAG
- Testing suite
- Linting
Usage
This project is intended to be expanded upon by employing it into a specific task (e.g. machine learning, ingest, etc.). However, the existing CLIs are the main way of using this codebase. The local-airflow
CLI includes commands for deploying Airflow locally and managing the images its built upon. Whereas, the template-project
CLI is a simple skeleton CLI to be built upon, but includes print
, lint
, and test
.
The DAG and other Airflow specific functionality is tied directly to the Airflow dockerfile and Docker image. Therefore, the Airflow image needs to be rebuilt when expanding the DAG to include new tasks or new functionality. This also applies to anything related to backend Airflow functionality, e.g. where DAGs are stored, how logging functions, etc.
All other functionality is tied to the template-project image. For example, if a new invokable command is added to the template-project CLI, or the way existing commands change (e.g. the print
/test
/lint
commands), then the template-project image needs to be updated to reflect that.
Local Airflow CLI
The local Airflow cli has multiple commands in order to facilitate local development and testing. The expected usage for the cli takes the following form: local-airflow <command> <command_args>
.
This section will cover all the available commands for the local-airflow
cli.
deploy
Locally deploy the Airflow cluster with all dependencies
Arguments: None
dashboard
Open the minikube dashboard to debug the deployment
Arguments: None
port_forward
Forwards ports to allow external local access to the Airflow
Arguments: None
teardown
Tears down the local minikube
deployment
Arguments: None
update_dag
Rebuilds the Airflow DAG image and redeploys with the latest image
Arguments: None
update_template_image
Rebuilds the template project image
Arguments: None
build_images
Build the template project image and Airflow DAG
Arguments: None
Template project CLI
The template project cli is setup to be expanded upon as it is how the DAG spins up tasks within the Kubernetes cluster. However, it is dyanmic in that it can be invoked locally outside of kubernetes to perform tasks.
The expected usage for the cli takes the following form: template-project <task> <task_args>
.
This section will cover all the available commands for the template-project
cli.
print
Prints a given message to stdout
Arguments:
message
(Required): The message to print
test
Runs the entire test suite on the codebase and reports basic coverage
Arguments: None
lint
Lints the entire codebase and reports a score
Arguments: None
Development and Extension
The intended means of extending and developing via the template-project is through extending the template-project
CLI. For example, if the template-project
is used to tackle a machine learning project, then this would include adding commands to the CLI which could build, train, and evaluate machine learning models. As this follows a basic directed workflow (i.e. building before training, and training before evaluating), this can define the DAG for this machine-learning task. Therefore, we add these invokable commands as Airflow tasks within the DAG. Then we can leverage the local-airflow
CLI to deploy and test out this workflow and codebase.