Royal Blue

Introduction

Royal Blue is an Extract, Load, Transform (ETL) data pipeline built on top of AWS using Python, Terraform and Pandas.

It was built as a graduation project for the Northcoders Data Engineering in Python bootcamp that ran from March to June 2025.

Project Description

The purpose of this repository is to build an entire ETL (Extract, Load, Transform) data pipeline in AWS (Amazon Web Services).

It implements a robust and scalable solution for extracting, transforming, and loading data from an OLTP (Online Transaction Processing) PostgreSQL database and loading it into an OLAP (Online Analytical Processing) database.

Follow these links for the origin and destination database Entity Relationship Diagrams.

The data is transformed from transactional day-to-day business data into a Data Analysis-ready format, suitable for multiple Business intelligence purposes.

It uses Python as the main programming language, followed by Terraform for infrastructure as code. It also uses Bash and SQL Scripts to help with build processes and integration testing, and has a full-featured Makefile for convenience.

This project follows the specs proposed for the Northcoders Data Engineering graduation project and was developed as a group effort by @theorib, @Brxzee, @charleybolton, @JanetteSamuels, @sxnfer and @josephtheodore.

Requirements

This project uses uv to manage Python environments, dependencies, running scripts, and for our build process. Make sure it is installed by following the official guide.
You will also need to install the latest version of Python (3.13.3 at the time of this writing).
For local development, you will need the AWS CLI installed and configured with your AWS credentials.
To run some of the integration tests, you will need to install PostgreSQL locally v14 or higher.

Tech Stack

Programming Language & Runtime

Python 3.13.3+

Core Python Dependencies

Psycopg 3 - PostgreSQL database adapter
Boto3 - for AWS SDK integration
pandas - for data manipulation
PyArrow - for Parquet file handling
Pydantic - for data validation and JSON serialisation

Development Dependencies

pytest - and related plugins for testing and coverage
pytest-postgresql - using for running integration tests against local PostgreSQL databases
Ruff - for linting and formatting
Moto - for mocking AWS services during tests
Bandit - for vulnerability and security scanning of source code
IPython Kernel - for VS Code Jupyter notebook support, used for testing and experimenting

Databases

PostgreSQL (used locally for integration tests, as well as being the database used on both sides of the data pipeline).

AWS

Lambda, S3, Step Functions, IAM, Cloudwatch, SNS Email alerts, etc. All accessed using boto3 deployed with Terraform and tested using Moto.

Utilities & Tooling

uv for managing Python environments, dependencies, and running scripts
Makefile for convenience, centralising and simplifying the project’s most common commands (testing, linting, formatting, deployment, etc).

Local Testing Scripts

Bash scripts to run SQL test files against the local PostgreSQL database and capture output for validation.

Installation Instructions

Clone or fork this repository and download it to your local machine:
```
git clone https://github.com/theorib/royal-blue.git
```
Change directory into the cloned repository:
```
cd royal-blue
```

Create a .env file at the root, based on the .env.example provided. Ensure essential variables like those starting with DB_ are set. Others, like S3 bucket names, are only needed if you plan to run scripts locally.

TOTESYS_DB_USER=some_user_abc
TOTESYS_DB_PASSWORD=some_password_xyz
TOTESYS_DB_HOST=host.something.com
TOTESYS_DB_DATABASE=database_name
TOTESYS_DB_PORT=0000
DATAWAREHOUSE_DB_USER=some_user_abc
DATAWAREHOUSE_DB_PASSWORD=some_password_xyz
DATAWAREHOUSE_DB_HOST=host.something.com
DATAWAREHOUSE_DB_DATABASE=database_name
DATAWAREHOUSE_DB_PORT=0000

# For local integration tests only:
INGEST_ZONE_BUCKET_NAME=some_bucket_name
PROCESS_ZONE_BUCKET_NAME=some_bucket_name
LAMBDA_STATE_BUCKET_NAME=another_bucket_name

If you forked this repository and want CI/CD to work as intended, you will have to create GitHub Secrets for the above environment variables (except for the local integration ones).
Run the setup script (this will install dependencies, run tests and checks):
```
make setup
```

Running Python Scripts Locally

With uv managing the environment, running your scripts is clean and consistent. Here's how to start:

Activate the Python virtual environment:
```
source .venv/bin/activate
```
Set the PYTHONPATH environment variable to the current directory:
```
export PYTHONPATH=$(pwd)
```
Point uv to your local .env file so that environment variables are available to running scripts:
```
export UV_ENV_FILE=.env
```
Run Python scripts or tests using uv run:
```
uv run src/lambdas/extract_lambda.py
```
Example on how to run tests:
```
uv run pytest
```

Available Makefile Commands

Use these main commands for common tasks:

Command	Description
`make setup`	Complete installation and validation (sync, build, checks)
`make test`	Run all tests
`make fix`	Run formatter and linter
`make safe`	Run security scans
`make help`	Show all available make commands

For a full list of commands and their description, run:

make help

Name		Name	Last commit message	Last commit date
Latest commit History 104 Commits
.github		.github
.vscode		.vscode
notebooks		notebooks
sh_scripts		sh_scripts
sql		sql
src		src
terraform		terraform
tests		tests
vendored		vendored
.env.example		.env.example
.gitignore		.gitignore
.python-version		.python-version
Makefile		Makefile
README.md		README.md
build_extract_lambda_zip.sh		build_extract_lambda_zip.sh
build_lambda_layer.sh		build_lambda_layer.sh
build_load_lambda_zip.sh		build_load_lambda_zip.sh
build_transform_lambda_zip.sh		build_transform_lambda_zip.sh
clean_aws_buckets.sh		clean_aws_buckets.sh
main.py		main.py
pyproject.toml		pyproject.toml
set-tfvars.sh		set-tfvars.sh
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Royal Blue

Table of Contents

Introduction

Project Description

Requirements

Tech Stack

Installation Instructions

Running Python Scripts Locally

Available Makefile Commands

About

Uh oh!

Uh oh!

Contributors 5

Uh oh!

Languages

theorib/royal-blue

Folders and files

Latest commit

History

Repository files navigation

Royal Blue

Table of Contents

Introduction

Project Description

Requirements

Tech Stack

Installation Instructions

Running Python Scripts Locally

Available Makefile Commands

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Uh oh!

Contributors 5

Uh oh!

Languages