About Me
I’m an Analytics Engineering professional based in New Zealand, specializing in building scalable data pipelines, transforming raw data into actionable insights, and enabling data-driven decision making through modern data stack solutions.
Currently at Canva in Data & Platform Engineering - Scaling warehouse infrastructure and data platforms:
Core Focus: Snowflake optimization • dbt pipeline management • Data architecture (Iceberg, Semantic Layer, Cortex Intelligence)
Engineering Practices: CI/CD automation • Observability • Pipeline orchestration • Stakeholder enablement
Skills & Tech Stack
Languages: Python • SQL • R • Rust
Data Engineering: dbt • Snowflake • Fivetran • Apache Spark • Airflow • PostgreSQL
Machine Learning: Scikit-Learn • Pandas • MLflow • Weights & Biases
Cloud & DevOps: AWS • Terraform • Docker • Kubernetes • CI/CD
Tools: Git • Looker • Jupyter • Power BI • FastAPI • Flask
ML & Data Engineering Portfolio
dbt Automated Data Pipeline
Modern data stack implementation combining dbt Core for SQL transformations with Meltano for ELT orchestration. Ingests cryptocurrency API data into PostgreSQL, featuring multi-layer transformations and automated GitHub Actions workflows.
View ProjectMachine Learning Model Deployment to the Cloud
Learn how to deploy and test a trained salary prediction model to the cloud using a Flask web API/endpoint and Heroku platform.
View ProjectAWS RDS Automated Setup & Power BI Dashboard
Discover how to set up an AWS RDS Postgres instance using Terraform, deploy a database, and connect it to build a comprehensive Power BI dashboard.
View ProjectDisaster Response NLP Pipeline
Build an NLP pipeline that classifies disaster messages into categories to help relief organizations quickly identify and prioritize emergency responses. Features Flask web app deployment on Azure.
View ProjectNYC Rental Price Prediction ML Pipeline
Deploy an end-to-end machine learning pipeline for predicting NYC short-term rental prices using MLOps best practices with MLflow, Weights & Biases, and Hydra for experiment tracking and artifact management.
View ProjectDynamic Customer Churn Risk Assessment Pipeline
Automated ML workflow that continuously computes customer attrition risk with new data, featuring automated retraining, deployment, and monitoring through Flask API endpoints and cron-based orchestration.
View ProjectCensus Bureau Salary Classification Pipeline
Complete CI/CD pipeline for deploying an ML salary classifier using GitHub Actions, DVC for data versioning, FastAPI for serving predictions, and automated Heroku deployment with testing and linting.
View ProjectData Science Portfolio
Market Basket Analysis of Instacart Dataset
Explore how market basket analysis using Association Rules and the Apriori algorithm can identify customer purchase patterns and provide personalized item recommendations.
View ProjectPredicting Operational Status of Water Assets
Investigate the application of classification models to predict the operational state (functional, repair needed, non-functional) of water pumps in Tanzania.
View ProjectWholesale Distributor Customer Spending Analysis
Examine the use of unsupervised learning techniques (PCA, Gaussian Mixture Model) to analyze customer spending patterns and optimize a new delivery service.
View Project