MlOps
Architecture
In this project, I built a full-stack MLOps pipeline using AWS services and modern DevOps practices, integrating CI/CD pipelines, GitOps workflows, and Kubernetes deployment. Here’s a deep dive into how the whole system works, from data preprocessing to model deployment, all automated and infrastructure-as-code.
馃殌 Overview
The pipeline focuses on a linear regression model trained on the California Housing dataset. It includes:
- AWS Glue for preprocessing
- EventBridge & Lambda for orchestration
- SageMaker for training and deployment
- Terraform for Infrastructure as Code
- GitHub Actions for CI/CD and GitOps
- Kubernetes (EKS) for frontend deployment
- Streamlit as a user-facing interface

馃П System Architecture
馃攧 ETL Pipeline with AWS Glue
- Raw housing data is uploaded to an S3 bucket.
- EventBridge detects the new file and triggers a Lambda function.
- Lambda starts a Glue workflow that runs a Python job to clean and transform the data.
- The processed data is saved to another S3 bucket for training.
馃 MLOps with SageMaker Pipelines
- A second EventBridge rule detects the new preprocessed data and triggers a SageMaker pipeline.
- This pipeline:
- Executes additional transformations
- Trains a regression model using XGBoost
- Registers the model in SageMaker Model Registry
- Deploys it as a real-time endpoint
馃寪 Model Serving and API
Once deployed, the model exposes a SageMaker endpoint. A lightweight REST API serves predictions via HTTP.
馃И CI/CD for Frontend (Streamlit)
Every push to the frontend/ directory:
- Builds a Docker image via GitHub Actions
- Pushes the image to Amazon ECR
- Updates a Kubernetes deployment on EKS
- Exposes the app via an Ingress controller
This enables real-time user interaction and visualization.
鈿欙笍 Infrastructure Automation with Terraform
All AWS resources, Glue, Lambda, S3, SageMaker, IAM, EventBridge, are managed via Terraform. The infrastructure code is modular, reusable, and version-controlled.
馃寑 GitOps for IaC
Changes pushed to the iac/ directory:
- Trigger
terraform planandterraform applyvia CI/CD - Apply changes to the infrastructure using AWS credentials
- Enforce full GitOps compliance and reproducibility
鈽革笍 Kubernetes Deployment for Frontend
Frontend deployment leverages Kubernetes for scalable serving:
- A Deployment pod runs the Dockerized Streamlit app.
- A Service exposes it internally.
- An Ingress route exposes it externally (e.g., via NGINX).
This structure allows secure, modular, and scalable user-facing components.
馃搱 Benefits of the Architecture
- Scalable: Serverless data processing and container-based serving
- Fully automated: CI/CD and GitOps reduce manual steps
- Repeatable: Terraform ensures consistent provisioning
- User-friendly: Streamlit frontend enables easy interaction
- Modular: Each stage can be reused or replaced independently
馃敭 Future Work
- Add model monitoring (e.g., drift detection)
- Introduce approval gates before production deployment
- Expand to multi-model deployment patterns
- Integrate with ArgoCD for advanced GitOps
馃搸 References
If you’re looking to build production-grade MLOps pipelines or want to explore automation with AWS and Kubernetes, feel free to fork the project repository or drop your questions.
Stay open. Stay automated. 馃殌