kgraph-mcp-agent-platform / CI_CD_DEPLOYMENT_PLAN.md
BasalGanglia's picture
πŸ† Multi-Track Hackathon Submission
1f2d50a verified
# KGraph-MCP CI/CD & Deployment Plan
## 🎯 Overview
This document outlines the comprehensive CI/CD pipeline and deployment strategy for the KGraph-MCP project, implementing a robust development workflow with separate dev, staging, and production environments.
## πŸ“Š Git Flow Strategy
### Branch Structure
- **`main`** - Production-ready code only, protected branch
- **`develop`** - Integration branch for features, auto-deploys to dev
- **`release/*`** - Release candidates, deploys to staging
- **`feature/*`** - Individual feature branches (PR to develop)
- **`hotfix/*`** - Emergency fixes (PR to main and develop)
### Workflow Rules
1. All development work starts from `develop`
2. Feature branches follow pattern: `feat/<task-id>_<description>`
3. PRs target `develop` branch (never main directly)
4. Release branches created from `develop` for staging
5. Only release branches can merge to `main`
6. Hotfixes are the only exception for direct main access
## πŸš€ Environment Architecture
### 1. Development Environment
- **URL**: `dev.kgraph-mcp.example.com`
- **Purpose**: Latest integrated features, continuous deployment
- **Database**: PostgreSQL dev instance
- **Features**:
- Auto-deploy on develop branch updates
- Debug mode enabled
- Verbose logging
- Test data seeding
### 2. Staging Environment
- **URL**: `staging.kgraph-mcp.example.com`
- **Purpose**: Pre-production testing, UAT
- **Database**: PostgreSQL staging (production-like data)
- **Features**:
- Deploy from release branches
- Production-like configuration
- Performance testing enabled
- Integration testing suite
### 3. Production Environment
- **URL**: `api.kgraph-mcp.example.com`
- **Purpose**: Live production system
- **Database**: PostgreSQL production (with backups)
- **Features**:
- Deploy only from main branch tags
- High availability setup
- Monitoring and alerting
- Automated backups
## πŸ”§ CI/CD Pipeline Configuration
### GitHub Actions Workflows
#### 1. Continuous Integration (`.github/workflows/ci.yml`)
```yaml
name: Continuous Integration
on:
pull_request:
branches: [develop, main]
push:
branches: [develop]
jobs:
test:
runs-on: ubuntu-latest
strategy:
matrix:
python-version: ["3.11", "3.12"]
steps:
- uses: actions/checkout@v4
- name: Set up Python
uses: actions/setup-python@v5
with:
python-version: ${{ matrix.python-version }}
- name: Install dependencies
run: |
pip install uv
uv pip install -r requirements.txt
uv pip install -r requirements-dev.txt
- name: Run linting
run: |
ruff check .
black --check .
- name: Run type checking
run: mypy .
- name: Run tests
run: |
pytest tests/ -v --cov=. --cov-report=xml
- name: Upload coverage
uses: codecov/codecov-action@v3
with:
file: ./coverage.xml
```
#### 2. Deploy to Dev (`.github/workflows/deploy-dev.yml`)
```yaml
name: Deploy to Development
on:
push:
branches: [develop]
workflow_dispatch:
jobs:
deploy:
runs-on: ubuntu-latest
environment: development
steps:
- uses: actions/checkout@v4
- name: Build Docker image
run: |
docker build -t kgraph-mcp:dev-${{ github.sha }} .
- name: Deploy to Dev
env:
DEPLOY_KEY: ${{ secrets.DEV_DEPLOY_KEY }}
run: |
# Deploy script for dev environment
./scripts/deploy.sh dev ${{ github.sha }}
```
#### 3. Deploy to Staging (`.github/workflows/deploy-staging.yml`)
```yaml
name: Deploy to Staging
on:
push:
branches: ['release/*']
workflow_dispatch:
inputs:
version:
description: 'Release version'
required: true
jobs:
deploy:
runs-on: ubuntu-latest
environment: staging
steps:
- uses: actions/checkout@v4
- name: Run integration tests
run: |
pytest tests/integration/ -v
- name: Build Docker image
run: |
docker build -t kgraph-mcp:staging-${{ github.sha }} .
- name: Deploy to Staging
env:
DEPLOY_KEY: ${{ secrets.STAGING_DEPLOY_KEY }}
run: |
./scripts/deploy.sh staging ${{ github.sha }}
```
#### 4. Deploy to Production (`.github/workflows/deploy-prod.yml`)
```yaml
name: Deploy to Production
on:
push:
tags:
- 'v*'
workflow_dispatch:
inputs:
tag:
description: 'Release tag'
required: true
jobs:
deploy:
runs-on: ubuntu-latest
environment: production
steps:
- uses: actions/checkout@v4
with:
ref: ${{ github.event.inputs.tag || github.ref }}
- name: Validate release
run: |
# Ensure we're deploying from main
git branch --contains ${{ github.sha }} | grep -q main
- name: Build Docker image
run: |
docker build -t kgraph-mcp:prod-${{ github.ref_name }} .
- name: Deploy to Production
env:
DEPLOY_KEY: ${{ secrets.PROD_DEPLOY_KEY }}
run: |
./scripts/deploy.sh prod ${{ github.ref_name }}
- name: Create GitHub Release
uses: softprops/action-gh-release@v1
with:
generate_release_notes: true
```
## 🐳 Docker Configuration
### Multi-stage Dockerfile
```dockerfile
# Build stage
FROM python:3.11-slim as builder
WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir uv && \
uv pip install --system -r requirements.txt
# Runtime stage
FROM python:3.11-slim
WORKDIR /app
COPY --from=builder /usr/local/lib/python3.11/site-packages /usr/local/lib/python3.11/site-packages
COPY . .
# Environment-specific configuration
ARG ENVIRONMENT=production
ENV ENVIRONMENT=${ENVIRONMENT}
EXPOSE 8000
CMD ["uvicorn", "api.main:app", "--host", "0.0.0.0", "--port", "8000"]
```
## πŸ” Environment Configuration
### Environment Variables Structure
```bash
# Common across all environments
APP_NAME=kgraph-mcp
LOG_LEVEL=INFO
# Environment-specific
# Development
DATABASE_URL=postgresql://dev_user:dev_pass@localhost/kgraph_dev
REDIS_URL=redis://localhost:6379/0
DEBUG=true
# Staging
DATABASE_URL=postgresql://staging_user:staging_pass@staging-db/kgraph_staging
REDIS_URL=redis://staging-redis:6379/0
DEBUG=false
# Production
DATABASE_URL=postgresql://prod_user:${PROD_DB_PASS}@prod-db/kgraph_prod
REDIS_URL=redis://prod-redis:6379/0
DEBUG=false
SENTRY_DSN=${SENTRY_DSN}
```
## πŸ“‹ Deployment Checklist
### Pre-deployment Steps
- [ ] All tests passing in CI
- [ ] Code review approved
- [ ] Documentation updated
- [ ] Database migrations prepared
- [ ] Environment variables configured
- [ ] Security scan completed
### Deployment Process
1. **Feature to Dev**
- Merge PR to develop
- Auto-deploy triggered
- Smoke tests run
2. **Dev to Staging**
- Create release branch
- Update version numbers
- Deploy to staging
- Run full test suite
3. **Staging to Production**
- Create PR from release to main
- Final approval required
- Tag release
- Deploy to production
- Monitor metrics
### Post-deployment Steps
- [ ] Verify deployment health
- [ ] Check monitoring dashboards
- [ ] Run smoke tests
- [ ] Update status page
- [ ] Notify stakeholders
## πŸ” Monitoring & Observability
### Metrics Collection
- **Application Metrics**: Prometheus + Grafana
- **Logs**: ELK Stack (Elasticsearch, Logstash, Kibana)
- **APM**: Sentry for error tracking
- **Uptime**: StatusPage or equivalent
### Key Metrics to Monitor
- API response times
- Error rates
- Database query performance
- Knowledge graph query latency
- Memory and CPU usage
- Active connections
## πŸ”„ Rollback Strategy
### Automated Rollback Triggers
- Health check failures (3 consecutive)
- Error rate > 5% for 5 minutes
- Response time > 2s p95 for 10 minutes
### Manual Rollback Process
```bash
# Quick rollback to previous version
just rollback-prod
# Specific version rollback
just deploy-prod v1.2.3
```
## πŸ›‘οΈ Security Considerations
### CI/CD Security
- Secrets stored in GitHub Secrets
- Environment-specific deploy keys
- Branch protection rules enforced
- Required reviews for production
### Runtime Security
- Container scanning in CI
- Dependency vulnerability checks
- OWASP security headers
- Rate limiting enabled
- API authentication required
## πŸ“… Release Schedule
### Regular Releases
- **Dev**: Continuous (on merge)
- **Staging**: Weekly (Wednesdays)
- **Production**: Bi-weekly (every other Friday)
### Hotfix Process
1. Create hotfix branch from main
2. Fix issue with tests
3. PR to main and develop
4. Emergency deploy to production
5. Verify fix in all environments
## 🎯 Success Metrics
### Deployment Success Criteria
- Zero-downtime deployments
- < 5 minute deployment time
- 99.9% deployment success rate
- < 1% rollback rate
### Performance Targets
- API latency < 100ms p50
- Knowledge graph queries < 500ms p95
- 99.95% uptime SLA
- < 0.1% error rate
## πŸ“š Additional Resources
- [Deployment Scripts](./scripts/deploy.sh)
- [Environment Setup Guide](./docs/deployment/ENVIRONMENT_SETUP.md)
- [Troubleshooting Guide](./docs/deployment/TROUBLESHOOTING.md)
- [Disaster Recovery Plan](./docs/deployment/DISASTER_RECOVERY.md)
---
*This CI/CD plan ensures reliable, secure, and efficient deployment of the KGraph-MCP system across all environments.*