GitOps with PloyD AI Gateway

GitOps Overview

GitOps is a core component of the PloyD AI Gateway platform, providing declarative infrastructure management, automated deployments, and configuration drift detection for AI model serving at scale.

🎯 What is GitOps?

GitOps is an operational framework that takes DevOps best practices used for application development such as version control, collaboration, compliance, and CI/CD, and applies them to infrastructure automation.

Key Benefits

🔄

Declarative Deployments

Git as single source of truth for all configurations

🚀

Automated CI/CD

Multi-environment promotion with automated validation

🔒

Security & Compliance

Policy enforcement and audit trails built-in

📊

Full Observability

Comprehensive monitoring and alerting

GitOps Architecture

GitOps Layer

YAML Configs

ArgoCD/Flux

CI/CD Pipeline

↓

PloyD SDK (Core Engine)

ModelServing

AIGateway

ModelRegistry

Security

Monitoring

↓

Infrastructure Layer

Kubernetes

Cloud APIs

Databases

GitOps Flow

1

Developer Commits

YAML configuration changes

→

2

CI/CD Validation

Automated testing & security scanning

→

3

ArgoCD Sync

Detects changes & deploys

→

4

Model Serving

AI models deployed & monitored

SDK Integration

✅ GitOps Uses PloyD SDK Internally

The GitOps implementation is built on top of the PloyD SDK as its core engine. GitOps acts as a declarative layer that translates YAML configurations into SDK API calls.

Translation Examples

GitOps YAML Configuration

apiVersion: ployd.ai/v1
kind: ModelDeployment
metadata:
  name: llama-7b-chat-v2-1
spec:
  model:
    framework: "vllm"
    path: "s3://models/llama-7b-chat-v2.1"
  resources:
    gpu:
      count: 2
      type: "nvidia-tesla-v100"
  scaling:
    minReplicas: 1
    maxReplicas: 5

Translates to

→

PloyD SDK API Call

# Internal GitOps script uses PloyD SDK
from ployd import ModelServing

model_serving = ModelServing()

deployment = await model_serving.deploy_model(
    name="llama-7b-chat-v2-1",
    model_path="s3://models/llama-7b-chat-v2.1",
    framework="vllm",
    gpu_count=2,
    min_replicas=1,
    max_replicas=5,
    framework_config={
        "vllm": {
            "tensor_parallel_size": 2,
            "gpu_memory_utilization": 0.9
        }
    }
)

SDK Components Used by GitOps

SDK Component	GitOps Usage	Purpose
`ModelServing`	`deploy_model()`, `scale_model()`	Deploy and manage model instances
`AIGateway`	`create_route()`, `update_route()`	Configure intelligent routing
`ModelRegistry`	`register_model()`, `track_deployment()`	Model lifecycle management
`Security`	`create_policy()`, `apply_rbac()`	Security and access control
`Monitoring`	`setup_alerts()`, `track_metrics()`	Observability and monitoring

Setup & Configuration

Prerequisites

✓ Kubernetes cluster (v1.24+)

✓ kubectl configured and connected

✓ Helm 3.x installed

✓ Git repository for configurations

✓ PloyD AI Gateway SDK installed

One-Command Setup

# Setup GitOps infrastructure
./scripts/gitops-setup.sh \
  --tool argocd \
  --environment production \
  --repo https://github.com/company/ployd-config

# This will:
# ✅ Install ArgoCD or Flux
# ✅ Configure repositories
# ✅ Set up monitoring
# ✅ Apply security policies
# ✅ Create initial applications

Repository Structure

ployd-platform-config/
├── environments/
│   ├── dev/
│   │   ├── applications/
│   │   ├── infrastructure/
│   │   └── policies/
│   ├── staging/
│   └── production/
├── clusters/
│   ├── on-premise/
│   ├── aws-us-west-2/
│   ├── azure-eastus/
│   └── gcp-us-central1/
└── shared/
    ├── monitoring/
    ├── security/
    └── networking/

Model Deployment with GitOps

Step 1: Register Model

# Register model in PloyD registry
from ployd import ModelRegistry

registry = ModelRegistry()

model_id = await registry.register_model(
    name="llama-7b-chat",
    version="v2.1",
    model_path="s3://company-models/llama-7b-chat-v2.1",
    framework="vllm",
    metadata={
        "accuracy": 0.92,
        "gpu_memory": "14GB",
        "tags": ["production-ready"]
    }
)

Step 2: Create GitOps Configuration

# gitops/models/llama-7b-chat-deployment.yaml
apiVersion: ployd.ai/v1
kind: ModelDeployment
metadata:
  name: llama-7b-chat-v2-1
  namespace: ployd-ai-gateway
  labels:
    app.kubernetes.io/name: llama-7b-chat
    app.kubernetes.io/version: v2.1
    model.ployd.ai/framework: vllm
spec:
  # Model Registry Integration
  modelRegistry:
    modelId: "model_123456"
    version: "v2.1"
    
  # Model Configuration
  model:
    name: "llama-7b-chat"
    version: "v2.1"
    framework: "vllm"
    path: "s3://company-models/llama-7b-chat-v2.1"
    parameters:
      max_tokens: 2048
      temperature: 0.7
    vllm:
      tensor_parallel_size: 2
      max_model_len: 4096
      gpu_memory_utilization: 0.9
      
  # Resource Requirements
  resources:
    gpu:
      count: 2
      type: "nvidia-tesla-v100"
    cpu:
      requests: "4000m"
      limits: "8000m"
    memory:
      requests: "16Gi"
      limits: "32Gi"
      
  # Scaling Configuration
  scaling:
    minReplicas: 1
    maxReplicas: 5
    targetGPUUtilization: 70
    
  # Health Checks
  health:
    livenessProbe:
      httpGet:
        path: /health
        port: 8080
      initialDelaySeconds: 60
    readinessProbe:
      httpGet:
        path: /ready
        port: 8080
      initialDelaySeconds: 30

Step 3: Commit and Deploy

# Commit configuration to Git
git add gitops/models/llama-7b-chat-deployment.yaml
git commit -m "Deploy llama-7b-chat v2.1"
git push origin main

# ArgoCD automatically:
# 1. Detects Git changes
# 2. Validates configuration
# 3. Calls PloyD SDK to deploy model
# 4. Monitors deployment health
# 5. Reports status

Gateway Routing Configuration

Route Configuration

# gitops/gateway/routes.yaml
apiVersion: ployd.ai/v1
kind: GatewayRoute
metadata:
  name: chat-api-v2
  namespace: ployd-ai-gateway
spec:
  # Route Configuration
  path: "/v2/chat"
  methods: ["POST", "OPTIONS"]
  
  # Model Backends with Traffic Splitting
  models:
    - name: "llama-7b-chat-v2-1"
      service: "llama-7b-chat-v2-1.ployd-ai-gateway.svc.cluster.local"
      port: 80
      weight: 90  # 90% traffic to new version
      priority: 1
      
    - name: "llama-7b-chat-v2-0"
      service: "llama-7b-chat-v2-0.ployd-ai-gateway.svc.cluster.local"
      port: 80
      weight: 10  # 10% traffic to old version (fallback)
      priority: 2
      fallback: true
  
  # Routing Strategy
  routing:
    strategy: "weighted_round_robin"
    timeout: "30s"
    retries: 3
    
  # Rate Limiting
  rateLimit:
    enabled: true
    global:
      rpm: 10000
      burst: 1000
    perClient:
      rpm: 1000
      burst: 100
      
  # Authentication
  authentication:
    required: true
    methods: ["api_key", "jwt"]
    
  # Circuit Breaker
  circuitBreaker:
    enabled: true
    failureThreshold: 5
    recoveryTimeout: "30s"
    
  # Monitoring
  monitoring:
    enabled: true
    metrics:
      - "request_latency"
      - "request_throughput"
      - "error_rate"

Advanced Routing Features

🎯 Intelligent Routing

Latency-based routing
Cost-optimized routing
Load balancing strategies

🔄 Traffic Management

Weighted traffic splitting
Canary deployments
Blue-green deployments

🛡️ Resilience

Circuit breakers
Automatic failover
Health checks

📊 Observability

Real-time metrics
Distributed tracing
Custom dashboards

CI/CD Workflows

GitHub Actions Pipeline

# .github/workflows/gitops.yml
name: GitOps - PloyD Platform

on:
  push:
    branches: [main, develop]
    paths:
      - 'gitops/**'
      - 'infrastructure/**'

jobs:
  validate-gitops:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      
      - name: Validate Kubernetes manifests
        run: |
          find gitops/ -name "*.yaml" | xargs kubeval
      
      - name: Security scanning
        uses: aquasecurity/trivy-action@master
        with:
          scan-type: 'fs'
          scan-ref: 'gitops/'

  deploy-production:
    needs: validate-gitops
    if: github.ref == 'refs/heads/main'
    environment: production
    runs-on: ubuntu-latest
    steps:
      - name: Deploy via PloyD SDK
        run: |
          python scripts/deploy-models-gitops.py \
            --environment production \
            --config-path gitops/

Environment Promotion

Development

Branch: develop

Auto Deploy: ✅ Yes

Validation: Basic

Resources: Minimal

→

Staging

Branch: main

Auto Deploy: ✅ Yes

Validation: Full

Resources: Production-like

→

Production

Branch: main

Auto Deploy: ❌ Manual

Validation: Comprehensive

Resources: Full

Monitoring & Observability

GitOps-Specific Alerts

# Prometheus AlertManager Rules
groups:
  - name: gitops
    rules:
      - alert: GitOpsAppOutOfSync
        expr: argocd_app_info{sync_status!="Synced"} == 1
        for: 15m
        labels:
          severity: warning
        annotations:
          summary: "GitOps application {{ $labels.name }} is out of sync"
      
      - alert: GitOpsAppUnhealthy
        expr: argocd_app_info{health_status!="Healthy"} == 1
        for: 10m
        labels:
          severity: critical
        annotations:
          summary: "GitOps application {{ $labels.name }} is unhealthy"

Model Performance Monitoring

📈 Model Performance

Request latency and throughput
Model accuracy and drift detection
GPU utilization and memory usage
Error rates and failure patterns

🏗️ Infrastructure Health

Kubernetes pod status and restarts
Node resource utilization
Network connectivity and latency
Storage performance and capacity

🔄 GitOps Operations

Deployment frequency and success rate
Configuration drift detection
Sync status and health checks
Rollback frequency and causes

Security & Compliance

Policy as Code

# policies/security/pod-security.rego
package kubernetes.admission

deny[msg] {
  input.request.kind.kind == "Pod"
  input.request.object.spec.containers[_].securityContext.runAsRoot == true
  msg := "Containers must not run as root"
}

deny[msg] {
  input.request.kind.kind == "Pod"
  not input.request.object.spec.containers[_].securityContext.readOnlyRootFilesystem
  msg := "Containers must use read-only root filesystem"
}

Security Features

🔐 Access Control

Git-based access control
RBAC for deployment permissions
Audit trail for all changes
Signed commits verification

🛡️ Model Security

Model artifact scanning
Encrypted storage and transmission
Runtime security monitoring
Compliance policy enforcement

🏗️ Infrastructure Security

Network policies for isolation
Secret management for API keys
Container image security scanning
Runtime security monitoring

Canary Deployments

Canary Configuration

# gitops/canary/llama-7b-chat-canary.yaml
apiVersion: ployd.ai/v1
kind: CanaryDeployment
metadata:
  name: llama-7b-chat-canary
spec:
  stable:
    model: "llama-7b-chat-v2-0"
    replicas: 3
  canary:
    model: "llama-7b-chat-v2-1"
    replicas: 1
  traffic:
    canaryWeight: 10  # Start with 10% traffic
    maxWeight: 100
    stepWeight: 10    # Increase by 10% each step
    interval: "5m"    # Wait 5 minutes between steps
  analysis:
    metrics:
    - name: "success_rate"
      threshold: 99.5
    - name: "latency_p95"
      threshold: 200
    - name: "error_rate"
      threshold: 0.1
    failureThreshold: 3
    successThreshold: 5

Canary Process

1

Deploy with 0% Traffic

New model version deployed but receives no traffic

2

Health Validation

Run health checks and validation tests

3

Gradual Traffic Increase

5% → 25% → 50% → 100% traffic progression

4

Metrics Monitoring

Monitor performance and error rates

5

Automatic Decision

Promote or rollback based on metrics

Disaster Recovery

Backup Strategy

📊 Database Backups

Automated backups with point-in-time recovery

Frequency: Hourly Retention: 30 days

🤖 Model Artifacts

Cross-region replication of model files

Frequency: Daily Retention: 90 days

⚙️ Configuration

GitOps repository with version control

Method: Git Retention: Unlimited

Recovery Procedures

Scenario	RTO	RPO	Procedure
Pod Failure	< 1 min	0	Kubernetes auto-restart
Node Failure	< 5 min	0	Auto-scaling + pod rescheduling
AZ Failure	< 15 min	< 1 min	Multi-AZ deployment
Region Failure	< 1 hour	< 15 min	Cross-region failover

Best Practices

Repository Management

📁

Separate Repositories

Use separate repos for infrastructure, platform config, and applications

🔒

Branch Protection

Enforce branch protection rules for main/production branches

👥

Required Reviews

Mandate code reviews for all configuration changes

🧪

Automated Testing

Run validation tests before merging changes

Table of Contents

Fundamentals

Implementation

Operations

Advanced

GitOps Overview

🎯 What is GitOps?

Key Benefits

Declarative Deployments

Automated CI/CD

Security & Compliance

Full Observability

GitOps Architecture

GitOps Layer

PloyD SDK (Core Engine)

Infrastructure Layer

GitOps Flow

Developer Commits

CI/CD Validation

ArgoCD Sync

Model Serving

SDK Integration

✅ GitOps Uses PloyD SDK Internally

Translation Examples

GitOps YAML Configuration

PloyD SDK API Call

SDK Components Used by GitOps

Setup & Configuration

Prerequisites

One-Command Setup

Repository Structure

Model Deployment with GitOps

Step 1: Register Model

Step 2: Create GitOps Configuration

Step 3: Commit and Deploy

Gateway Routing Configuration

Route Configuration

Advanced Routing Features

🎯 Intelligent Routing

🔄 Traffic Management

🛡️ Resilience

📊 Observability

CI/CD Workflows

GitHub Actions Pipeline

Environment Promotion

Development

Staging

Production

Monitoring & Observability

GitOps-Specific Alerts

Model Performance Monitoring

📈 Model Performance

🏗️ Infrastructure Health

🔄 GitOps Operations

Security & Compliance

Policy as Code

Security Features

🔐 Access Control

🛡️ Model Security

🏗️ Infrastructure Security

Canary Deployments

Canary Configuration

Canary Process

Deploy with 0% Traffic

Health Validation

Gradual Traffic Increase

Metrics Monitoring

Automatic Decision

Disaster Recovery

Backup Strategy

📊 Database Backups

🤖 Model Artifacts

⚙️ Configuration

Recovery Procedures

Best Practices

Repository Management

Separate Repositories

Branch Protection

Required Reviews