AIGC Platform: DeepSeek & Stable Diffusion Production Deployment

Production Environment Architecture Overview

This document provides a complete deployment solution for enterprise AIGC platforms, suitable for production environments used by internal employees, with focus on:

  • High Availability: Multi-instance load balancing, automatic fault recovery
  • Security: Identity authentication, access control, data encryption
  • Scalability: Horizontal scaling, elastic resource scaling
  • Monitoring & Alerting: Comprehensive system monitoring and anomaly alerts
  • Compliance: Data privacy protection, audit logs

System Architecture

┌─────────────────┐    ┌─────────────────┐    ┌─────────────────┐
│   Load Balancer │    │   API Gateway   │    │   Auth Service  │
│   (Nginx/HAProxy)│    │   (Kong/Traefik) │    │   (Keycloak)    │
└─────────────────┘    └─────────────────┘    └─────────────────┘
         │                       │                       │
         └───────────────────────┼───────────────────────┘
                                 │
         ┌───────────────────────┼───────────────────────┐
         │                       │                       │
┌─────────────────┐    ┌─────────────────┐    ┌─────────────────┐
│  DeepSeek API   │    │ Stable Diffusion│    │   Monitoring    │
│   (Multiple)    │    │   API (Multiple)│    │   (Prometheus)  │
└─────────────────┘    └─────────────────┘    └─────────────────┘
         │                       │                       │
         └───────────────────────┼───────────────────────┘
                                 │
                    ┌─────────────────┐
                    │   Storage Layer │
                    │ (Redis + MinIO) │
                    └─────────────────┘

2. Production Environment Infrastructure Preparation

2.1 Hardware Requirements

Minimum Configuration (Development/Test Environment):

  • CPU: 16-core AMD EPYC or Intel Xeon
  • Memory: 64GB DDR4 ECC
  • GPU: 2x NVIDIA RTX 4090 (24GB VRAM) or 1x NVIDIA A100 (40GB VRAM)
  • Storage: 2TB NVMe SSD (RAID 1)
  • Network: 10Gbps network interface

Production Environment Recommended Configuration:

  • CPU: 32-core AMD EPYC or Intel Xeon
  • Memory: 128GB DDR4 ECC
  • GPU: 4x NVIDIA A100 (40GB VRAM) or 8x NVIDIA RTX 4090
  • Storage: 4TB NVMe SSD (RAID 10) + 10TB HDD (backup)
  • Network: 25Gbps network interface

2.2 Software Environment

  • Operating System: Ubuntu 22.04 LTS (Long Term Support version)
  • Container Platform: Docker 24.0+ or Kubernetes 1.28+
  • Orchestration Tools: Docker Compose or Helm
  • Database: PostgreSQL 15+ (user management) + Redis 7+ (caching)
  • Monitoring: Prometheus + Grafana + AlertManager
  • Logging: ELK Stack (Elasticsearch + Logstash + Kibana)
  • Security: Vault (key management) + OpenVPN (VPN access)

2.3 Network Security Requirements

  • Firewall: UFW or iptables configuration
  • SSL/TLS: Let’s Encrypt or enterprise certificates
  • VPN: OpenVPN or WireGuard
  • Access Control: Role-based access control (RBAC)
  • Audit Logs: Complete operation audit records

3. Security Configuration and Identity Authentication

3.1 System Security Hardening

# System update and security hardening
sudo apt update && sudo apt upgrade -y
sudo apt install -y ufw fail2ban unattended-upgrades

# Configure firewall
sudo ufw default deny incoming
sudo ufw default allow outgoing
sudo ufw allow ssh
sudo ufw allow 443/tcp  # HTTPS
sudo ufw allow 80/tcp   # HTTP (redirect to HTTPS)
sudo ufw enable

# Configure automatic security updates
sudo dpkg-reconfigure -plow unattended-upgrades

3.2 Identity Authentication System (Keycloak)

# docker-compose.auth.yml
version: "3.8"
services:
  postgres:
    image: postgres:15
    environment:
      POSTGRES_DB: keycloak
      POSTGRES_USER: keycloak
      POSTGRES_PASSWORD: ${POSTGRES_PASSWORD}
    volumes:
      - postgres_data:/var/lib/postgresql/data
    networks:
      - auth_network

  keycloak:
    image: quay.io/keycloak/keycloak:23.0
    environment:
      KC_DB: postgres
      KC_DB_URL: jdbc:postgresql://postgres:5432/keycloak
      KC_DB_USERNAME: keycloak
      KC_DB_PASSWORD: ${POSTGRES_PASSWORD}
      KEYCLOAK_ADMIN: admin
      KEYCLOAK_ADMIN_PASSWORD: ${KEYCLOAK_ADMIN_PASSWORD}
    command: start-dev
    ports:
      - "8080:8080"
    depends_on:
      - postgres
    networks:
      - auth_network

volumes:
  postgres_data:

networks:
  auth_network:
    driver: bridge

3.3 API Gateway Configuration (Kong)

# docker-compose.gateway.yml
version: "3.8"
services:
  kong:
    image: kong:3.4
    environment:
      KONG_DATABASE: postgres
      KONG_PG_HOST: kong-database
      KONG_PG_USER: kong
      KONG_PG_PASSWORD: ${KONG_PG_PASSWORD}
      KONG_PROXY_ACCESS_LOG: /dev/stdout
      KONG_ADMIN_ACCESS_LOG: /dev/stdout
      KONG_PROXY_ERROR_LOG: /dev/stderr
      KONG_ADMIN_ERROR_LOG: /dev/stderr
      KONG_ADMIN_LISTEN: 0.0.0.0:8001
      KONG_ADMIN_GUI_URL: http://localhost:8002
    ports:
      - "8000:8000"
      - "8443:8443"
      - "8001:8001"
      - "8444:8444"
    depends_on:
      - kong-database
    networks:
      - gateway_network

  kong-database:
    image: postgres:15
    environment:
      POSTGRES_USER: kong
      POSTGRES_DB: kong
      POSTGRES_PASSWORD: ${KONG_PG_PASSWORD}
    volumes:
      - kong_data:/var/lib/postgresql/data
    networks:
      - gateway_network

volumes:
  kong_data:

networks:
  gateway_network:
    driver: bridge

4. High Availability Configuration

4.1 Load Balancer (HAProxy)

# Install HAProxy
sudo apt install -y haproxy

# Configure HAProxy
sudo tee /etc/haproxy/haproxy.cfg > /dev/null <<EOF
global
    log /dev/log local0
    log /dev/log local1 notice
    chroot /var/lib/haproxy
    stats socket /run/haproxy/admin.sock mode 660 level admin expose-fd listeners
    stats timeout 30s
    user haproxy
    group haproxy
    daemon

defaults
    log     global
    mode    http
    option  httplog
    option  dontlognull
    timeout connect 5000
    timeout client  50000
    timeout server  50000

frontend http_front
    bind *:80
    bind *:443 ssl crt /etc/ssl/certs/aigc-platform.pem
    redirect scheme https if !{ ssl_fc }

    # Health check
    http-request add-header X-Forwarded-Proto https if { ssl_fc }

    # Route to backend services
    use_backend deepseek_backend if { path_beg /api/deepseek }
    use_backend stable_diffusion_backend if { path_beg /api/sd }
    use_backend web_backend if { path_beg / }

backend deepseek_backend
    balance roundrobin
    option httpchk GET /health
    server deepseek1 10.0.1.10:8001 check
    server deepseek2 10.0.1.11:8001 check
    server deepseek3 10.0.1.12:8001 check

backend stable_diffusion_backend
    balance roundrobin
    option httpchk GET /health
    server sd1 10.0.1.20:7860 check
    server sd2 10.0.1.21:7860 check

backend web_backend
    balance roundrobin
    server web1 10.0.1.30:3000 check
    server web2 10.0.1.31:3000 check

listen stats
    bind *:8404
    stats enable
    stats uri /stats
    stats refresh 10s
    stats auth admin:${HAPROXY_STATS_PASSWORD}
EOF

sudo systemctl enable haproxy
sudo systemctl start haproxy

4.2 Database High Availability (PostgreSQL)

# docker-compose.db.yml
version: "3.8"
services:
  postgres-primary:
    image: postgres:15
    environment:
      POSTGRES_DB: aigc_platform
      POSTGRES_USER: aigc_user
      POSTGRES_PASSWORD: ${POSTGRES_PASSWORD}
      POSTGRES_INITDB_ARGS: "--encoding=UTF-8 --lc-collate=C --lc-ctype=C"
    volumes:
      - postgres_primary_data:/var/lib/postgresql/data
      - ./postgres/init:/docker-entrypoint-initdb.d
    ports:
      - "5432:5432"
    networks:
      - db_network

  postgres-replica:
    image: postgres:15
    environment:
      POSTGRES_DB: aigc_platform
      POSTGRES_USER: aigc_user
      POSTGRES_PASSWORD: ${POSTGRES_PASSWORD}
    volumes:
      - postgres_replica_data:/var/lib/postgresql/data
    ports:
      - "5433:5432"
    depends_on:
      - postgres-primary
    networks:
      - db_network

  redis-master:
    image: redis:7-alpine
    command: redis-server --appendonly yes --requirepass ${REDIS_PASSWORD}
    volumes:
      - redis_master_data:/data
    ports:
      - "6379:6379"
    networks:
      - db_network

  redis-slave:
    image: redis:7-alpine
    command: redis-server --slaveof redis-master 6379 --requirepass ${REDIS_PASSWORD}
    volumes:
      - redis_slave_data:/data
    ports:
      - "6380:6379"
    depends_on:
      - redis-master
    networks:
      - db_network

volumes:
  postgres_primary_data:
  postgres_replica_data:
  redis_master_data:
  redis_slave_data:

networks:
  db_network:
    driver: bridge

5. Monitoring and Alerting System

5.1 Prometheus Configuration

# docker-compose.monitoring.yml
version: "3.8"
services:
  prometheus:
    image: prom/prometheus:latest
    command:
      - "--config.file=/etc/prometheus/prometheus.yml"
      - "--storage.tsdb.path=/prometheus"
      - "--web.console.libraries=/etc/prometheus/console_libraries"
      - "--web.console.templates=/etc/prometheus/consoles"
      - "--storage.tsdb.retention.time=200h"
      - "--web.enable-lifecycle"
    ports:
      - "9090:9090"
    volumes:
      - ./prometheus/prometheus.yml:/etc/prometheus/prometheus.yml
      - prometheus_data:/prometheus
    networks:
      - monitoring_network

  grafana:
    image: grafana/grafana:latest
    environment:
      GF_SECURITY_ADMIN_PASSWORD: ${GRAFANA_PASSWORD}
    ports:
      - "3000:3000"
    volumes:
      - grafana_data:/var/lib/grafana
      - ./grafana/provisioning:/etc/grafana/provisioning
    networks:
      - monitoring_network

  alertmanager:
    image: prom/alertmanager:latest
    command:
      - "--config.file=/etc/alertmanager/alertmanager.yml"
      - "--storage.path=/alertmanager"
    ports:
      - "9093:9093"
    volumes:
      - ./alertmanager/alertmanager.yml:/etc/alertmanager/alertmanager.yml
      - alertmanager_data:/alertmanager
    networks:
      - monitoring_network

volumes:
  prometheus_data:
  grafana_data:
  alertmanager_data:

networks:
  monitoring_network:
    driver: bridge

5.2 Monitoring Configuration

# prometheus/prometheus.yml
global:
  scrape_interval: 15s
  evaluation_interval: 15s

rule_files:
  - "alert_rules.yml"

alerting:
  alertmanagers:
    - static_configs:
        - targets:
            - alertmanager:9093

scrape_configs:
  - job_name: "deepseek-api"
    static_configs:
      - targets: ["deepseek1:8001", "deepseek2:8001", "deepseek3:8001"]
    metrics_path: /metrics
    scrape_interval: 10s

  - job_name: "stable-diffusion-api"
    static_configs:
      - targets: ["sd1:7860", "sd2:7860"]
    metrics_path: /metrics
    scrape_interval: 10s

  - job_name: "node-exporter"
    static_configs:
      - targets: ["node1:9100", "node2:9100", "node3:9100"]

  - job_name: "postgres"
    static_configs:
      - targets: ["postgres-primary:5432"]
    metrics_path: /metrics

  - job_name: "redis"
    static_configs:
      - targets: ["redis-master:6379"]
    metrics_path: /metrics

5.3 Alert Rules

# prometheus/alert_rules.yml
groups:
  - name: aigc_platform_alerts
    rules:
      - alert: HighCPUUsage
        expr: 100 - (avg by(instance) (irate(node_cpu_seconds_total{mode="idle"}[5m])) * 100) > 80
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "High CPU usage on {{ $labels.instance }}"
          description: "CPU usage is above 80% for 5 minutes"

      - alert: HighMemoryUsage
        expr: (node_memory_MemTotal_bytes - node_memory_MemAvailable_bytes) / node_memory_MemTotal_bytes * 100 > 85
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "High memory usage on {{ $labels.instance }}"
          description: "Memory usage is above 85% for 5 minutes"

      - alert: GPUOutOfMemory
        expr: nvidia_gpu_memory_used_bytes / nvidia_gpu_memory_total_bytes * 100 > 90
        for: 2m
        labels:
          severity: critical
        annotations:
          summary: "GPU out of memory on {{ $labels.instance }}"
          description: "GPU memory usage is above 90%"

      - alert: APIDown
        expr: up{job=~"deepseek-api|stable-diffusion-api"} == 0
        for: 1m
        labels:
          severity: critical
        annotations:
          summary: "API service down: {{ $labels.job }}"
          description: "API service has been down for more than 1 minute"

      - alert: HighResponseTime
        expr: histogram_quantile(0.95, rate(http_request_duration_seconds_bucket[5m])) > 2
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "High response time on {{ $labels.job }}"
          description: "95th percentile response time is above 2 seconds"

6. AI Services Production Environment Deployment

6.1 DeepSeek API Service (Production Environment)

# deepseek_api.py - Production Environment API Service
import os
import time
import logging
import torch
import redis
import json
import jwt
from datetime import datetime, timedelta
from typing import Dict, Any, Optional
from fastapi import FastAPI, HTTPException, Depends, BackgroundTasks
from fastapi.security import HTTPBearer, HTTPAuthorizationCredentials
from fastapi.middleware.cors import CORSMiddleware
from pydantic import BaseModel
from transformers import AutoTokenizer, AutoModelForCausalLM
import prometheus_client
from prometheus_client import Counter, Histogram, Gauge

# Configure logging
logging.basicConfig(
    level=logging.INFO,
    format='%(asctime)s - %(name)s - %(levelname)s - %(message)s',
    handlers=[
        logging.FileHandler('/var/log/deepseek_api.log'),
        logging.StreamHandler()
    ]
)
logger = logging.getLogger(__name__)

# Prometheus metrics
REQUEST_COUNT = Counter('deepseek_requests_total', 'Total requests', ['endpoint', 'status'])
REQUEST_DURATION = Histogram('deepseek_request_duration_seconds', 'Request duration')
MODEL_LOAD_TIME = Histogram('deepseek_model_load_seconds', 'Model load time')
GPU_MEMORY_USAGE = Gauge('deepseek_gpu_memory_bytes', 'GPU memory usage')

# Redis connection
redis_client = redis.Redis(
    host=os.getenv('REDIS_HOST', 'localhost'),
    port=int(os.getenv('REDIS_PORT', 6379)),
    password=os.getenv('REDIS_PASSWORD'),
    decode_responses=True
)

# JWT configuration
JWT_SECRET = os.getenv('JWT_SECRET', 'your-secret-key')
JWT_ALGORITHM = "HS256"

# Model configuration
MODEL_NAME = os.getenv('MODEL_NAME', 'deepseek-ai/deepseek-llm-7b-instruct')
MAX_LENGTH = int(os.getenv('MAX_LENGTH', 2048))
TEMPERATURE = float(os.getenv('TEMPERATURE', 0.7))

app = FastAPI(title="DeepSeek API", version="1.0.0")

# CORS configuration
app.add_middleware(
    CORSMiddleware,
    allow_origins=os.getenv('ALLOWED_ORIGINS', '*').split(','),
    allow_credentials=True,
    allow_methods=["*"],
    allow_headers=["*"],
)

# Security authentication
security = HTTPBearer()

class GenerateRequest(BaseModel):
    prompt: str
    max_length: Optional[int] = MAX_LENGTH
    temperature: Optional[float] = TEMPERATURE
    top_p: Optional[float] = 0.9
    top_k: Optional[int] = 50

class GenerateResponse(BaseModel):
    response: str
    tokens_used: int
    processing_time: float
    model_name: str

# Global model variables
model = None
tokenizer = None

def load_model():
    """Load model"""
    global model, tokenizer
    start_time = time.time()

    try:
        logger.info(f"Loading model: {MODEL_NAME}")
        tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME)
        model = AutoModelForCausalLM.from_pretrained(
            MODEL_NAME,
            torch_dtype=torch.float16,
            device_map="auto",
            trust_remote_code=True,
            load_in_8bit=True  # 8-bit quantization to save memory
        )

        load_time = time.time() - start_time
        MODEL_LOAD_TIME.observe(load_time)
        logger.info(f"Model loaded successfully in {load_time:.2f} seconds")

    except Exception as e:
        logger.error(f"Failed to load model: {e}")
        raise

def verify_token(credentials: HTTPAuthorizationCredentials = Depends(security)) -> Dict[str, Any]:
    """Verify JWT token"""
    try:
        payload = jwt.decode(credentials.credentials, JWT_SECRET, algorithms=[JWT_ALGORITHM])
        return payload
    except jwt.ExpiredSignatureError:
        raise HTTPException(status_code=401, detail="Token expired")
    except jwt.JWTError:
        raise HTTPException(status_code=401, detail="Invalid token")

def get_cached_response(prompt: str, params: Dict[str, Any]) -> Optional[str]:
    """Get response from cache"""
    cache_key = f"deepseek:{hash(prompt + str(params))}"
    return redis_client.get(cache_key)

def set_cached_response(prompt: str, params: Dict[str, Any], response: str, ttl: int = 3600):
    """Set cached response"""
    cache_key = f"deepseek:{hash(prompt + str(params))}"
    redis_client.setex(cache_key, ttl, response)

@app.on_event("startup")
async def startup_event():
    """Load model on startup"""
    load_model()

@app.get("/health")
async def health_check():
    """Health check"""
    return {
        "status": "healthy",
        "model_loaded": model is not None,
        "gpu_available": torch.cuda.is_available(),
        "timestamp": datetime.utcnow().isoformat()
    }

@app.get("/metrics")
async def metrics():
    """Prometheus metrics"""
    return prometheus_client.generate_latest()

@app.post("/generate", response_model=GenerateResponse)
async def generate_text(
    request: GenerateRequest,
    background_tasks: BackgroundTasks,
    user_info: Dict[str, Any] = Depends(verify_token)
):
    """Generate text"""
    start_time = time.time()

    try:
        # Check user quota
        user_id = user_info.get('user_id')
        quota_key = f"quota:{user_id}:daily"
        current_usage = int(redis_client.get(quota_key) or 0)
        daily_limit = int(os.getenv('DAILY_LIMIT', 1000))

        if current_usage >= daily_limit:
            raise HTTPException(status_code=429, detail="Daily quota exceeded")

        # Check cache
        params = {
            'max_length': request.max_length,
            'temperature': request.temperature,
            'top_p': request.top_p,
            'top_k': request.top_k
        }

        cached_response = get_cached_response(request.prompt, params)
        if cached_response:
            REQUEST_COUNT.labels(endpoint='/generate', status='cache_hit').inc()
            return GenerateResponse(
                response=cached_response,
                tokens_used=0,
                processing_time=0.0,
                model_name=MODEL_NAME
            )

        # Generate response
        inputs = tokenizer(request.prompt, return_tensors="pt").to(model.device)

        with torch.no_grad():
            outputs = model.generate(
                **inputs,
                max_length=request.max_length,
                temperature=request.temperature,
                top_p=request.top_p,
                top_k=request.top_k,
                do_sample=True,
                pad_token_id=tokenizer.eos_token_id
            )

        response = tokenizer.decode(outputs[0], skip_special_tokens=True)
        processing_time = time.time() - start_time

        # Update metrics
        REQUEST_DURATION.observe(processing_time)
        REQUEST_COUNT.labels(endpoint='/generate', status='success').inc()

        # Update GPU memory usage
        if torch.cuda.is_available():
            GPU_MEMORY_USAGE.set(torch.cuda.memory_allocated())

        # Cache response
        background_tasks.add_task(set_cached_response, request.prompt, params, response)

        # Update user quota
        redis_client.incr(quota_key)
        redis_client.expire(quota_key, 86400)  # 24 hours expiration

        return GenerateResponse(
            response=response,
            tokens_used=len(outputs[0]),
            processing_time=processing_time,
            model_name=MODEL_NAME
        )

    except Exception as e:
        REQUEST_COUNT.labels(endpoint='/generate', status='error').inc()
        logger.error(f"Generation error: {e}")
        raise HTTPException(status_code=500, detail=str(e))

@app.get("/quota/{user_id}")
async def get_quota(user_id: str, user_info: Dict[str, Any] = Depends(verify_token)):
    """Get user quota information"""
    quota_key = f"quota:{user_id}:daily"
    current_usage = int(redis_client.get(quota_key) or 0)
    daily_limit = int(os.getenv('DAILY_LIMIT', 1000))

    return {
        "user_id": user_id,
        "current_usage": current_usage,
        "daily_limit": daily_limit,
        "remaining": max(0, daily_limit - current_usage)
    }

if __name__ == "__main__":
    import uvicorn
    uvicorn.run(app, host="0.0.0.0", port=8001)

6.2 DeepSeek Docker Deployment

# Dockerfile.deepseek
FROM nvidia/cuda:12.0-devel-ubuntu22.04

# Set environment variables
ENV DEBIAN_FRONTEND=noninteractive
ENV PYTHONUNBUFFERED=1
ENV PYTHONPATH=/app

# Install system dependencies
RUN apt-get update && apt-get install -y \
    python3 \
    python3-pip \
    python3-dev \
    git \
    wget \
    curl \
    && rm -rf /var/lib/apt/lists/*

# Set working directory
WORKDIR /app

# Copy dependency files
COPY requirements.txt .

# Install Python dependencies
RUN pip3 install --no-cache-dir -r requirements.txt

# Copy application code
COPY deepseek_api.py .
COPY models/ ./models/

# Create log directory
RUN mkdir -p /var/log

# Expose port
EXPOSE 8001

HEALTHCHECK --interval=30s --timeout=10s --start-period=60s --retries=3 \
    CMD curl -f http://localhost:8001/health || exit 1

CMD ["python3", "deepseek_api.py"]

6.3 Stable Diffusion API Service

# stable_diffusion_api.py
import os
import time
import logging
import base64
import io
from typing import Optional
from fastapi import FastAPI, HTTPException, Depends, BackgroundTasks, File, UploadFile
from fastapi.security import HTTPBearer, HTTPAuthorizationCredentials
from fastapi.middleware.cors import CORSMiddleware
from pydantic import BaseModel
import torch
from diffusers import StableDiffusionPipeline
import prometheus_client
from prometheus_client import Counter, Histogram, Gauge
import redis
import jwt

logging.basicConfig(
    level=logging.INFO,
    format='%(asctime)s - %(name)s - %(levelname)s - %(message)s',
    handlers=[
        logging.FileHandler('/var/log/sd_api.log'),
        logging.StreamHandler()
    ]
)
logger = logging.getLogger(__name__)

SD_REQUEST_COUNT = Counter('sd_requests_total', 'Total SD requests', ['endpoint', 'status'])
SD_REQUEST_DURATION = Histogram('sd_request_duration_seconds', 'SD request duration')
SD_GPU_MEMORY_USAGE = Gauge('sd_gpu_memory_bytes', 'SD GPU memory usage')

redis_client = redis.Redis(
    host=os.getenv('REDIS_HOST', 'localhost'),
    port=int(os.getenv('REDIS_PORT', 6379)),
    password=os.getenv('REDIS_PASSWORD'),
    decode_responses=True
)

# JWT configuration
JWT_SECRET = os.getenv('JWT_SECRET', 'your-secret-key')
JWT_ALGORITHM = "HS256"

app = FastAPI(title="Stable Diffusion API", version="1.0.0")

app.add_middleware(
    CORSMiddleware,
    allow_origins=os.getenv('ALLOWED_ORIGINS', '*').split(','),
    allow_credentials=True,
    allow_methods=["*"],
    allow_headers=["*"],
)

security = HTTPBearer()

class GenerateImageRequest(BaseModel):
    prompt: str
    negative_prompt: Optional[str] = ""
    width: Optional[int] = 512
    height: Optional[int] = 512
    num_inference_steps: Optional[int] = 50
    guidance_scale: Optional[float] = 7.5
    seed: Optional[int] = None

class GenerateImageResponse(BaseModel):
    image_base64: str
    seed: int
    processing_time: float
    model_name: str

pipe = None

def load_model():
    """Load Stable Diffusion model"""
    global pipe
    try:
        logger.info("Loading Stable Diffusion model...")
        model_id = os.getenv('SD_MODEL_ID', 'runwayml/stable-diffusion-v1-5')

        pipe = StableDiffusionPipeline.from_pretrained(
            model_id,
            torch_dtype=torch.float16,
            use_safetensors=True
        )

        if torch.cuda.is_available():
            pipe = pipe.to("cuda")
            pipe.enable_attention_slicing()
            pipe.enable_vae_slicing()

        logger.info("Stable Diffusion model loaded successfully")

    except Exception as e:
        logger.error(f"Failed to load SD model: {e}")
        raise

def verify_token(credentials: HTTPAuthorizationCredentials = Depends(security)):
    """Verify JWT token"""
    try:
        payload = jwt.decode(credentials.credentials, JWT_SECRET, algorithms=[JWT_ALGORITHM])
        return payload
    except jwt.ExpiredSignatureError:
        raise HTTPException(status_code=401, detail="Token expired")
    except jwt.JWTError:
        raise HTTPException(status_code=401, detail="Invalid token")

@app.on_event("startup")
async def startup_event():
    """Load model on startup"""
    load_model()

@app.get("/health")
async def health_check():
    """Health check"""
    return {
        "status": "healthy",
        "model_loaded": pipe is not None,
        "gpu_available": torch.cuda.is_available(),
        "timestamp": time.time()
    }

@app.get("/metrics")
async def metrics():
    """Prometheus metrics"""
    return prometheus_client.generate_latest()

@app.post("/generate", response_model=GenerateImageResponse)
async def generate_image(
    request: GenerateImageRequest,
    user_info: dict = Depends(verify_token)
):
    """Generate image"""
    start_time = time.time()

    try:
        # Check user quota
        user_id = user_info.get('user_id')
        quota_key = f"sd_quota:{user_id}:daily"
        current_usage = int(redis_client.get(quota_key) or 0)
        daily_limit = int(os.getenv('SD_DAILY_LIMIT', 100))

        if current_usage >= daily_limit:
            raise HTTPException(status_code=429, detail="Daily SD quota exceeded")

        # Set random seed
        if request.seed is None:
            request.seed = torch.randint(0, 2**32, (1,)).item()

        # Generate image
        generator = torch.Generator("cuda" if torch.cuda.is_available() else "cpu").manual_seed(request.seed)

        image = pipe(
            prompt=request.prompt,
            negative_prompt=request.negative_prompt,
            width=request.width,
            height=request.height,
            num_inference_steps=request.num_inference_steps,
            guidance_scale=request.guidance_scale,
            generator=generator
        ).images[0]

        # Convert to base64
        buffer = io.BytesIO()
        image.save(buffer, format="PNG")
        image_base64 = base64.b64encode(buffer.getvalue()).decode()

        processing_time = time.time() - start_time

        # Update metrics
        SD_REQUEST_DURATION.observe(processing_time)
        SD_REQUEST_COUNT.labels(endpoint='/generate', status='success').inc()

        # Update GPU memory usage
        if torch.cuda.is_available():
            SD_GPU_MEMORY_USAGE.set(torch.cuda.memory_allocated())

        # Update user quota
        redis_client.incr(quota_key)
        redis_client.expire(quota_key, 86400)

        return GenerateImageResponse(
            image_base64=image_base64,
            seed=request.seed,
            processing_time=processing_time,
            model_name="stable-diffusion-v1-5"
        )

    except Exception as e:
        SD_REQUEST_COUNT.labels(endpoint='/generate', status='error').inc()
        logger.error(f"Image generation error: {e}")
        raise HTTPException(status_code=500, detail=str(e))

if __name__ == "__main__":
    import uvicorn
    uvicorn.run(app, host="0.0.0.0", port=7860)

3. Install DeepSeek

Clone the DeepSeek repository and install dependencies:

# Clone DeepSeek repository
git clone https://github.com/deepseek-ai/DeepSeek.git
cd DeepSeek

# Install dependencies
pip3 install -r requirements.txt

# Install additional dependencies for inference
pip3 install transformers accelerate sentencepiece

4. Download Model Weights

Download the DeepSeek model weights:

# Create models directory
mkdir -p models
cd models

# Download DeepSeek-7B-Instruct (adjust model size as needed)
wget https://huggingface.co/deepseek-ai/deepseek-llm-7b-instruct/resolve/main/pytorch_model.bin
wget https://huggingface.co/deepseek-ai/deepseek-llm-7b-instruct/resolve/main/config.json
wget https://huggingface.co/deepseek-ai/deepseek-llm-7b-instruct/resolve/main/tokenizer.json
wget https://huggingface.co/deepseek-ai/deepseek-llm-7b-instruct/resolve/main/tokenizer_config.json

5. Create Inference Script

Create a simple inference script for testing:

# inference.py
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM

def load_model():
    model_name = "deepseek-ai/deepseek-llm-7b-instruct"
    tokenizer = AutoTokenizer.from_pretrained(model_name)
    model = AutoModelForCausalLM.from_pretrained(
        model_name,
        torch_dtype=torch.float16,
        device_map="auto",
        trust_remote_code=True
    )
    return model, tokenizer

def generate_response(prompt, model, tokenizer, max_length=512):
    inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

    with torch.no_grad():
        outputs = model.generate(
            **inputs,
            max_length=max_length,
            temperature=0.7,
            do_sample=True,
            pad_token_id=tokenizer.eos_token_id
        )

    response = tokenizer.decode(outputs[0], skip_special_tokens=True)
    return response

if __name__ == "__main__":
    print("Loading DeepSeek model...")
    model, tokenizer = load_model()

    while True:
        user_input = input("\nEnter your prompt (or 'quit' to exit): ")
        if user_input.lower() == 'quit':
            break

        response = generate_response(user_input, model, tokenizer)
        print(f"\nDeepSeek Response: {response}")

6. Run DeepSeek

Execute the inference script:

python3 inference.py

7. Performance Optimization

For better performance, consider these optimizations:

# Install additional optimization libraries
pip3 install bitsandbytes accelerate

# For 4-bit quantization (reduces memory usage)
pip3 install transformers[torch] accelerate bitsandbytes

8. Docker Deployment (Alternative)

Create a Dockerfile for containerized deployment:

# Dockerfile
FROM nvidia/cuda:12.0-devel-ubuntu20.04

# Set environment variables
ENV DEBIAN_FRONTEND=noninteractive
ENV PYTHONUNBUFFERED=1

# Install system dependencies
RUN apt-get update && apt-get install -y \
    python3 \
    python3-pip \
    git \
    wget \
    curl \
    && rm -rf /var/lib/apt/lists/*

# Set working directory
WORKDIR /app

# Copy requirements and install Python dependencies
COPY requirements.txt .
RUN pip3 install -r requirements.txt

# Copy application code
COPY . .

# Expose port for API (if using web interface)
EXPOSE 8000

# Run the application
CMD ["python3", "inference.py"]

9. Monitoring and Logging

Add monitoring capabilities:

# monitoring.py
import psutil
import torch
import logging
from datetime import datetime

logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)

def monitor_resources():
    cpu_percent = psutil.cpu_percent()
    memory = psutil.virtual_memory()
    gpu_memory = torch.cuda.memory_allocated() / 1024**3 if torch.cuda.is_available() else 0

    logger.info(f"CPU Usage: {cpu_percent}%")
    logger.info(f"Memory Usage: {memory.percent}%")
    logger.info(f"GPU Memory: {gpu_memory:.2f} GB")

    return {
        'cpu_percent': cpu_percent,
        'memory_percent': memory.percent,
        'gpu_memory_gb': gpu_memory
    }

10. Troubleshooting

Common issues and solutions:

Out of Memory Error:

# Reduce batch size or use model quantization
pip3 install bitsandbytes
# Use 4-bit quantization in your model loading

CUDA Version Mismatch:

# Check CUDA version
nvidia-smi
# Install matching PyTorch version
pip3 install torch==2.0.1+cu118 --index-url https://download.pytorch.org/whl/cu118

Model Download Issues:

# Use git-lfs for large files
sudo apt install git-lfs
git lfs install
git clone https://huggingface.co/deepseek-ai/deepseek-llm-7b-instruct

11. Production Deployment

For production use, consider:

  • Load Balancing: Use multiple model instances
  • Caching: Implement response caching
  • Rate Limiting: Add request rate limiting
  • Health Checks: Monitor model health and performance
  • Backup: Regular model weight backups

12. Integration with CI/CD

Add to your CI/CD pipeline:

# .github/workflows/deepseek-deploy.yml
name: Deploy DeepSeek

on:
  push:
    branches: [main]

jobs:
  deploy:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3
      - name: Set up Python
        uses: actions/setup-python@v4
        with:
          python-version: "3.9"
      - name: Install dependencies
        run: |
          pip install -r requirements.txt
      - name: Test model loading
        run: |
          python -c "from transformers import AutoModelForCausalLM; AutoModelForCausalLM.from_pretrained('deepseek-ai/deepseek-llm-7b-instruct')"

Deploy Stable Diffusion Locally

Prerequisites

  • Ubuntu 20.04+
  • NVIDIA GPU (8GB+ VRAM recommended)
  • Python 3.9+
  • CUDA 11.8+

1. Install Stable Diffusion

# Clone Stable Diffusion WebUI
git clone https://github.com/AUTOMATIC1111/stable-diffusion-webui.git
cd stable-diffusion-webui

# Install dependencies
pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
pip3 install -r requirements.txt

2. Download Models

# Create models directory
mkdir -p models/Stable-diffusion

# Download base model (example: Stable Diffusion 1.5)
wget https://huggingface.co/runwayml/stable-diffusion-v1-5/resolve/main/v1-5-pruned.safetensors -O models/Stable-diffusion/v1-5-pruned.safetensors

3. Launch WebUI

# Launch with GPU acceleration
python3 launch.py --listen --port 7860 --enable-insecure-extension-access

4. API Integration

For programmatic access, enable the API:

python3 launch.py --api --listen --port 7860

5. Docker Deployment

# Dockerfile for Stable Diffusion
FROM nvidia/cuda:11.8-devel-ubuntu20.04

ENV DEBIAN_FRONTEND=noninteractive
ENV PYTHONUNBUFFERED=1

RUN apt-get update && apt-get install -y \
    python3 \
    python3-pip \
    git \
    wget \
    && rm -rf /var/lib/apt/lists/*

WORKDIR /app

RUN git clone https://github.com/AUTOMATIC1111/stable-diffusion-webui.git .

RUN pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
RUN pip3 install -r requirements.txt

EXPOSE 7860

CMD ["python3", "launch.py", "--listen", "--port", "7860", "--api"]

Conclusion

This guide provides a comprehensive approach to deploying both DeepSeek and Stable Diffusion locally on Ubuntu. The setup includes:

  • Environment preparation with proper CUDA support
  • Model installation and optimization
  • Performance monitoring and troubleshooting
  • Production deployment considerations
  • CI/CD integration for automated deployments

For production environments, consider implementing:

  • Load balancing across multiple GPU instances
  • Automated model updates and versioning
  • Comprehensive monitoring and alerting
  • Security hardening and access controls
  • Backup and disaster recovery procedures

Remember to monitor resource usage and adjust configurations based on your specific hardware and requirements.bash

Apply PATH environment settings

source ~/.zshrc


Update your system and install essential packages:

```bash
sudo apt update && sudo apt upgrade -y
sudo apt install -y build-essential python3-dev python3-pip git wget curl

2. Install CUDA and PyTorch (GPU Users)

For GPU acceleration, install CUDA Toolkit and PyTorch:

# Install CUDA Toolkit (adjust version as needed)
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/cuda-keyring_1.0-1_all.deb
sudo dpkg -i cuda-keyring_1.0-1_all.deb
sudo apt update
sudo apt install -y cuda-toolkit-12-0

# Install PyTorch with CUDA support
pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121

3. Install DeepSeek

Clone the DeepSeek repository and install dependencies:

# Clone DeepSeek repository
git clone https://github.com/deepseek-ai/DeepSeek.git
cd DeepSeek

# Install dependencies
pip3 install -r requirements.txt

# Install additional dependencies for inference
pip3 install transformers accelerate sentencepiece

4. Download Model Weights

Download the DeepSeek model weights:

# Create models directory
mkdir -p models
cd models

# Download DeepSeek-7B-Instruct (adjust model size as needed)
wget https://huggingface.co/deepseek-ai/deepseek-llm-7b-instruct/resolve/main/pytorch_model.bin
wget https://huggingface.co/deepseek-ai/deepseek-llm-7b-instruct/resolve/main/config.json
wget https://huggingface.co/deepseek-ai/deepseek-llm-7b-instruct/resolve/main/tokenizer.json
wget https://huggingface.co/deepseek-ai/deepseek-llm-7b-instruct/resolve/main/tokenizer_config.json

5. Create Inference Script

Create a simple inference script for testing:

# inference.py
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM

def load_model():
    model_name = "deepseek-ai/deepseek-llm-7b-instruct"
    tokenizer = AutoTokenizer.from_pretrained(model_name)
    model = AutoModelForCausalLM.from_pretrained(
        model_name,
        torch_dtype=torch.float16,
        device_map="auto",
        trust_remote_code=True
    )
    return model, tokenizer

def generate_response(prompt, model, tokenizer, max_length=512):
    inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

    with torch.no_grad():
        outputs = model.generate(
            **inputs,
            max_length=max_length,
            temperature=0.7,
            do_sample=True,
            pad_token_id=tokenizer.eos_token_id
        )

    response = tokenizer.decode(outputs[0], skip_special_tokens=True)
    return response

if __name__ == "__main__":
    print("Loading DeepSeek model...")
    model, tokenizer = load_model()

    while True:
        user_input = input("\nEnter your prompt (or 'quit' to exit): ")
        if user_input.lower() == 'quit':
            break

        response = generate_response(user_input, model, tokenizer)
        print(f"\nDeepSeek Response: {response}")

10. Production Environment Best Practices

10.1 Performance Optimization

# System-level optimization
# Adjust kernel parameters
sudo tee /etc/sysctl.d/99-performance.conf > /dev/null <<EOF
# Increase file descriptor limits
fs.file-max = 65536

# Network optimization
net.core.rmem_max = 16777216
net.core.wmem_max = 16777216
net.ipv4.tcp_rmem = 4096 87380 16777216
net.ipv4.tcp_wmem = 4096 65536 16777216

# GPU optimization
nvidia.NVreg_UsePageAttributeTable = 1
nvidia.NVreg_EnablePCIeGen3 = 1
EOF

sudo sysctl -p /etc/sysctl.d/99-performance.conf

# Adjust process limits
sudo tee /etc/security/limits.d/99-aigc.conf > /dev/null <<EOF
* soft nofile 65536
* hard nofile 65536
* soft nproc 32768
* hard nproc 32768
EOF

10.2 Resource Management

# resource_manager.py - Resource Manager
import psutil
import torch
import logging
from typing import Dict, Any
import time

class ResourceManager:
    def __init__(self):
        self.logger = logging.getLogger(__name__)
        self.gpu_memory_threshold = 0.9  # 90% GPU memory usage threshold
        self.cpu_threshold = 0.8  # 80% CPU usage threshold
        self.memory_threshold = 0.85  # 85% memory usage threshold

    def get_system_resources(self) -> Dict[str, Any]:
        """Get system resource usage"""
        cpu_percent = psutil.cpu_percent(interval=1)
        memory = psutil.virtual_memory()

        gpu_info = {}
        if torch.cuda.is_available():
            for i in range(torch.cuda.device_count()):
                gpu_memory = torch.cuda.get_device_properties(i).total_memory
                gpu_memory_allocated = torch.cuda.memory_allocated(i)
                gpu_memory_usage = gpu_memory_allocated / gpu_memory

                gpu_info[f"gpu_{i}"] = {
                    "memory_total": gpu_memory,
                    "memory_allocated": gpu_memory_allocated,
                    "memory_usage": gpu_memory_usage,
                    "temperature": self._get_gpu_temperature(i)
                }

        return {
            "cpu_percent": cpu_percent,
            "memory_percent": memory.percent,
            "memory_available": memory.available,
            "gpu_info": gpu_info,
            "timestamp": time.time()
        }

    def check_resource_health(self) -> Dict[str, bool]:
        """Check resource health status"""
        resources = self.get_system_resources()
        health_status = {
            "cpu_healthy": resources["cpu_percent"] < (self.cpu_threshold * 100),
            "memory_healthy": resources["memory_percent"] < (self.memory_threshold * 100),
            "gpu_healthy": True
        }

        # Check GPU health status
        for gpu_id, gpu_data in resources["gpu_info"].items():
            if gpu_data["memory_usage"] > self.gpu_memory_threshold:
                health_status["gpu_healthy"] = False
                self.logger.warning(f"GPU {gpu_id} memory usage high: {gpu_data['memory_usage']:.2%}")

        return health_status

    def _get_gpu_temperature(self, gpu_id: int) -> float:
        """Get GPU temperature"""
        try:
            import subprocess
            result = subprocess.run(
                ["nvidia-smi", "--query-gpu=temperature.gpu", "--format=csv,noheader,nounits", "-i", str(gpu_id)],
                capture_output=True, text=True
            )
            return float(result.stdout.strip())
        except:
            return 0.0

    def cleanup_gpu_memory(self):
        """Clean up GPU memory"""
        if torch.cuda.is_available():
            torch.cuda.empty_cache()
            self.logger.info("GPU memory cleaned up")

# Usage example
resource_manager = ResourceManager()

# Periodically check resources
def monitor_resources():
    while True:
        health = resource_manager.check_resource_health()
        if not all(health.values()):
            resource_manager.cleanup_gpu_memory()
        time.sleep(60)  # Check every minute

10.3 Error Handling and Recovery

# error_handler.py - Error Handling and Recovery
import logging
import time
import traceback
from functools import wraps
from typing import Callable, Any
import redis
import requests

class ErrorHandler:
    def __init__(self, redis_client: redis.Redis):
        self.logger = logging.getLogger(__name__)
        self.redis_client = redis_client
        self.max_retries = 3
        self.retry_delay = 1

    def retry_on_failure(self, max_retries: int = None, delay: float = None):
        """Retry decorator"""
        def decorator(func: Callable) -> Callable:
            @wraps(func)
            def wrapper(*args, **kwargs) -> Any:
                retries = max_retries or self.max_retries
                retry_delay = delay or self.retry_delay

                for attempt in range(retries + 1):
                    try:
                        return func(*args, **kwargs)
                    except Exception as e:
                        if attempt == retries:
                            self.logger.error(f"Function {func.__name__} failed after {retries} retries: {e}")
                            raise

                        self.logger.warning(f"Attempt {attempt + 1} failed for {func.__name__}: {e}")
                        time.sleep(retry_delay * (2 ** attempt))  # Exponential backoff

            return wrapper
        return decorator

    def handle_model_loading_error(self, model_name: str) -> bool:
        """Handle model loading error"""
        try:
            # Record error
            error_key = f"model_error:{model_name}:{int(time.time())}"
            self.redis_client.setex(error_key, 3600, "model_loading_failed")

            # Try to reload model
            self.logger.info(f"Attempting to reload model: {model_name}")

            # Add model reloading logic here
            return True

        except Exception as e:
            self.logger.error(f"Failed to handle model loading error: {e}")
            return False

    def handle_api_error(self, endpoint: str, error: Exception) -> Dict[str, Any]:
        """Handle API error"""
        error_info = {
            "endpoint": endpoint,
            "error_type": type(error).__name__,
            "error_message": str(error),
            "timestamp": time.time(),
            "traceback": traceback.format_exc()
        }

        # Log error
        self.logger.error(f"API Error: {error_info}")

        # Store error info in Redis
        error_key = f"api_error:{endpoint}:{int(time.time())}"
        self.redis_client.setex(error_key, 3600, str(error_info))

        return error_info

# Usage example
@ErrorHandler(redis_client).retry_on_failure(max_retries=3, delay=1)
def generate_text_with_retry(prompt: str, model, tokenizer):
    """Text generation with retry"""
    return model.generate(prompt, tokenizer)

11. Deployment Checklist

11.1 Pre-deployment Check

  • Hardware Check

    • GPU driver and CUDA version compatible
    • Memory capacity meets requirements
    • Storage space sufficient
    • Network bandwidth meets requirements
  • Software Check

    • Operating system version correct
    • Docker and Docker Compose installed
    • All dependencies installed
    • Firewall configuration correct
  • Security Check

    • SSL certificate configured
    • Firewall rules set
    • User permissions configured
    • Keys and passwords generated

11.2 Post-deployment Verification

  • Service Health Check

    • All containers running normally
    • API endpoints accessible
    • Database connection normal
    • Monitoring system working properly
  • Performance Test

    • API response time test
    • Concurrent user test
    • Memory usage monitoring
    • GPU utilization check
  • Security Test

    • Identity authentication test
    • Permission control test
    • Data encryption verification
    • Network security scan

11.3 Operations Monitoring

  • Daily Monitoring

    • System resource usage
    • API response time
    • Error rate statistics
    • User activity logs
  • Regular Maintenance

    • Log file rotation
    • Database backup
    • System updates
    • Performance optimization

12. Troubleshooting Guide

12.1 Common Issues

Problem 1: GPU Memory Insufficient

# Solution
# 1. Check GPU memory usage
nvidia-smi

# 2. Clean GPU memory
python3 -c "import torch; torch.cuda.empty_cache()"

# 3. Restart AI services
docker restart deepseek-api-1 sd-api-1

Problem 2: API Response Slow

# Solution
# 1. Check system load
htop

# 2. Check network latency
ping your-api-endpoint

# 3. Check Redis cache
redis-cli info memory

# 4. Restart load balancer
sudo systemctl restart haproxy

Problem 3: Database Connection Failed

# Solution
# 1. Check database status
docker ps | grep postgres

# 2. Check database logs
docker logs postgres-primary

# 3. Restart database service
docker-compose -f docker-compose.db.yml restart

12.2 Emergency Recovery Process

#!/bin/bash
# emergency_recovery.sh - Emergency recovery script

echo "Starting emergency recovery process..."

# 1. Stop all services
docker-compose down

# 2. Clean up resources
docker system prune -f
python3 -c "import torch; torch.cuda.empty_cache()"

# 3. Restart infrastructure
docker-compose -f docker-compose.db.yml up -d
sleep 30

# 4. Restart AI services
docker-compose -f docker-compose.ai.yml up -d

# 5. Verify service status
./health_check.sh

echo "Emergency recovery completed"

13. Summary

This guide provides a complete production environment deployment solution for enterprise AIGC platforms, including:

13.1 Core Features

  • High Availability: Multi-instance deployment, load balancing, automatic fault recovery
  • Security: Identity authentication, access control, data encryption, network security
  • Scalability: Horizontal scaling, elastic resource scaling, microservice architecture
  • Monitoring & Alerting: Comprehensive system monitoring, performance metrics, anomaly alerts
  • Operations Management: Automated deployment, log management, backup and recovery

13.2 Technology Stack

  • Containerization: Docker + Docker Compose
  • Load Balancing: HAProxy
  • API Gateway: Kong
  • Identity Authentication: Keycloak
  • Database: PostgreSQL + Redis
  • Monitoring: Prometheus + Grafana + AlertManager
  • Logging: ELK Stack
  • AI Framework: PyTorch + Transformers + Diffusers

13.3 Best Practices

  • Security First: Multi-layer security protection, regular security audits
  • Performance Optimization: Resource monitoring, auto-scaling, caching strategies
  • Fault Recovery: Automated recovery processes, data backup, disaster recovery
  • Operations Automation: CI/CD pipeline, monitoring alerts, log analysis

13.4 Future Optimization Suggestions

  1. Performance Optimization

    • Implement model quantization optimization
    • Add distributed inference support
    • Optimize caching strategies
  2. Feature Extension

    • Support more AI models
    • Add batch processing capabilities
    • Implement model version management
  3. Operations Improvement

    • Implement blue-green deployment
    • Add A/B testing support
    • Improve monitoring dashboards
  4. Security Enhancement

    • Implement zero-trust architecture
    • Add threat detection
    • Improve audit logging

By following this guide, you can build a stable, secure, and efficient enterprise AIGC platform that provides reliable AI services for internal employees.

8. Docker Deployment (Alternative)

Create a Dockerfile for containerized deployment:

# Dockerfile
FROM nvidia/cuda:12.0-devel-ubuntu20.04

# Set environment variables
ENV DEBIAN_FRONTEND=noninteractive
ENV PYTHONUNBUFFERED=1

# Install system dependencies
RUN apt-get update && apt-get install -y \
    python3 \
    python3-pip \
    git \
    wget \
    curl \
    && rm -rf /var/lib/apt/lists/*

# Set working directory
WORKDIR /app

# Copy requirements and install Python dependencies
COPY requirements.txt .
RUN pip3 install -r requirements.txt

# Copy application code
COPY . .

# Expose port for API (if using web interface)
EXPOSE 8000

# Run the application
CMD ["python3", "inference.py"]

9. Monitoring and Logging

Add monitoring capabilities:

# monitoring.py
import psutil
import torch
import logging
from datetime import datetime

logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)

def monitor_resources():
    cpu_percent = psutil.cpu_percent()
    memory = psutil.virtual_memory()
    gpu_memory = torch.cuda.memory_allocated() / 1024**3 if torch.cuda.is_available() else 0

    logger.info(f"CPU Usage: {cpu_percent}%")
    logger.info(f"Memory Usage: {memory.percent}%")
    logger.info(f"GPU Memory: {gpu_memory:.2f} GB")

    return {
        'cpu_percent': cpu_percent,
        'memory_percent': memory.percent,
        'gpu_memory_gb': gpu_memory
    }

10. Troubleshooting

Common issues and solutions:

Out of Memory Error:

# Reduce batch size or use model quantization
pip3 install bitsandbytes
# Use 4-bit quantization in your model loading

CUDA Version Mismatch:

# Check CUDA version
nvidia-smi
# Install matching PyTorch version
pip3 install torch==2.0.1+cu118 --index-url https://download.pytorch.org/whl/cu118

Model Download Issues:

# Use git-lfs for large files
sudo apt install git-lfs
git lfs install
git clone https://huggingface.co/deepseek-ai/deepseek-llm-7b-instruct

11. Production Deployment

For production use, consider:

  • Load Balancing: Use multiple model instances
  • Caching: Implement response caching
  • Rate Limiting: Add request rate limiting
  • Health Checks: Monitor model health and performance
  • Backup: Regular model weight backups

12. Integration with CI/CD

Add to your CI/CD pipeline:

# .github/workflows/deepseek-deploy.yml
name: Deploy DeepSeek

on:
  push:
    branches: [main]

jobs:
  deploy:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3
      - name: Set up Python
        uses: actions/setup-python@v4
        with:
          python-version: "3.9"
      - name: Install dependencies
        run: |
          pip install -r requirements.txt
      - name: Test model loading
        run: |
          python -c "from transformers import AutoModelForCausalLM; AutoModelForCausalLM.from_pretrained('deepseek-ai/deepseek-llm-7b-instruct')"

Deploy Stable Diffusion Locally

Prerequisites

  • Ubuntu 20.04+
  • NVIDIA GPU (8GB+ VRAM recommended)
  • Python 3.9+
  • CUDA 11.8+

1. Install Stable Diffusion

# Clone Stable Diffusion WebUI
git clone https://github.com/AUTOMATIC1111/stable-diffusion-webui.git
cd stable-diffusion-webui

# Install dependencies
pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
pip3 install -r requirements.txt

2. Download Models

# Create models directory
mkdir -p models/Stable-diffusion

# Download base model (example: Stable Diffusion 1.5)
wget https://huggingface.co/runwayml/stable-diffusion-v1-5/resolve/main/v1-5-pruned.safetensors -O models/Stable-diffusion/v1-5-pruned.safetensors

3. Launch WebUI

# Launch with GPU acceleration
python3 launch.py --listen --port 7860 --enable-insecure-extension-access

4. API Integration

For programmatic access, enable the API:

python3 launch.py --api --listen --port 7860

5. Docker Deployment

# Dockerfile for Stable Diffusion
FROM nvidia/cuda:11.8-devel-ubuntu20.04

ENV DEBIAN_FRONTEND=noninteractive
ENV PYTHONUNBUFFERED=1

RUN apt-get update && apt-get install -y \
    python3 \
    python3-pip \
    git \
    wget \
    && rm -rf /var/lib/apt/lists/*

WORKDIR /app

RUN git clone https://github.com/AUTOMATIC1111/stable-diffusion-webui.git .

RUN pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
RUN pip3 install -r requirements.txt

EXPOSE 7860

CMD ["python3", "launch.py", "--listen", "--port", "7860", "--api"]

Conclusion

This guide provides a comprehensive approach to deploying both DeepSeek and Stable Diffusion locally on Ubuntu. The setup includes:

  • Environment preparation with proper CUDA support
  • Model installation and optimization
  • Performance monitoring and troubleshooting
  • Production deployment considerations
  • CI/CD integration for automated deployments

For production environments, consider implementing:

  • Load balancing across multiple GPU instances
  • Automated model updates and versioning
  • Comprehensive monitoring and alerting
  • Security hardening and access controls
  • Backup and disaster recovery procedures

Remember to monitor resource usage and adjust configurations based on your specific hardware and requirements.

YH

Youqing Han

DevOps Engineer

Share this article:

Stay Updated

Get the latest DevOps insights and best practices delivered to your inbox

No spam, unsubscribe at any time