AIGC Platform: DeepSeek & Stable Diffusion Production Deployment
Production Environment Architecture Overview
This document provides a complete deployment solution for enterprise AIGC platforms, suitable for production environments used by internal employees, with focus on:
- High Availability: Multi-instance load balancing, automatic fault recovery
- Security: Identity authentication, access control, data encryption
- Scalability: Horizontal scaling, elastic resource scaling
- Monitoring & Alerting: Comprehensive system monitoring and anomaly alerts
- Compliance: Data privacy protection, audit logs
System Architecture
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
│ Load Balancer │ │ API Gateway │ │ Auth Service │
│ (Nginx/HAProxy)│ │ (Kong/Traefik) │ │ (Keycloak) │
└─────────────────┘ └─────────────────┘ └─────────────────┘
│ │ │
└───────────────────────┼───────────────────────┘
│
┌───────────────────────┼───────────────────────┐
│ │ │
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
│ DeepSeek API │ │ Stable Diffusion│ │ Monitoring │
│ (Multiple) │ │ API (Multiple)│ │ (Prometheus) │
└─────────────────┘ └─────────────────┘ └─────────────────┘
│ │ │
└───────────────────────┼───────────────────────┘
│
┌─────────────────┐
│ Storage Layer │
│ (Redis + MinIO) │
└─────────────────┘
2. Production Environment Infrastructure Preparation
2.1 Hardware Requirements
Minimum Configuration (Development/Test Environment):
- CPU: 16-core AMD EPYC or Intel Xeon
- Memory: 64GB DDR4 ECC
- GPU: 2x NVIDIA RTX 4090 (24GB VRAM) or 1x NVIDIA A100 (40GB VRAM)
- Storage: 2TB NVMe SSD (RAID 1)
- Network: 10Gbps network interface
Production Environment Recommended Configuration:
- CPU: 32-core AMD EPYC or Intel Xeon
- Memory: 128GB DDR4 ECC
- GPU: 4x NVIDIA A100 (40GB VRAM) or 8x NVIDIA RTX 4090
- Storage: 4TB NVMe SSD (RAID 10) + 10TB HDD (backup)
- Network: 25Gbps network interface
2.2 Software Environment
- Operating System: Ubuntu 22.04 LTS (Long Term Support version)
- Container Platform: Docker 24.0+ or Kubernetes 1.28+
- Orchestration Tools: Docker Compose or Helm
- Database: PostgreSQL 15+ (user management) + Redis 7+ (caching)
- Monitoring: Prometheus + Grafana + AlertManager
- Logging: ELK Stack (Elasticsearch + Logstash + Kibana)
- Security: Vault (key management) + OpenVPN (VPN access)
2.3 Network Security Requirements
- Firewall: UFW or iptables configuration
- SSL/TLS: Let’s Encrypt or enterprise certificates
- VPN: OpenVPN or WireGuard
- Access Control: Role-based access control (RBAC)
- Audit Logs: Complete operation audit records
3. Security Configuration and Identity Authentication
3.1 System Security Hardening
# System update and security hardening
sudo apt update && sudo apt upgrade -y
sudo apt install -y ufw fail2ban unattended-upgrades
# Configure firewall
sudo ufw default deny incoming
sudo ufw default allow outgoing
sudo ufw allow ssh
sudo ufw allow 443/tcp # HTTPS
sudo ufw allow 80/tcp # HTTP (redirect to HTTPS)
sudo ufw enable
# Configure automatic security updates
sudo dpkg-reconfigure -plow unattended-upgrades
3.2 Identity Authentication System (Keycloak)
# docker-compose.auth.yml
version: "3.8"
services:
postgres:
image: postgres:15
environment:
POSTGRES_DB: keycloak
POSTGRES_USER: keycloak
POSTGRES_PASSWORD: ${POSTGRES_PASSWORD}
volumes:
- postgres_data:/var/lib/postgresql/data
networks:
- auth_network
keycloak:
image: quay.io/keycloak/keycloak:23.0
environment:
KC_DB: postgres
KC_DB_URL: jdbc:postgresql://postgres:5432/keycloak
KC_DB_USERNAME: keycloak
KC_DB_PASSWORD: ${POSTGRES_PASSWORD}
KEYCLOAK_ADMIN: admin
KEYCLOAK_ADMIN_PASSWORD: ${KEYCLOAK_ADMIN_PASSWORD}
command: start-dev
ports:
- "8080:8080"
depends_on:
- postgres
networks:
- auth_network
volumes:
postgres_data:
networks:
auth_network:
driver: bridge
3.3 API Gateway Configuration (Kong)
# docker-compose.gateway.yml
version: "3.8"
services:
kong:
image: kong:3.4
environment:
KONG_DATABASE: postgres
KONG_PG_HOST: kong-database
KONG_PG_USER: kong
KONG_PG_PASSWORD: ${KONG_PG_PASSWORD}
KONG_PROXY_ACCESS_LOG: /dev/stdout
KONG_ADMIN_ACCESS_LOG: /dev/stdout
KONG_PROXY_ERROR_LOG: /dev/stderr
KONG_ADMIN_ERROR_LOG: /dev/stderr
KONG_ADMIN_LISTEN: 0.0.0.0:8001
KONG_ADMIN_GUI_URL: http://localhost:8002
ports:
- "8000:8000"
- "8443:8443"
- "8001:8001"
- "8444:8444"
depends_on:
- kong-database
networks:
- gateway_network
kong-database:
image: postgres:15
environment:
POSTGRES_USER: kong
POSTGRES_DB: kong
POSTGRES_PASSWORD: ${KONG_PG_PASSWORD}
volumes:
- kong_data:/var/lib/postgresql/data
networks:
- gateway_network
volumes:
kong_data:
networks:
gateway_network:
driver: bridge
4. High Availability Configuration
4.1 Load Balancer (HAProxy)
# Install HAProxy
sudo apt install -y haproxy
# Configure HAProxy
sudo tee /etc/haproxy/haproxy.cfg > /dev/null <<EOF
global
log /dev/log local0
log /dev/log local1 notice
chroot /var/lib/haproxy
stats socket /run/haproxy/admin.sock mode 660 level admin expose-fd listeners
stats timeout 30s
user haproxy
group haproxy
daemon
defaults
log global
mode http
option httplog
option dontlognull
timeout connect 5000
timeout client 50000
timeout server 50000
frontend http_front
bind *:80
bind *:443 ssl crt /etc/ssl/certs/aigc-platform.pem
redirect scheme https if !{ ssl_fc }
# Health check
http-request add-header X-Forwarded-Proto https if { ssl_fc }
# Route to backend services
use_backend deepseek_backend if { path_beg /api/deepseek }
use_backend stable_diffusion_backend if { path_beg /api/sd }
use_backend web_backend if { path_beg / }
backend deepseek_backend
balance roundrobin
option httpchk GET /health
server deepseek1 10.0.1.10:8001 check
server deepseek2 10.0.1.11:8001 check
server deepseek3 10.0.1.12:8001 check
backend stable_diffusion_backend
balance roundrobin
option httpchk GET /health
server sd1 10.0.1.20:7860 check
server sd2 10.0.1.21:7860 check
backend web_backend
balance roundrobin
server web1 10.0.1.30:3000 check
server web2 10.0.1.31:3000 check
listen stats
bind *:8404
stats enable
stats uri /stats
stats refresh 10s
stats auth admin:${HAPROXY_STATS_PASSWORD}
EOF
sudo systemctl enable haproxy
sudo systemctl start haproxy
4.2 Database High Availability (PostgreSQL)
# docker-compose.db.yml
version: "3.8"
services:
postgres-primary:
image: postgres:15
environment:
POSTGRES_DB: aigc_platform
POSTGRES_USER: aigc_user
POSTGRES_PASSWORD: ${POSTGRES_PASSWORD}
POSTGRES_INITDB_ARGS: "--encoding=UTF-8 --lc-collate=C --lc-ctype=C"
volumes:
- postgres_primary_data:/var/lib/postgresql/data
- ./postgres/init:/docker-entrypoint-initdb.d
ports:
- "5432:5432"
networks:
- db_network
postgres-replica:
image: postgres:15
environment:
POSTGRES_DB: aigc_platform
POSTGRES_USER: aigc_user
POSTGRES_PASSWORD: ${POSTGRES_PASSWORD}
volumes:
- postgres_replica_data:/var/lib/postgresql/data
ports:
- "5433:5432"
depends_on:
- postgres-primary
networks:
- db_network
redis-master:
image: redis:7-alpine
command: redis-server --appendonly yes --requirepass ${REDIS_PASSWORD}
volumes:
- redis_master_data:/data
ports:
- "6379:6379"
networks:
- db_network
redis-slave:
image: redis:7-alpine
command: redis-server --slaveof redis-master 6379 --requirepass ${REDIS_PASSWORD}
volumes:
- redis_slave_data:/data
ports:
- "6380:6379"
depends_on:
- redis-master
networks:
- db_network
volumes:
postgres_primary_data:
postgres_replica_data:
redis_master_data:
redis_slave_data:
networks:
db_network:
driver: bridge
5. Monitoring and Alerting System
5.1 Prometheus Configuration
# docker-compose.monitoring.yml
version: "3.8"
services:
prometheus:
image: prom/prometheus:latest
command:
- "--config.file=/etc/prometheus/prometheus.yml"
- "--storage.tsdb.path=/prometheus"
- "--web.console.libraries=/etc/prometheus/console_libraries"
- "--web.console.templates=/etc/prometheus/consoles"
- "--storage.tsdb.retention.time=200h"
- "--web.enable-lifecycle"
ports:
- "9090:9090"
volumes:
- ./prometheus/prometheus.yml:/etc/prometheus/prometheus.yml
- prometheus_data:/prometheus
networks:
- monitoring_network
grafana:
image: grafana/grafana:latest
environment:
GF_SECURITY_ADMIN_PASSWORD: ${GRAFANA_PASSWORD}
ports:
- "3000:3000"
volumes:
- grafana_data:/var/lib/grafana
- ./grafana/provisioning:/etc/grafana/provisioning
networks:
- monitoring_network
alertmanager:
image: prom/alertmanager:latest
command:
- "--config.file=/etc/alertmanager/alertmanager.yml"
- "--storage.path=/alertmanager"
ports:
- "9093:9093"
volumes:
- ./alertmanager/alertmanager.yml:/etc/alertmanager/alertmanager.yml
- alertmanager_data:/alertmanager
networks:
- monitoring_network
volumes:
prometheus_data:
grafana_data:
alertmanager_data:
networks:
monitoring_network:
driver: bridge
5.2 Monitoring Configuration
# prometheus/prometheus.yml
global:
scrape_interval: 15s
evaluation_interval: 15s
rule_files:
- "alert_rules.yml"
alerting:
alertmanagers:
- static_configs:
- targets:
- alertmanager:9093
scrape_configs:
- job_name: "deepseek-api"
static_configs:
- targets: ["deepseek1:8001", "deepseek2:8001", "deepseek3:8001"]
metrics_path: /metrics
scrape_interval: 10s
- job_name: "stable-diffusion-api"
static_configs:
- targets: ["sd1:7860", "sd2:7860"]
metrics_path: /metrics
scrape_interval: 10s
- job_name: "node-exporter"
static_configs:
- targets: ["node1:9100", "node2:9100", "node3:9100"]
- job_name: "postgres"
static_configs:
- targets: ["postgres-primary:5432"]
metrics_path: /metrics
- job_name: "redis"
static_configs:
- targets: ["redis-master:6379"]
metrics_path: /metrics
5.3 Alert Rules
# prometheus/alert_rules.yml
groups:
- name: aigc_platform_alerts
rules:
- alert: HighCPUUsage
expr: 100 - (avg by(instance) (irate(node_cpu_seconds_total{mode="idle"}[5m])) * 100) > 80
for: 5m
labels:
severity: warning
annotations:
summary: "High CPU usage on {{ $labels.instance }}"
description: "CPU usage is above 80% for 5 minutes"
- alert: HighMemoryUsage
expr: (node_memory_MemTotal_bytes - node_memory_MemAvailable_bytes) / node_memory_MemTotal_bytes * 100 > 85
for: 5m
labels:
severity: warning
annotations:
summary: "High memory usage on {{ $labels.instance }}"
description: "Memory usage is above 85% for 5 minutes"
- alert: GPUOutOfMemory
expr: nvidia_gpu_memory_used_bytes / nvidia_gpu_memory_total_bytes * 100 > 90
for: 2m
labels:
severity: critical
annotations:
summary: "GPU out of memory on {{ $labels.instance }}"
description: "GPU memory usage is above 90%"
- alert: APIDown
expr: up{job=~"deepseek-api|stable-diffusion-api"} == 0
for: 1m
labels:
severity: critical
annotations:
summary: "API service down: {{ $labels.job }}"
description: "API service has been down for more than 1 minute"
- alert: HighResponseTime
expr: histogram_quantile(0.95, rate(http_request_duration_seconds_bucket[5m])) > 2
for: 5m
labels:
severity: warning
annotations:
summary: "High response time on {{ $labels.job }}"
description: "95th percentile response time is above 2 seconds"
6. AI Services Production Environment Deployment
6.1 DeepSeek API Service (Production Environment)
# deepseek_api.py - Production Environment API Service
import os
import time
import logging
import torch
import redis
import json
import jwt
from datetime import datetime, timedelta
from typing import Dict, Any, Optional
from fastapi import FastAPI, HTTPException, Depends, BackgroundTasks
from fastapi.security import HTTPBearer, HTTPAuthorizationCredentials
from fastapi.middleware.cors import CORSMiddleware
from pydantic import BaseModel
from transformers import AutoTokenizer, AutoModelForCausalLM
import prometheus_client
from prometheus_client import Counter, Histogram, Gauge
# Configure logging
logging.basicConfig(
level=logging.INFO,
format='%(asctime)s - %(name)s - %(levelname)s - %(message)s',
handlers=[
logging.FileHandler('/var/log/deepseek_api.log'),
logging.StreamHandler()
]
)
logger = logging.getLogger(__name__)
# Prometheus metrics
REQUEST_COUNT = Counter('deepseek_requests_total', 'Total requests', ['endpoint', 'status'])
REQUEST_DURATION = Histogram('deepseek_request_duration_seconds', 'Request duration')
MODEL_LOAD_TIME = Histogram('deepseek_model_load_seconds', 'Model load time')
GPU_MEMORY_USAGE = Gauge('deepseek_gpu_memory_bytes', 'GPU memory usage')
# Redis connection
redis_client = redis.Redis(
host=os.getenv('REDIS_HOST', 'localhost'),
port=int(os.getenv('REDIS_PORT', 6379)),
password=os.getenv('REDIS_PASSWORD'),
decode_responses=True
)
# JWT configuration
JWT_SECRET = os.getenv('JWT_SECRET', 'your-secret-key')
JWT_ALGORITHM = "HS256"
# Model configuration
MODEL_NAME = os.getenv('MODEL_NAME', 'deepseek-ai/deepseek-llm-7b-instruct')
MAX_LENGTH = int(os.getenv('MAX_LENGTH', 2048))
TEMPERATURE = float(os.getenv('TEMPERATURE', 0.7))
app = FastAPI(title="DeepSeek API", version="1.0.0")
# CORS configuration
app.add_middleware(
CORSMiddleware,
allow_origins=os.getenv('ALLOWED_ORIGINS', '*').split(','),
allow_credentials=True,
allow_methods=["*"],
allow_headers=["*"],
)
# Security authentication
security = HTTPBearer()
class GenerateRequest(BaseModel):
prompt: str
max_length: Optional[int] = MAX_LENGTH
temperature: Optional[float] = TEMPERATURE
top_p: Optional[float] = 0.9
top_k: Optional[int] = 50
class GenerateResponse(BaseModel):
response: str
tokens_used: int
processing_time: float
model_name: str
# Global model variables
model = None
tokenizer = None
def load_model():
"""Load model"""
global model, tokenizer
start_time = time.time()
try:
logger.info(f"Loading model: {MODEL_NAME}")
tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME)
model = AutoModelForCausalLM.from_pretrained(
MODEL_NAME,
torch_dtype=torch.float16,
device_map="auto",
trust_remote_code=True,
load_in_8bit=True # 8-bit quantization to save memory
)
load_time = time.time() - start_time
MODEL_LOAD_TIME.observe(load_time)
logger.info(f"Model loaded successfully in {load_time:.2f} seconds")
except Exception as e:
logger.error(f"Failed to load model: {e}")
raise
def verify_token(credentials: HTTPAuthorizationCredentials = Depends(security)) -> Dict[str, Any]:
"""Verify JWT token"""
try:
payload = jwt.decode(credentials.credentials, JWT_SECRET, algorithms=[JWT_ALGORITHM])
return payload
except jwt.ExpiredSignatureError:
raise HTTPException(status_code=401, detail="Token expired")
except jwt.JWTError:
raise HTTPException(status_code=401, detail="Invalid token")
def get_cached_response(prompt: str, params: Dict[str, Any]) -> Optional[str]:
"""Get response from cache"""
cache_key = f"deepseek:{hash(prompt + str(params))}"
return redis_client.get(cache_key)
def set_cached_response(prompt: str, params: Dict[str, Any], response: str, ttl: int = 3600):
"""Set cached response"""
cache_key = f"deepseek:{hash(prompt + str(params))}"
redis_client.setex(cache_key, ttl, response)
@app.on_event("startup")
async def startup_event():
"""Load model on startup"""
load_model()
@app.get("/health")
async def health_check():
"""Health check"""
return {
"status": "healthy",
"model_loaded": model is not None,
"gpu_available": torch.cuda.is_available(),
"timestamp": datetime.utcnow().isoformat()
}
@app.get("/metrics")
async def metrics():
"""Prometheus metrics"""
return prometheus_client.generate_latest()
@app.post("/generate", response_model=GenerateResponse)
async def generate_text(
request: GenerateRequest,
background_tasks: BackgroundTasks,
user_info: Dict[str, Any] = Depends(verify_token)
):
"""Generate text"""
start_time = time.time()
try:
# Check user quota
user_id = user_info.get('user_id')
quota_key = f"quota:{user_id}:daily"
current_usage = int(redis_client.get(quota_key) or 0)
daily_limit = int(os.getenv('DAILY_LIMIT', 1000))
if current_usage >= daily_limit:
raise HTTPException(status_code=429, detail="Daily quota exceeded")
# Check cache
params = {
'max_length': request.max_length,
'temperature': request.temperature,
'top_p': request.top_p,
'top_k': request.top_k
}
cached_response = get_cached_response(request.prompt, params)
if cached_response:
REQUEST_COUNT.labels(endpoint='/generate', status='cache_hit').inc()
return GenerateResponse(
response=cached_response,
tokens_used=0,
processing_time=0.0,
model_name=MODEL_NAME
)
# Generate response
inputs = tokenizer(request.prompt, return_tensors="pt").to(model.device)
with torch.no_grad():
outputs = model.generate(
**inputs,
max_length=request.max_length,
temperature=request.temperature,
top_p=request.top_p,
top_k=request.top_k,
do_sample=True,
pad_token_id=tokenizer.eos_token_id
)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
processing_time = time.time() - start_time
# Update metrics
REQUEST_DURATION.observe(processing_time)
REQUEST_COUNT.labels(endpoint='/generate', status='success').inc()
# Update GPU memory usage
if torch.cuda.is_available():
GPU_MEMORY_USAGE.set(torch.cuda.memory_allocated())
# Cache response
background_tasks.add_task(set_cached_response, request.prompt, params, response)
# Update user quota
redis_client.incr(quota_key)
redis_client.expire(quota_key, 86400) # 24 hours expiration
return GenerateResponse(
response=response,
tokens_used=len(outputs[0]),
processing_time=processing_time,
model_name=MODEL_NAME
)
except Exception as e:
REQUEST_COUNT.labels(endpoint='/generate', status='error').inc()
logger.error(f"Generation error: {e}")
raise HTTPException(status_code=500, detail=str(e))
@app.get("/quota/{user_id}")
async def get_quota(user_id: str, user_info: Dict[str, Any] = Depends(verify_token)):
"""Get user quota information"""
quota_key = f"quota:{user_id}:daily"
current_usage = int(redis_client.get(quota_key) or 0)
daily_limit = int(os.getenv('DAILY_LIMIT', 1000))
return {
"user_id": user_id,
"current_usage": current_usage,
"daily_limit": daily_limit,
"remaining": max(0, daily_limit - current_usage)
}
if __name__ == "__main__":
import uvicorn
uvicorn.run(app, host="0.0.0.0", port=8001)
6.2 DeepSeek Docker Deployment
# Dockerfile.deepseek
FROM nvidia/cuda:12.0-devel-ubuntu22.04
# Set environment variables
ENV DEBIAN_FRONTEND=noninteractive
ENV PYTHONUNBUFFERED=1
ENV PYTHONPATH=/app
# Install system dependencies
RUN apt-get update && apt-get install -y \
python3 \
python3-pip \
python3-dev \
git \
wget \
curl \
&& rm -rf /var/lib/apt/lists/*
# Set working directory
WORKDIR /app
# Copy dependency files
COPY requirements.txt .
# Install Python dependencies
RUN pip3 install --no-cache-dir -r requirements.txt
# Copy application code
COPY deepseek_api.py .
COPY models/ ./models/
# Create log directory
RUN mkdir -p /var/log
# Expose port
EXPOSE 8001
HEALTHCHECK --interval=30s --timeout=10s --start-period=60s --retries=3 \
CMD curl -f http://localhost:8001/health || exit 1
CMD ["python3", "deepseek_api.py"]
6.3 Stable Diffusion API Service
# stable_diffusion_api.py
import os
import time
import logging
import base64
import io
from typing import Optional
from fastapi import FastAPI, HTTPException, Depends, BackgroundTasks, File, UploadFile
from fastapi.security import HTTPBearer, HTTPAuthorizationCredentials
from fastapi.middleware.cors import CORSMiddleware
from pydantic import BaseModel
import torch
from diffusers import StableDiffusionPipeline
import prometheus_client
from prometheus_client import Counter, Histogram, Gauge
import redis
import jwt
logging.basicConfig(
level=logging.INFO,
format='%(asctime)s - %(name)s - %(levelname)s - %(message)s',
handlers=[
logging.FileHandler('/var/log/sd_api.log'),
logging.StreamHandler()
]
)
logger = logging.getLogger(__name__)
SD_REQUEST_COUNT = Counter('sd_requests_total', 'Total SD requests', ['endpoint', 'status'])
SD_REQUEST_DURATION = Histogram('sd_request_duration_seconds', 'SD request duration')
SD_GPU_MEMORY_USAGE = Gauge('sd_gpu_memory_bytes', 'SD GPU memory usage')
redis_client = redis.Redis(
host=os.getenv('REDIS_HOST', 'localhost'),
port=int(os.getenv('REDIS_PORT', 6379)),
password=os.getenv('REDIS_PASSWORD'),
decode_responses=True
)
# JWT configuration
JWT_SECRET = os.getenv('JWT_SECRET', 'your-secret-key')
JWT_ALGORITHM = "HS256"
app = FastAPI(title="Stable Diffusion API", version="1.0.0")
app.add_middleware(
CORSMiddleware,
allow_origins=os.getenv('ALLOWED_ORIGINS', '*').split(','),
allow_credentials=True,
allow_methods=["*"],
allow_headers=["*"],
)
security = HTTPBearer()
class GenerateImageRequest(BaseModel):
prompt: str
negative_prompt: Optional[str] = ""
width: Optional[int] = 512
height: Optional[int] = 512
num_inference_steps: Optional[int] = 50
guidance_scale: Optional[float] = 7.5
seed: Optional[int] = None
class GenerateImageResponse(BaseModel):
image_base64: str
seed: int
processing_time: float
model_name: str
pipe = None
def load_model():
"""Load Stable Diffusion model"""
global pipe
try:
logger.info("Loading Stable Diffusion model...")
model_id = os.getenv('SD_MODEL_ID', 'runwayml/stable-diffusion-v1-5')
pipe = StableDiffusionPipeline.from_pretrained(
model_id,
torch_dtype=torch.float16,
use_safetensors=True
)
if torch.cuda.is_available():
pipe = pipe.to("cuda")
pipe.enable_attention_slicing()
pipe.enable_vae_slicing()
logger.info("Stable Diffusion model loaded successfully")
except Exception as e:
logger.error(f"Failed to load SD model: {e}")
raise
def verify_token(credentials: HTTPAuthorizationCredentials = Depends(security)):
"""Verify JWT token"""
try:
payload = jwt.decode(credentials.credentials, JWT_SECRET, algorithms=[JWT_ALGORITHM])
return payload
except jwt.ExpiredSignatureError:
raise HTTPException(status_code=401, detail="Token expired")
except jwt.JWTError:
raise HTTPException(status_code=401, detail="Invalid token")
@app.on_event("startup")
async def startup_event():
"""Load model on startup"""
load_model()
@app.get("/health")
async def health_check():
"""Health check"""
return {
"status": "healthy",
"model_loaded": pipe is not None,
"gpu_available": torch.cuda.is_available(),
"timestamp": time.time()
}
@app.get("/metrics")
async def metrics():
"""Prometheus metrics"""
return prometheus_client.generate_latest()
@app.post("/generate", response_model=GenerateImageResponse)
async def generate_image(
request: GenerateImageRequest,
user_info: dict = Depends(verify_token)
):
"""Generate image"""
start_time = time.time()
try:
# Check user quota
user_id = user_info.get('user_id')
quota_key = f"sd_quota:{user_id}:daily"
current_usage = int(redis_client.get(quota_key) or 0)
daily_limit = int(os.getenv('SD_DAILY_LIMIT', 100))
if current_usage >= daily_limit:
raise HTTPException(status_code=429, detail="Daily SD quota exceeded")
# Set random seed
if request.seed is None:
request.seed = torch.randint(0, 2**32, (1,)).item()
# Generate image
generator = torch.Generator("cuda" if torch.cuda.is_available() else "cpu").manual_seed(request.seed)
image = pipe(
prompt=request.prompt,
negative_prompt=request.negative_prompt,
width=request.width,
height=request.height,
num_inference_steps=request.num_inference_steps,
guidance_scale=request.guidance_scale,
generator=generator
).images[0]
# Convert to base64
buffer = io.BytesIO()
image.save(buffer, format="PNG")
image_base64 = base64.b64encode(buffer.getvalue()).decode()
processing_time = time.time() - start_time
# Update metrics
SD_REQUEST_DURATION.observe(processing_time)
SD_REQUEST_COUNT.labels(endpoint='/generate', status='success').inc()
# Update GPU memory usage
if torch.cuda.is_available():
SD_GPU_MEMORY_USAGE.set(torch.cuda.memory_allocated())
# Update user quota
redis_client.incr(quota_key)
redis_client.expire(quota_key, 86400)
return GenerateImageResponse(
image_base64=image_base64,
seed=request.seed,
processing_time=processing_time,
model_name="stable-diffusion-v1-5"
)
except Exception as e:
SD_REQUEST_COUNT.labels(endpoint='/generate', status='error').inc()
logger.error(f"Image generation error: {e}")
raise HTTPException(status_code=500, detail=str(e))
if __name__ == "__main__":
import uvicorn
uvicorn.run(app, host="0.0.0.0", port=7860)
3. Install DeepSeek
Clone the DeepSeek repository and install dependencies:
# Clone DeepSeek repository
git clone https://github.com/deepseek-ai/DeepSeek.git
cd DeepSeek
# Install dependencies
pip3 install -r requirements.txt
# Install additional dependencies for inference
pip3 install transformers accelerate sentencepiece
4. Download Model Weights
Download the DeepSeek model weights:
# Create models directory
mkdir -p models
cd models
# Download DeepSeek-7B-Instruct (adjust model size as needed)
wget https://huggingface.co/deepseek-ai/deepseek-llm-7b-instruct/resolve/main/pytorch_model.bin
wget https://huggingface.co/deepseek-ai/deepseek-llm-7b-instruct/resolve/main/config.json
wget https://huggingface.co/deepseek-ai/deepseek-llm-7b-instruct/resolve/main/tokenizer.json
wget https://huggingface.co/deepseek-ai/deepseek-llm-7b-instruct/resolve/main/tokenizer_config.json
5. Create Inference Script
Create a simple inference script for testing:
# inference.py
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
def load_model():
model_name = "deepseek-ai/deepseek-llm-7b-instruct"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
model_name,
torch_dtype=torch.float16,
device_map="auto",
trust_remote_code=True
)
return model, tokenizer
def generate_response(prompt, model, tokenizer, max_length=512):
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
with torch.no_grad():
outputs = model.generate(
**inputs,
max_length=max_length,
temperature=0.7,
do_sample=True,
pad_token_id=tokenizer.eos_token_id
)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
return response
if __name__ == "__main__":
print("Loading DeepSeek model...")
model, tokenizer = load_model()
while True:
user_input = input("\nEnter your prompt (or 'quit' to exit): ")
if user_input.lower() == 'quit':
break
response = generate_response(user_input, model, tokenizer)
print(f"\nDeepSeek Response: {response}")
6. Run DeepSeek
Execute the inference script:
python3 inference.py
7. Performance Optimization
For better performance, consider these optimizations:
# Install additional optimization libraries
pip3 install bitsandbytes accelerate
# For 4-bit quantization (reduces memory usage)
pip3 install transformers[torch] accelerate bitsandbytes
8. Docker Deployment (Alternative)
Create a Dockerfile for containerized deployment:
# Dockerfile
FROM nvidia/cuda:12.0-devel-ubuntu20.04
# Set environment variables
ENV DEBIAN_FRONTEND=noninteractive
ENV PYTHONUNBUFFERED=1
# Install system dependencies
RUN apt-get update && apt-get install -y \
python3 \
python3-pip \
git \
wget \
curl \
&& rm -rf /var/lib/apt/lists/*
# Set working directory
WORKDIR /app
# Copy requirements and install Python dependencies
COPY requirements.txt .
RUN pip3 install -r requirements.txt
# Copy application code
COPY . .
# Expose port for API (if using web interface)
EXPOSE 8000
# Run the application
CMD ["python3", "inference.py"]
9. Monitoring and Logging
Add monitoring capabilities:
# monitoring.py
import psutil
import torch
import logging
from datetime import datetime
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)
def monitor_resources():
cpu_percent = psutil.cpu_percent()
memory = psutil.virtual_memory()
gpu_memory = torch.cuda.memory_allocated() / 1024**3 if torch.cuda.is_available() else 0
logger.info(f"CPU Usage: {cpu_percent}%")
logger.info(f"Memory Usage: {memory.percent}%")
logger.info(f"GPU Memory: {gpu_memory:.2f} GB")
return {
'cpu_percent': cpu_percent,
'memory_percent': memory.percent,
'gpu_memory_gb': gpu_memory
}
10. Troubleshooting
Common issues and solutions:
Out of Memory Error:
# Reduce batch size or use model quantization
pip3 install bitsandbytes
# Use 4-bit quantization in your model loading
CUDA Version Mismatch:
# Check CUDA version
nvidia-smi
# Install matching PyTorch version
pip3 install torch==2.0.1+cu118 --index-url https://download.pytorch.org/whl/cu118
Model Download Issues:
# Use git-lfs for large files
sudo apt install git-lfs
git lfs install
git clone https://huggingface.co/deepseek-ai/deepseek-llm-7b-instruct
11. Production Deployment
For production use, consider:
- Load Balancing: Use multiple model instances
- Caching: Implement response caching
- Rate Limiting: Add request rate limiting
- Health Checks: Monitor model health and performance
- Backup: Regular model weight backups
12. Integration with CI/CD
Add to your CI/CD pipeline:
# .github/workflows/deepseek-deploy.yml
name: Deploy DeepSeek
on:
push:
branches: [main]
jobs:
deploy:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- name: Set up Python
uses: actions/setup-python@v4
with:
python-version: "3.9"
- name: Install dependencies
run: |
pip install -r requirements.txt
- name: Test model loading
run: |
python -c "from transformers import AutoModelForCausalLM; AutoModelForCausalLM.from_pretrained('deepseek-ai/deepseek-llm-7b-instruct')"
Deploy Stable Diffusion Locally
Prerequisites
- Ubuntu 20.04+
- NVIDIA GPU (8GB+ VRAM recommended)
- Python 3.9+
- CUDA 11.8+
1. Install Stable Diffusion
# Clone Stable Diffusion WebUI
git clone https://github.com/AUTOMATIC1111/stable-diffusion-webui.git
cd stable-diffusion-webui
# Install dependencies
pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
pip3 install -r requirements.txt
2. Download Models
# Create models directory
mkdir -p models/Stable-diffusion
# Download base model (example: Stable Diffusion 1.5)
wget https://huggingface.co/runwayml/stable-diffusion-v1-5/resolve/main/v1-5-pruned.safetensors -O models/Stable-diffusion/v1-5-pruned.safetensors
3. Launch WebUI
# Launch with GPU acceleration
python3 launch.py --listen --port 7860 --enable-insecure-extension-access
4. API Integration
For programmatic access, enable the API:
python3 launch.py --api --listen --port 7860
5. Docker Deployment
# Dockerfile for Stable Diffusion
FROM nvidia/cuda:11.8-devel-ubuntu20.04
ENV DEBIAN_FRONTEND=noninteractive
ENV PYTHONUNBUFFERED=1
RUN apt-get update && apt-get install -y \
python3 \
python3-pip \
git \
wget \
&& rm -rf /var/lib/apt/lists/*
WORKDIR /app
RUN git clone https://github.com/AUTOMATIC1111/stable-diffusion-webui.git .
RUN pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
RUN pip3 install -r requirements.txt
EXPOSE 7860
CMD ["python3", "launch.py", "--listen", "--port", "7860", "--api"]
Conclusion
This guide provides a comprehensive approach to deploying both DeepSeek and Stable Diffusion locally on Ubuntu. The setup includes:
- Environment preparation with proper CUDA support
- Model installation and optimization
- Performance monitoring and troubleshooting
- Production deployment considerations
- CI/CD integration for automated deployments
For production environments, consider implementing:
- Load balancing across multiple GPU instances
- Automated model updates and versioning
- Comprehensive monitoring and alerting
- Security hardening and access controls
- Backup and disaster recovery procedures
Remember to monitor resource usage and adjust configurations based on your specific hardware and requirements.bash
Apply PATH environment settings
source ~/.zshrc
Update your system and install essential packages:
```bash
sudo apt update && sudo apt upgrade -y
sudo apt install -y build-essential python3-dev python3-pip git wget curl
2. Install CUDA and PyTorch (GPU Users)
For GPU acceleration, install CUDA Toolkit and PyTorch:
# Install CUDA Toolkit (adjust version as needed)
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/cuda-keyring_1.0-1_all.deb
sudo dpkg -i cuda-keyring_1.0-1_all.deb
sudo apt update
sudo apt install -y cuda-toolkit-12-0
# Install PyTorch with CUDA support
pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121
3. Install DeepSeek
Clone the DeepSeek repository and install dependencies:
# Clone DeepSeek repository
git clone https://github.com/deepseek-ai/DeepSeek.git
cd DeepSeek
# Install dependencies
pip3 install -r requirements.txt
# Install additional dependencies for inference
pip3 install transformers accelerate sentencepiece
4. Download Model Weights
Download the DeepSeek model weights:
# Create models directory
mkdir -p models
cd models
# Download DeepSeek-7B-Instruct (adjust model size as needed)
wget https://huggingface.co/deepseek-ai/deepseek-llm-7b-instruct/resolve/main/pytorch_model.bin
wget https://huggingface.co/deepseek-ai/deepseek-llm-7b-instruct/resolve/main/config.json
wget https://huggingface.co/deepseek-ai/deepseek-llm-7b-instruct/resolve/main/tokenizer.json
wget https://huggingface.co/deepseek-ai/deepseek-llm-7b-instruct/resolve/main/tokenizer_config.json
5. Create Inference Script
Create a simple inference script for testing:
# inference.py
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
def load_model():
model_name = "deepseek-ai/deepseek-llm-7b-instruct"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
model_name,
torch_dtype=torch.float16,
device_map="auto",
trust_remote_code=True
)
return model, tokenizer
def generate_response(prompt, model, tokenizer, max_length=512):
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
with torch.no_grad():
outputs = model.generate(
**inputs,
max_length=max_length,
temperature=0.7,
do_sample=True,
pad_token_id=tokenizer.eos_token_id
)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
return response
if __name__ == "__main__":
print("Loading DeepSeek model...")
model, tokenizer = load_model()
while True:
user_input = input("\nEnter your prompt (or 'quit' to exit): ")
if user_input.lower() == 'quit':
break
response = generate_response(user_input, model, tokenizer)
print(f"\nDeepSeek Response: {response}")
10. Production Environment Best Practices
10.1 Performance Optimization
# System-level optimization
# Adjust kernel parameters
sudo tee /etc/sysctl.d/99-performance.conf > /dev/null <<EOF
# Increase file descriptor limits
fs.file-max = 65536
# Network optimization
net.core.rmem_max = 16777216
net.core.wmem_max = 16777216
net.ipv4.tcp_rmem = 4096 87380 16777216
net.ipv4.tcp_wmem = 4096 65536 16777216
# GPU optimization
nvidia.NVreg_UsePageAttributeTable = 1
nvidia.NVreg_EnablePCIeGen3 = 1
EOF
sudo sysctl -p /etc/sysctl.d/99-performance.conf
# Adjust process limits
sudo tee /etc/security/limits.d/99-aigc.conf > /dev/null <<EOF
* soft nofile 65536
* hard nofile 65536
* soft nproc 32768
* hard nproc 32768
EOF
10.2 Resource Management
# resource_manager.py - Resource Manager
import psutil
import torch
import logging
from typing import Dict, Any
import time
class ResourceManager:
def __init__(self):
self.logger = logging.getLogger(__name__)
self.gpu_memory_threshold = 0.9 # 90% GPU memory usage threshold
self.cpu_threshold = 0.8 # 80% CPU usage threshold
self.memory_threshold = 0.85 # 85% memory usage threshold
def get_system_resources(self) -> Dict[str, Any]:
"""Get system resource usage"""
cpu_percent = psutil.cpu_percent(interval=1)
memory = psutil.virtual_memory()
gpu_info = {}
if torch.cuda.is_available():
for i in range(torch.cuda.device_count()):
gpu_memory = torch.cuda.get_device_properties(i).total_memory
gpu_memory_allocated = torch.cuda.memory_allocated(i)
gpu_memory_usage = gpu_memory_allocated / gpu_memory
gpu_info[f"gpu_{i}"] = {
"memory_total": gpu_memory,
"memory_allocated": gpu_memory_allocated,
"memory_usage": gpu_memory_usage,
"temperature": self._get_gpu_temperature(i)
}
return {
"cpu_percent": cpu_percent,
"memory_percent": memory.percent,
"memory_available": memory.available,
"gpu_info": gpu_info,
"timestamp": time.time()
}
def check_resource_health(self) -> Dict[str, bool]:
"""Check resource health status"""
resources = self.get_system_resources()
health_status = {
"cpu_healthy": resources["cpu_percent"] < (self.cpu_threshold * 100),
"memory_healthy": resources["memory_percent"] < (self.memory_threshold * 100),
"gpu_healthy": True
}
# Check GPU health status
for gpu_id, gpu_data in resources["gpu_info"].items():
if gpu_data["memory_usage"] > self.gpu_memory_threshold:
health_status["gpu_healthy"] = False
self.logger.warning(f"GPU {gpu_id} memory usage high: {gpu_data['memory_usage']:.2%}")
return health_status
def _get_gpu_temperature(self, gpu_id: int) -> float:
"""Get GPU temperature"""
try:
import subprocess
result = subprocess.run(
["nvidia-smi", "--query-gpu=temperature.gpu", "--format=csv,noheader,nounits", "-i", str(gpu_id)],
capture_output=True, text=True
)
return float(result.stdout.strip())
except:
return 0.0
def cleanup_gpu_memory(self):
"""Clean up GPU memory"""
if torch.cuda.is_available():
torch.cuda.empty_cache()
self.logger.info("GPU memory cleaned up")
# Usage example
resource_manager = ResourceManager()
# Periodically check resources
def monitor_resources():
while True:
health = resource_manager.check_resource_health()
if not all(health.values()):
resource_manager.cleanup_gpu_memory()
time.sleep(60) # Check every minute
10.3 Error Handling and Recovery
# error_handler.py - Error Handling and Recovery
import logging
import time
import traceback
from functools import wraps
from typing import Callable, Any
import redis
import requests
class ErrorHandler:
def __init__(self, redis_client: redis.Redis):
self.logger = logging.getLogger(__name__)
self.redis_client = redis_client
self.max_retries = 3
self.retry_delay = 1
def retry_on_failure(self, max_retries: int = None, delay: float = None):
"""Retry decorator"""
def decorator(func: Callable) -> Callable:
@wraps(func)
def wrapper(*args, **kwargs) -> Any:
retries = max_retries or self.max_retries
retry_delay = delay or self.retry_delay
for attempt in range(retries + 1):
try:
return func(*args, **kwargs)
except Exception as e:
if attempt == retries:
self.logger.error(f"Function {func.__name__} failed after {retries} retries: {e}")
raise
self.logger.warning(f"Attempt {attempt + 1} failed for {func.__name__}: {e}")
time.sleep(retry_delay * (2 ** attempt)) # Exponential backoff
return wrapper
return decorator
def handle_model_loading_error(self, model_name: str) -> bool:
"""Handle model loading error"""
try:
# Record error
error_key = f"model_error:{model_name}:{int(time.time())}"
self.redis_client.setex(error_key, 3600, "model_loading_failed")
# Try to reload model
self.logger.info(f"Attempting to reload model: {model_name}")
# Add model reloading logic here
return True
except Exception as e:
self.logger.error(f"Failed to handle model loading error: {e}")
return False
def handle_api_error(self, endpoint: str, error: Exception) -> Dict[str, Any]:
"""Handle API error"""
error_info = {
"endpoint": endpoint,
"error_type": type(error).__name__,
"error_message": str(error),
"timestamp": time.time(),
"traceback": traceback.format_exc()
}
# Log error
self.logger.error(f"API Error: {error_info}")
# Store error info in Redis
error_key = f"api_error:{endpoint}:{int(time.time())}"
self.redis_client.setex(error_key, 3600, str(error_info))
return error_info
# Usage example
@ErrorHandler(redis_client).retry_on_failure(max_retries=3, delay=1)
def generate_text_with_retry(prompt: str, model, tokenizer):
"""Text generation with retry"""
return model.generate(prompt, tokenizer)
11. Deployment Checklist
11.1 Pre-deployment Check
-
Hardware Check
- GPU driver and CUDA version compatible
- Memory capacity meets requirements
- Storage space sufficient
- Network bandwidth meets requirements
-
Software Check
- Operating system version correct
- Docker and Docker Compose installed
- All dependencies installed
- Firewall configuration correct
-
Security Check
- SSL certificate configured
- Firewall rules set
- User permissions configured
- Keys and passwords generated
11.2 Post-deployment Verification
-
Service Health Check
- All containers running normally
- API endpoints accessible
- Database connection normal
- Monitoring system working properly
-
Performance Test
- API response time test
- Concurrent user test
- Memory usage monitoring
- GPU utilization check
-
Security Test
- Identity authentication test
- Permission control test
- Data encryption verification
- Network security scan
11.3 Operations Monitoring
-
Daily Monitoring
- System resource usage
- API response time
- Error rate statistics
- User activity logs
-
Regular Maintenance
- Log file rotation
- Database backup
- System updates
- Performance optimization
12. Troubleshooting Guide
12.1 Common Issues
Problem 1: GPU Memory Insufficient
# Solution
# 1. Check GPU memory usage
nvidia-smi
# 2. Clean GPU memory
python3 -c "import torch; torch.cuda.empty_cache()"
# 3. Restart AI services
docker restart deepseek-api-1 sd-api-1
Problem 2: API Response Slow
# Solution
# 1. Check system load
htop
# 2. Check network latency
ping your-api-endpoint
# 3. Check Redis cache
redis-cli info memory
# 4. Restart load balancer
sudo systemctl restart haproxy
Problem 3: Database Connection Failed
# Solution
# 1. Check database status
docker ps | grep postgres
# 2. Check database logs
docker logs postgres-primary
# 3. Restart database service
docker-compose -f docker-compose.db.yml restart
12.2 Emergency Recovery Process
#!/bin/bash
# emergency_recovery.sh - Emergency recovery script
echo "Starting emergency recovery process..."
# 1. Stop all services
docker-compose down
# 2. Clean up resources
docker system prune -f
python3 -c "import torch; torch.cuda.empty_cache()"
# 3. Restart infrastructure
docker-compose -f docker-compose.db.yml up -d
sleep 30
# 4. Restart AI services
docker-compose -f docker-compose.ai.yml up -d
# 5. Verify service status
./health_check.sh
echo "Emergency recovery completed"
13. Summary
This guide provides a complete production environment deployment solution for enterprise AIGC platforms, including:
13.1 Core Features
- High Availability: Multi-instance deployment, load balancing, automatic fault recovery
- Security: Identity authentication, access control, data encryption, network security
- Scalability: Horizontal scaling, elastic resource scaling, microservice architecture
- Monitoring & Alerting: Comprehensive system monitoring, performance metrics, anomaly alerts
- Operations Management: Automated deployment, log management, backup and recovery
13.2 Technology Stack
- Containerization: Docker + Docker Compose
- Load Balancing: HAProxy
- API Gateway: Kong
- Identity Authentication: Keycloak
- Database: PostgreSQL + Redis
- Monitoring: Prometheus + Grafana + AlertManager
- Logging: ELK Stack
- AI Framework: PyTorch + Transformers + Diffusers
13.3 Best Practices
- Security First: Multi-layer security protection, regular security audits
- Performance Optimization: Resource monitoring, auto-scaling, caching strategies
- Fault Recovery: Automated recovery processes, data backup, disaster recovery
- Operations Automation: CI/CD pipeline, monitoring alerts, log analysis
13.4 Future Optimization Suggestions
-
Performance Optimization
- Implement model quantization optimization
- Add distributed inference support
- Optimize caching strategies
-
Feature Extension
- Support more AI models
- Add batch processing capabilities
- Implement model version management
-
Operations Improvement
- Implement blue-green deployment
- Add A/B testing support
- Improve monitoring dashboards
-
Security Enhancement
- Implement zero-trust architecture
- Add threat detection
- Improve audit logging
By following this guide, you can build a stable, secure, and efficient enterprise AIGC platform that provides reliable AI services for internal employees.
8. Docker Deployment (Alternative)
Create a Dockerfile for containerized deployment:
# Dockerfile
FROM nvidia/cuda:12.0-devel-ubuntu20.04
# Set environment variables
ENV DEBIAN_FRONTEND=noninteractive
ENV PYTHONUNBUFFERED=1
# Install system dependencies
RUN apt-get update && apt-get install -y \
python3 \
python3-pip \
git \
wget \
curl \
&& rm -rf /var/lib/apt/lists/*
# Set working directory
WORKDIR /app
# Copy requirements and install Python dependencies
COPY requirements.txt .
RUN pip3 install -r requirements.txt
# Copy application code
COPY . .
# Expose port for API (if using web interface)
EXPOSE 8000
# Run the application
CMD ["python3", "inference.py"]
9. Monitoring and Logging
Add monitoring capabilities:
# monitoring.py
import psutil
import torch
import logging
from datetime import datetime
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)
def monitor_resources():
cpu_percent = psutil.cpu_percent()
memory = psutil.virtual_memory()
gpu_memory = torch.cuda.memory_allocated() / 1024**3 if torch.cuda.is_available() else 0
logger.info(f"CPU Usage: {cpu_percent}%")
logger.info(f"Memory Usage: {memory.percent}%")
logger.info(f"GPU Memory: {gpu_memory:.2f} GB")
return {
'cpu_percent': cpu_percent,
'memory_percent': memory.percent,
'gpu_memory_gb': gpu_memory
}
10. Troubleshooting
Common issues and solutions:
Out of Memory Error:
# Reduce batch size or use model quantization
pip3 install bitsandbytes
# Use 4-bit quantization in your model loading
CUDA Version Mismatch:
# Check CUDA version
nvidia-smi
# Install matching PyTorch version
pip3 install torch==2.0.1+cu118 --index-url https://download.pytorch.org/whl/cu118
Model Download Issues:
# Use git-lfs for large files
sudo apt install git-lfs
git lfs install
git clone https://huggingface.co/deepseek-ai/deepseek-llm-7b-instruct
11. Production Deployment
For production use, consider:
- Load Balancing: Use multiple model instances
- Caching: Implement response caching
- Rate Limiting: Add request rate limiting
- Health Checks: Monitor model health and performance
- Backup: Regular model weight backups
12. Integration with CI/CD
Add to your CI/CD pipeline:
# .github/workflows/deepseek-deploy.yml
name: Deploy DeepSeek
on:
push:
branches: [main]
jobs:
deploy:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- name: Set up Python
uses: actions/setup-python@v4
with:
python-version: "3.9"
- name: Install dependencies
run: |
pip install -r requirements.txt
- name: Test model loading
run: |
python -c "from transformers import AutoModelForCausalLM; AutoModelForCausalLM.from_pretrained('deepseek-ai/deepseek-llm-7b-instruct')"
Deploy Stable Diffusion Locally
Prerequisites
- Ubuntu 20.04+
- NVIDIA GPU (8GB+ VRAM recommended)
- Python 3.9+
- CUDA 11.8+
1. Install Stable Diffusion
# Clone Stable Diffusion WebUI
git clone https://github.com/AUTOMATIC1111/stable-diffusion-webui.git
cd stable-diffusion-webui
# Install dependencies
pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
pip3 install -r requirements.txt
2. Download Models
# Create models directory
mkdir -p models/Stable-diffusion
# Download base model (example: Stable Diffusion 1.5)
wget https://huggingface.co/runwayml/stable-diffusion-v1-5/resolve/main/v1-5-pruned.safetensors -O models/Stable-diffusion/v1-5-pruned.safetensors
3. Launch WebUI
# Launch with GPU acceleration
python3 launch.py --listen --port 7860 --enable-insecure-extension-access
4. API Integration
For programmatic access, enable the API:
python3 launch.py --api --listen --port 7860
5. Docker Deployment
# Dockerfile for Stable Diffusion
FROM nvidia/cuda:11.8-devel-ubuntu20.04
ENV DEBIAN_FRONTEND=noninteractive
ENV PYTHONUNBUFFERED=1
RUN apt-get update && apt-get install -y \
python3 \
python3-pip \
git \
wget \
&& rm -rf /var/lib/apt/lists/*
WORKDIR /app
RUN git clone https://github.com/AUTOMATIC1111/stable-diffusion-webui.git .
RUN pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
RUN pip3 install -r requirements.txt
EXPOSE 7860
CMD ["python3", "launch.py", "--listen", "--port", "7860", "--api"]
Conclusion
This guide provides a comprehensive approach to deploying both DeepSeek and Stable Diffusion locally on Ubuntu. The setup includes:
- Environment preparation with proper CUDA support
- Model installation and optimization
- Performance monitoring and troubleshooting
- Production deployment considerations
- CI/CD integration for automated deployments
For production environments, consider implementing:
- Load balancing across multiple GPU instances
- Automated model updates and versioning
- Comprehensive monitoring and alerting
- Security hardening and access controls
- Backup and disaster recovery procedures
Remember to monitor resource usage and adjust configurations based on your specific hardware and requirements.
Stay Updated
Get the latest DevOps insights and best practices delivered to your inbox
No spam, unsubscribe at any time