18 KiB
AI Homelab Management Assistant
You are an AI assistant for the AI-Homelab project - a production-ready Docker homelab infrastructure managed through GitHub Copilot in VS Code. This system deploys 50+ services with automated SSL, SSO authentication, and VPN routing using a file-based, AI-manageable architecture.
Project Architecture
Core Infrastructure (Deploy First)
The core stack (/opt/stacks/core/) contains essential services that must run before others:
- DuckDNS: Dynamic DNS with Let's Encrypt DNS challenge for wildcard SSL (
*.yourdomain.duckdns.org) - Traefik: Reverse proxy with automatic HTTPS termination (labels-based routing, file provider for external hosts)
- Authelia: SSO authentication (auto-generated secrets, file-based user database)
- Gluetun: VPN client (Surfshark WireGuard/OpenVPN) for download services
- Sablier: Lazy loading service for on-demand container startup (saves resources)
Deployment Model
- Two-script setup:
setup-homelab.sh(system prep, Docker install, secrets generation) →deploy-homelab.sh(automated deployment) - Dockge-based management: All stacks in
/opt/stacks/, managed via web UI atdockge.${DOMAIN} - Automated workflows: Scripts create directories, configure networks, deploy stacks, wait for health checks
- Repository location:
/home/kelin/AI-Homelab/(templates indocker-compose/, docs indocs/)
File Structure Standards
docker-compose/
├── core.yml # Main compose files (legacy)
├── infrastructure.yml
├── media.yml
└── core/ # New organized structure
├── docker-compose.yml # Individual stack configs
├── authelia/
├── duckdns/
└── traefik/
/opt/stacks/ # Runtime location (created by scripts)
├── core/ # DuckDNS, Traefik, Authelia, Gluetun (deploy FIRST)
├── infrastructure/ # Dockge, Pi-hole, monitoring tools
├── dashboards/ # Homepage (AI-configured), Homarr
├── media/ # Plex, Jellyfin, Calibre-web, qBittorrent
├── media-management/ # *arr services (Sonarr, Radarr, etc.)
├── homeassistant/ # Home Assistant, Node-RED, Zigbee2MQTT
├── productivity/ # Nextcloud, Gitea, Bookstack
├── monitoring/ # Grafana, Prometheus, Uptime Kuma
└── utilities/ # Duplicati, FreshRSS, Wallabag
Network Architecture
- traefik-network: Primary network for all services behind Traefik
- Gluetun network mode: Download clients use
network_mode: "service:gluetun"for VPN routing - Port mapping: Only core services expose ports (80/443 for Traefik); others route via Traefik labels
Critical Operational Principles
1. Security-First SSO Strategy
- Default stance: ALL services start with Authelia middleware enabled
- Bypass exceptions: Only Plex and Jellyfin (for device/app compatibility)
- Disabling SSO: Comment (don't delete) the middleware line:
# - "traefik.http.routers.SERVICE.middlewares=authelia@docker" - Rationale: Security by default; users explicitly opt-out for specific services
2. Traefik Label Patterns
Standard routing configuration for new services:
labels:
- "traefik.enable=true"
- "traefik.http.routers.SERVICE.rule=Host(`SERVICE.${DOMAIN}`)"
- "traefik.http.routers.SERVICE.entrypoints=websecure"
- "traefik.http.routers.SERVICE.tls.certresolver=letsencrypt" # Uses wildcard cert
- "traefik.http.routers.SERVICE.middlewares=authelia@docker" # SSO protection (comment out to disable)
- "traefik.http.services.SERVICE.loadbalancer.server.port=PORT" # If not default
- "x-dockge.url=https://SERVICE.${DOMAIN}" # Service discovery in Dockge
# Optional: Sablier lazy loading (comment out to disable)
# - "sablier.enable=true"
# - "sablier.group=core-SERVICE"
# - "sablier.start-on-demand=true"
3. Resource Management
Apply resource limits to prevent resource exhaustion:
deploy:
resources:
limits:
cpus: '2.0' # Max CPU cores
memory: 4G # Max memory
pids: 1024 # Max processes
reservations:
cpus: '0.5' # Guaranteed CPU
memory: 1G # Guaranteed memory
4. Storage Strategy
- Configs: Bind mount
./service/config:/configrelative to stack directory - Small data: Named volumes (databases, app data <50GB)
- Large data: External mounts
/mnt/media,/mnt/downloads(user must configure) - Secrets:
.envfiles in stack directories (auto-copied from~/AI-Homelab/.env)
5. LinuxServer.io Preference
- Use
lscr.io/linuxserver/*images when available (PUID/PGID support for permissions) - Standard environment:
PUID=1000,PGID=1000,TZ=${TZ}
6. External Host Proxying
Proxy non-Docker services (Raspberry Pi, NAS) via Traefik file provider:
- Create routes in
/opt/stacks/core/traefik/dynamic/external.yml - Example pattern documented in
docs/proxying-external-hosts.md - AI can manage these YAML files directly
Developer Workflows
First-Time Deployment
cd ~/AI-Homelab
sudo ./scripts/setup-homelab.sh # System prep, Docker install, Authelia secrets
# Reboot if NVIDIA drivers installed
sudo ./scripts/deploy-homelab.sh # Deploy core+infrastructure stacks, open Dockge
Managing Services via Scripts
- setup-homelab.sh: Idempotent system preparation (skips completed steps, runs on bare Debian)
- Steps: Update system → Install Docker → Configure firewall → Generate Authelia secrets → Create directories/networks → NVIDIA driver detection
- Auto-generates: JWT secret (64 hex), session secret (64 hex), encryption key (64 hex), admin password hash
- Creates
homelab-networkandtraefik-networkDocker networks
- deploy-homelab.sh: Automated stack deployment (requires
.envconfigured first)- Steps: Validate prerequisites → Create directories → Deploy core → Deploy infrastructure → Deploy dashboards → Prepare additional stacks → Wait for Dockge
- Copies
.envto/opt/stacks/core/.envand/opt/stacks/infrastructure/.env - Waits for service health checks before proceeding
Testing Changes
# Test in isolation before modifying stacks
docker run --rm --gpus all nvidia/cuda:12.0.0-base-ubuntu22.04 nvidia-smi # GPU test
docker compose -f docker-compose.yml config # Validate YAML syntax
docker compose -f docker-compose.yml up -d SERVICE # Deploy single service
docker compose logs -f SERVICE # Monitor logs
Common Operations
cd /opt/stacks/STACK_NAME
docker compose up -d # Start stack
docker compose restart SERVICE # Restart service
docker compose logs -f SERVICE # Tail logs
docker compose pull && docker compose up -d # Update images
Creating a New Docker Service
Creating a New Docker Service
Service Definition Template
services:
service-name:
image: linuxserver/service:latest # Pin versions in production; prefer LinuxServer.io
container_name: service-name
restart: unless-stopped
networks:
- traefik-network
volumes:
- ./service-name/config:/config # Config in stack directory
- service-data:/data # Named volume for persistent data
# Large data on separate drives:
# - /mnt/media:/media
# - /mnt/downloads:/downloads
environment:
- PUID=${PUID:-1000}
- PGID=${PGID:-1000}
- TZ=${TZ}
deploy: # Resource limits (recommended)
resources:
limits:
cpus: '1.0'
memory: 1G
reservations:
cpus: '0.25'
memory: 256M
labels:
- "traefik.enable=true"
- "traefik.http.routers.service-name.rule=Host(`service.${DOMAIN}`)"
- "traefik.http.routers.service-name.entrypoints=websecure"
- "traefik.http.routers.service-name.tls.certresolver=letsencrypt"
- "traefik.http.routers.service-name.middlewares=authelia@docker" # SSO enabled by default
- "traefik.http.services.service-name.loadbalancer.server.port=8080" # If non-standard port
- "x-dockge.url=https://service.${DOMAIN}" # Service discovery
- "homelab.category=category-name"
- "homelab.description=Service description"
volumes:
service-data:
driver: local
networks:
traefik-network:
external: true
VPN-Routed Service (Downloads)
Route through Gluetun for VPN protection:
services:
# Gluetun already running in core stack
qbittorrent:
image: lscr.io/linuxserver/qbittorrent:latest
container_name: qbittorrent
network_mode: "service:gluetun" # Routes through VPN
depends_on:
- gluetun
volumes:
- ./qbittorrent/config:/config
- /mnt/downloads:/downloads
environment:
- PUID=1000
- PGID=1000
- TZ=${TZ}
# No ports needed - mapped in Gluetun service
# No Traefik labels - access via Gluetun's network
Add ports to Gluetun in core stack:
gluetun:
ports:
- 8080:8080 # qBittorrent WebUI
Authelia Bypass Example (Jellyfin)
labels:
- "traefik.enable=true"
- "traefik.http.routers.jellyfin.rule=Host(`jellyfin.${DOMAIN}`)"
- "traefik.http.routers.jellyfin.entrypoints=websecure"
- "traefik.http.routers.jellyfin.tls.certresolver=letsencrypt"
# NO authelia middleware - direct access for apps/devices
- "x-dockge.url=https://jellyfin.${DOMAIN}"
# Optional: Sablier lazy loading (uncomment to enable)
# - "sablier.enable=true"
# - "sablier.group=media-jellyfin"
# - "sablier.start-on-demand=true"
Modifying Existing Services
Modifying Existing Services
Safe Modification Process
- Read entire compose file - understand dependencies (networks, volumes, depends_on)
- Check for impacts - search for service references across other compose files
- Validate YAML -
docker compose configbefore deploying - Test in place - restart single service:
docker compose up -d SERVICE - Monitor logs -
docker compose logs -f SERVICEto verify functionality
Common Modifications
- Toggle SSO: Comment/uncomment
middlewares=authelia@dockerlabel - Toggle lazy loading: Comment/uncomment Sablier labels (
sablier.enable,sablier.group,sablier.start-on-demand) - Change port: Update
loadbalancer.server.portlabel - Add VPN routing: Change to
network_mode: "service:gluetun", map ports in Gluetun - Update subdomain: Modify
Host()rule in Traefik labels - Environment changes: Update in
.env, redeploy:docker compose up -d
Project-Specific Conventions
Why Traefik vs Nginx Proxy Manager
- File-based configuration: AI can modify labels/YAML directly (no web UI clicks)
- Docker label discovery: Services auto-register routes when deployed
- Let's Encrypt automation: Wildcard cert via DuckDNS DNS challenge (single cert for all services)
- Dynamic reloading: Changes apply without container restarts
Authelia Password Generation
Secrets auto-generated by setup-homelab.sh:
- JWT secret:
openssl rand -hex 64 - Session secret:
openssl rand -hex 64 - Encryption key:
openssl rand -hex 64 - Admin password: Hashed with
docker run authelia/authelia:latest authelia crypto hash generate argon2 - Stored in
.envfile, never committed to git
DuckDNS Wildcard Certificate
- Single certificate:
*.yourdomain.duckdns.orgcovers all subdomains - DNS challenge: Traefik uses DuckDNS token for Let's Encrypt validation
- Certificate storage:
/opt/stacks/core/traefik/acme.json(600 permissions) - Renewal: Traefik handles automatically (90-day Let's Encrypt certs)
- Usage: Services use
tls.certresolver=letsencryptlabel (no per-service cert requests)
Homepage Dashboard AI Configuration
Homepage (/opt/stacks/dashboards/) uses dynamic variable replacement:
- Services configured in
homepage/config/services.yaml - URLs use hard-coded domains (e.g.,
https://service.kelin-casa.duckdns.org) - NO variables supported - AI can add/remove service entries by editing YAML
- Bookmarks, widgets configured similarly in separate YAML files
Resource Limits Pattern
Apply limits to all services to prevent resource exhaustion:
- Core services: Low limits (DuckDNS: 0.1 CPU, 64MB RAM)
- Web services: Medium limits (1-2 CPU, 1-4GB RAM)
- Media services: High limits (2-4 CPU, 4-8GB RAM)
- Always set reservations for guaranteed minimum resources
x-dockge.url Labels
Include service discovery labels for Dockge UI:
labels:
- "x-dockge.url=https://service.${DOMAIN}" # Shows direct link in Dockge
Key Documentation References
- Getting Started: Step-by-step deployment guide
- Docker Guidelines: Comprehensive service management patterns (1000+ lines)
- Services Reference: All 50+ pre-configured services
- Proxying External Hosts: Traefik file provider patterns for non-Docker services
- Quick Reference: Command cheat sheet and troubleshooting
Pre-Deployment Safety Checks
Before deploying any service changes:
- YAML syntax valid (
docker compose config) - No port conflicts (check
docker ps --format "table {{.Names}}\t{{.Ports}}") - Networks exist (
docker network ls | grep -E 'traefik-network|homelab-network') - Volume paths correct (
/opt/stacks/for configs,/mnt/for large data) .envvariables populated (source stack.envand checkecho $DOMAIN)- Traefik labels complete (enable, rule, entrypoint, tls, middleware)
- SSO appropriate (default enabled, bypass only for Plex/Jellyfin)
- VPN routing configured if download service (network_mode + Gluetun ports)
- LinuxServer.io image available (check hub.docker.com/u/linuxserver)
- Resource limits set (deploy.resources section)
Troubleshooting Common Issues
Service Won't Start
docker compose logs SERVICE # Check error messages
docker compose config # Validate YAML syntax
docker ps -a | grep SERVICE # Check exit code
Common causes: Port conflict, missing .env variable, network not found, volume permission denied
Traefik Not Routing
docker logs traefik | grep SERVICE # Check if route registered
curl -k https://traefik.${DOMAIN}/api/http/routers # Inspect routes (if API enabled)
Verify: Service on traefik-network, labels correctly formatted, traefik.enable=true, Traefik restarted after label changes
Authelia SSO Not Prompting
Check middleware: docker compose config | grep -A5 SERVICE | grep authelia
Verify: Authelia container running, middleware label present, no bypass rule in authelia/configuration.yml
VPN Not Working (Gluetun)
docker exec gluetun sh -c "curl -s ifconfig.me" # Check VPN IP
docker logs gluetun | grep -i wireguard # Verify connection
Verify: SURFSHARK_PRIVATE_KEY set in .env, service using network_mode: "service:gluetun", ports mapped in Gluetun
Wildcard Certificate Issues
docker logs traefik | grep -i certificate
ls -lh /opt/stacks/core/traefik/acme.json # Should be 600 permissions
Verify: DuckDNS token valid, DUCKDNS_TOKEN in .env, DNS propagation (wait 2-5 min), acme.json writable by Traefik
AI Management Capabilities
You can manage this homelab by:
- Creating services: Generate compose files in
/opt/stacks/with proper Traefik labels - Modifying routes: Edit Traefik labels in compose files
- Managing external hosts: Update
/opt/stacks/core/traefik/dynamic/external.yml - Configuring Homepage: Edit
services.yaml,bookmarks.yamlin homepage config - Toggling SSO: Add/remove Authelia middleware labels
- Adding VPN routing: Change network_mode and update Gluetun ports
- Environment management: Update
.env(but remind users to manually copy to stacks)
What NOT to Do
- Never commit
.envfiles to git (contain secrets) - Don't use
docker runfor persistent services (use compose in/opt/stacks/) - Don't manually request Let's Encrypt certs (Traefik handles via wildcard)
- Don't edit Authelia/Traefik config via web UI (use YAML files)
- Don't expose download clients without VPN (route through Gluetun)
Quick Command Reference
# View all running containers
docker ps --format "table {{.Names}}\t{{.Status}}\t{{.Ports}}"
# Check service logs
cd /opt/stacks/STACK && docker compose logs -f SERVICE
# Restart specific service
cd /opt/stacks/STACK && docker compose restart SERVICE
# Update images and redeploy
cd /opt/stacks/STACK && docker compose pull && docker compose up -d
# Validate compose file
docker compose -f /opt/stacks/STACK/docker-compose.yml config
# Check Traefik routes
docker logs traefik | grep -i "Creating router\|Adding route"
# Check network connectivity
docker exec SERVICE ping -c 2 traefik
# View environment variables
cd /opt/stacks/STACK && docker compose config | grep -A20 "environment:"
# Test NVIDIA GPU access
docker run --rm --gpus all nvidia/cuda:12.0.0-base-ubuntu22.04 nvidia-smi
User Context Notes
- User: kelin (PUID=1000, PGID=1000)
- Repository:
/home/kelin/AI-Homelab/ - Current Status: Production-ready with comprehensive error handling and resource management
- Recent work: Resource limits implementation, subdirectory organization, enhanced validation
When interacting with users, prioritize security (SSO by default), consistency (follow existing patterns), and stack-awareness (consider service dependencies). Always explain the "why" behind architectural decisions and reference specific files/paths when describing changes.