# Docker Service Management Guidelines ## Overview This document provides comprehensive guidelines for managing Docker services in your AI-powered homelab using Dockge, Traefik, and Authelia. These guidelines ensure consistency, maintainability, and reliability across your entire infrastructure. ## Table of Contents 1. [Philosophy](#philosophy) 2. [Dockge Structure](#dockge-structure) 3. [Traefik and Authelia Integration](#traefik-and-authelia-integration) 4. [Docker Compose vs Docker Run](#docker-compose-vs-docker-run) 5. [Service Creation Guidelines](#service-creation-guidelines) 6. [Service Modification Guidelines](#service-modification-guidelines) 7. [Naming Conventions](#naming-conventions) 8. [Network Architecture](#network-architecture) 9. [Volume Management](#volume-management) 10. [Security Best Practices](#security-best-practices) 11. [Monitoring and Logging](#monitoring-and-logging) 12. [Troubleshooting](#troubleshooting) ## Philosophy ### Core Principles 1. **Dockge First**: Manage all stacks through Dockge in `/opt/stacks/` 2. **Infrastructure as Code**: All services defined in Docker Compose files 3. **File-Based Configuration**: Traefik labels and Authelia YAML (AI-manageable) 4. **Reproducibility**: Any service should be rebuildable from compose files 5. **Automatic HTTPS**: All services routed through Traefik with Let's Encrypt 6. **Smart SSO**: Authelia protects admin interfaces, bypasses media apps 7. **Documentation**: Every non-obvious configuration must be commented 8. **Consistency**: Use the same patterns across all services 9. **Safety First**: Always test changes in isolation before deploying ### The Stack Mindset Think of your homelab as an interconnected stack where: - Services depend on networks (especially traefik-network) - Traefik routes all traffic with automatic SSL - Authelia protects sensitive services - VPN (Gluetun) secures downloads - Changes ripple through the system Always ask: "How does this change affect other services and routing?" ## Dockge Structure ### Directory Organization All stacks live in `/opt/stacks/stack-name/`: ``` /opt/stacks/ ├── traefik/ │ ├── docker-compose.yml │ ├── traefik.yml # Static config │ ├── dynamic/ # Dynamic routes │ │ ├── routes.yml │ │ └── external.yml # External host proxying │ ├── acme.json # SSL certificates (chmod 600) │ └── .env ├── authelia/ │ ├── docker-compose.yml │ ├── configuration.yml # Authelia settings │ ├── users_database.yml # User accounts │ └── .env ├── media/ │ ├── docker-compose.yml │ └── .env └── ... ``` ### Why Dockge? - **Visual Management**: Web UI at `https://dockge.${DOMAIN}` - **Direct File Editing**: Edit compose files in-place - **Stack Organization**: Each service stack is independent - **AI Compatible**: Files can be managed by AI - **Git Integration**: Easy to version control ### Storage Strategy **Small Data** (configs, DBs < 10GB): `/opt/stacks/stack-name/` ```yaml volumes: - /opt/stacks/sonarr/config:/config ``` **Large Data** (media, downloads, backups): `/mnt/` ```yaml volumes: - /mnt/media/movies:/movies - /mnt/media/tv:/tv - /mnt/downloads:/downloads - /mnt/backups:/backups ``` AI will suggest `/mnt/` when data may exceed 50GB or grow continuously. ## Traefik and Authelia Integration ### Every Local (on the same server) Service Needs Traefik Labels **Default Configuration**: All services should use authelia SSO, traefik routing, and sablier lazy loading by default. Standard pattern for all services using the standardized TRAEFIK CONFIGURATION format: ```yaml services: myservice: image: myimage:latest container_name: myservice networks: - homelab-network - traefik-network # Required for Traefik labels: # TRAEFIK CONFIGURATION # ========================================== # Service metadata - "com.centurylinklabs.watchtower.enable=true" - "homelab.category=category-name" - "homelab.description=Brief service description" # Traefik labels - "traefik.enable=true" # Router configuration - "traefik.http.routers.myservice.rule=Host(`myservice.${DOMAIN}`)" - "traefik.http.routers.myservice.entrypoints=websecure" - "traefik.http.routers.myservice.tls.certresolver=letsencrypt" - "traefik.http.routers.myservice.middlewares=authelia@docker" # Service configuration - "traefik.http.services.myservice.loadbalancer.server.port=8080" # Sablier configuration - "sablier.enable=true" - "sablier.group=${SERVER_HOSTNAME}-myservice" - "sablier.start-on-demand=true" ``` ### Label Structure Explanation **Service Metadata Section:** - `com.centurylinklabs.watchtower.enable=true` - Enables automatic container updates - `homelab.category=category-name` - Groups services by function (media, productivity, infrastructure, etc.) - `homelab.description=Brief description` - Documents service purpose **Router Configuration Section:** - `traefik.enable=true` - Enables Traefik routing for this service - `rule=Host(\`myservice.${DOMAIN}\`)` - Defines the domain routing rule - `entrypoints=websecure` - Routes through HTTPS entrypoint - `tls.certresolver=letsencrypt` - Enables automatic SSL certificates - `middlewares=authelia@docker` - **Default: Enables SSO protection** (remove line to disable) **Service Configuration Section:** - `loadbalancer.server.port=8080` - Specifies internal container port (if not 80) **Sablier Configuration Section:** - `sablier.enable=true` - **Default: Enables lazy loading** (remove section to disable) - `sablier.group=${SERVER_HOSTNAME}-myservice` - Groups containers for coordinated startup - `sablier.start-on-demand=true` - Starts containers only when accessed **x-dockge Section:** At the bottom of the compose file, add a top-level `x-dockge` section for service discovery in Dockge: ```yaml x-dockge: urls: - https://myservice.${DOMAIN} - http://localhost:8080 # Direct local access ``` ### If Traefik is on a Remote Server, configure routes & services on the Remote Server When Traefik runs on a separate server from your application services, you cannot use Docker labels for configuration. Instead, create YAML files in the Traefik server's `dynamic/` directory to define routes and services. #### When to Use Remote Traefik Configuration Use this approach when: - Traefik runs on a dedicated reverse proxy server - Application services run on separate application servers - You want centralized routing configuration - Docker labels cannot be used (different servers) #### File Organization Create one YAML file per application server in `/opt/stacks/traefik/dynamic/`: ``` /opt/stacks/traefik/dynamic/ ├── server1.example.com.yml # Services on server1 ├── server2.example.com.yml # Services on server2 ├── shared-services.yml # Common services └── sablier.yml # Sablier middlewares ``` #### YAML File Structure Each server-specific YAML file should contain: ```yaml # /opt/stacks/traefik/dynamic/server1.example.com.yml http: routers: # Router definitions for services on server1 sonarr: rule: "Host(`sonarr.yourdomain.duckdns.org`)" entrypoints: - websecure tls: certresolver: letsencrypt middlewares: - authelia - sablier-server1-sonarr service: sonarr radarr: rule: "Host(`radarr.yourdomain.duckdns.org`)" entrypoints: - websecure tls: certresolver: letsencrypt middlewares: - authelia - sablier-server1-radarr service: radarr services: # Service definitions for services on server1 sonarr: loadbalancer: servers: - url: "http://server1.example.com:8989" # Internal IP/port of service passhostheader: true radarr: loadbalancer: servers: - url: "http://server1.example.com:7878" # Internal IP/port of service passhostheader: true ``` #### Complete Example for a Media Server ```yaml # /opt/stacks/traefik/dynamic/media-server.yml http: routers: jellyfin: rule: "Host(`jellyfin.yourdomain.duckdns.org`)" entrypoints: - websecure tls: certresolver: letsencrypt # No authelia for app access middlewares: - sablier-media-server-jellyfin service: jellyfin sonarr: rule: "Host(`sonarr.yourdomain.duckdns.org`)" entrypoints: - websecure tls: certresolver: letsencrypt middlewares: - authelia - sablier-media-server-sonarr service: sonarr radarr: rule: "Host(`radarr.yourdomain.duckdns.org`)" entrypoints: - websecure tls: certresolver: letsencrypt middlewares: - authelia - sablier-media-server-radarr service: radarr services: jellyfin: loadbalancer: servers: - url: "http://192.168.1.100:8096" # Media server internal IP passhostheader: true sonarr: loadbalancer: servers: - url: "http://192.168.1.100:8989" # Media server internal IP passhostheader: true radarr: loadbalancer: servers: - url: "http://192.168.1.100:7878" # Media server internal IP passhostheader: true ``` #### Key Configuration Notes **Router Configuration:** - `rule`: Domain matching rule (same as Docker labels) - `entrypoints`: Use `websecure` for HTTPS - `tls.certresolver`: Use `letsencrypt` for automatic SSL - `middlewares`: List of middlewares (authelia, sablier, custom) - `service`: Reference to service definition below **Service Configuration:** - `url`: Internal IP address and port of the actual service - `passhostheader: true`: Required for most web applications - Use internal IPs, not public domains **Middleware References:** - `authelia`: References the authelia middleware (defined in another file) - `sablier-server1-sonarr`: References sablier middleware for lazy loading - Custom middlewares can be added as needed #### Deployment Process 1. **Create/Update YAML files** in `/opt/stacks/traefik/dynamic/` 2. **Validate syntax**: ```bash docker exec traefik traefik validate --configFile=/etc/traefik/traefik.yml ``` 3. **Reload configuration** (if hot-reload enabled) or restart Traefik 4. **Test services** by accessing their domains 5. **Monitor logs** for any routing errors #### Migration from Docker Labels When moving from Docker labels to YAML configuration: 1. Copy router rules from Docker labels to YAML format 2. Convert service ports to full URLs with internal IPs 3. Ensure middlewares are properly referenced 4. Remove Traefik labels from docker-compose files 5. Test all services after migration This approach provides centralized, version-controllable routing configuration while maintaining the same security and performance benefits as Docker label-based configuration. ### When to Use Authelia SSO **Protect with Authelia** (Default for all services): - Admin interfaces (Sonarr, Radarr, Prowlarr, etc.) - Infrastructure tools (Portainer, Dockge, Grafana) - Personal data (Nextcloud, Mealie, wikis) - Development tools (code-server, GitLab) - Monitoring dashboards **Bypass Authelia**: - Media servers (Plex, Jellyfin) - need app access - Request services (Jellyseerr) - family-friendly access - Public services (WordPress, status pages) - Services with their own auth (Home Assistant) Configure bypasses in `/opt/stacks/authelia/configuration.yml`: ```yaml access_control: rules: - domain: jellyfin.yourdomain.duckdns.org policy: bypass - domain: plex.yourdomain.duckdns.org policy: bypass ``` ### Routing Through VPN (Gluetun) For services that need VPN (downloads): ```yaml services: mydownloader: image: downloader:latest container_name: mydownloader network_mode: "service:gluetun" # Route through VPN depends_on: - gluetun ``` Expose ports through Gluetun's compose file: ```yaml # In gluetun.yml gluetun: ports: - "8080:8080" # mydownloader web UI ``` ## Docker Compose vs Docker Run ### Docker Compose: For Everything Persistent Use Docker Compose for: - All production services - Services that need to restart automatically - Multi-container applications - Services with complex configurations - Anything you want to keep long-term **Example:** ```yaml # docker-compose/plex.yml services: plex: image: plexinc/pms-docker:1.40.0.7998-f68041501 container_name: plex restart: unless-stopped networks: - media-network ports: - "32400:32400" volumes: - ./config/plex:/config - /media:/media:ro environment: - PUID=1000 - PGID=1000 - TZ=America/New_York ``` ### Docker Run: For Temporary Operations Only Use `docker run` for: - Testing new images - One-off commands - Debugging - Verification tasks (like GPU testing) **Examples:** ```bash # Test if NVIDIA GPU is accessible docker run --rm --gpus all nvidia/cuda:12.0.0-base-ubuntu22.04 nvidia-smi # Quick test of a new image docker run --rm -it alpine:latest /bin/sh # One-off database backup docker run --rm -v mydata:/data busybox tar czf /backup/data.tar.gz /data ``` ## Service Creation Guidelines ### Step-by-Step Process #### 1. Planning Phase **Before writing any YAML:** - [ ] What problem does this service solve? - [ ] Does a similar service already exist? - [ ] What are the dependencies? - [ ] What ports are needed? - [ ] What data needs to persist? - [ ] What environment variables are required? - [ ] What networks should it connect to? (include `traefik-network`) - [ ] Are there any security considerations? - [ ] **Should this service be protected by Authelia SSO?** (default: yes) - [ ] **Should this service use lazy loading?** (default: yes) - [ ] **What category does this service belong to?** (media, productivity, infrastructure, etc.) - [ ] **What subdomain should it use?** (service-name.${DOMAIN}) #### 2. Research Phase - Read the official image documentation - Check for a service-doc in the EZ-Homelab/docs/service-docs folder, if the new service doesn't have one, be prepared to create it at the end - Utilize https://awesome-docker-compose.com/apps - Check example configurations - Review resource requirements - Understand health check requirements - Note any special permissions needed #### 3. Implementation Phase **Start with a minimal configuration:** ```yaml services: service-name: image: vendor/image:specific-version container_name: service-name restart: unless-stopped # Set to 'no' if lazyloading (Sablier) is to be enabled ``` **Add networks (required for Traefik):** ```yaml networks: - homelab-network - traefik-network # Required for Traefik routing ``` **Add ports (if externally accessible):** ```yaml ports: - "8080:8080" # Web UI ``` **Add volumes:** ```yaml volumes: - ./config/service-name:/config - service-data:/data ``` **Add environment variables:** ```yaml environment: - PUID=1000 - PGID=1000 - TZ=${TIMEZONE} ``` **Add health checks (if compatable):** ```yaml healthcheck: test: ["CMD", "curl", "-f", "http://localhost:8080/health"] interval: 30s timeout: 10s retries: 3 start_period: 40s ``` **Add TRAEFIK CONFIGURATION labels (required for all web services):** ```yaml labels: # TRAEFIK CONFIGURATION # ========================================== # Service metadata - "com.centurylinklabs.watchtower.enable=true" - "homelab.category=category-name" - "homelab.description=Brief service description" # Traefik labels - "traefik.enable=true" # Router configuration - "traefik.http.routers.service-name.rule=Host(`service-name.${DOMAIN}`)" - "traefik.http.routers.service-name.entrypoints=websecure" - "traefik.http.routers.service-name.tls.certresolver=letsencrypt" - "traefik.http.routers.service-name.middlewares=authelia@docker" # Service configuration - "traefik.http.services.service-name.loadbalancer.server.port=8080" # Sablier configuration - "sablier.enable=true" - "sablier.group=${SERVER_HOSTNAME}-service-name" - "sablier.start-on-demand=true" ``` **Add x-dockge section at the bottom of the compose file (before networks):** ```yaml x-dockge: urls: - https://service-name.${DOMAIN} - http://${SERVER_IP}$:8080 volumes: service-data: driver: local networks: traefik-network: external: true homelab-network: external: true ``` If Traefik & Sablier are on a remote server: * Comment out the traefik labels since they won't be used, don't delete them. * Notify user to add the service and middleware to the traefic external host yml file, and the sablier.yml file. **Example: Comment out Traefik labels in docker-compose.yml:** ```yaml labels: # TRAEFIK CONFIGURATION # ========================================== # Service metadata - "com.centurylinklabs.watchtower.enable=true" - "homelab.category=category-name" - "homelab.description=Brief service description" # Traefik labels - COMMENTED OUT for remote server # - "traefik.enable=true" # - "traefik.http.routers.service-name.rule=Host(`service-name.${DOMAIN}`)" # - "traefik.http.routers.service-name.entrypoints=websecure" # - "traefik.http.routers.service-name.tls.certresolver=letsencrypt" # - "traefik.http.routers.service-name.middlewares=authelia@docker" # - "traefik.http.services.service-name.loadbalancer.server.port=8080" # Sablier configuration - "sablier.enable=true" - "sablier.group=${SERVER_HOSTNAME}-service-name" - "sablier.start-on-demand=true" ``` **Required: Add to Traefik external host YAML file (e.g., `/opt/stacks/traefik/dynamic/remote-host-server1.yml`):** ```yaml http: routers: service-name: rule: "Host(`service-name.yourdomain.duckdns.org`)" entrypoints: - websecure tls: certresolver: letsencrypt middlewares: - authelia - sablier-server1-service-name service: service-name services: service-name: loadbalancer: servers: - url: "http://192.168.1.100:8080" # Internal IP of application server passhostheader: true ``` **Required: Add to Sablier YAML file (e.g., `/opt/stacks/traefik/dynamic/sablier.yml`):** ```yaml sablier-server1-servicename: plugin: sablier: sablierUrl: http://sablier-service:10000 group: server1-servicename sessionDuration: 5m ignoreUserAgent: curl dynamic: displayName: Service Name theme: ghost show-details-by-default: true ``` **Deployment Steps:** 1. Comment out Traefik labels in the service's docker-compose.yml 2. Add router and service definitions to the appropriate Traefik dynamic YAML file 3. Add sablier middleware to the sablier.yml file 4. Validate Traefik configuration: `docker exec traefik traefik validate --configFile=/etc/traefik/traefik.yml` 5. Restart Traefik or wait for hot-reload 6. Test service access through Traefik #### 4. Testing Phase ```bash # Validate syntax docker compose -f docker-compose/service.yml config # Start in foreground to see logs docker compose -f docker-compose/service.yml up # If successful, restart in background docker compose -f docker-compose/service.yml down docker compose -f docker-compose/service.yml up -d ``` #### 5. Documentation Phase Add comments to your compose file: ```yaml services: sonarr: image: lscr.io/linuxserver/sonarr:4.0.0 container_name: sonarr # Sonarr - TV Show management and automation # Protected by: Authelia SSO, Sablier lazy loading restart: no ``` Update your main README or service-specific README with: - Service purpose - Access URLs (Traefik HTTPS URLs) - Default credentials (if any) - Configuration notes (SSO enabled/disabled, lazy loading, etc.) - Backup instructions - Any special routing considerations (VPN, remote server, etc.) If the service doesn't already have a service doc in EZ-Homelab/docs/service-docs folder, create it using the compiled information about the service with the same format as the other service-docs ## Service Modification Guidelines ### Before Modifying 1. **Back up current configuration:** ```bash cp docker-compose/service.yml docker-compose/service.yml.backup ``` 2. **Document why you're making the change** - Create a comment in the compose file - Note in your changelog or docs 3. **Understand the current state:** ```bash # Check if service is running docker compose -f docker-compose/service.yml ps # Review current configuration docker compose -f docker-compose/service.yml config # Check logs for any existing issues docker compose -f docker-compose/service.yml logs --tail=50 ``` ### Making the Change 1. **Edit the compose file** - Make minimal, targeted changes - Keep existing structure when possible - Add comments for new configurations 2. **Validate syntax:** ```bash docker compose -f docker-compose/service.yml config ``` 3. **Apply the change:** ```bash # Pull new image if version changed docker compose -f docker-compose/service.yml pull # Recreate the service docker compose -f docker-compose/service.yml up -d ``` 4. **Verify the change:** ```bash # Check service is running docker compose -f docker-compose/service.yml ps # Watch logs for errors docker compose -f docker-compose/service.yml logs -f # Test functionality curl http://localhost:port/health ``` ### Rollback Plan If something goes wrong: ```bash # Stop the service docker compose -f docker-compose/service.yml down # Restore backup mv docker-compose/service.yml.backup docker-compose/service.yml # Restart with old configuration docker compose -f docker-compose/service.yml up -d ``` ### Common Modifications **Add TRAEFIK CONFIGURATION to existing service:** ```yaml labels: # TRAEFIK CONFIGURATION # ========================================== # Service metadata - "com.centurylinklabs.watchtower.enable=true" - "homelab.category=category-name" - "homelab.description=Brief service description" # Traefik labels - "traefik.enable=true" # Router configuration - "traefik.http.routers.service-name.rule=Host(`service-name.${DOMAIN}`)" - "traefik.http.routers.service-name.entrypoints=websecure" - "traefik.http.routers.service-name.tls.certresolver=letsencrypt" - "traefik.http.routers.service-name.middlewares=authelia@docker" # Service configuration - "traefik.http.services.service-name.loadbalancer.server.port=8080" # Sablier configuration - "sablier.enable=true" - "sablier.group=${SERVER_HOSTNAME}-service-name" - "sablier.start-on-demand=true" ``` **Toggle SSO**: Comment/uncomment the Authelia middleware label: ```yaml # Enable SSO (default) - "traefik.http.routers.service.middlewares=authelia@docker" # Disable SSO (remove line for media servers, public services) # - "traefik.http.routers.service.middlewares=authelia@docker" ``` **Toggle Lazy Loading**: Comment/uncomment Sablier labels: ```yaml # Enable lazy loading (default) - "sablier.enable=true" - "sablier.group=${SERVER_HOSTNAME}-service" - "sablier.start-on-demand=true" # Disable lazy loading (remove section for always-on services) # - "sablier.enable=true" # - "sablier.group=${SERVER_HOSTNAME}-service" # - "sablier.start-on-demand=true" ``` **Change Port**: Update the loadbalancer server port: ```yaml - "traefik.http.services.service.loadbalancer.server.port=8080" ``` **Add VPN Routing**: Change network mode and update Gluetun ports: ```yaml network_mode: "service:gluetun" # Add port mapping in Gluetun service ``` **Update Subdomain**: Modify the Host rule: ```yaml - "traefik.http.routers.service.rule=Host(`newservice.${DOMAIN}`)" ``` ## Naming Conventions ### Service Names Use lowercase with hyphens: - ✅ `plex-media-server` - ✅ `home-assistant` - ❌ `PlexMediaServer` - ❌ `home_assistant` ### Container Names Match service names or be descriptive: ```yaml services: plex: container_name: plex # Simple match database: container_name: media-database # Descriptive ``` ### Network Names Use purpose-based naming: - `homelab-network` - Main network - `media-network` - Media services - `monitoring-network` - Observability stack - `isolated-network` - Untrusted services ### Volume Names Use `service-purpose` pattern: ```yaml volumes: plex-config: plex-metadata: database-data: nginx-certs: ``` ### File Names Organize by function: - `docker-compose/media.yml` - Media services (Plex, Jellyfin, etc.) - `docker-compose/monitoring.yml` - Monitoring stack - `docker-compose/infrastructure.yml` - Core services (DNS, reverse proxy) - `docker-compose/development.yml` - Dev tools ## Network Architecture ### Network Types 1. **Bridge Networks** (Most Common) ```yaml networks: homelab-network: driver: bridge ipam: config: - subnet: 172.20.0.0/16 ``` 2. **Host Network** (When Performance Critical) ```yaml services: performance-critical: network_mode: host ``` 3. **Overlay Networks** (For Swarm/Multi-host) ```yaml networks: swarm-network: driver: overlay ``` ### Network Design Patterns #### Pattern 1: Single Shared Network Simplest approach for small homelabs: ```yaml networks: homelab-network: external: true ``` Create once manually: ```bash docker network create homelab-network ``` #### Pattern 2: Segmented Networks Better security through isolation: ```yaml networks: frontend-network: # Web-facing services backend-network: # Databases, internal services monitoring-network: # Observability ``` #### Pattern 3: Service-Specific Networks Each service group has its own network: ```yaml services: web: networks: - frontend - backend database: networks: - backend # Not exposed to frontend ``` ### Network Security - Place databases on internal networks only - Use separate networks for untrusted services - Expose minimal ports to the host - Use reverse proxies for web services ## Volume Management ### Volume Types #### Named Volumes (Managed by Docker) ```yaml volumes: database-data: driver: local ``` **Use for:** - Database files - Application data - Anything Docker should manage **Advantages:** - Docker handles permissions - Easy to backup/restore - Portable across systems #### Bind Mounts (Direct Host Paths) ```yaml volumes: - ./config/app:/config - /media:/media:ro ``` **Use for:** - Configuration files you edit directly - Large media libraries - Shared data with host **Advantages:** - Direct file access - Easy to edit - Can share with host applications #### tmpfs Mounts (RAM) ```yaml tmpfs: - /tmp ``` **Use for:** - Temporary data - Cache that doesn't need persistence - Sensitive data that shouldn't touch disk ### Volume Best Practices 1. **Consistent Paths:** ```yaml volumes: - ./config/service:/config # Always use /config inside container - service-data:/data # Always use /data for application data ``` 2. **Read-Only When Possible:** ```yaml volumes: - /media:/media:ro # Media library is read-only ``` 3. **Separate Config from Data:** ```yaml volumes: - ./config/plex:/config # Editable configuration - plex-metadata:/metadata # Application-managed data ``` 4. **Backup Strategy:** ```bash # Backup named volume docker run --rm \ -v plex-metadata:/data \ -v $(pwd)/backups:/backup \ busybox tar czf /backup/plex-metadata.tar.gz /data ``` ## Security Best Practices ### 1. Image Security **Pin Specific Versions:** ```yaml # ✅ Good - Specific version image: nginx:1.25.3-alpine # ❌ Bad - Latest tag image: nginx:latest ``` **Use Official or Trusted Images:** - Official Docker images - LinuxServer.io (lscr.io) - Trusted vendors **Scan Images:** ```bash docker scan vendor/image:tag ``` ### 2. Secret Management **Never Commit Secrets:** ```yaml # .env file (gitignored) DB_PASSWORD=super-secret-password API_KEY=sk-1234567890 # docker-compose.yml environment: - DB_PASSWORD=${DB_PASSWORD} - API_KEY=${API_KEY} ``` **Provide Templates:** ```bash # .env.example (committed) DB_PASSWORD=changeme API_KEY=your-api-key-here ``` ### 3. User Permissions **Run as Non-Root:** ```yaml environment: - PUID=1000 # Your user ID - PGID=1000 # Your group ID ``` **Check Current User:** ```bash id -u # Gets your UID id -g # Gets your GID ``` ### 4. Network Security **Minimal Exposure:** ```yaml # ✅ Good - Only expose what's needed ports: - "127.0.0.1:8080:8080" # Only accessible from localhost # ❌ Bad - Exposed to all interfaces ports: - "8080:8080" ``` **Use Reverse Proxy:** ```yaml # Don't expose services directly # Use Nginx/Traefik to proxy with SSL ``` ### 5. Resource Limits **Prevent Resource Exhaustion:** ```yaml deploy: resources: limits: cpus: '2' memory: 4G reservations: cpus: '0.5' memory: 1G ``` ## Monitoring and Logging ### Logging Configuration **Standard Logging:** ```yaml logging: driver: "json-file" options: max-size: "10m" max-file: "3" ``` **Centralized Logging:** ```yaml logging: driver: "syslog" options: syslog-address: "tcp://192.168.1.100:514" ``` ### Health Checks **HTTP Health Check:** ```yaml healthcheck: test: ["CMD", "curl", "-f", "http://localhost:8080/health"] interval: 30s timeout: 10s retries: 3 ``` **TCP Health Check:** ```yaml healthcheck: test: ["CMD-SHELL", "nc -z localhost 5432 || exit 1"] interval: 30s timeout: 5s retries: 3 ``` **Custom Script:** ```yaml healthcheck: test: ["CMD", "/healthcheck.sh"] interval: 30s timeout: 10s retries: 3 ``` ### Monitoring Stack Example ```yaml # docker-compose/monitoring.yml services: prometheus: image: prom/prometheus:v2.48.0 container_name: prometheus restart: unless-stopped volumes: - ./config/prometheus:/etc/prometheus - prometheus-data:/prometheus ports: - "9090:9090" networks: - monitoring-network grafana: image: grafana/grafana:10.2.2 container_name: grafana restart: unless-stopped volumes: - grafana-data:/var/lib/grafana ports: - "3000:3000" networks: - monitoring-network depends_on: - prometheus volumes: prometheus-data: grafana-data: networks: monitoring-network: driver: bridge ``` ## Troubleshooting ### Common Issues #### Service Won't Start **1. Check logs:** ```bash docker compose -f docker-compose/service.yml logs ``` **2. Validate configuration:** ```bash docker compose -f docker-compose/service.yml config ``` **3. Check for port conflicts:** ```bash # See what's using a port sudo netstat -tlnp | grep :8080 ``` **4. Verify image exists:** ```bash docker images | grep service-name ``` #### Permission Errors **1. Check PUID/PGID:** ```bash # Your user ID id -u # Your group ID id -g ``` **2. Fix directory permissions:** ```bash sudo chown -R 1000:1000 ./config/service-name ``` **3. Check volume permissions:** ```bash docker compose -f docker-compose/service.yml exec service-name ls -la /config ``` #### Network Connectivity Issues **1. Verify network exists:** ```bash docker network ls docker network inspect homelab-network ``` **2. Check if services are on same network:** ```bash docker network inspect homelab-network | grep Name ``` **3. Test connectivity:** ```bash docker compose -f docker-compose/service.yml exec service1 ping service2 ``` #### Container Keeps Restarting **1. Watch logs:** ```bash docker compose -f docker-compose/service.yml logs -f ``` **2. Check health status:** ```bash docker compose -f docker-compose/service.yml ps ``` **3. Inspect container:** ```bash docker inspect container-name ``` ### Debugging Commands ```bash # Enter running container docker compose -f docker-compose/service.yml exec service-name /bin/sh # View full container configuration docker inspect container-name # See resource usage docker stats container-name # View recent events docker events --since 10m # Check disk space docker system df ``` ### Recovery Procedures #### Service Corrupted ```bash # Stop service docker compose -f docker-compose/service.yml down # Remove container and volumes (backup first!) docker compose -f docker-compose/service.yml down -v # Recreate from scratch docker compose -f docker-compose/service.yml up -d ``` #### Network Issues ```bash # Remove and recreate network docker network rm homelab-network docker network create homelab-network # Restart services docker compose -f docker-compose/*.yml up -d ``` #### Full System Reset (Nuclear Option) ```bash # ⚠️ WARNING: This removes everything! # Backup first! # Stop all containers docker stop $(docker ps -aq) # Remove all containers docker rm $(docker ps -aq) # Remove all volumes (careful!) docker volume rm $(docker volume ls -q) # Remove all networks (except defaults) docker network prune -f # Rebuild from compose files docker compose -f docker-compose/*.yml up -d ``` ## Maintenance ### Regular Tasks **Weekly:** - Review logs for errors - Check disk space: `docker system df` - Update security patches on images **Monthly:** - Update images to latest versions - Review and prune unused resources - Backup volumes - Review and optimize compose files **Quarterly:** - Full stack review - Documentation update - Performance optimization - Security audit ### Update Procedure ```bash # 1. Backup current state docker compose -f docker-compose/service.yml config > backup/service-config.yml # 2. Update image version in compose file # Edit docker-compose/service.yml # 3. Pull new image docker compose -f docker-compose/service.yml pull # 4. Recreate service docker compose -f docker-compose/service.yml up -d # 5. Verify docker compose -f docker-compose/service.yml logs -f # 6. Test functionality # Access service and verify it works ``` ## AI Automation Guidelines ### Homepage Dashboard Management **Automatic Configuration Updates** Homepage configuration must be kept synchronized with deployed services. The AI assistant handles this automatically: **Template Location:** - Config templates: `/home/kelin/AI-Homelab/config-templates/homepage/` - Active configs: `/opt/stacks/homepage/config/` **Key Principles:** 1. **Hard-Coded URLs Required**: Homepage does NOT support variables in href links - Template uses `{{HOMEPAGE_VAR_DOMAIN}}` as placeholder - Active config uses `yourdomain.duckdns.org` hard-coded - AI must replace placeholders when deploying configs 2. **No Container Restart Needed**: Homepage picks up config changes instantly - Simply edit YAML files in `/opt/stacks/homepage/config/` - Refresh browser to see changes - DO NOT restart the container 3. **Stack-Based Organization**: Services grouped by their compose file - **Currently Installed**: Shows running services grouped by stack - **Available to Install**: Shows undeployed services from repository 4. **Automatic Updates Required**: AI must update Homepage configs when: - New service is deployed → Add to appropriate stack section - Service is removed → Remove from stack section - Domain/subdomain changes → Update all affected href URLs - Stack file is renamed → Update section headers **Configuration Structure:** ```yaml # services.yaml - Stack Name (compose-file.yml): - Service Name: icon: service.png href: https://subdomain.yourdomain.duckdns.org # Hard-coded! description: Service description ``` **Deployment Workflow:** ```bash # When deploying from template: cp /home/kelin/AI-Homelab/config-templates/homepage/*.yaml /opt/stacks/homepage/config/ sed -i 's/{{HOMEPAGE_VAR_DOMAIN}}/yourdomain.duckdns.org/g' /opt/stacks/homepage/config/services.yaml # No restart needed - configs load instantly ``` **Critical Reminder:** Homepage is the single source of truth for service inventory. Keep it updated or users won't know what's deployed. --- ## Conclusion Following these guidelines ensures: - Consistent infrastructure - Easy troubleshooting - Reproducible deployments - Maintainable system - Better security Remember: **Infrastructure as Code** means treating your Docker Compose files as critical documentation. Keep them clean, commented, and version-controlled.