Files
EZ-Homelab/docs/docker-guidelines.md
2026-01-24 23:11:05 -05:00

36 KiB

Docker Service Management Guidelines

Overview

This document provides comprehensive guidelines for managing Docker services in your AI-powered homelab using Dockge, Traefik, and Authelia. These guidelines ensure consistency, maintainability, and reliability across your entire infrastructure.

Table of Contents

  1. Philosophy
  2. Dockge Structure
  3. Traefik and Authelia Integration
  4. Docker Compose vs Docker Run
  5. Service Creation Guidelines
  6. Service Modification Guidelines
  7. Naming Conventions
  8. Network Architecture
  9. Volume Management
  10. Security Best Practices
  11. Monitoring and Logging
  12. Troubleshooting

Philosophy

Core Principles

  1. Dockge First: Manage all stacks through Dockge in /opt/stacks/
  2. Infrastructure as Code: All services defined in Docker Compose files
  3. File-Based Configuration: Traefik labels and Authelia YAML (AI-manageable)
  4. Reproducibility: Any service should be rebuildable from compose files
  5. Automatic HTTPS: All services routed through Traefik with Let's Encrypt
  6. Smart SSO: Authelia protects admin interfaces, bypasses media apps
  7. Documentation: Every non-obvious configuration must be commented
  8. Consistency: Use the same patterns across all services
  9. Safety First: Always test changes in isolation before deploying

The Stack Mindset

Think of your homelab as an interconnected stack where:

  • Services depend on networks (especially traefik-network)
  • Traefik routes all traffic with automatic SSL
  • Authelia protects sensitive services
  • VPN (Gluetun) secures downloads
  • Changes ripple through the system

Always ask: "How does this change affect other services and routing?"

Dockge Structure

Directory Organization

All stacks live in /opt/stacks/stack-name/:

/opt/stacks/
├── traefik/
│   ├── docker-compose.yml
│   ├── traefik.yml           # Static config
│   ├── dynamic/              # Dynamic routes
│   │   ├── routes.yml
│   │   └── external.yml      # External host proxying
│   ├── acme.json            # SSL certificates (chmod 600)
│   └── .env
├── authelia/
│   ├── docker-compose.yml
│   ├── configuration.yml     # Authelia settings
│   ├── users_database.yml    # User accounts
│   └── .env
├── media/
│   ├── docker-compose.yml
│   └── .env
└── ...

Why Dockge?

  • Visual Management: Web UI at https://dockge.${DOMAIN}
  • Direct File Editing: Edit compose files in-place
  • Stack Organization: Each service stack is independent
  • AI Compatible: Files can be managed by AI
  • Git Integration: Easy to version control

Storage Strategy

Small Data (configs, DBs < 10GB): /opt/stacks/stack-name/

volumes:
  - /opt/stacks/sonarr/config:/config

Large Data (media, downloads, backups): /mnt/

volumes:
  - /mnt/media/movies:/movies
  - /mnt/media/tv:/tv
  - /mnt/downloads:/downloads
  - /mnt/backups:/backups

AI will suggest /mnt/ when data may exceed 50GB or grow continuously.

Traefik and Authelia Integration

Every Local (on the same server) Service Needs Traefik Labels

Default Configuration: All services should use authelia SSO, traefik routing, and sablier lazy loading by default.

Standard pattern for all services using the standardized TRAEFIK CONFIGURATION format:

services:
  myservice:
    image: myimage:latest
    container_name: myservice
    networks:
      - homelab-network
      - traefik-network    # Required for Traefik
    labels:
      # TRAEFIK CONFIGURATION
      # ==========================================
      # Service metadata
      - "com.centurylinklabs.watchtower.enable=true"
      - "homelab.category=category-name"
      - "homelab.description=Brief service description"
      # Traefik labels
      - "traefik.enable=true"
      # Router configuration
      - "traefik.http.routers.myservice.rule=Host(`myservice.${DOMAIN}`)"
      - "traefik.http.routers.myservice.entrypoints=websecure"
      - "traefik.http.routers.myservice.tls.certresolver=letsencrypt"
      - "traefik.http.routers.myservice.middlewares=authelia@docker"
      # Service configuration
      - "traefik.http.services.myservice.loadbalancer.server.port=8080"
      # Sablier configuration
      - "sablier.enable=true"
      - "sablier.group=${SERVER_HOSTNAME}-myservice"
      - "sablier.start-on-demand=true"

Label Structure Explanation

Service Metadata Section:

  • com.centurylinklabs.watchtower.enable=true - Enables automatic container updates
  • homelab.category=category-name - Groups services by function (media, productivity, infrastructure, etc.)
  • homelab.description=Brief description - Documents service purpose

Router Configuration Section:

  • traefik.enable=true - Enables Traefik routing for this service
  • rule=Host(\myservice.${DOMAIN}`)` - Defines the domain routing rule
  • entrypoints=websecure - Routes through HTTPS entrypoint
  • tls.certresolver=letsencrypt - Enables automatic SSL certificates
  • middlewares=authelia@docker - Default: Enables SSO protection (remove line to disable)

Service Configuration Section:

  • loadbalancer.server.port=8080 - Specifies internal container port (if not 80)

Sablier Configuration Section:

  • sablier.enable=true - Default: Enables lazy loading (remove section to disable)
  • sablier.group=${SERVER_HOSTNAME}-myservice - Groups containers for coordinated startup
  • sablier.start-on-demand=true - Starts containers only when accessed

x-dockge Section: At the bottom of the compose file, add a top-level x-dockge section for service discovery in Dockge:

x-dockge:
  urls:
    - https://myservice.${DOMAIN}
    - http://localhost:8080  # Direct local access

If Traefik is on a Remote Server, configure routes & services on the Remote Server

When Traefik runs on a separate server from your application services, you cannot use Docker labels for configuration. Instead, create YAML files in the Traefik server's dynamic/ directory to define routes and services.

When to Use Remote Traefik Configuration

Use this approach when:

  • Traefik runs on a dedicated reverse proxy server
  • Application services run on separate application servers
  • You want centralized routing configuration
  • Docker labels cannot be used (different servers)

File Organization

Create one YAML file per application server in /opt/stacks/traefik/dynamic/:

/opt/stacks/traefik/dynamic/
├── server1.example.com.yml    # Services on server1
├── server2.example.com.yml    # Services on server2
├── shared-services.yml        # Common services
└── sablier.yml               # Sablier middlewares

YAML File Structure

Each server-specific YAML file should contain:

# /opt/stacks/traefik/dynamic/server1.example.com.yml
http:
  routers:
    # Router definitions for services on server1
    sonarr:
      rule: "Host(`sonarr.yourdomain.duckdns.org`)"
      entrypoints:
        - websecure
      tls:
        certresolver: letsencrypt
      middlewares:
        - authelia
        - sablier-server1-sonarr
      service: sonarr

    radarr:
      rule: "Host(`radarr.yourdomain.duckdns.org`)"
      entrypoints:
        - websecure
      tls:
        certresolver: letsencrypt
      middlewares:
        - authelia
        - sablier-server1-radarr
      service: radarr

  services:
    # Service definitions for services on server1
    sonarr:
      loadbalancer:
        servers:
          - url: "http://server1.example.com:8989"  # Internal IP/port of service
        passhostheader: true

    radarr:
      loadbalancer:
        servers:
          - url: "http://server1.example.com:7878"  # Internal IP/port of service
        passhostheader: true

Complete Example for a Media Server

# /opt/stacks/traefik/dynamic/media-server.yml
http:
  routers:
    jellyfin:
      rule: "Host(`jellyfin.yourdomain.duckdns.org`)"
      entrypoints:
        - websecure
      tls:
        certresolver: letsencrypt
      # No authelia for app access
      middlewares:
        - sablier-media-server-jellyfin
      service: jellyfin

    sonarr:
      rule: "Host(`sonarr.yourdomain.duckdns.org`)"
      entrypoints:
        - websecure
      tls:
        certresolver: letsencrypt
      middlewares:
        - authelia
        - sablier-media-server-sonarr
      service: sonarr

    radarr:
      rule: "Host(`radarr.yourdomain.duckdns.org`)"
      entrypoints:
        - websecure
      tls:
        certresolver: letsencrypt
      middlewares:
        - authelia
        - sablier-media-server-radarr
      service: radarr

  services:
    jellyfin:
      loadbalancer:
        servers:
          - url: "http://192.168.1.100:8096"  # Media server internal IP
        passhostheader: true

    sonarr:
      loadbalancer:
        servers:
          - url: "http://192.168.1.100:8989"  # Media server internal IP
        passhostheader: true

    radarr:
      loadbalancer:
        servers:
          - url: "http://192.168.1.100:7878"  # Media server internal IP
        passhostheader: true

Key Configuration Notes

Router Configuration:

  • rule: Domain matching rule (same as Docker labels)
  • entrypoints: Use websecure for HTTPS
  • tls.certresolver: Use letsencrypt for automatic SSL
  • middlewares: List of middlewares (authelia, sablier, custom)
  • service: Reference to service definition below

Service Configuration:

  • url: Internal IP address and port of the actual service
  • passhostheader: true: Required for most web applications
  • Use internal IPs, not public domains

Middleware References:

  • authelia: References the authelia middleware (defined in another file)
  • sablier-server1-sonarr: References sablier middleware for lazy loading
  • Custom middlewares can be added as needed

Deployment Process

  1. Create/Update YAML files in /opt/stacks/traefik/dynamic/
  2. Validate syntax:
    docker exec traefik traefik validate --configFile=/etc/traefik/traefik.yml
    
  3. Reload configuration (if hot-reload enabled) or restart Traefik
  4. Test services by accessing their domains
  5. Monitor logs for any routing errors

Migration from Docker Labels

When moving from Docker labels to YAML configuration:

  1. Copy router rules from Docker labels to YAML format
  2. Convert service ports to full URLs with internal IPs
  3. Ensure middlewares are properly referenced
  4. Remove Traefik labels from docker-compose files
  5. Test all services after migration

This approach provides centralized, version-controllable routing configuration while maintaining the same security and performance benefits as Docker label-based configuration.

When to Use Authelia SSO

Protect with Authelia (Default for all services):

  • Admin interfaces (Sonarr, Radarr, Prowlarr, etc.)
  • Infrastructure tools (Portainer, Dockge, Grafana)
  • Personal data (Nextcloud, Mealie, wikis)
  • Development tools (code-server, GitLab)
  • Monitoring dashboards

Bypass Authelia:

  • Media servers (Plex, Jellyfin) - need app access
  • Request services (Jellyseerr) - family-friendly access
  • Public services (WordPress, status pages)
  • Services with their own auth (Home Assistant)

Configure bypasses in /opt/stacks/authelia/configuration.yml:

access_control:
  rules:
    - domain: jellyfin.yourdomain.duckdns.org
      policy: bypass
    
    - domain: plex.yourdomain.duckdns.org
      policy: bypass

Routing Through VPN (Gluetun)

For services that need VPN (downloads):

services:
  mydownloader:
    image: downloader:latest
    container_name: mydownloader
    network_mode: "service:gluetun"  # Route through VPN
    depends_on:
      - gluetun

Expose ports through Gluetun's compose file:

# In gluetun.yml
gluetun:
  ports:
    - "8080:8080"  # mydownloader web UI

Docker Compose vs Docker Run

Docker Compose: For Everything Persistent

Use Docker Compose for:

  • All production services
  • Services that need to restart automatically
  • Multi-container applications
  • Services with complex configurations
  • Anything you want to keep long-term

Example:

# docker-compose/plex.yml
services:
  plex:
    image: plexinc/pms-docker:1.40.0.7998-f68041501
    container_name: plex
    restart: unless-stopped
    networks:
      - media-network
    ports:
      - "32400:32400"
    volumes:
      - ./config/plex:/config
      - /media:/media:ro
    environment:
      - PUID=1000
      - PGID=1000
      - TZ=America/New_York

Docker Run: For Temporary Operations Only

Use docker run for:

  • Testing new images
  • One-off commands
  • Debugging
  • Verification tasks (like GPU testing)

Examples:

# Test if NVIDIA GPU is accessible
docker run --rm --gpus all nvidia/cuda:12.0.0-base-ubuntu22.04 nvidia-smi

# Quick test of a new image
docker run --rm -it alpine:latest /bin/sh

# One-off database backup
docker run --rm -v mydata:/data busybox tar czf /backup/data.tar.gz /data

Service Creation Guidelines

Step-by-Step Process

1. Planning Phase

Before writing any YAML:

  • What problem does this service solve?
  • Does a similar service already exist?
  • What are the dependencies?
  • What ports are needed?
  • What data needs to persist?
  • What environment variables are required?
  • What networks should it connect to? (include traefik-network)
  • Are there any security considerations?
  • Should this service be protected by Authelia SSO? (default: yes)
  • Should this service use lazy loading? (default: yes)
  • What category does this service belong to? (media, productivity, infrastructure, etc.)
  • What subdomain should it use? (service-name.${DOMAIN})

2. Research Phase

  • Read the official image documentation
  • Check for a service-doc in the EZ-Homelab/docs/service-docs folder, if the new service doesn't have one, be prepared to create it at the end
  • Utilize https://awesome-docker-compose.com/apps
  • Check example configurations
  • Review resource requirements
  • Understand health check requirements
  • Note any special permissions needed

3. Implementation Phase

Start with a minimal configuration:

services:
  service-name:
    image: vendor/image:specific-version
    container_name: service-name
    restart: unless-stopped    # Set to 'no' if lazyloading (Sablier) is to be enabled

Add networks (required for Traefik):

    networks:
      - homelab-network
      - traefik-network    # Required for Traefik routing

Add ports (if externally accessible):

    ports:
      - "8080:8080"  # Web UI

Add volumes:

    volumes:
      - ./config/service-name:/config
      - service-data:/data

Add environment variables:

    environment:
      - PUID=1000
      - PGID=1000
      - TZ=${TIMEZONE}

Add health checks (if compatable):

    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:8080/health"]
      interval: 30s
      timeout: 10s
      retries: 3
      start_period: 40s

Add TRAEFIK CONFIGURATION labels (required for all web services):

    labels:
      # TRAEFIK CONFIGURATION
      # ==========================================
      # Service metadata
      - "com.centurylinklabs.watchtower.enable=true"
      - "homelab.category=category-name"
      - "homelab.description=Brief service description"
      # Traefik labels
      - "traefik.enable=true"
      # Router configuration
      - "traefik.http.routers.service-name.rule=Host(`service-name.${DOMAIN}`)"
      - "traefik.http.routers.service-name.entrypoints=websecure"
      - "traefik.http.routers.service-name.tls.certresolver=letsencrypt"
      - "traefik.http.routers.service-name.middlewares=authelia@docker"
      # Service configuration
      - "traefik.http.services.service-name.loadbalancer.server.port=8080"
      # Sablier configuration
      - "sablier.enable=true"
      - "sablier.group=${SERVER_HOSTNAME}-service-name"
      - "sablier.start-on-demand=true"

Add x-dockge section at the bottom of the compose file (before networks):

x-dockge:
  urls:
    - https://service-name.${DOMAIN}
    - http://${SERVER_IP}$:8080

volumes:
  service-data:
    driver: local

networks:
  traefik-network:
    external: true
  homelab-network:
    external: true

If Traefik & Sablier are on a remote server:

  • Comment out the traefik labels since they won't be used, don't delete them.
  • Notify user to add the service and middleware to the traefic external host yml file, and the sablier.yml file.

Example: Comment out Traefik labels in docker-compose.yml:

    labels:
      # TRAEFIK CONFIGURATION
      # ==========================================
      # Service metadata
      - "com.centurylinklabs.watchtower.enable=true"
      - "homelab.category=category-name"
      - "homelab.description=Brief service description"
      # Traefik labels - COMMENTED OUT for remote server
      # - "traefik.enable=true"
      # - "traefik.http.routers.service-name.rule=Host(`service-name.${DOMAIN}`)"
      # - "traefik.http.routers.service-name.entrypoints=websecure"
      # - "traefik.http.routers.service-name.tls.certresolver=letsencrypt"
      # - "traefik.http.routers.service-name.middlewares=authelia@docker"
      # - "traefik.http.services.service-name.loadbalancer.server.port=8080"
      # Sablier configuration 
      - "sablier.enable=true"
      - "sablier.group=${SERVER_HOSTNAME}-service-name"
      - "sablier.start-on-demand=true"

Required: Add to Traefik external host YAML file (e.g., /opt/stacks/traefik/dynamic/remote-host-server1.yml):

http:
  routers:
    service-name:
      rule: "Host(`service-name.yourdomain.duckdns.org`)"
      entrypoints:
        - websecure
      tls:
        certresolver: letsencrypt
      middlewares:
        - authelia
        - sablier-server1-service-name
      service: service-name

  services:
    service-name:
      loadbalancer:
        servers:
          - url: "http://192.168.1.100:8080"  # Internal IP of application server
        passhostheader: true

Required: Add to Sablier YAML file (e.g., /opt/stacks/traefik/dynamic/sablier.yml):

    sablier-server1-servicename:
      plugin:
        sablier:
          sablierUrl: http://sablier-service:10000
          group: server1-servicename
          sessionDuration: 5m
          ignoreUserAgent: curl
          dynamic:
            displayName: Service Name
            theme: ghost
            show-details-by-default: true

Deployment Steps:

  1. Comment out Traefik labels in the service's docker-compose.yml
  2. Add router and service definitions to the appropriate Traefik dynamic YAML file
  3. Add sablier middleware to the sablier.yml file
  4. Validate Traefik configuration: docker exec traefik traefik validate --configFile=/etc/traefik/traefik.yml
  5. Restart Traefik or wait for hot-reload
  6. Test service access through Traefik

4. Testing Phase

# Validate syntax
docker compose -f docker-compose/service.yml config

# Start in foreground to see logs
docker compose -f docker-compose/service.yml up

# If successful, restart in background
docker compose -f docker-compose/service.yml down
docker compose -f docker-compose/service.yml up -d

5. Documentation Phase

Add comments to your compose file:

services:
  sonarr:
    image: lscr.io/linuxserver/sonarr:4.0.0
    container_name: sonarr
    # Sonarr - TV Show management and automation
    # Protected by: Authelia SSO, Sablier lazy loading
    restart: no

Update your main README or service-specific README with:

  • Service purpose
  • Access URLs (Traefik HTTPS URLs)
  • Default credentials (if any)
  • Configuration notes (SSO enabled/disabled, lazy loading, etc.)
  • Backup instructions
  • Any special routing considerations (VPN, remote server, etc.)

If the service doesn't already have a service doc in EZ-Homelab/docs/service-docs folder, create it using the compiled information about the service with the same format as the other service-docs

Service Modification Guidelines

Before Modifying

  1. Back up current configuration:

    cp docker-compose/service.yml docker-compose/service.yml.backup
    
  2. Document why you're making the change

    • Create a comment in the compose file
    • Note in your changelog or docs
  3. Understand the current state:

    # Check if service is running
    docker compose -f docker-compose/service.yml ps
    
    # Review current configuration
    docker compose -f docker-compose/service.yml config
    
    # Check logs for any existing issues
    docker compose -f docker-compose/service.yml logs --tail=50
    

Making the Change

  1. Edit the compose file

    • Make minimal, targeted changes
    • Keep existing structure when possible
    • Add comments for new configurations
  2. Validate syntax:

    docker compose -f docker-compose/service.yml config
    
  3. Apply the change:

    # Pull new image if version changed
    docker compose -f docker-compose/service.yml pull
    
    # Recreate the service
    docker compose -f docker-compose/service.yml up -d
    
  4. Verify the change:

    # Check service is running
    docker compose -f docker-compose/service.yml ps
    
    # Watch logs for errors
    docker compose -f docker-compose/service.yml logs -f
    
    # Test functionality
    curl http://localhost:port/health
    

Rollback Plan

If something goes wrong:

# Stop the service
docker compose -f docker-compose/service.yml down

# Restore backup
mv docker-compose/service.yml.backup docker-compose/service.yml

# Restart with old configuration
docker compose -f docker-compose/service.yml up -d

Common Modifications

Add TRAEFIK CONFIGURATION to existing service:

    labels:
      # TRAEFIK CONFIGURATION
      # ==========================================
      # Service metadata
      - "com.centurylinklabs.watchtower.enable=true"
      - "homelab.category=category-name"
      - "homelab.description=Brief service description"
      # Traefik labels
      - "traefik.enable=true"
      # Router configuration
      - "traefik.http.routers.service-name.rule=Host(`service-name.${DOMAIN}`)"
      - "traefik.http.routers.service-name.entrypoints=websecure"
      - "traefik.http.routers.service-name.tls.certresolver=letsencrypt"
      - "traefik.http.routers.service-name.middlewares=authelia@docker"
      # Service configuration
      - "traefik.http.services.service-name.loadbalancer.server.port=8080"
      # Sablier configuration
      - "sablier.enable=true"
      - "sablier.group=${SERVER_HOSTNAME}-service-name"
      - "sablier.start-on-demand=true"

Toggle SSO: Comment/uncomment the Authelia middleware label:

# Enable SSO (default)
- "traefik.http.routers.service.middlewares=authelia@docker"

# Disable SSO (remove line for media servers, public services)
# - "traefik.http.routers.service.middlewares=authelia@docker"

Toggle Lazy Loading: Comment/uncomment Sablier labels:

# Enable lazy loading (default)
- "sablier.enable=true"
- "sablier.group=${SERVER_HOSTNAME}-service"
- "sablier.start-on-demand=true"

# Disable lazy loading (remove section for always-on services)
# - "sablier.enable=true"
# - "sablier.group=${SERVER_HOSTNAME}-service"
# - "sablier.start-on-demand=true"

Change Port: Update the loadbalancer server port:

- "traefik.http.services.service.loadbalancer.server.port=8080"

Add VPN Routing: Change network mode and update Gluetun ports:

network_mode: "service:gluetun"
# Add port mapping in Gluetun service

Update Subdomain: Modify the Host rule:

- "traefik.http.routers.service.rule=Host(`newservice.${DOMAIN}`)"

Naming Conventions

Service Names

Use lowercase with hyphens:

  • plex-media-server
  • home-assistant
  • PlexMediaServer
  • home_assistant

Container Names

Match service names or be descriptive:

services:
  plex:
    container_name: plex  # Simple match
  
  database:
    container_name: media-database  # Descriptive

Network Names

Use purpose-based naming:

  • homelab-network - Main network
  • media-network - Media services
  • monitoring-network - Observability stack
  • isolated-network - Untrusted services

Volume Names

Use service-purpose pattern:

volumes:
  plex-config:
  plex-metadata:
  database-data:
  nginx-certs:

File Names

Organize by function:

  • docker-compose/media.yml - Media services (Plex, Jellyfin, etc.)
  • docker-compose/monitoring.yml - Monitoring stack
  • docker-compose/infrastructure.yml - Core services (DNS, reverse proxy)
  • docker-compose/development.yml - Dev tools

Network Architecture

Network Types

  1. Bridge Networks (Most Common)

    networks:
      homelab-network:
        driver: bridge
        ipam:
          config:
            - subnet: 172.20.0.0/16
    
  2. Host Network (When Performance Critical)

    services:
      performance-critical:
        network_mode: host
    
  3. Overlay Networks (For Swarm/Multi-host)

    networks:
      swarm-network:
        driver: overlay
    

Network Design Patterns

Pattern 1: Single Shared Network

Simplest approach for small homelabs:

networks:
  homelab-network:
    external: true

Create once manually:

docker network create homelab-network

Pattern 2: Segmented Networks

Better security through isolation:

networks:
  frontend-network:  # Web-facing services
  backend-network:   # Databases, internal services
  monitoring-network:  # Observability

Pattern 3: Service-Specific Networks

Each service group has its own network:

services:
  web:
    networks:
      - frontend
      - backend
  
  database:
    networks:
      - backend  # Not exposed to frontend

Network Security

  • Place databases on internal networks only
  • Use separate networks for untrusted services
  • Expose minimal ports to the host
  • Use reverse proxies for web services

Volume Management

Volume Types

Named Volumes (Managed by Docker)

volumes:
  database-data:
    driver: local

Use for:

  • Database files
  • Application data
  • Anything Docker should manage

Advantages:

  • Docker handles permissions
  • Easy to backup/restore
  • Portable across systems

Bind Mounts (Direct Host Paths)

volumes:
  - ./config/app:/config
  - /media:/media:ro

Use for:

  • Configuration files you edit directly
  • Large media libraries
  • Shared data with host

Advantages:

  • Direct file access
  • Easy to edit
  • Can share with host applications

tmpfs Mounts (RAM)

tmpfs:
  - /tmp

Use for:

  • Temporary data
  • Cache that doesn't need persistence
  • Sensitive data that shouldn't touch disk

Volume Best Practices

  1. Consistent Paths:

    volumes:
      - ./config/service:/config  # Always use /config inside container
      - service-data:/data         # Always use /data for application data
    
  2. Read-Only When Possible:

    volumes:
      - /media:/media:ro  # Media library is read-only
    
  3. Separate Config from Data:

    volumes:
      - ./config/plex:/config      # Editable configuration
      - plex-metadata:/metadata    # Application-managed data
    
  4. Backup Strategy:

    # Backup named volume
    docker run --rm \
      -v plex-metadata:/data \
      -v $(pwd)/backups:/backup \
      busybox tar czf /backup/plex-metadata.tar.gz /data
    

Security Best Practices

1. Image Security

Pin Specific Versions:

# ✅ Good - Specific version
image: nginx:1.25.3-alpine

# ❌ Bad - Latest tag
image: nginx:latest

Use Official or Trusted Images:

  • Official Docker images
  • LinuxServer.io (lscr.io)
  • Trusted vendors

Scan Images:

docker scan vendor/image:tag

2. Secret Management

Never Commit Secrets:

# .env file (gitignored)
DB_PASSWORD=super-secret-password
API_KEY=sk-1234567890

# docker-compose.yml
environment:
  - DB_PASSWORD=${DB_PASSWORD}
  - API_KEY=${API_KEY}

Provide Templates:

# .env.example (committed)
DB_PASSWORD=changeme
API_KEY=your-api-key-here

3. User Permissions

Run as Non-Root:

environment:
  - PUID=1000  # Your user ID
  - PGID=1000  # Your group ID

Check Current User:

id -u  # Gets your UID
id -g  # Gets your GID

4. Network Security

Minimal Exposure:

# ✅ Good - Only expose what's needed
ports:
  - "127.0.0.1:8080:8080"  # Only accessible from localhost

# ❌ Bad - Exposed to all interfaces
ports:
  - "8080:8080"

Use Reverse Proxy:

# Don't expose services directly
# Use Nginx/Traefik to proxy with SSL

5. Resource Limits

Prevent Resource Exhaustion:

deploy:
  resources:
    limits:
      cpus: '2'
      memory: 4G
    reservations:
      cpus: '0.5'
      memory: 1G

Monitoring and Logging

Logging Configuration

Standard Logging:

logging:
  driver: "json-file"
  options:
    max-size: "10m"
    max-file: "3"

Centralized Logging:

logging:
  driver: "syslog"
  options:
    syslog-address: "tcp://192.168.1.100:514"

Health Checks

HTTP Health Check:

healthcheck:
  test: ["CMD", "curl", "-f", "http://localhost:8080/health"]
  interval: 30s
  timeout: 10s
  retries: 3

TCP Health Check:

healthcheck:
  test: ["CMD-SHELL", "nc -z localhost 5432 || exit 1"]
  interval: 30s
  timeout: 5s
  retries: 3

Custom Script:

healthcheck:
  test: ["CMD", "/healthcheck.sh"]
  interval: 30s
  timeout: 10s
  retries: 3

Monitoring Stack Example

# docker-compose/monitoring.yml
services:
  prometheus:
    image: prom/prometheus:v2.48.0
    container_name: prometheus
    restart: unless-stopped
    volumes:
      - ./config/prometheus:/etc/prometheus
      - prometheus-data:/prometheus
    ports:
      - "9090:9090"
    networks:
      - monitoring-network

  grafana:
    image: grafana/grafana:10.2.2
    container_name: grafana
    restart: unless-stopped
    volumes:
      - grafana-data:/var/lib/grafana
    ports:
      - "3000:3000"
    networks:
      - monitoring-network
    depends_on:
      - prometheus

volumes:
  prometheus-data:
  grafana-data:

networks:
  monitoring-network:
    driver: bridge

Troubleshooting

Common Issues

Service Won't Start

1. Check logs:

docker compose -f docker-compose/service.yml logs

2. Validate configuration:

docker compose -f docker-compose/service.yml config

3. Check for port conflicts:

# See what's using a port
sudo netstat -tlnp | grep :8080

4. Verify image exists:

docker images | grep service-name

Permission Errors

1. Check PUID/PGID:

# Your user ID
id -u

# Your group ID
id -g

2. Fix directory permissions:

sudo chown -R 1000:1000 ./config/service-name

3. Check volume permissions:

docker compose -f docker-compose/service.yml exec service-name ls -la /config

Network Connectivity Issues

1. Verify network exists:

docker network ls
docker network inspect homelab-network

2. Check if services are on same network:

docker network inspect homelab-network | grep Name

3. Test connectivity:

docker compose -f docker-compose/service.yml exec service1 ping service2

Container Keeps Restarting

1. Watch logs:

docker compose -f docker-compose/service.yml logs -f

2. Check health status:

docker compose -f docker-compose/service.yml ps

3. Inspect container:

docker inspect container-name

Debugging Commands

# Enter running container
docker compose -f docker-compose/service.yml exec service-name /bin/sh

# View full container configuration
docker inspect container-name

# See resource usage
docker stats container-name

# View recent events
docker events --since 10m

# Check disk space
docker system df

Recovery Procedures

Service Corrupted

# Stop service
docker compose -f docker-compose/service.yml down

# Remove container and volumes (backup first!)
docker compose -f docker-compose/service.yml down -v

# Recreate from scratch
docker compose -f docker-compose/service.yml up -d

Network Issues

# Remove and recreate network
docker network rm homelab-network
docker network create homelab-network

# Restart services
docker compose -f docker-compose/*.yml up -d

Full System Reset (Nuclear Option)

# ⚠️ WARNING: This removes everything!
# Backup first!

# Stop all containers
docker stop $(docker ps -aq)

# Remove all containers
docker rm $(docker ps -aq)

# Remove all volumes (careful!)
docker volume rm $(docker volume ls -q)

# Remove all networks (except defaults)
docker network prune -f

# Rebuild from compose files
docker compose -f docker-compose/*.yml up -d

Maintenance

Regular Tasks

Weekly:

  • Review logs for errors
  • Check disk space: docker system df
  • Update security patches on images

Monthly:

  • Update images to latest versions
  • Review and prune unused resources
  • Backup volumes
  • Review and optimize compose files

Quarterly:

  • Full stack review
  • Documentation update
  • Performance optimization
  • Security audit

Update Procedure

# 1. Backup current state
docker compose -f docker-compose/service.yml config > backup/service-config.yml

# 2. Update image version in compose file
# Edit docker-compose/service.yml

# 3. Pull new image
docker compose -f docker-compose/service.yml pull

# 4. Recreate service
docker compose -f docker-compose/service.yml up -d

# 5. Verify
docker compose -f docker-compose/service.yml logs -f

# 6. Test functionality
# Access service and verify it works

AI Automation Guidelines

Homepage Dashboard Management

Automatic Configuration Updates

Homepage configuration must be kept synchronized with deployed services. The AI assistant handles this automatically:

Template Location:

  • Config templates: /home/kelin/AI-Homelab/config-templates/homepage/
  • Active configs: /opt/stacks/homepage/config/

Key Principles:

  1. Hard-Coded URLs Required: Homepage does NOT support variables in href links

    • Template uses {{HOMEPAGE_VAR_DOMAIN}} as placeholder
    • Active config uses kelin-hass.duckdns.org hard-coded
    • AI must replace placeholders when deploying configs
  2. No Container Restart Needed: Homepage picks up config changes instantly

    • Simply edit YAML files in /opt/stacks/homepage/config/
    • Refresh browser to see changes
    • DO NOT restart the container
  3. Stack-Based Organization: Services grouped by their compose file

    • Currently Installed: Shows running services grouped by stack
    • Available to Install: Shows undeployed services from repository
  4. Automatic Updates Required: AI must update Homepage configs when:

    • New service is deployed → Add to appropriate stack section
    • Service is removed → Remove from stack section
    • Domain/subdomain changes → Update all affected href URLs
    • Stack file is renamed → Update section headers

Configuration Structure:

# services.yaml
- Stack Name (compose-file.yml):
    - Service Name:
        icon: service.png
        href: https://subdomain.kelin-hass.duckdns.org  # Hard-coded!
        description: Service description

Deployment Workflow:

# When deploying from template:
cp /home/kelin/AI-Homelab/config-templates/homepage/*.yaml /opt/stacks/homepage/config/
sed -i 's/{{HOMEPAGE_VAR_DOMAIN}}/kelin-hass.duckdns.org/g' /opt/stacks/homepage/config/services.yaml

# No restart needed - configs load instantly

Critical Reminder: Homepage is the single source of truth for service inventory. Keep it updated or users won't know what's deployed.


Conclusion

Following these guidelines ensures:

  • Consistent infrastructure
  • Easy troubleshooting
  • Reproducible deployments
  • Maintainable system
  • Better security

Remember: Infrastructure as Code means treating your Docker Compose files as critical documentation. Keep them clean, commented, and version-controlled.