36 KiB
Docker Service Management Guidelines
Overview
This document provides comprehensive guidelines for managing Docker services in your AI-powered homelab using Dockge, Traefik, and Authelia. These guidelines ensure consistency, maintainability, and reliability across your entire infrastructure.
Table of Contents
- Philosophy
- Dockge Structure
- Traefik and Authelia Integration
- Docker Compose vs Docker Run
- Service Creation Guidelines
- Service Modification Guidelines
- Naming Conventions
- Network Architecture
- Volume Management
- Security Best Practices
- Monitoring and Logging
- Troubleshooting
Philosophy
Core Principles
- Dockge First: Manage all stacks through Dockge in
/opt/stacks/ - Infrastructure as Code: All services defined in Docker Compose files
- File-Based Configuration: Traefik labels and Authelia YAML (AI-manageable)
- Reproducibility: Any service should be rebuildable from compose files
- Automatic HTTPS: All services routed through Traefik with Let's Encrypt
- Smart SSO: Authelia protects admin interfaces, bypasses media apps
- Documentation: Every non-obvious configuration must be commented
- Consistency: Use the same patterns across all services
- Safety First: Always test changes in isolation before deploying
The Stack Mindset
Think of your homelab as an interconnected stack where:
- Services depend on networks (especially traefik-network)
- Traefik routes all traffic with automatic SSL
- Authelia protects sensitive services
- VPN (Gluetun) secures downloads
- Changes ripple through the system
Always ask: "How does this change affect other services and routing?"
Dockge Structure
Directory Organization
All stacks live in /opt/stacks/stack-name/:
/opt/stacks/
├── traefik/
│ ├── docker-compose.yml
│ ├── traefik.yml # Static config
│ ├── dynamic/ # Dynamic routes
│ │ ├── routes.yml
│ │ └── external.yml # External host proxying
│ ├── acme.json # SSL certificates (chmod 600)
│ └── .env
├── authelia/
│ ├── docker-compose.yml
│ ├── configuration.yml # Authelia settings
│ ├── users_database.yml # User accounts
│ └── .env
├── media/
│ ├── docker-compose.yml
│ └── .env
└── ...
Why Dockge?
- Visual Management: Web UI at
https://dockge.${DOMAIN} - Direct File Editing: Edit compose files in-place
- Stack Organization: Each service stack is independent
- AI Compatible: Files can be managed by AI
- Git Integration: Easy to version control
Storage Strategy
Small Data (configs, DBs < 10GB): /opt/stacks/stack-name/
volumes:
- /opt/stacks/sonarr/config:/config
Large Data (media, downloads, backups): /mnt/
volumes:
- /mnt/media/movies:/movies
- /mnt/media/tv:/tv
- /mnt/downloads:/downloads
- /mnt/backups:/backups
AI will suggest /mnt/ when data may exceed 50GB or grow continuously.
Traefik and Authelia Integration
Every Local (on the same server) Service Needs Traefik Labels
Default Configuration: All services should use authelia SSO, traefik routing, and sablier lazy loading by default.
Standard pattern for all services using the standardized TRAEFIK CONFIGURATION format:
services:
myservice:
image: myimage:latest
container_name: myservice
networks:
- homelab-network
- traefik-network # Required for Traefik
labels:
# TRAEFIK CONFIGURATION
# ==========================================
# Service metadata
- "com.centurylinklabs.watchtower.enable=true"
- "homelab.category=category-name"
- "homelab.description=Brief service description"
# Traefik labels
- "traefik.enable=true"
# Router configuration
- "traefik.http.routers.myservice.rule=Host(`myservice.${DOMAIN}`)"
- "traefik.http.routers.myservice.entrypoints=websecure"
- "traefik.http.routers.myservice.tls.certresolver=letsencrypt"
- "traefik.http.routers.myservice.middlewares=authelia@docker"
# Service configuration
- "traefik.http.services.myservice.loadbalancer.server.port=8080"
# Sablier configuration
- "sablier.enable=true"
- "sablier.group=${SERVER_HOSTNAME}-myservice"
- "sablier.start-on-demand=true"
Label Structure Explanation
Service Metadata Section:
com.centurylinklabs.watchtower.enable=true- Enables automatic container updateshomelab.category=category-name- Groups services by function (media, productivity, infrastructure, etc.)homelab.description=Brief description- Documents service purpose
Router Configuration Section:
traefik.enable=true- Enables Traefik routing for this servicerule=Host(\myservice.${DOMAIN}`)` - Defines the domain routing ruleentrypoints=websecure- Routes through HTTPS entrypointtls.certresolver=letsencrypt- Enables automatic SSL certificatesmiddlewares=authelia@docker- Default: Enables SSO protection (remove line to disable)
Service Configuration Section:
loadbalancer.server.port=8080- Specifies internal container port (if not 80)
Sablier Configuration Section:
sablier.enable=true- Default: Enables lazy loading (remove section to disable)sablier.group=${SERVER_HOSTNAME}-myservice- Groups containers for coordinated startupsablier.start-on-demand=true- Starts containers only when accessed
x-dockge Section:
At the bottom of the compose file, add a top-level x-dockge section for service discovery in Dockge:
x-dockge:
urls:
- https://myservice.${DOMAIN}
- http://localhost:8080 # Direct local access
If Traefik is on a Remote Server, configure routes & services on the Remote Server
When Traefik runs on a separate server from your application services, you cannot use Docker labels for configuration. Instead, create YAML files in the Traefik server's dynamic/ directory to define routes and services.
When to Use Remote Traefik Configuration
Use this approach when:
- Traefik runs on a dedicated reverse proxy server
- Application services run on separate application servers
- You want centralized routing configuration
- Docker labels cannot be used (different servers)
File Organization
Create one YAML file per application server in /opt/stacks/traefik/dynamic/:
/opt/stacks/traefik/dynamic/
├── server1.example.com.yml # Services on server1
├── server2.example.com.yml # Services on server2
├── shared-services.yml # Common services
└── sablier.yml # Sablier middlewares
YAML File Structure
Each server-specific YAML file should contain:
# /opt/stacks/traefik/dynamic/server1.example.com.yml
http:
routers:
# Router definitions for services on server1
sonarr:
rule: "Host(`sonarr.yourdomain.duckdns.org`)"
entrypoints:
- websecure
tls:
certresolver: letsencrypt
middlewares:
- authelia
- sablier-server1-sonarr
service: sonarr
radarr:
rule: "Host(`radarr.yourdomain.duckdns.org`)"
entrypoints:
- websecure
tls:
certresolver: letsencrypt
middlewares:
- authelia
- sablier-server1-radarr
service: radarr
services:
# Service definitions for services on server1
sonarr:
loadbalancer:
servers:
- url: "http://server1.example.com:8989" # Internal IP/port of service
passhostheader: true
radarr:
loadbalancer:
servers:
- url: "http://server1.example.com:7878" # Internal IP/port of service
passhostheader: true
Complete Example for a Media Server
# /opt/stacks/traefik/dynamic/media-server.yml
http:
routers:
jellyfin:
rule: "Host(`jellyfin.yourdomain.duckdns.org`)"
entrypoints:
- websecure
tls:
certresolver: letsencrypt
# No authelia for app access
middlewares:
- sablier-media-server-jellyfin
service: jellyfin
sonarr:
rule: "Host(`sonarr.yourdomain.duckdns.org`)"
entrypoints:
- websecure
tls:
certresolver: letsencrypt
middlewares:
- authelia
- sablier-media-server-sonarr
service: sonarr
radarr:
rule: "Host(`radarr.yourdomain.duckdns.org`)"
entrypoints:
- websecure
tls:
certresolver: letsencrypt
middlewares:
- authelia
- sablier-media-server-radarr
service: radarr
services:
jellyfin:
loadbalancer:
servers:
- url: "http://192.168.1.100:8096" # Media server internal IP
passhostheader: true
sonarr:
loadbalancer:
servers:
- url: "http://192.168.1.100:8989" # Media server internal IP
passhostheader: true
radarr:
loadbalancer:
servers:
- url: "http://192.168.1.100:7878" # Media server internal IP
passhostheader: true
Key Configuration Notes
Router Configuration:
rule: Domain matching rule (same as Docker labels)entrypoints: Usewebsecurefor HTTPStls.certresolver: Useletsencryptfor automatic SSLmiddlewares: List of middlewares (authelia, sablier, custom)service: Reference to service definition below
Service Configuration:
url: Internal IP address and port of the actual servicepasshostheader: true: Required for most web applications- Use internal IPs, not public domains
Middleware References:
authelia: References the authelia middleware (defined in another file)sablier-server1-sonarr: References sablier middleware for lazy loading- Custom middlewares can be added as needed
Deployment Process
- Create/Update YAML files in
/opt/stacks/traefik/dynamic/ - Validate syntax:
docker exec traefik traefik validate --configFile=/etc/traefik/traefik.yml - Reload configuration (if hot-reload enabled) or restart Traefik
- Test services by accessing their domains
- Monitor logs for any routing errors
Migration from Docker Labels
When moving from Docker labels to YAML configuration:
- Copy router rules from Docker labels to YAML format
- Convert service ports to full URLs with internal IPs
- Ensure middlewares are properly referenced
- Remove Traefik labels from docker-compose files
- Test all services after migration
This approach provides centralized, version-controllable routing configuration while maintaining the same security and performance benefits as Docker label-based configuration.
When to Use Authelia SSO
Protect with Authelia (Default for all services):
- Admin interfaces (Sonarr, Radarr, Prowlarr, etc.)
- Infrastructure tools (Portainer, Dockge, Grafana)
- Personal data (Nextcloud, Mealie, wikis)
- Development tools (code-server, GitLab)
- Monitoring dashboards
Bypass Authelia:
- Media servers (Plex, Jellyfin) - need app access
- Request services (Jellyseerr) - family-friendly access
- Public services (WordPress, status pages)
- Services with their own auth (Home Assistant)
Configure bypasses in /opt/stacks/authelia/configuration.yml:
access_control:
rules:
- domain: jellyfin.yourdomain.duckdns.org
policy: bypass
- domain: plex.yourdomain.duckdns.org
policy: bypass
Routing Through VPN (Gluetun)
For services that need VPN (downloads):
services:
mydownloader:
image: downloader:latest
container_name: mydownloader
network_mode: "service:gluetun" # Route through VPN
depends_on:
- gluetun
Expose ports through Gluetun's compose file:
# In gluetun.yml
gluetun:
ports:
- "8080:8080" # mydownloader web UI
Docker Compose vs Docker Run
Docker Compose: For Everything Persistent
Use Docker Compose for:
- All production services
- Services that need to restart automatically
- Multi-container applications
- Services with complex configurations
- Anything you want to keep long-term
Example:
# docker-compose/plex.yml
services:
plex:
image: plexinc/pms-docker:1.40.0.7998-f68041501
container_name: plex
restart: unless-stopped
networks:
- media-network
ports:
- "32400:32400"
volumes:
- ./config/plex:/config
- /media:/media:ro
environment:
- PUID=1000
- PGID=1000
- TZ=America/New_York
Docker Run: For Temporary Operations Only
Use docker run for:
- Testing new images
- One-off commands
- Debugging
- Verification tasks (like GPU testing)
Examples:
# Test if NVIDIA GPU is accessible
docker run --rm --gpus all nvidia/cuda:12.0.0-base-ubuntu22.04 nvidia-smi
# Quick test of a new image
docker run --rm -it alpine:latest /bin/sh
# One-off database backup
docker run --rm -v mydata:/data busybox tar czf /backup/data.tar.gz /data
Service Creation Guidelines
Step-by-Step Process
1. Planning Phase
Before writing any YAML:
- What problem does this service solve?
- Does a similar service already exist?
- What are the dependencies?
- What ports are needed?
- What data needs to persist?
- What environment variables are required?
- What networks should it connect to? (include
traefik-network) - Are there any security considerations?
- Should this service be protected by Authelia SSO? (default: yes)
- Should this service use lazy loading? (default: yes)
- What category does this service belong to? (media, productivity, infrastructure, etc.)
- What subdomain should it use? (service-name.${DOMAIN})
2. Research Phase
- Read the official image documentation
- Check for a service-doc in the EZ-Homelab/docs/service-docs folder, if the new service doesn't have one, be prepared to create it at the end
- Utilize https://awesome-docker-compose.com/apps
- Check example configurations
- Review resource requirements
- Understand health check requirements
- Note any special permissions needed
3. Implementation Phase
Start with a minimal configuration:
services:
service-name:
image: vendor/image:specific-version
container_name: service-name
restart: unless-stopped # Set to 'no' if lazyloading (Sablier) is to be enabled
Add networks (required for Traefik):
networks:
- homelab-network
- traefik-network # Required for Traefik routing
Add ports (if externally accessible):
ports:
- "8080:8080" # Web UI
Add volumes:
volumes:
- ./config/service-name:/config
- service-data:/data
Add environment variables:
environment:
- PUID=1000
- PGID=1000
- TZ=${TIMEZONE}
Add health checks (if compatable):
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:8080/health"]
interval: 30s
timeout: 10s
retries: 3
start_period: 40s
Add TRAEFIK CONFIGURATION labels (required for all web services):
labels:
# TRAEFIK CONFIGURATION
# ==========================================
# Service metadata
- "com.centurylinklabs.watchtower.enable=true"
- "homelab.category=category-name"
- "homelab.description=Brief service description"
# Traefik labels
- "traefik.enable=true"
# Router configuration
- "traefik.http.routers.service-name.rule=Host(`service-name.${DOMAIN}`)"
- "traefik.http.routers.service-name.entrypoints=websecure"
- "traefik.http.routers.service-name.tls.certresolver=letsencrypt"
- "traefik.http.routers.service-name.middlewares=authelia@docker"
# Service configuration
- "traefik.http.services.service-name.loadbalancer.server.port=8080"
# Sablier configuration
- "sablier.enable=true"
- "sablier.group=${SERVER_HOSTNAME}-service-name"
- "sablier.start-on-demand=true"
Add x-dockge section at the bottom of the compose file (before networks):
x-dockge:
urls:
- https://service-name.${DOMAIN}
- http://${SERVER_IP}$:8080
volumes:
service-data:
driver: local
networks:
traefik-network:
external: true
homelab-network:
external: true
If Traefik & Sablier are on a remote server:
- Comment out the traefik labels since they won't be used, don't delete them.
- Notify user to add the service and middleware to the traefic external host yml file, and the sablier.yml file.
Example: Comment out Traefik labels in docker-compose.yml:
labels:
# TRAEFIK CONFIGURATION
# ==========================================
# Service metadata
- "com.centurylinklabs.watchtower.enable=true"
- "homelab.category=category-name"
- "homelab.description=Brief service description"
# Traefik labels - COMMENTED OUT for remote server
# - "traefik.enable=true"
# - "traefik.http.routers.service-name.rule=Host(`service-name.${DOMAIN}`)"
# - "traefik.http.routers.service-name.entrypoints=websecure"
# - "traefik.http.routers.service-name.tls.certresolver=letsencrypt"
# - "traefik.http.routers.service-name.middlewares=authelia@docker"
# - "traefik.http.services.service-name.loadbalancer.server.port=8080"
# Sablier configuration
- "sablier.enable=true"
- "sablier.group=${SERVER_HOSTNAME}-service-name"
- "sablier.start-on-demand=true"
Required: Add to Traefik external host YAML file (e.g., /opt/stacks/traefik/dynamic/remote-host-server1.yml):
http:
routers:
service-name:
rule: "Host(`service-name.yourdomain.duckdns.org`)"
entrypoints:
- websecure
tls:
certresolver: letsencrypt
middlewares:
- authelia
- sablier-server1-service-name
service: service-name
services:
service-name:
loadbalancer:
servers:
- url: "http://192.168.1.100:8080" # Internal IP of application server
passhostheader: true
Required: Add to Sablier YAML file (e.g., /opt/stacks/traefik/dynamic/sablier.yml):
sablier-server1-servicename:
plugin:
sablier:
sablierUrl: http://sablier-service:10000
group: server1-servicename
sessionDuration: 5m
ignoreUserAgent: curl
dynamic:
displayName: Service Name
theme: ghost
show-details-by-default: true
Deployment Steps:
- Comment out Traefik labels in the service's docker-compose.yml
- Add router and service definitions to the appropriate Traefik dynamic YAML file
- Add sablier middleware to the sablier.yml file
- Validate Traefik configuration:
docker exec traefik traefik validate --configFile=/etc/traefik/traefik.yml - Restart Traefik or wait for hot-reload
- Test service access through Traefik
4. Testing Phase
# Validate syntax
docker compose -f docker-compose/service.yml config
# Start in foreground to see logs
docker compose -f docker-compose/service.yml up
# If successful, restart in background
docker compose -f docker-compose/service.yml down
docker compose -f docker-compose/service.yml up -d
5. Documentation Phase
Add comments to your compose file:
services:
sonarr:
image: lscr.io/linuxserver/sonarr:4.0.0
container_name: sonarr
# Sonarr - TV Show management and automation
# Protected by: Authelia SSO, Sablier lazy loading
restart: no
Update your main README or service-specific README with:
- Service purpose
- Access URLs (Traefik HTTPS URLs)
- Default credentials (if any)
- Configuration notes (SSO enabled/disabled, lazy loading, etc.)
- Backup instructions
- Any special routing considerations (VPN, remote server, etc.)
If the service doesn't already have a service doc in EZ-Homelab/docs/service-docs folder, create it using the compiled information about the service with the same format as the other service-docs
Service Modification Guidelines
Before Modifying
-
Back up current configuration:
cp docker-compose/service.yml docker-compose/service.yml.backup -
Document why you're making the change
- Create a comment in the compose file
- Note in your changelog or docs
-
Understand the current state:
# Check if service is running docker compose -f docker-compose/service.yml ps # Review current configuration docker compose -f docker-compose/service.yml config # Check logs for any existing issues docker compose -f docker-compose/service.yml logs --tail=50
Making the Change
-
Edit the compose file
- Make minimal, targeted changes
- Keep existing structure when possible
- Add comments for new configurations
-
Validate syntax:
docker compose -f docker-compose/service.yml config -
Apply the change:
# Pull new image if version changed docker compose -f docker-compose/service.yml pull # Recreate the service docker compose -f docker-compose/service.yml up -d -
Verify the change:
# Check service is running docker compose -f docker-compose/service.yml ps # Watch logs for errors docker compose -f docker-compose/service.yml logs -f # Test functionality curl http://localhost:port/health
Rollback Plan
If something goes wrong:
# Stop the service
docker compose -f docker-compose/service.yml down
# Restore backup
mv docker-compose/service.yml.backup docker-compose/service.yml
# Restart with old configuration
docker compose -f docker-compose/service.yml up -d
Common Modifications
Add TRAEFIK CONFIGURATION to existing service:
labels:
# TRAEFIK CONFIGURATION
# ==========================================
# Service metadata
- "com.centurylinklabs.watchtower.enable=true"
- "homelab.category=category-name"
- "homelab.description=Brief service description"
# Traefik labels
- "traefik.enable=true"
# Router configuration
- "traefik.http.routers.service-name.rule=Host(`service-name.${DOMAIN}`)"
- "traefik.http.routers.service-name.entrypoints=websecure"
- "traefik.http.routers.service-name.tls.certresolver=letsencrypt"
- "traefik.http.routers.service-name.middlewares=authelia@docker"
# Service configuration
- "traefik.http.services.service-name.loadbalancer.server.port=8080"
# Sablier configuration
- "sablier.enable=true"
- "sablier.group=${SERVER_HOSTNAME}-service-name"
- "sablier.start-on-demand=true"
Toggle SSO: Comment/uncomment the Authelia middleware label:
# Enable SSO (default)
- "traefik.http.routers.service.middlewares=authelia@docker"
# Disable SSO (remove line for media servers, public services)
# - "traefik.http.routers.service.middlewares=authelia@docker"
Toggle Lazy Loading: Comment/uncomment Sablier labels:
# Enable lazy loading (default)
- "sablier.enable=true"
- "sablier.group=${SERVER_HOSTNAME}-service"
- "sablier.start-on-demand=true"
# Disable lazy loading (remove section for always-on services)
# - "sablier.enable=true"
# - "sablier.group=${SERVER_HOSTNAME}-service"
# - "sablier.start-on-demand=true"
Change Port: Update the loadbalancer server port:
- "traefik.http.services.service.loadbalancer.server.port=8080"
Add VPN Routing: Change network mode and update Gluetun ports:
network_mode: "service:gluetun"
# Add port mapping in Gluetun service
Update Subdomain: Modify the Host rule:
- "traefik.http.routers.service.rule=Host(`newservice.${DOMAIN}`)"
Naming Conventions
Service Names
Use lowercase with hyphens:
- ✅
plex-media-server - ✅
home-assistant - ❌
PlexMediaServer - ❌
home_assistant
Container Names
Match service names or be descriptive:
services:
plex:
container_name: plex # Simple match
database:
container_name: media-database # Descriptive
Network Names
Use purpose-based naming:
homelab-network- Main networkmedia-network- Media servicesmonitoring-network- Observability stackisolated-network- Untrusted services
Volume Names
Use service-purpose pattern:
volumes:
plex-config:
plex-metadata:
database-data:
nginx-certs:
File Names
Organize by function:
docker-compose/media.yml- Media services (Plex, Jellyfin, etc.)docker-compose/monitoring.yml- Monitoring stackdocker-compose/infrastructure.yml- Core services (DNS, reverse proxy)docker-compose/development.yml- Dev tools
Network Architecture
Network Types
-
Bridge Networks (Most Common)
networks: homelab-network: driver: bridge ipam: config: - subnet: 172.20.0.0/16 -
Host Network (When Performance Critical)
services: performance-critical: network_mode: host -
Overlay Networks (For Swarm/Multi-host)
networks: swarm-network: driver: overlay
Network Design Patterns
Pattern 1: Single Shared Network
Simplest approach for small homelabs:
networks:
homelab-network:
external: true
Create once manually:
docker network create homelab-network
Pattern 2: Segmented Networks
Better security through isolation:
networks:
frontend-network: # Web-facing services
backend-network: # Databases, internal services
monitoring-network: # Observability
Pattern 3: Service-Specific Networks
Each service group has its own network:
services:
web:
networks:
- frontend
- backend
database:
networks:
- backend # Not exposed to frontend
Network Security
- Place databases on internal networks only
- Use separate networks for untrusted services
- Expose minimal ports to the host
- Use reverse proxies for web services
Volume Management
Volume Types
Named Volumes (Managed by Docker)
volumes:
database-data:
driver: local
Use for:
- Database files
- Application data
- Anything Docker should manage
Advantages:
- Docker handles permissions
- Easy to backup/restore
- Portable across systems
Bind Mounts (Direct Host Paths)
volumes:
- ./config/app:/config
- /media:/media:ro
Use for:
- Configuration files you edit directly
- Large media libraries
- Shared data with host
Advantages:
- Direct file access
- Easy to edit
- Can share with host applications
tmpfs Mounts (RAM)
tmpfs:
- /tmp
Use for:
- Temporary data
- Cache that doesn't need persistence
- Sensitive data that shouldn't touch disk
Volume Best Practices
-
Consistent Paths:
volumes: - ./config/service:/config # Always use /config inside container - service-data:/data # Always use /data for application data -
Read-Only When Possible:
volumes: - /media:/media:ro # Media library is read-only -
Separate Config from Data:
volumes: - ./config/plex:/config # Editable configuration - plex-metadata:/metadata # Application-managed data -
Backup Strategy:
# Backup named volume docker run --rm \ -v plex-metadata:/data \ -v $(pwd)/backups:/backup \ busybox tar czf /backup/plex-metadata.tar.gz /data
Security Best Practices
1. Image Security
Pin Specific Versions:
# ✅ Good - Specific version
image: nginx:1.25.3-alpine
# ❌ Bad - Latest tag
image: nginx:latest
Use Official or Trusted Images:
- Official Docker images
- LinuxServer.io (lscr.io)
- Trusted vendors
Scan Images:
docker scan vendor/image:tag
2. Secret Management
Never Commit Secrets:
# .env file (gitignored)
DB_PASSWORD=super-secret-password
API_KEY=sk-1234567890
# docker-compose.yml
environment:
- DB_PASSWORD=${DB_PASSWORD}
- API_KEY=${API_KEY}
Provide Templates:
# .env.example (committed)
DB_PASSWORD=changeme
API_KEY=your-api-key-here
3. User Permissions
Run as Non-Root:
environment:
- PUID=1000 # Your user ID
- PGID=1000 # Your group ID
Check Current User:
id -u # Gets your UID
id -g # Gets your GID
4. Network Security
Minimal Exposure:
# ✅ Good - Only expose what's needed
ports:
- "127.0.0.1:8080:8080" # Only accessible from localhost
# ❌ Bad - Exposed to all interfaces
ports:
- "8080:8080"
Use Reverse Proxy:
# Don't expose services directly
# Use Nginx/Traefik to proxy with SSL
5. Resource Limits
Prevent Resource Exhaustion:
deploy:
resources:
limits:
cpus: '2'
memory: 4G
reservations:
cpus: '0.5'
memory: 1G
Monitoring and Logging
Logging Configuration
Standard Logging:
logging:
driver: "json-file"
options:
max-size: "10m"
max-file: "3"
Centralized Logging:
logging:
driver: "syslog"
options:
syslog-address: "tcp://192.168.1.100:514"
Health Checks
HTTP Health Check:
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:8080/health"]
interval: 30s
timeout: 10s
retries: 3
TCP Health Check:
healthcheck:
test: ["CMD-SHELL", "nc -z localhost 5432 || exit 1"]
interval: 30s
timeout: 5s
retries: 3
Custom Script:
healthcheck:
test: ["CMD", "/healthcheck.sh"]
interval: 30s
timeout: 10s
retries: 3
Monitoring Stack Example
# docker-compose/monitoring.yml
services:
prometheus:
image: prom/prometheus:v2.48.0
container_name: prometheus
restart: unless-stopped
volumes:
- ./config/prometheus:/etc/prometheus
- prometheus-data:/prometheus
ports:
- "9090:9090"
networks:
- monitoring-network
grafana:
image: grafana/grafana:10.2.2
container_name: grafana
restart: unless-stopped
volumes:
- grafana-data:/var/lib/grafana
ports:
- "3000:3000"
networks:
- monitoring-network
depends_on:
- prometheus
volumes:
prometheus-data:
grafana-data:
networks:
monitoring-network:
driver: bridge
Troubleshooting
Common Issues
Service Won't Start
1. Check logs:
docker compose -f docker-compose/service.yml logs
2. Validate configuration:
docker compose -f docker-compose/service.yml config
3. Check for port conflicts:
# See what's using a port
sudo netstat -tlnp | grep :8080
4. Verify image exists:
docker images | grep service-name
Permission Errors
1. Check PUID/PGID:
# Your user ID
id -u
# Your group ID
id -g
2. Fix directory permissions:
sudo chown -R 1000:1000 ./config/service-name
3. Check volume permissions:
docker compose -f docker-compose/service.yml exec service-name ls -la /config
Network Connectivity Issues
1. Verify network exists:
docker network ls
docker network inspect homelab-network
2. Check if services are on same network:
docker network inspect homelab-network | grep Name
3. Test connectivity:
docker compose -f docker-compose/service.yml exec service1 ping service2
Container Keeps Restarting
1. Watch logs:
docker compose -f docker-compose/service.yml logs -f
2. Check health status:
docker compose -f docker-compose/service.yml ps
3. Inspect container:
docker inspect container-name
Debugging Commands
# Enter running container
docker compose -f docker-compose/service.yml exec service-name /bin/sh
# View full container configuration
docker inspect container-name
# See resource usage
docker stats container-name
# View recent events
docker events --since 10m
# Check disk space
docker system df
Recovery Procedures
Service Corrupted
# Stop service
docker compose -f docker-compose/service.yml down
# Remove container and volumes (backup first!)
docker compose -f docker-compose/service.yml down -v
# Recreate from scratch
docker compose -f docker-compose/service.yml up -d
Network Issues
# Remove and recreate network
docker network rm homelab-network
docker network create homelab-network
# Restart services
docker compose -f docker-compose/*.yml up -d
Full System Reset (Nuclear Option)
# ⚠️ WARNING: This removes everything!
# Backup first!
# Stop all containers
docker stop $(docker ps -aq)
# Remove all containers
docker rm $(docker ps -aq)
# Remove all volumes (careful!)
docker volume rm $(docker volume ls -q)
# Remove all networks (except defaults)
docker network prune -f
# Rebuild from compose files
docker compose -f docker-compose/*.yml up -d
Maintenance
Regular Tasks
Weekly:
- Review logs for errors
- Check disk space:
docker system df - Update security patches on images
Monthly:
- Update images to latest versions
- Review and prune unused resources
- Backup volumes
- Review and optimize compose files
Quarterly:
- Full stack review
- Documentation update
- Performance optimization
- Security audit
Update Procedure
# 1. Backup current state
docker compose -f docker-compose/service.yml config > backup/service-config.yml
# 2. Update image version in compose file
# Edit docker-compose/service.yml
# 3. Pull new image
docker compose -f docker-compose/service.yml pull
# 4. Recreate service
docker compose -f docker-compose/service.yml up -d
# 5. Verify
docker compose -f docker-compose/service.yml logs -f
# 6. Test functionality
# Access service and verify it works
AI Automation Guidelines
Homepage Dashboard Management
Automatic Configuration Updates
Homepage configuration must be kept synchronized with deployed services. The AI assistant handles this automatically:
Template Location:
- Config templates:
/home/kelin/AI-Homelab/config-templates/homepage/ - Active configs:
/opt/stacks/homepage/config/
Key Principles:
-
Hard-Coded URLs Required: Homepage does NOT support variables in href links
- Template uses
{{HOMEPAGE_VAR_DOMAIN}}as placeholder - Active config uses
kelin-hass.duckdns.orghard-coded - AI must replace placeholders when deploying configs
- Template uses
-
No Container Restart Needed: Homepage picks up config changes instantly
- Simply edit YAML files in
/opt/stacks/homepage/config/ - Refresh browser to see changes
- DO NOT restart the container
- Simply edit YAML files in
-
Stack-Based Organization: Services grouped by their compose file
- Currently Installed: Shows running services grouped by stack
- Available to Install: Shows undeployed services from repository
-
Automatic Updates Required: AI must update Homepage configs when:
- New service is deployed → Add to appropriate stack section
- Service is removed → Remove from stack section
- Domain/subdomain changes → Update all affected href URLs
- Stack file is renamed → Update section headers
Configuration Structure:
# services.yaml
- Stack Name (compose-file.yml):
- Service Name:
icon: service.png
href: https://subdomain.kelin-hass.duckdns.org # Hard-coded!
description: Service description
Deployment Workflow:
# When deploying from template:
cp /home/kelin/AI-Homelab/config-templates/homepage/*.yaml /opt/stacks/homepage/config/
sed -i 's/{{HOMEPAGE_VAR_DOMAIN}}/kelin-hass.duckdns.org/g' /opt/stacks/homepage/config/services.yaml
# No restart needed - configs load instantly
Critical Reminder: Homepage is the single source of truth for service inventory. Keep it updated or users won't know what's deployed.
Conclusion
Following these guidelines ensures:
- Consistent infrastructure
- Easy troubleshooting
- Reproducible deployments
- Maintainable system
- Better security
Remember: Infrastructure as Code means treating your Docker Compose files as critical documentation. Keep them clean, commented, and version-controlled.