Files
EZ-Homelab/wiki/Docker-Guidelines.md
kelinfoxy ef55974b50 Wiki major update
updated with recent documentation
2026-01-21 19:18:39 -05:00

1114 lines
24 KiB
Markdown

# Docker Service Management Guidelines
## Overview
This document provides comprehensive guidelines for managing Docker services in your AI-powered homelab using Dockge, Traefik, and Authelia. These guidelines ensure consistency, maintainability, and reliability across your entire infrastructure.
## Table of Contents
1. [Philosophy](#philosophy)
2. [Dockge Structure](#dockge-structure)
3. [Traefik and Authelia Integration](#traefik-and-authelia-integration)
4. [Docker Compose vs Docker Run](#docker-compose-vs-docker-run)
5. [Service Creation Guidelines](#service-creation-guidelines)
6. [Service Modification Guidelines](#service-modification-guidelines)
7. [Naming Conventions](#naming-conventions)
8. [Network Architecture](#network-architecture)
9. [Volume Management](#volume-management)
10. [Security Best Practices](#security-best-practices)
11. [Monitoring and Logging](#monitoring-and-logging)
12. [Troubleshooting](#troubleshooting)
## Philosophy
### Core Principles
1. **Dockge First**: Manage all stacks through Dockge in `/opt/stacks/`
2. **Infrastructure as Code**: All services defined in Docker Compose files
3. **File-Based Configuration**: Traefik labels and Authelia YAML (AI-manageable)
4. **Reproducibility**: Any service should be rebuildable from compose files
5. **Automatic HTTPS**: All services routed through Traefik with Let's Encrypt
6. **Smart SSO**: Authelia protects admin interfaces, bypasses media apps
7. **Documentation**: Every non-obvious configuration must be commented
8. **Consistency**: Use the same patterns across all services
9. **Safety First**: Always test changes in isolation before deploying
### The Stack Mindset
Think of your homelab as an interconnected stack where:
- Services depend on networks (especially traefik-network)
- Traefik routes all traffic with automatic SSL
- Authelia protects sensitive services
- VPN (Gluetun) secures downloads
- Changes ripple through the system
Always ask: "How does this change affect other services and routing?"
## Dockge Structure
### Directory Organization
All stacks live in `/opt/stacks/stack-name/`:
```
/opt/stacks/
├── traefik/
│ ├── docker-compose.yml
│ ├── traefik.yml # Static config
│ ├── dynamic/ # Dynamic routes
│ │ ├── routes.yml
│ │ └── external.yml # External host proxying
│ ├── acme.json # SSL certificates (chmod 600)
│ └── .env
├── authelia/
│ ├── docker-compose.yml
│ ├── configuration.yml # Authelia settings
│ ├── users_database.yml # User accounts
│ └── .env
├── media/
│ ├── docker-compose.yml
│ └── .env
└── ...
```
### Why Dockge?
- **Visual Management**: Web UI at `https://dockge.${DOMAIN}`
- **Direct File Editing**: Edit compose files in-place
- **Stack Organization**: Each service stack is independent
- **AI Compatible**: Files can be managed by AI
- **Git Integration**: Easy to version control
### Storage Strategy
**Small Data** (configs, DBs < 10GB): `/opt/stacks/stack-name/`
```yaml
volumes:
- /opt/stacks/sonarr/config:/config
```
**Large Data** (media, downloads, backups): `/mnt/`
```yaml
volumes:
- /mnt/media/movies:/movies
- /mnt/media/tv:/tv
- /mnt/downloads:/downloads
- /mnt/backups:/backups
```
AI will suggest `/mnt/` when data may exceed 50GB or grow continuously.
## Traefik and Authelia Integration
### Every Local (on the same server) Service Needs Traefik Labels
Standard pattern for all services:
```yaml
services:
myservice:
image: myimage:latest
container_name: myservice
networks:
- homelab-network
- traefik-network # Required for Traefik
labels:
# Enable Traefik
- "traefik.enable=true"
# Define routing rule
- "traefik.http.routers.myservice.rule=Host(`myservice.${DOMAIN}`)"
# Use websecure entrypoint (HTTPS)
- "traefik.http.routers.myservice.entrypoints=websecure"
# Enable Let's Encrypt
- "traefik.http.routers.myservice.tls.certresolver=letsencrypt"
# Add Authelia SSO (if needed) - comment out to disable SSO
- "traefik.http.routers.myservice.middlewares=authelia@docker"
# Specify port (if not default 80)
- "traefik.http.services.myservice.loadbalancer.server.port=8080"
# Optional: Sablier lazy loading (comment out to disable)
# - "sablier.enable=true"
# - "sablier.group=core-myservice"
# - "sablier.start-on-demand=true"
```
### If Traefik is on a Remote Server, configure routes & services on the Remote Server
Add a yaml file to the traefik/dynamic folder for each remote server
Add a section under routers: and a section on services: for each service
### When to Use Authelia SSO
**Protect with Authelia**:
- Admin interfaces (Sonarr, Radarr, Prowlarr, etc.)
- Infrastructure tools (Portainer, Dockge, Grafana)
- Personal data (Nextcloud, Mealie, wikis)
- Development tools (code-server, GitLab)
- Monitoring dashboards
**Bypass Authelia**:
- Media servers (Plex, Jellyfin) - need app access
- Request services (Jellyseerr) - family-friendly access
- Public services (WordPress, status pages)
- Services with their own auth (Home Assistant)
Configure bypasses in `/opt/stacks/authelia/configuration.yml`:
```yaml
access_control:
rules:
- domain: jellyfin.yourdomain.duckdns.org
policy: bypass
- domain: plex.yourdomain.duckdns.org
policy: bypass
```
### Routing Through VPN (Gluetun)
For services that need VPN (downloads):
```yaml
services:
mydownloader:
image: downloader:latest
container_name: mydownloader
network_mode: "service:gluetun" # Route through VPN
depends_on:
- gluetun
```
Expose ports through Gluetun's compose file:
```yaml
# In gluetun.yml
gluetun:
ports:
- "8080:8080" # mydownloader web UI
```
## Docker Compose vs Docker Run
### Docker Compose: For Everything Persistent
Use Docker Compose for:
- All production services
- Services that need to restart automatically
- Multi-container applications
- Services with complex configurations
- Anything you want to keep long-term
**Example:**
```yaml
# docker-compose/plex.yml
services:
plex:
image: plexinc/pms-docker:1.40.0.7998-f68041501
container_name: plex
restart: unless-stopped
networks:
- media-network
ports:
- "32400:32400"
volumes:
- ./config/plex:/config
- /media:/media:ro
environment:
- PUID=1000
- PGID=1000
- TZ=America/New_York
```
### Docker Run: For Temporary Operations Only
Use `docker run` for:
- Testing new images
- One-off commands
- Debugging
- Verification tasks (like GPU testing)
**Examples:**
```bash
# Test if NVIDIA GPU is accessible
docker run --rm --gpus all nvidia/cuda:12.0.0-base-ubuntu22.04 nvidia-smi
# Quick test of a new image
docker run --rm -it alpine:latest /bin/sh
# One-off database backup
docker run --rm -v mydata:/data busybox tar czf /backup/data.tar.gz /data
```
## Service Creation Guidelines
### Step-by-Step Process
#### 1. Planning Phase
**Before writing any YAML:**
- [ ] What problem does this service solve?
- [ ] Does a similar service already exist?
- [ ] What are the dependencies?
- [ ] What ports are needed?
- [ ] What data needs to persist?
- [ ] What environment variables are required?
- [ ] What networks should it connect to?
- [ ] Are there any security considerations?
#### 2. Research Phase
- Read the official image documentation
- Check example configurations
- Review resource requirements
- Understand health check requirements
- Note any special permissions needed
#### 3. Implementation Phase
**Start with a minimal configuration:**
```yaml
services:
service-name:
image: vendor/image:specific-version
container_name: service-name
restart: unless-stopped
```
**Add networks:**
```yaml
networks:
- homelab-network
```
**Add ports (if externally accessible):**
```yaml
ports:
- "8080:8080" # Web UI
```
**Add volumes:**
```yaml
volumes:
- ./config/service-name:/config
- service-data:/data
```
**Add environment variables:**
```yaml
environment:
- PUID=1000
- PGID=1000
- TZ=${TIMEZONE}
```
**Add health checks (if applicable):**
```yaml
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:8080/health"]
interval: 30s
timeout: 10s
retries: 3
start_period: 40s
```
#### 4. Testing Phase
```bash
# Validate syntax
docker compose -f docker-compose/service.yml config
# Start in foreground to see logs
docker compose -f docker-compose/service.yml up
# If successful, restart in background
docker compose -f docker-compose/service.yml down
docker compose -f docker-compose/service.yml up -d
```
#### 5. Documentation Phase
Add comments to your compose file:
```yaml
services:
sonarr:
image: lscr.io/linuxserver/sonarr:4.0.0
container_name: sonarr
# Sonarr - TV Show management and automation
# Web UI: http://server-ip:8989
# Connects to: Prowlarr (indexers), qBittorrent (downloads)
restart: unless-stopped
```
Update your main README or service-specific README with:
- Service purpose
- Access URLs
- Default credentials (if any)
- Configuration notes
- Backup instructions
## Service Modification Guidelines
### Before Modifying
1. **Back up current configuration:**
```bash
cp docker-compose/service.yml docker-compose/service.yml.backup
```
2. **Document why you're making the change**
- Create a comment in the compose file
- Note in your changelog or docs
3. **Understand the current state:**
```bash
# Check if service is running
docker compose -f docker-compose/service.yml ps
# Review current configuration
docker compose -f docker-compose/service.yml config
# Check logs for any existing issues
docker compose -f docker-compose/service.yml logs --tail=50
```
### Making the Change
1. **Edit the compose file**
- Make minimal, targeted changes
- Keep existing structure when possible
- Add comments for new configurations
2. **Validate syntax:**
```bash
docker compose -f docker-compose/service.yml config
```
3. **Apply the change:**
```bash
# Pull new image if version changed
docker compose -f docker-compose/service.yml pull
# Recreate the service
docker compose -f docker-compose/service.yml up -d
```
4. **Verify the change:**
```bash
# Check service is running
docker compose -f docker-compose/service.yml ps
# Watch logs for errors
docker compose -f docker-compose/service.yml logs -f
# Test functionality
curl http://localhost:port/health
```
### Rollback Plan
If something goes wrong:
```bash
# Stop the service
docker compose -f docker-compose/service.yml down
# Restore backup
mv docker-compose/service.yml.backup docker-compose/service.yml
# Restart with old configuration
docker compose -f docker-compose/service.yml up -d
```
### Common Modifications
**Toggle SSO**: Comment/uncomment the Authelia middleware label:
```yaml
# Enable SSO
- "traefik.http.routers.service.middlewares=authelia@docker"
# Disable SSO (comment out)
# - "traefik.http.routers.service.middlewares=authelia@docker"
```
**Toggle Lazy Loading**: Comment/uncomment Sablier labels:
```yaml
# Enable lazy loading
- "sablier.enable=true"
- "sablier.group=core-service"
- "sablier.start-on-demand=true"
# Disable lazy loading (comment out)
# - "sablier.enable=true"
# - "sablier.group=core-service"
# - "sablier.start-on-demand=true"
```
**Change Port**: Update the loadbalancer server port:
```yaml
- "traefik.http.services.service.loadbalancer.server.port=8080"
```
**Add VPN Routing**: Change network mode and update Gluetun ports:
```yaml
network_mode: "service:gluetun"
# Add port mapping in Gluetun service
```
**Update Subdomain**: Modify the Host rule:
```yaml
- "traefik.http.routers.service.rule=Host(`newservice.${DOMAIN}`)"
```
## Naming Conventions
### Service Names
Use lowercase with hyphens:
- ✅ `plex-media-server`
- ✅ `home-assistant`
- ❌ `PlexMediaServer`
- ❌ `home_assistant`
### Container Names
Match service names or be descriptive:
```yaml
services:
plex:
container_name: plex # Simple match
database:
container_name: media-database # Descriptive
```
### Network Names
Use purpose-based naming:
- `homelab-network` - Main network
- `media-network` - Media services
- `monitoring-network` - Observability stack
- `isolated-network` - Untrusted services
### Volume Names
Use `service-purpose` pattern:
```yaml
volumes:
plex-config:
plex-metadata:
database-data:
nginx-certs:
```
### File Names
Organize by function:
- `docker-compose/media.yml` - Media services (Plex, Jellyfin, etc.)
- `docker-compose/monitoring.yml` - Monitoring stack
- `docker-compose/infrastructure.yml` - Core services (DNS, reverse proxy)
- `docker-compose/development.yml` - Dev tools
## Network Architecture
### Network Types
1. **Bridge Networks** (Most Common)
```yaml
networks:
homelab-network:
driver: bridge
ipam:
config:
- subnet: 172.20.0.0/16
```
2. **Host Network** (When Performance Critical)
```yaml
services:
performance-critical:
network_mode: host
```
3. **Overlay Networks** (For Swarm/Multi-host)
```yaml
networks:
swarm-network:
driver: overlay
```
### Network Design Patterns
#### Pattern 1: Single Shared Network
Simplest approach for small homelabs:
```yaml
networks:
homelab-network:
external: true
```
Create once manually:
```bash
docker network create homelab-network
```
#### Pattern 2: Segmented Networks
Better security through isolation:
```yaml
networks:
frontend-network: # Web-facing services
backend-network: # Databases, internal services
monitoring-network: # Observability
```
#### Pattern 3: Service-Specific Networks
Each service group has its own network:
```yaml
services:
web:
networks:
- frontend
- backend
database:
networks:
- backend # Not exposed to frontend
```
### Network Security
- Place databases on internal networks only
- Use separate networks for untrusted services
- Expose minimal ports to the host
- Use reverse proxies for web services
## Volume Management
### Volume Types
#### Named Volumes (Managed by Docker)
```yaml
volumes:
database-data:
driver: local
```
**Use for:**
- Database files
- Application data
- Anything Docker should manage
**Advantages:**
- Docker handles permissions
- Easy to backup/restore
- Portable across systems
#### Bind Mounts (Direct Host Paths)
```yaml
volumes:
- ./config/app:/config
- /media:/media:ro
```
**Use for:**
- Configuration files you edit directly
- Large media libraries
- Shared data with host
**Advantages:**
- Direct file access
- Easy to edit
- Can share with host applications
#### tmpfs Mounts (RAM)
```yaml
tmpfs:
- /tmp
```
**Use for:**
- Temporary data
- Cache that doesn't need persistence
- Sensitive data that shouldn't touch disk
### Volume Best Practices
1. **Consistent Paths:**
```yaml
volumes:
- ./config/service:/config # Always use /config inside container
- service-data:/data # Always use /data for application data
```
2. **Read-Only When Possible:**
```yaml
volumes:
- /media:/media:ro # Media library is read-only
```
3. **Separate Config from Data:**
```yaml
volumes:
- ./config/plex:/config # Editable configuration
- plex-metadata:/metadata # Application-managed data
```
4. **Backup Strategy:**
```bash
# Backup named volume
docker run --rm \
-v plex-metadata:/data \
-v $(pwd)/backups:/backup \
busybox tar czf /backup/plex-metadata.tar.gz /data
```
## Security Best Practices
### 1. Image Security
**Pin Specific Versions:**
```yaml
# ✅ Good - Specific version
image: nginx:1.25.3-alpine
# ❌ Bad - Latest tag
image: nginx:latest
```
**Use Official or Trusted Images:**
- Official Docker images
- LinuxServer.io (lscr.io)
- Trusted vendors
**Scan Images:**
```bash
docker scan vendor/image:tag
```
### 2. Secret Management
**Never Commit Secrets:**
```yaml
# .env file (gitignored)
DB_PASSWORD=super-secret-password
API_KEY=sk-1234567890
# docker-compose.yml
environment:
- DB_PASSWORD=${DB_PASSWORD}
- API_KEY=${API_KEY}
```
**Provide Templates:**
```bash
# .env.example (committed)
DB_PASSWORD=changeme
API_KEY=your-api-key-here
```
### 3. User Permissions
**Run as Non-Root:**
```yaml
environment:
- PUID=1000 # Your user ID
- PGID=1000 # Your group ID
```
**Check Current User:**
```bash
id -u # Gets your UID
id -g # Gets your GID
```
### 4. Network Security
**Minimal Exposure:**
```yaml
# ✅ Good - Only expose what's needed
ports:
- "127.0.0.1:8080:8080" # Only accessible from localhost
# ❌ Bad - Exposed to all interfaces
ports:
- "8080:8080"
```
**Use Reverse Proxy:**
```yaml
# Don't expose services directly
# Use Nginx/Traefik to proxy with SSL
```
### 5. Resource Limits
**Prevent Resource Exhaustion:**
```yaml
deploy:
resources:
limits:
cpus: '2'
memory: 4G
reservations:
cpus: '0.5'
memory: 1G
```
## Monitoring and Logging
### Logging Configuration
**Standard Logging:**
```yaml
logging:
driver: "json-file"
options:
max-size: "10m"
max-file: "3"
```
**Centralized Logging:**
```yaml
logging:
driver: "syslog"
options:
syslog-address: "tcp://192.168.1.100:514"
```
### Health Checks
**HTTP Health Check:**
```yaml
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:8080/health"]
interval: 30s
timeout: 10s
retries: 3
```
**TCP Health Check:**
```yaml
healthcheck:
test: ["CMD-SHELL", "nc -z localhost 5432 || exit 1"]
interval: 30s
timeout: 5s
retries: 3
```
**Custom Script:**
```yaml
healthcheck:
test: ["CMD", "/healthcheck.sh"]
interval: 30s
timeout: 10s
retries: 3
```
### Monitoring Stack Example
```yaml
# docker-compose/monitoring.yml
services:
prometheus:
image: prom/prometheus:v2.48.0
container_name: prometheus
restart: unless-stopped
volumes:
- ./config/prometheus:/etc/prometheus
- prometheus-data:/prometheus
ports:
- "9090:9090"
networks:
- monitoring-network
grafana:
image: grafana/grafana:10.2.2
container_name: grafana
restart: unless-stopped
volumes:
- grafana-data:/var/lib/grafana
ports:
- "3000:3000"
networks:
- monitoring-network
depends_on:
- prometheus
volumes:
prometheus-data:
grafana-data:
networks:
monitoring-network:
driver: bridge
```
## Troubleshooting
### Common Issues
#### Service Won't Start
**1. Check logs:**
```bash
docker compose -f docker-compose/service.yml logs
```
**2. Validate configuration:**
```bash
docker compose -f docker-compose/service.yml config
```
**3. Check for port conflicts:**
```bash
# See what's using a port
sudo netstat -tlnp | grep :8080
```
**4. Verify image exists:**
```bash
docker images | grep service-name
```
#### Permission Errors
**1. Check PUID/PGID:**
```bash
# Your user ID
id -u
# Your group ID
id -g
```
**2. Fix directory permissions:**
```bash
sudo chown -R 1000:1000 ./config/service-name
```
**3. Check volume permissions:**
```bash
docker compose -f docker-compose/service.yml exec service-name ls -la /config
```
#### Network Connectivity Issues
**1. Verify network exists:**
```bash
docker network ls
docker network inspect homelab-network
```
**2. Check if services are on same network:**
```bash
docker network inspect homelab-network | grep Name
```
**3. Test connectivity:**
```bash
docker compose -f docker-compose/service.yml exec service1 ping service2
```
#### Container Keeps Restarting
**1. Watch logs:**
```bash
docker compose -f docker-compose/service.yml logs -f
```
**2. Check health status:**
```bash
docker compose -f docker-compose/service.yml ps
```
**3. Inspect container:**
```bash
docker inspect container-name
```
### Debugging Commands
```bash
# Enter running container
docker compose -f docker-compose/service.yml exec service-name /bin/sh
# View full container configuration
docker inspect container-name
# See resource usage
docker stats container-name
# View recent events
docker events --since 10m
# Check disk space
docker system df
```
### Recovery Procedures
#### Service Corrupted
```bash
# Stop service
docker compose -f docker-compose/service.yml down
# Remove container and volumes (backup first!)
docker compose -f docker-compose/service.yml down -v
# Recreate from scratch
docker compose -f docker-compose/service.yml up -d
```
#### Network Issues
```bash
# Remove and recreate network
docker network rm homelab-network
docker network create homelab-network
# Restart services
docker compose -f docker-compose/*.yml up -d
```
#### Full System Reset (Nuclear Option)
```bash
# ⚠️ WARNING: This removes everything!
# Backup first!
# Stop all containers
docker stop $(docker ps -aq)
# Remove all containers
docker rm $(docker ps -aq)
# Remove all volumes (careful!)
docker volume rm $(docker volume ls -q)
# Remove all networks (except defaults)
docker network prune -f
# Rebuild from compose files
docker compose -f docker-compose/*.yml up -d
```
## Maintenance
### Regular Tasks
**Weekly:**
- Review logs for errors
- Check disk space: `docker system df`
- Update security patches on images
**Monthly:**
- Update images to latest versions
- Review and prune unused resources
- Backup volumes
- Review and optimize compose files
**Quarterly:**
- Full stack review
- Documentation update
- Performance optimization
- Security audit
### Update Procedure
```bash
# 1. Backup current state
docker compose -f docker-compose/service.yml config > backup/service-config.yml
# 2. Update image version in compose file
# Edit docker-compose/service.yml
# 3. Pull new image
docker compose -f docker-compose/service.yml pull
# 4. Recreate service
docker compose -f docker-compose/service.yml up -d
# 5. Verify
docker compose -f docker-compose/service.yml logs -f
# 6. Test functionality
# Access service and verify it works
```
## AI Automation Guidelines
### Homepage Dashboard Management
**Automatic Configuration Updates**
Homepage configuration must be kept synchronized with deployed services. The AI assistant handles this automatically:
**Template Location:**
- Config templates: `/home/kelin/AI-Homelab/config-templates/homepage/`
- Active configs: `/opt/stacks/homepage/config/`
**Key Principles:**
1. **Hard-Coded URLs Required**: Homepage does NOT support variables in href links
- Template uses `{{HOMEPAGE_VAR_DOMAIN}}` as placeholder
- Active config uses `kelin-hass.duckdns.org` hard-coded
- AI must replace placeholders when deploying configs
2. **No Container Restart Needed**: Homepage picks up config changes instantly
- Simply edit YAML files in `/opt/stacks/homepage/config/`
- Refresh browser to see changes
- DO NOT restart the container
3. **Stack-Based Organization**: Services grouped by their compose file
- **Currently Installed**: Shows running services grouped by stack
- **Available to Install**: Shows undeployed services from repository
4. **Automatic Updates Required**: AI must update Homepage configs when:
- New service is deployed → Add to appropriate stack section
- Service is removed → Remove from stack section
- Domain/subdomain changes → Update all affected href URLs
- Stack file is renamed → Update section headers
**Configuration Structure:**
```yaml
# services.yaml
- Stack Name (compose-file.yml):
- Service Name:
icon: service.png
href: https://subdomain.kelin-hass.duckdns.org # Hard-coded!
description: Service description
```
**Deployment Workflow:**
```bash
# When deploying from template:
cp /home/kelin/AI-Homelab/config-templates/homepage/*.yaml /opt/stacks/homepage/config/
sed -i 's/{{HOMEPAGE_VAR_DOMAIN}}/kelin-hass.duckdns.org/g' /opt/stacks/homepage/config/services.yaml
# No restart needed - configs load instantly
```
**Critical Reminder:** Homepage is the single source of truth for service inventory. Keep it updated or users won't know what's deployed.
---
## Conclusion
Following these guidelines ensures:
- Consistent infrastructure
- Easy troubleshooting
- Reproducible deployments
- Maintainable system
- Better security
Remember: **Infrastructure as Code** means treating your Docker Compose files as critical documentation. Keep them clean, commented, and version-controlled.