Files
EZ-Homelab/docs/docker-guidelines.md
2026-01-11 22:56:13 +00:00

859 lines
17 KiB
Markdown

# Docker Service Management Guidelines
## Overview
This document provides comprehensive guidelines for managing Docker services in your AI-powered homelab. These guidelines ensure consistency, maintainability, and reliability across your entire infrastructure.
## Table of Contents
1. [Philosophy](#philosophy)
2. [Docker Compose vs Docker Run](#docker-compose-vs-docker-run)
3. [Service Creation Guidelines](#service-creation-guidelines)
4. [Service Modification Guidelines](#service-modification-guidelines)
5. [Naming Conventions](#naming-conventions)
6. [Network Architecture](#network-architecture)
7. [Volume Management](#volume-management)
8. [Security Best Practices](#security-best-practices)
9. [Monitoring and Logging](#monitoring-and-logging)
10. [Troubleshooting](#troubleshooting)
## Philosophy
### Core Principles
1. **Infrastructure as Code**: All services must be defined in Docker Compose files
2. **Reproducibility**: Any service should be rebuildable from compose files
3. **Documentation**: Every non-obvious configuration must be commented
4. **Consistency**: Use the same patterns across all services
5. **Safety First**: Always test changes in isolation before deploying
### The Stack Mindset
Think of your homelab as an interconnected stack where:
- Services depend on networks
- Networks connect services
- Volumes persist data
- Changes ripple through the system
Always ask: "How does this change affect other services?"
## Docker Compose vs Docker Run
### Docker Compose: For Everything Persistent
Use Docker Compose for:
- All production services
- Services that need to restart automatically
- Multi-container applications
- Services with complex configurations
- Anything you want to keep long-term
**Example:**
```yaml
# docker-compose/plex.yml
services:
plex:
image: plexinc/pms-docker:1.40.0.7998-f68041501
container_name: plex
restart: unless-stopped
networks:
- media-network
ports:
- "32400:32400"
volumes:
- ./config/plex:/config
- /media:/media:ro
environment:
- PUID=1000
- PGID=1000
- TZ=America/New_York
```
### Docker Run: For Temporary Operations Only
Use `docker run` for:
- Testing new images
- One-off commands
- Debugging
- Verification tasks (like GPU testing)
**Examples:**
```bash
# Test if NVIDIA GPU is accessible
docker run --rm --gpus all nvidia/cuda:12.0.0-base-ubuntu22.04 nvidia-smi
# Quick test of a new image
docker run --rm -it alpine:latest /bin/sh
# One-off database backup
docker run --rm -v mydata:/data busybox tar czf /backup/data.tar.gz /data
```
## Service Creation Guidelines
### Step-by-Step Process
#### 1. Planning Phase
**Before writing any YAML:**
- [ ] What problem does this service solve?
- [ ] Does a similar service already exist?
- [ ] What are the dependencies?
- [ ] What ports are needed?
- [ ] What data needs to persist?
- [ ] What environment variables are required?
- [ ] What networks should it connect to?
- [ ] Are there any security considerations?
#### 2. Research Phase
- Read the official image documentation
- Check example configurations
- Review resource requirements
- Understand health check requirements
- Note any special permissions needed
#### 3. Implementation Phase
**Start with a minimal configuration:**
```yaml
services:
service-name:
image: vendor/image:specific-version
container_name: service-name
restart: unless-stopped
```
**Add networks:**
```yaml
networks:
- homelab-network
```
**Add ports (if externally accessible):**
```yaml
ports:
- "8080:8080" # Web UI
```
**Add volumes:**
```yaml
volumes:
- ./config/service-name:/config
- service-data:/data
```
**Add environment variables:**
```yaml
environment:
- PUID=1000
- PGID=1000
- TZ=${TIMEZONE}
```
**Add health checks (if applicable):**
```yaml
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:8080/health"]
interval: 30s
timeout: 10s
retries: 3
start_period: 40s
```
#### 4. Testing Phase
```bash
# Validate syntax
docker compose -f docker-compose/service.yml config
# Start in foreground to see logs
docker compose -f docker-compose/service.yml up
# If successful, restart in background
docker compose -f docker-compose/service.yml down
docker compose -f docker-compose/service.yml up -d
```
#### 5. Documentation Phase
Add comments to your compose file:
```yaml
services:
sonarr:
image: lscr.io/linuxserver/sonarr:4.0.0
container_name: sonarr
# Sonarr - TV Show management and automation
# Web UI: http://server-ip:8989
# Connects to: Prowlarr (indexers), qBittorrent (downloads)
restart: unless-stopped
```
Update your main README or service-specific README with:
- Service purpose
- Access URLs
- Default credentials (if any)
- Configuration notes
- Backup instructions
## Service Modification Guidelines
### Before Modifying
1. **Back up current configuration:**
```bash
cp docker-compose/service.yml docker-compose/service.yml.backup
```
2. **Document why you're making the change**
- Create a comment in the compose file
- Note in your changelog or docs
3. **Understand the current state:**
```bash
# Check if service is running
docker compose -f docker-compose/service.yml ps
# Review current configuration
docker compose -f docker-compose/service.yml config
# Check logs for any existing issues
docker compose -f docker-compose/service.yml logs --tail=50
```
### Making the Change
1. **Edit the compose file**
- Make minimal, targeted changes
- Keep existing structure when possible
- Add comments for new configurations
2. **Validate syntax:**
```bash
docker compose -f docker-compose/service.yml config
```
3. **Apply the change:**
```bash
# Pull new image if version changed
docker compose -f docker-compose/service.yml pull
# Recreate the service
docker compose -f docker-compose/service.yml up -d
```
4. **Verify the change:**
```bash
# Check service is running
docker compose -f docker-compose/service.yml ps
# Watch logs for errors
docker compose -f docker-compose/service.yml logs -f
# Test functionality
curl http://localhost:port/health
```
### Rollback Plan
If something goes wrong:
```bash
# Stop the service
docker compose -f docker-compose/service.yml down
# Restore backup
mv docker-compose/service.yml.backup docker-compose/service.yml
# Restart with old configuration
docker compose -f docker-compose/service.yml up -d
```
## Naming Conventions
### Service Names
Use lowercase with hyphens:
- ✅ `plex-media-server`
- ✅ `home-assistant`
- ❌ `PlexMediaServer`
- ❌ `home_assistant`
### Container Names
Match service names or be descriptive:
```yaml
services:
plex:
container_name: plex # Simple match
database:
container_name: media-database # Descriptive
```
### Network Names
Use purpose-based naming:
- `homelab-network` - Main network
- `media-network` - Media services
- `monitoring-network` - Observability stack
- `isolated-network` - Untrusted services
### Volume Names
Use `service-purpose` pattern:
```yaml
volumes:
plex-config:
plex-metadata:
database-data:
nginx-certs:
```
### File Names
Organize by function:
- `docker-compose/media.yml` - Media services (Plex, Jellyfin, etc.)
- `docker-compose/monitoring.yml` - Monitoring stack
- `docker-compose/infrastructure.yml` - Core services (DNS, reverse proxy)
- `docker-compose/development.yml` - Dev tools
## Network Architecture
### Network Types
1. **Bridge Networks** (Most Common)
```yaml
networks:
homelab-network:
driver: bridge
ipam:
config:
- subnet: 172.20.0.0/16
```
2. **Host Network** (When Performance Critical)
```yaml
services:
performance-critical:
network_mode: host
```
3. **Overlay Networks** (For Swarm/Multi-host)
```yaml
networks:
swarm-network:
driver: overlay
```
### Network Design Patterns
#### Pattern 1: Single Shared Network
Simplest approach for small homelabs:
```yaml
networks:
homelab-network:
external: true
```
Create once manually:
```bash
docker network create homelab-network
```
#### Pattern 2: Segmented Networks
Better security through isolation:
```yaml
networks:
frontend-network: # Web-facing services
backend-network: # Databases, internal services
monitoring-network: # Observability
```
#### Pattern 3: Service-Specific Networks
Each service group has its own network:
```yaml
services:
web:
networks:
- frontend
- backend
database:
networks:
- backend # Not exposed to frontend
```
### Network Security
- Place databases on internal networks only
- Use separate networks for untrusted services
- Expose minimal ports to the host
- Use reverse proxies for web services
## Volume Management
### Volume Types
#### Named Volumes (Managed by Docker)
```yaml
volumes:
database-data:
driver: local
```
**Use for:**
- Database files
- Application data
- Anything Docker should manage
**Advantages:**
- Docker handles permissions
- Easy to backup/restore
- Portable across systems
#### Bind Mounts (Direct Host Paths)
```yaml
volumes:
- ./config/app:/config
- /media:/media:ro
```
**Use for:**
- Configuration files you edit directly
- Large media libraries
- Shared data with host
**Advantages:**
- Direct file access
- Easy to edit
- Can share with host applications
#### tmpfs Mounts (RAM)
```yaml
tmpfs:
- /tmp
```
**Use for:**
- Temporary data
- Cache that doesn't need persistence
- Sensitive data that shouldn't touch disk
### Volume Best Practices
1. **Consistent Paths:**
```yaml
volumes:
- ./config/service:/config # Always use /config inside container
- service-data:/data # Always use /data for application data
```
2. **Read-Only When Possible:**
```yaml
volumes:
- /media:/media:ro # Media library is read-only
```
3. **Separate Config from Data:**
```yaml
volumes:
- ./config/plex:/config # Editable configuration
- plex-metadata:/metadata # Application-managed data
```
4. **Backup Strategy:**
```bash
# Backup named volume
docker run --rm \
-v plex-metadata:/data \
-v $(pwd)/backups:/backup \
busybox tar czf /backup/plex-metadata.tar.gz /data
```
## Security Best Practices
### 1. Image Security
**Pin Specific Versions:**
```yaml
# ✅ Good - Specific version
image: nginx:1.25.3-alpine
# ❌ Bad - Latest tag
image: nginx:latest
```
**Use Official or Trusted Images:**
- Official Docker images
- LinuxServer.io (lscr.io)
- Trusted vendors
**Scan Images:**
```bash
docker scan vendor/image:tag
```
### 2. Secret Management
**Never Commit Secrets:**
```yaml
# .env file (gitignored)
DB_PASSWORD=super-secret-password
API_KEY=sk-1234567890
# docker-compose.yml
environment:
- DB_PASSWORD=${DB_PASSWORD}
- API_KEY=${API_KEY}
```
**Provide Templates:**
```bash
# .env.example (committed)
DB_PASSWORD=changeme
API_KEY=your-api-key-here
```
### 3. User Permissions
**Run as Non-Root:**
```yaml
environment:
- PUID=1000 # Your user ID
- PGID=1000 # Your group ID
```
**Check Current User:**
```bash
id -u # Gets your UID
id -g # Gets your GID
```
### 4. Network Security
**Minimal Exposure:**
```yaml
# ✅ Good - Only expose what's needed
ports:
- "127.0.0.1:8080:8080" # Only accessible from localhost
# ❌ Bad - Exposed to all interfaces
ports:
- "8080:8080"
```
**Use Reverse Proxy:**
```yaml
# Don't expose services directly
# Use Nginx/Traefik to proxy with SSL
```
### 5. Resource Limits
**Prevent Resource Exhaustion:**
```yaml
deploy:
resources:
limits:
cpus: '2'
memory: 4G
reservations:
cpus: '0.5'
memory: 1G
```
## Monitoring and Logging
### Logging Configuration
**Standard Logging:**
```yaml
logging:
driver: "json-file"
options:
max-size: "10m"
max-file: "3"
```
**Centralized Logging:**
```yaml
logging:
driver: "syslog"
options:
syslog-address: "tcp://192.168.1.100:514"
```
### Health Checks
**HTTP Health Check:**
```yaml
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:8080/health"]
interval: 30s
timeout: 10s
retries: 3
```
**TCP Health Check:**
```yaml
healthcheck:
test: ["CMD-SHELL", "nc -z localhost 5432 || exit 1"]
interval: 30s
timeout: 5s
retries: 3
```
**Custom Script:**
```yaml
healthcheck:
test: ["CMD", "/healthcheck.sh"]
interval: 30s
timeout: 10s
retries: 3
```
### Monitoring Stack Example
```yaml
# docker-compose/monitoring.yml
services:
prometheus:
image: prom/prometheus:v2.48.0
container_name: prometheus
restart: unless-stopped
volumes:
- ./config/prometheus:/etc/prometheus
- prometheus-data:/prometheus
ports:
- "9090:9090"
networks:
- monitoring-network
grafana:
image: grafana/grafana:10.2.2
container_name: grafana
restart: unless-stopped
volumes:
- grafana-data:/var/lib/grafana
ports:
- "3000:3000"
networks:
- monitoring-network
depends_on:
- prometheus
volumes:
prometheus-data:
grafana-data:
networks:
monitoring-network:
driver: bridge
```
## Troubleshooting
### Common Issues
#### Service Won't Start
**1. Check logs:**
```bash
docker compose -f docker-compose/service.yml logs
```
**2. Validate configuration:**
```bash
docker compose -f docker-compose/service.yml config
```
**3. Check for port conflicts:**
```bash
# See what's using a port
sudo netstat -tlnp | grep :8080
```
**4. Verify image exists:**
```bash
docker images | grep service-name
```
#### Permission Errors
**1. Check PUID/PGID:**
```bash
# Your user ID
id -u
# Your group ID
id -g
```
**2. Fix directory permissions:**
```bash
sudo chown -R 1000:1000 ./config/service-name
```
**3. Check volume permissions:**
```bash
docker compose -f docker-compose/service.yml exec service-name ls -la /config
```
#### Network Connectivity Issues
**1. Verify network exists:**
```bash
docker network ls
docker network inspect homelab-network
```
**2. Check if services are on same network:**
```bash
docker network inspect homelab-network | grep Name
```
**3. Test connectivity:**
```bash
docker compose -f docker-compose/service.yml exec service1 ping service2
```
#### Container Keeps Restarting
**1. Watch logs:**
```bash
docker compose -f docker-compose/service.yml logs -f
```
**2. Check health status:**
```bash
docker compose -f docker-compose/service.yml ps
```
**3. Inspect container:**
```bash
docker inspect container-name
```
### Debugging Commands
```bash
# Enter running container
docker compose -f docker-compose/service.yml exec service-name /bin/sh
# View full container configuration
docker inspect container-name
# See resource usage
docker stats container-name
# View recent events
docker events --since 10m
# Check disk space
docker system df
```
### Recovery Procedures
#### Service Corrupted
```bash
# Stop service
docker compose -f docker-compose/service.yml down
# Remove container and volumes (backup first!)
docker compose -f docker-compose/service.yml down -v
# Recreate from scratch
docker compose -f docker-compose/service.yml up -d
```
#### Network Issues
```bash
# Remove and recreate network
docker network rm homelab-network
docker network create homelab-network
# Restart services
docker compose -f docker-compose/*.yml up -d
```
#### Full System Reset (Nuclear Option)
```bash
# ⚠️ WARNING: This removes everything!
# Backup first!
# Stop all containers
docker stop $(docker ps -aq)
# Remove all containers
docker rm $(docker ps -aq)
# Remove all volumes (careful!)
docker volume rm $(docker volume ls -q)
# Remove all networks (except defaults)
docker network prune -f
# Rebuild from compose files
docker compose -f docker-compose/*.yml up -d
```
## Maintenance
### Regular Tasks
**Weekly:**
- Review logs for errors
- Check disk space: `docker system df`
- Update security patches on images
**Monthly:**
- Update images to latest versions
- Review and prune unused resources
- Backup volumes
- Review and optimize compose files
**Quarterly:**
- Full stack review
- Documentation update
- Performance optimization
- Security audit
### Update Procedure
```bash
# 1. Backup current state
docker compose -f docker-compose/service.yml config > backup/service-config.yml
# 2. Update image version in compose file
# Edit docker-compose/service.yml
# 3. Pull new image
docker compose -f docker-compose/service.yml pull
# 4. Recreate service
docker compose -f docker-compose/service.yml up -d
# 5. Verify
docker compose -f docker-compose/service.yml logs -f
# 6. Test functionality
# Access service and verify it works
```
## Conclusion
Following these guidelines ensures:
- Consistent infrastructure
- Easy troubleshooting
- Reproducible deployments
- Maintainable system
- Better security
Remember: **Infrastructure as Code** means treating your Docker Compose files as critical documentation. Keep them clean, commented, and version-controlled.