- Update README.md with comprehensive overview of all 40+ services - Add Quick Setup guide with Dockge deployment steps - List all service stacks by category (Infrastructure, Media, Home Automation, Productivity, etc.) - Update AI assistant examples for Traefik, Authelia, VPN routing - Add network architecture with Traefik routing patterns - Update common operations for Dockge structure - Add storage strategy (opt/stacks vs /mnt/) - Update configuration management section - Add proxying external hosts overview - Update getting started checklist with all new steps - Update docker-guidelines.md with Dockge and Traefik sections - Add comprehensive Dockge structure explanation - Add Traefik and Authelia integration patterns - Document when to use SSO vs bypass - Add VPN routing patterns with Gluetun - Update storage strategy recommendations - Completely rewrite getting-started.md with step-by-step Dockge deployment - Add DuckDNS signup instructions - Add detailed core infrastructure deployment (DuckDNS, Traefik, Authelia, Dockge) - Add Homepage configuration steps - Add API key configuration for widgets - Add troubleshooting section - Add security checklist - Create docs/services-reference.md - comprehensive catalog of all 60+ services - Organized by category with descriptions - Access URLs and SSO requirements - Storage recommendations - Quick deployment guide - Summary table by stack All documentation now accurately reflects the complete homelab setup with Dockge, Traefik, Authelia, Gluetun, Homepage, and all 40+ services across 10 stack files. Co-authored-by: kelinfoxy <67766943+kelinfoxy@users.noreply.github.com>
1003 lines
21 KiB
Markdown
1003 lines
21 KiB
Markdown
# Docker Service Management Guidelines
|
|
|
|
## Overview
|
|
|
|
This document provides comprehensive guidelines for managing Docker services in your AI-powered homelab using Dockge, Traefik, and Authelia. These guidelines ensure consistency, maintainability, and reliability across your entire infrastructure.
|
|
|
|
## Table of Contents
|
|
|
|
1. [Philosophy](#philosophy)
|
|
2. [Dockge Structure](#dockge-structure)
|
|
3. [Traefik and Authelia Integration](#traefik-and-authelia-integration)
|
|
4. [Docker Compose vs Docker Run](#docker-compose-vs-docker-run)
|
|
5. [Service Creation Guidelines](#service-creation-guidelines)
|
|
6. [Service Modification Guidelines](#service-modification-guidelines)
|
|
7. [Naming Conventions](#naming-conventions)
|
|
8. [Network Architecture](#network-architecture)
|
|
9. [Volume Management](#volume-management)
|
|
10. [Security Best Practices](#security-best-practices)
|
|
11. [Monitoring and Logging](#monitoring-and-logging)
|
|
12. [Troubleshooting](#troubleshooting)
|
|
|
|
## Philosophy
|
|
|
|
### Core Principles
|
|
|
|
1. **Dockge First**: Manage all stacks through Dockge in `/opt/stacks/`
|
|
2. **Infrastructure as Code**: All services defined in Docker Compose files
|
|
3. **File-Based Configuration**: Traefik labels and Authelia YAML (AI-manageable)
|
|
4. **Reproducibility**: Any service should be rebuildable from compose files
|
|
5. **Automatic HTTPS**: All services routed through Traefik with Let's Encrypt
|
|
6. **Smart SSO**: Authelia protects admin interfaces, bypasses media apps
|
|
7. **Documentation**: Every non-obvious configuration must be commented
|
|
8. **Consistency**: Use the same patterns across all services
|
|
9. **Safety First**: Always test changes in isolation before deploying
|
|
|
|
### The Stack Mindset
|
|
|
|
Think of your homelab as an interconnected stack where:
|
|
- Services depend on networks (especially traefik-network)
|
|
- Traefik routes all traffic with automatic SSL
|
|
- Authelia protects sensitive services
|
|
- VPN (Gluetun) secures downloads
|
|
- Changes ripple through the system
|
|
|
|
Always ask: "How does this change affect other services and routing?"
|
|
|
|
## Dockge Structure
|
|
|
|
### Directory Organization
|
|
|
|
All stacks live in `/opt/stacks/stack-name/`:
|
|
|
|
```
|
|
/opt/stacks/
|
|
├── traefik/
|
|
│ ├── docker-compose.yml
|
|
│ ├── traefik.yml # Static config
|
|
│ ├── dynamic/ # Dynamic routes
|
|
│ │ ├── routes.yml
|
|
│ │ └── external.yml # External host proxying
|
|
│ ├── acme.json # SSL certificates (chmod 600)
|
|
│ └── .env
|
|
├── authelia/
|
|
│ ├── docker-compose.yml
|
|
│ ├── configuration.yml # Authelia settings
|
|
│ ├── users_database.yml # User accounts
|
|
│ └── .env
|
|
├── media/
|
|
│ ├── docker-compose.yml
|
|
│ └── .env
|
|
└── ...
|
|
```
|
|
|
|
### Why Dockge?
|
|
|
|
- **Visual Management**: Web UI at `https://dockge.${DOMAIN}`
|
|
- **Direct File Editing**: Edit compose files in-place
|
|
- **Stack Organization**: Each service stack is independent
|
|
- **AI Compatible**: Files can be managed by AI
|
|
- **Git Integration**: Easy to version control
|
|
|
|
### Storage Strategy
|
|
|
|
**Small Data** (configs, DBs < 10GB): `/opt/stacks/stack-name/`
|
|
```yaml
|
|
volumes:
|
|
- /opt/stacks/sonarr/config:/config
|
|
```
|
|
|
|
**Large Data** (media, downloads, backups): `/mnt/`
|
|
```yaml
|
|
volumes:
|
|
- /mnt/media/movies:/movies
|
|
- /mnt/media/tv:/tv
|
|
- /mnt/downloads:/downloads
|
|
- /mnt/backups:/backups
|
|
```
|
|
|
|
AI will suggest `/mnt/` when data may exceed 50GB or grow continuously.
|
|
|
|
## Traefik and Authelia Integration
|
|
|
|
### Every Service Needs Traefik Labels
|
|
|
|
Standard pattern for all services:
|
|
|
|
```yaml
|
|
services:
|
|
myservice:
|
|
image: myimage:latest
|
|
container_name: myservice
|
|
networks:
|
|
- homelab-network
|
|
- traefik-network # Required for Traefik
|
|
labels:
|
|
# Enable Traefik
|
|
- "traefik.enable=true"
|
|
|
|
# Define routing rule
|
|
- "traefik.http.routers.myservice.rule=Host(`myservice.${DOMAIN}`)"
|
|
|
|
# Use websecure entrypoint (HTTPS)
|
|
- "traefik.http.routers.myservice.entrypoints=websecure"
|
|
|
|
# Enable Let's Encrypt
|
|
- "traefik.http.routers.myservice.tls.certresolver=letsencrypt"
|
|
|
|
# Add Authelia SSO (if needed)
|
|
- "traefik.http.routers.myservice.middlewares=authelia@docker"
|
|
|
|
# Specify port (if not default 80)
|
|
- "traefik.http.services.myservice.loadbalancer.server.port=8080"
|
|
```
|
|
|
|
### When to Use Authelia SSO
|
|
|
|
**Protect with Authelia**:
|
|
- Admin interfaces (Sonarr, Radarr, Prowlarr, etc.)
|
|
- Infrastructure tools (Portainer, Dockge, Grafana)
|
|
- Personal data (Nextcloud, Mealie, wikis)
|
|
- Development tools (code-server, GitLab)
|
|
- Monitoring dashboards
|
|
|
|
**Bypass Authelia**:
|
|
- Media servers (Plex, Jellyfin) - need app access
|
|
- Request services (Jellyseerr) - family-friendly access
|
|
- Public services (WordPress, status pages)
|
|
- Services with their own auth (Home Assistant)
|
|
|
|
Configure bypasses in `/opt/stacks/authelia/configuration.yml`:
|
|
|
|
```yaml
|
|
access_control:
|
|
rules:
|
|
- domain: jellyfin.yourdomain.duckdns.org
|
|
policy: bypass
|
|
|
|
- domain: plex.yourdomain.duckdns.org
|
|
policy: bypass
|
|
```
|
|
|
|
### Routing Through VPN (Gluetun)
|
|
|
|
For services that need VPN (downloads):
|
|
|
|
```yaml
|
|
services:
|
|
mydownloader:
|
|
image: downloader:latest
|
|
container_name: mydownloader
|
|
network_mode: "service:gluetun" # Route through VPN
|
|
depends_on:
|
|
- gluetun
|
|
```
|
|
|
|
Expose ports through Gluetun's compose file:
|
|
```yaml
|
|
# In gluetun.yml
|
|
gluetun:
|
|
ports:
|
|
- "8080:8080" # mydownloader web UI
|
|
```
|
|
|
|
## Docker Compose vs Docker Run
|
|
|
|
### Docker Compose: For Everything Persistent
|
|
|
|
Use Docker Compose for:
|
|
- All production services
|
|
- Services that need to restart automatically
|
|
- Multi-container applications
|
|
- Services with complex configurations
|
|
- Anything you want to keep long-term
|
|
|
|
**Example:**
|
|
```yaml
|
|
# docker-compose/plex.yml
|
|
services:
|
|
plex:
|
|
image: plexinc/pms-docker:1.40.0.7998-f68041501
|
|
container_name: plex
|
|
restart: unless-stopped
|
|
networks:
|
|
- media-network
|
|
ports:
|
|
- "32400:32400"
|
|
volumes:
|
|
- ./config/plex:/config
|
|
- /media:/media:ro
|
|
environment:
|
|
- PUID=1000
|
|
- PGID=1000
|
|
- TZ=America/New_York
|
|
```
|
|
|
|
### Docker Run: For Temporary Operations Only
|
|
|
|
Use `docker run` for:
|
|
- Testing new images
|
|
- One-off commands
|
|
- Debugging
|
|
- Verification tasks (like GPU testing)
|
|
|
|
**Examples:**
|
|
```bash
|
|
# Test if NVIDIA GPU is accessible
|
|
docker run --rm --gpus all nvidia/cuda:12.0.0-base-ubuntu22.04 nvidia-smi
|
|
|
|
# Quick test of a new image
|
|
docker run --rm -it alpine:latest /bin/sh
|
|
|
|
# One-off database backup
|
|
docker run --rm -v mydata:/data busybox tar czf /backup/data.tar.gz /data
|
|
```
|
|
|
|
## Service Creation Guidelines
|
|
|
|
### Step-by-Step Process
|
|
|
|
#### 1. Planning Phase
|
|
|
|
**Before writing any YAML:**
|
|
|
|
- [ ] What problem does this service solve?
|
|
- [ ] Does a similar service already exist?
|
|
- [ ] What are the dependencies?
|
|
- [ ] What ports are needed?
|
|
- [ ] What data needs to persist?
|
|
- [ ] What environment variables are required?
|
|
- [ ] What networks should it connect to?
|
|
- [ ] Are there any security considerations?
|
|
|
|
#### 2. Research Phase
|
|
|
|
- Read the official image documentation
|
|
- Check example configurations
|
|
- Review resource requirements
|
|
- Understand health check requirements
|
|
- Note any special permissions needed
|
|
|
|
#### 3. Implementation Phase
|
|
|
|
**Start with a minimal configuration:**
|
|
|
|
```yaml
|
|
services:
|
|
service-name:
|
|
image: vendor/image:specific-version
|
|
container_name: service-name
|
|
restart: unless-stopped
|
|
```
|
|
|
|
**Add networks:**
|
|
```yaml
|
|
networks:
|
|
- homelab-network
|
|
```
|
|
|
|
**Add ports (if externally accessible):**
|
|
```yaml
|
|
ports:
|
|
- "8080:8080" # Web UI
|
|
```
|
|
|
|
**Add volumes:**
|
|
```yaml
|
|
volumes:
|
|
- ./config/service-name:/config
|
|
- service-data:/data
|
|
```
|
|
|
|
**Add environment variables:**
|
|
```yaml
|
|
environment:
|
|
- PUID=1000
|
|
- PGID=1000
|
|
- TZ=${TIMEZONE}
|
|
```
|
|
|
|
**Add health checks (if applicable):**
|
|
```yaml
|
|
healthcheck:
|
|
test: ["CMD", "curl", "-f", "http://localhost:8080/health"]
|
|
interval: 30s
|
|
timeout: 10s
|
|
retries: 3
|
|
start_period: 40s
|
|
```
|
|
|
|
#### 4. Testing Phase
|
|
|
|
```bash
|
|
# Validate syntax
|
|
docker compose -f docker-compose/service.yml config
|
|
|
|
# Start in foreground to see logs
|
|
docker compose -f docker-compose/service.yml up
|
|
|
|
# If successful, restart in background
|
|
docker compose -f docker-compose/service.yml down
|
|
docker compose -f docker-compose/service.yml up -d
|
|
```
|
|
|
|
#### 5. Documentation Phase
|
|
|
|
Add comments to your compose file:
|
|
```yaml
|
|
services:
|
|
sonarr:
|
|
image: lscr.io/linuxserver/sonarr:4.0.0
|
|
container_name: sonarr
|
|
# Sonarr - TV Show management and automation
|
|
# Web UI: http://server-ip:8989
|
|
# Connects to: Prowlarr (indexers), qBittorrent (downloads)
|
|
restart: unless-stopped
|
|
```
|
|
|
|
Update your main README or service-specific README with:
|
|
- Service purpose
|
|
- Access URLs
|
|
- Default credentials (if any)
|
|
- Configuration notes
|
|
- Backup instructions
|
|
|
|
## Service Modification Guidelines
|
|
|
|
### Before Modifying
|
|
|
|
1. **Back up current configuration:**
|
|
```bash
|
|
cp docker-compose/service.yml docker-compose/service.yml.backup
|
|
```
|
|
|
|
2. **Document why you're making the change**
|
|
- Create a comment in the compose file
|
|
- Note in your changelog or docs
|
|
|
|
3. **Understand the current state:**
|
|
```bash
|
|
# Check if service is running
|
|
docker compose -f docker-compose/service.yml ps
|
|
|
|
# Review current configuration
|
|
docker compose -f docker-compose/service.yml config
|
|
|
|
# Check logs for any existing issues
|
|
docker compose -f docker-compose/service.yml logs --tail=50
|
|
```
|
|
|
|
### Making the Change
|
|
|
|
1. **Edit the compose file**
|
|
- Make minimal, targeted changes
|
|
- Keep existing structure when possible
|
|
- Add comments for new configurations
|
|
|
|
2. **Validate syntax:**
|
|
```bash
|
|
docker compose -f docker-compose/service.yml config
|
|
```
|
|
|
|
3. **Apply the change:**
|
|
```bash
|
|
# Pull new image if version changed
|
|
docker compose -f docker-compose/service.yml pull
|
|
|
|
# Recreate the service
|
|
docker compose -f docker-compose/service.yml up -d
|
|
```
|
|
|
|
4. **Verify the change:**
|
|
```bash
|
|
# Check service is running
|
|
docker compose -f docker-compose/service.yml ps
|
|
|
|
# Watch logs for errors
|
|
docker compose -f docker-compose/service.yml logs -f
|
|
|
|
# Test functionality
|
|
curl http://localhost:port/health
|
|
```
|
|
|
|
### Rollback Plan
|
|
|
|
If something goes wrong:
|
|
```bash
|
|
# Stop the service
|
|
docker compose -f docker-compose/service.yml down
|
|
|
|
# Restore backup
|
|
mv docker-compose/service.yml.backup docker-compose/service.yml
|
|
|
|
# Restart with old configuration
|
|
docker compose -f docker-compose/service.yml up -d
|
|
```
|
|
|
|
## Naming Conventions
|
|
|
|
### Service Names
|
|
|
|
Use lowercase with hyphens:
|
|
- ✅ `plex-media-server`
|
|
- ✅ `home-assistant`
|
|
- ❌ `PlexMediaServer`
|
|
- ❌ `home_assistant`
|
|
|
|
### Container Names
|
|
|
|
Match service names or be descriptive:
|
|
```yaml
|
|
services:
|
|
plex:
|
|
container_name: plex # Simple match
|
|
|
|
database:
|
|
container_name: media-database # Descriptive
|
|
```
|
|
|
|
### Network Names
|
|
|
|
Use purpose-based naming:
|
|
- `homelab-network` - Main network
|
|
- `media-network` - Media services
|
|
- `monitoring-network` - Observability stack
|
|
- `isolated-network` - Untrusted services
|
|
|
|
### Volume Names
|
|
|
|
Use `service-purpose` pattern:
|
|
```yaml
|
|
volumes:
|
|
plex-config:
|
|
plex-metadata:
|
|
database-data:
|
|
nginx-certs:
|
|
```
|
|
|
|
### File Names
|
|
|
|
Organize by function:
|
|
- `docker-compose/media.yml` - Media services (Plex, Jellyfin, etc.)
|
|
- `docker-compose/monitoring.yml` - Monitoring stack
|
|
- `docker-compose/infrastructure.yml` - Core services (DNS, reverse proxy)
|
|
- `docker-compose/development.yml` - Dev tools
|
|
|
|
## Network Architecture
|
|
|
|
### Network Types
|
|
|
|
1. **Bridge Networks** (Most Common)
|
|
```yaml
|
|
networks:
|
|
homelab-network:
|
|
driver: bridge
|
|
ipam:
|
|
config:
|
|
- subnet: 172.20.0.0/16
|
|
```
|
|
|
|
2. **Host Network** (When Performance Critical)
|
|
```yaml
|
|
services:
|
|
performance-critical:
|
|
network_mode: host
|
|
```
|
|
|
|
3. **Overlay Networks** (For Swarm/Multi-host)
|
|
```yaml
|
|
networks:
|
|
swarm-network:
|
|
driver: overlay
|
|
```
|
|
|
|
### Network Design Patterns
|
|
|
|
#### Pattern 1: Single Shared Network
|
|
Simplest approach for small homelabs:
|
|
```yaml
|
|
networks:
|
|
homelab-network:
|
|
external: true
|
|
```
|
|
|
|
Create once manually:
|
|
```bash
|
|
docker network create homelab-network
|
|
```
|
|
|
|
#### Pattern 2: Segmented Networks
|
|
Better security through isolation:
|
|
```yaml
|
|
networks:
|
|
frontend-network: # Web-facing services
|
|
backend-network: # Databases, internal services
|
|
monitoring-network: # Observability
|
|
```
|
|
|
|
#### Pattern 3: Service-Specific Networks
|
|
Each service group has its own network:
|
|
```yaml
|
|
services:
|
|
web:
|
|
networks:
|
|
- frontend
|
|
- backend
|
|
|
|
database:
|
|
networks:
|
|
- backend # Not exposed to frontend
|
|
```
|
|
|
|
### Network Security
|
|
|
|
- Place databases on internal networks only
|
|
- Use separate networks for untrusted services
|
|
- Expose minimal ports to the host
|
|
- Use reverse proxies for web services
|
|
|
|
## Volume Management
|
|
|
|
### Volume Types
|
|
|
|
#### Named Volumes (Managed by Docker)
|
|
```yaml
|
|
volumes:
|
|
database-data:
|
|
driver: local
|
|
```
|
|
|
|
**Use for:**
|
|
- Database files
|
|
- Application data
|
|
- Anything Docker should manage
|
|
|
|
**Advantages:**
|
|
- Docker handles permissions
|
|
- Easy to backup/restore
|
|
- Portable across systems
|
|
|
|
#### Bind Mounts (Direct Host Paths)
|
|
```yaml
|
|
volumes:
|
|
- ./config/app:/config
|
|
- /media:/media:ro
|
|
```
|
|
|
|
**Use for:**
|
|
- Configuration files you edit directly
|
|
- Large media libraries
|
|
- Shared data with host
|
|
|
|
**Advantages:**
|
|
- Direct file access
|
|
- Easy to edit
|
|
- Can share with host applications
|
|
|
|
#### tmpfs Mounts (RAM)
|
|
```yaml
|
|
tmpfs:
|
|
- /tmp
|
|
```
|
|
|
|
**Use for:**
|
|
- Temporary data
|
|
- Cache that doesn't need persistence
|
|
- Sensitive data that shouldn't touch disk
|
|
|
|
### Volume Best Practices
|
|
|
|
1. **Consistent Paths:**
|
|
```yaml
|
|
volumes:
|
|
- ./config/service:/config # Always use /config inside container
|
|
- service-data:/data # Always use /data for application data
|
|
```
|
|
|
|
2. **Read-Only When Possible:**
|
|
```yaml
|
|
volumes:
|
|
- /media:/media:ro # Media library is read-only
|
|
```
|
|
|
|
3. **Separate Config from Data:**
|
|
```yaml
|
|
volumes:
|
|
- ./config/plex:/config # Editable configuration
|
|
- plex-metadata:/metadata # Application-managed data
|
|
```
|
|
|
|
4. **Backup Strategy:**
|
|
```bash
|
|
# Backup named volume
|
|
docker run --rm \
|
|
-v plex-metadata:/data \
|
|
-v $(pwd)/backups:/backup \
|
|
busybox tar czf /backup/plex-metadata.tar.gz /data
|
|
```
|
|
|
|
## Security Best Practices
|
|
|
|
### 1. Image Security
|
|
|
|
**Pin Specific Versions:**
|
|
```yaml
|
|
# ✅ Good - Specific version
|
|
image: nginx:1.25.3-alpine
|
|
|
|
# ❌ Bad - Latest tag
|
|
image: nginx:latest
|
|
```
|
|
|
|
**Use Official or Trusted Images:**
|
|
- Official Docker images
|
|
- LinuxServer.io (lscr.io)
|
|
- Trusted vendors
|
|
|
|
**Scan Images:**
|
|
```bash
|
|
docker scan vendor/image:tag
|
|
```
|
|
|
|
### 2. Secret Management
|
|
|
|
**Never Commit Secrets:**
|
|
```yaml
|
|
# .env file (gitignored)
|
|
DB_PASSWORD=super-secret-password
|
|
API_KEY=sk-1234567890
|
|
|
|
# docker-compose.yml
|
|
environment:
|
|
- DB_PASSWORD=${DB_PASSWORD}
|
|
- API_KEY=${API_KEY}
|
|
```
|
|
|
|
**Provide Templates:**
|
|
```bash
|
|
# .env.example (committed)
|
|
DB_PASSWORD=changeme
|
|
API_KEY=your-api-key-here
|
|
```
|
|
|
|
### 3. User Permissions
|
|
|
|
**Run as Non-Root:**
|
|
```yaml
|
|
environment:
|
|
- PUID=1000 # Your user ID
|
|
- PGID=1000 # Your group ID
|
|
```
|
|
|
|
**Check Current User:**
|
|
```bash
|
|
id -u # Gets your UID
|
|
id -g # Gets your GID
|
|
```
|
|
|
|
### 4. Network Security
|
|
|
|
**Minimal Exposure:**
|
|
```yaml
|
|
# ✅ Good - Only expose what's needed
|
|
ports:
|
|
- "127.0.0.1:8080:8080" # Only accessible from localhost
|
|
|
|
# ❌ Bad - Exposed to all interfaces
|
|
ports:
|
|
- "8080:8080"
|
|
```
|
|
|
|
**Use Reverse Proxy:**
|
|
```yaml
|
|
# Don't expose services directly
|
|
# Use Nginx/Traefik to proxy with SSL
|
|
```
|
|
|
|
### 5. Resource Limits
|
|
|
|
**Prevent Resource Exhaustion:**
|
|
```yaml
|
|
deploy:
|
|
resources:
|
|
limits:
|
|
cpus: '2'
|
|
memory: 4G
|
|
reservations:
|
|
cpus: '0.5'
|
|
memory: 1G
|
|
```
|
|
|
|
## Monitoring and Logging
|
|
|
|
### Logging Configuration
|
|
|
|
**Standard Logging:**
|
|
```yaml
|
|
logging:
|
|
driver: "json-file"
|
|
options:
|
|
max-size: "10m"
|
|
max-file: "3"
|
|
```
|
|
|
|
**Centralized Logging:**
|
|
```yaml
|
|
logging:
|
|
driver: "syslog"
|
|
options:
|
|
syslog-address: "tcp://192.168.1.100:514"
|
|
```
|
|
|
|
### Health Checks
|
|
|
|
**HTTP Health Check:**
|
|
```yaml
|
|
healthcheck:
|
|
test: ["CMD", "curl", "-f", "http://localhost:8080/health"]
|
|
interval: 30s
|
|
timeout: 10s
|
|
retries: 3
|
|
```
|
|
|
|
**TCP Health Check:**
|
|
```yaml
|
|
healthcheck:
|
|
test: ["CMD-SHELL", "nc -z localhost 5432 || exit 1"]
|
|
interval: 30s
|
|
timeout: 5s
|
|
retries: 3
|
|
```
|
|
|
|
**Custom Script:**
|
|
```yaml
|
|
healthcheck:
|
|
test: ["CMD", "/healthcheck.sh"]
|
|
interval: 30s
|
|
timeout: 10s
|
|
retries: 3
|
|
```
|
|
|
|
### Monitoring Stack Example
|
|
|
|
```yaml
|
|
# docker-compose/monitoring.yml
|
|
services:
|
|
prometheus:
|
|
image: prom/prometheus:v2.48.0
|
|
container_name: prometheus
|
|
restart: unless-stopped
|
|
volumes:
|
|
- ./config/prometheus:/etc/prometheus
|
|
- prometheus-data:/prometheus
|
|
ports:
|
|
- "9090:9090"
|
|
networks:
|
|
- monitoring-network
|
|
|
|
grafana:
|
|
image: grafana/grafana:10.2.2
|
|
container_name: grafana
|
|
restart: unless-stopped
|
|
volumes:
|
|
- grafana-data:/var/lib/grafana
|
|
ports:
|
|
- "3000:3000"
|
|
networks:
|
|
- monitoring-network
|
|
depends_on:
|
|
- prometheus
|
|
|
|
volumes:
|
|
prometheus-data:
|
|
grafana-data:
|
|
|
|
networks:
|
|
monitoring-network:
|
|
driver: bridge
|
|
```
|
|
|
|
## Troubleshooting
|
|
|
|
### Common Issues
|
|
|
|
#### Service Won't Start
|
|
|
|
**1. Check logs:**
|
|
```bash
|
|
docker compose -f docker-compose/service.yml logs
|
|
```
|
|
|
|
**2. Validate configuration:**
|
|
```bash
|
|
docker compose -f docker-compose/service.yml config
|
|
```
|
|
|
|
**3. Check for port conflicts:**
|
|
```bash
|
|
# See what's using a port
|
|
sudo netstat -tlnp | grep :8080
|
|
```
|
|
|
|
**4. Verify image exists:**
|
|
```bash
|
|
docker images | grep service-name
|
|
```
|
|
|
|
#### Permission Errors
|
|
|
|
**1. Check PUID/PGID:**
|
|
```bash
|
|
# Your user ID
|
|
id -u
|
|
|
|
# Your group ID
|
|
id -g
|
|
```
|
|
|
|
**2. Fix directory permissions:**
|
|
```bash
|
|
sudo chown -R 1000:1000 ./config/service-name
|
|
```
|
|
|
|
**3. Check volume permissions:**
|
|
```bash
|
|
docker compose -f docker-compose/service.yml exec service-name ls -la /config
|
|
```
|
|
|
|
#### Network Connectivity Issues
|
|
|
|
**1. Verify network exists:**
|
|
```bash
|
|
docker network ls
|
|
docker network inspect homelab-network
|
|
```
|
|
|
|
**2. Check if services are on same network:**
|
|
```bash
|
|
docker network inspect homelab-network | grep Name
|
|
```
|
|
|
|
**3. Test connectivity:**
|
|
```bash
|
|
docker compose -f docker-compose/service.yml exec service1 ping service2
|
|
```
|
|
|
|
#### Container Keeps Restarting
|
|
|
|
**1. Watch logs:**
|
|
```bash
|
|
docker compose -f docker-compose/service.yml logs -f
|
|
```
|
|
|
|
**2. Check health status:**
|
|
```bash
|
|
docker compose -f docker-compose/service.yml ps
|
|
```
|
|
|
|
**3. Inspect container:**
|
|
```bash
|
|
docker inspect container-name
|
|
```
|
|
|
|
### Debugging Commands
|
|
|
|
```bash
|
|
# Enter running container
|
|
docker compose -f docker-compose/service.yml exec service-name /bin/sh
|
|
|
|
# View full container configuration
|
|
docker inspect container-name
|
|
|
|
# See resource usage
|
|
docker stats container-name
|
|
|
|
# View recent events
|
|
docker events --since 10m
|
|
|
|
# Check disk space
|
|
docker system df
|
|
```
|
|
|
|
### Recovery Procedures
|
|
|
|
#### Service Corrupted
|
|
|
|
```bash
|
|
# Stop service
|
|
docker compose -f docker-compose/service.yml down
|
|
|
|
# Remove container and volumes (backup first!)
|
|
docker compose -f docker-compose/service.yml down -v
|
|
|
|
# Recreate from scratch
|
|
docker compose -f docker-compose/service.yml up -d
|
|
```
|
|
|
|
#### Network Issues
|
|
|
|
```bash
|
|
# Remove and recreate network
|
|
docker network rm homelab-network
|
|
docker network create homelab-network
|
|
|
|
# Restart services
|
|
docker compose -f docker-compose/*.yml up -d
|
|
```
|
|
|
|
#### Full System Reset (Nuclear Option)
|
|
|
|
```bash
|
|
# ⚠️ WARNING: This removes everything!
|
|
# Backup first!
|
|
|
|
# Stop all containers
|
|
docker stop $(docker ps -aq)
|
|
|
|
# Remove all containers
|
|
docker rm $(docker ps -aq)
|
|
|
|
# Remove all volumes (careful!)
|
|
docker volume rm $(docker volume ls -q)
|
|
|
|
# Remove all networks (except defaults)
|
|
docker network prune -f
|
|
|
|
# Rebuild from compose files
|
|
docker compose -f docker-compose/*.yml up -d
|
|
```
|
|
|
|
## Maintenance
|
|
|
|
### Regular Tasks
|
|
|
|
**Weekly:**
|
|
- Review logs for errors
|
|
- Check disk space: `docker system df`
|
|
- Update security patches on images
|
|
|
|
**Monthly:**
|
|
- Update images to latest versions
|
|
- Review and prune unused resources
|
|
- Backup volumes
|
|
- Review and optimize compose files
|
|
|
|
**Quarterly:**
|
|
- Full stack review
|
|
- Documentation update
|
|
- Performance optimization
|
|
- Security audit
|
|
|
|
### Update Procedure
|
|
|
|
```bash
|
|
# 1. Backup current state
|
|
docker compose -f docker-compose/service.yml config > backup/service-config.yml
|
|
|
|
# 2. Update image version in compose file
|
|
# Edit docker-compose/service.yml
|
|
|
|
# 3. Pull new image
|
|
docker compose -f docker-compose/service.yml pull
|
|
|
|
# 4. Recreate service
|
|
docker compose -f docker-compose/service.yml up -d
|
|
|
|
# 5. Verify
|
|
docker compose -f docker-compose/service.yml logs -f
|
|
|
|
# 6. Test functionality
|
|
# Access service and verify it works
|
|
```
|
|
|
|
## Conclusion
|
|
|
|
Following these guidelines ensures:
|
|
- Consistent infrastructure
|
|
- Easy troubleshooting
|
|
- Reproducible deployments
|
|
- Maintainable system
|
|
- Better security
|
|
|
|
Remember: **Infrastructure as Code** means treating your Docker Compose files as critical documentation. Keep them clean, commented, and version-controlled.
|