Files
EZ-Homelab/docs/multi-server-implementation-plan.md
kelinfoxy 5cbb106160 Add multi-server support and update docs
Introduce multi-server architecture documentation and reorganize README content. Top-level README now documents Core vs Remote server roles, links to local docs instead of wiki pages, and highlights Traefik/Sablier multi-server behavior. docker-compose/README.md was rewritten to be a template-style reference with single- and multi-server deployment guidance, Traefik label examples, and sablier usage; dockge README was moved into docker-compose/dockge/. docker-compose/core/README.md was updated to describe core responsibilities, shared CA artifacts, and startup order for multi-server deployments. Several obsolete/duplicated docs and action reports were removed and a new multi-server deployment doc was added to centralize on-demand/remote service guidance. Overall this cleans up legacy docs and documents the multi-server workflow and TLS/shared-CA requirements.
2026-02-05 22:30:52 -05:00

50 KiB

Multi-Server Traefik and Sablier Implementation Plan

⚠️ IMPLEMENTATION STATUS: COMPLETED (with modified approach)

Date: December 2024
Status: The multi-server architecture has been successfully implemented, but with a simplified approach that differs from this original plan.

What Changed:

  • Traefik Multi-Docker Provider: NOT implemented (complexity, Docker API security concerns)
  • Per-Server Traefik: Each server runs its own Traefik for LOCAL discovery only
  • Per-Server Sablier: Each server runs its own Sablier for LOCAL lazy loading
  • Core Routing: Core Traefik uses file provider (YAML) to route to remote services
  • No Docker API TLS: Servers communicate via HTTP/HTTPS, not Docker API

Current Documentation:

This Document: Preserved for historical reference and future consideration. The original plan's centralized Traefik approach may be revisited in the future if label-based remote discovery becomes a priority.


Original Implementation Plan (Historical Reference)

Executive Summary

This document outlines the implementation plan for enabling label-based automatic routing and lazy loading across multiple servers in the EZ-Homelab infrastructure. The goals are to:

  1. Traefik Multi-Server Setup: Enable Traefik on the core server to automatically discover and route to Docker services on remote servers using labels, eliminating manual YAML file maintenance.
  2. Per-Server Sablier Deployment: Deploy Sablier instances on each server to control local lazy loading, eliminating docker-proxy dependencies for Sablier control.

System Constraints

Development/Test Environment: Raspberry Pi 4 (4GB RAM)

  • Critical: Avoid memory-intensive operations that could hang the system
  • Strategy: Use lightweight validation, avoid large file operations in memory, implement timeouts
  • Monitoring: Check process resource usage before long-running operations

Current State Analysis

Working Components (DO NOT MODIFY)

  • Prerequisites installation (Docker, packages)
  • Core stack deployment (DuckDNS, Traefik, Authelia)
  • TLS certificate generation for docker-proxy
  • Variable replacement in labels (localize_compose_labels)
  • Variable replacement in config files (localize_config_file, localize_users_database_file)
  • Image tags and service configurations

Current Architecture

Traefik Configuration

  • Static Config (traefik.yml): Single Docker provider (local socket)
  • Dynamic Config (dynamic/*.yml): Manual YAML files for external hosts
  • Problem: Each non-local service requires manual YAML file creation in /opt/stacks/core/traefik/dynamic/external-host-*.yml

Sablier Configuration

  • Current: Single Sablier instance on core server
  • Remote Control: Uses DOCKER_HOST=tcp://remote-ip:2376 with TLS
  • Problem: Centralized control requires docker-proxy on all servers; single point of failure

TLS Infrastructure

  • Already Working: setup_docker_tls() function generates:
    • CA certificates
    • Server certificates
    • Client certificates
    • Configures Docker daemon for TLS on port 2376

Proposed Architecture

Overview

┌─────────────────────────────────────────────────────────────────┐
│  Router (Ports 80/443 forwarded to Core Server)                 │
└──────────────────────────┬──────────────────────────────────────┘
                           │
                           ▼
┌─────────────────────────────────────────────────────────────────┐
│  CORE SERVER                                                     │
│  ┌────────────────────────────────────────────────────────────┐ │
│  │ Traefik (Multiple Docker Providers)                        │ │
│  │  • Local Docker Provider: /var/run/docker.sock            │ │
│  │  • Remote Provider 1: tcp://remote1-ip:2376 (TLS)         │ │
│  │  • Remote Provider 2: tcp://remote2-ip:2376 (TLS)         │ │
│  │  • Auto-discovers all containers with traefik.enable=true │ │
│  └────────────────────────────────────────────────────────────┘ │
│  ┌────────────────────────────────────────────────────────────┐ │
│  │ Sablier (Local Control Only)                               │ │
│  │  • DOCKER_HOST: unix:///var/run/docker.sock               │ │
│  │  • Controls only core server containers                    │ │
│  └────────────────────────────────────────────────────────────┘ │
│  ┌────────────────────────────────────────────────────────────┐ │
│  │ Shared CA: /opt/stacks/core/shared-ca/                     │ │
│  │  • ca.pem, ca-key.pem (distributed to all servers)        │ │
│  └────────────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────────┘
                           │
                ┌──────────┴──────────┐
                ▼                     ▼
┌──────────────────────────┐  ┌──────────────────────────┐
│  REMOTE SERVER 1         │  │  REMOTE SERVER 2         │
│  ┌────────────────────┐  │  │  ┌────────────────────┐  │
│  │ Docker Daemon      │  │  │  │ Docker Daemon      │  │
│  │ Port: 2376 (TLS)   │  │  │  │ Port: 2376 (TLS)   │  │
│  │ Uses shared CA     │  │  │  │ Uses shared CA     │  │
│  └────────────────────┘  │  │  └────────────────────┘  │
│  ┌────────────────────┐  │  │  ┌────────────────────┐  │
│  │ Sablier (Local)    │  │  │  │ Sablier (Local)    │  │
│  │ Controls local     │  │  │  │ Controls local     │  │
│  │ containers only    │  │  │  │ containers only    │  │
│  └────────────────────┘  │  │  └────────────────────┘  │
└──────────────────────────┘  └──────────────────────────┘

Key Changes

1. Traefik Multi-Provider Configuration

Location: /opt/stacks/core/traefik/config/traefik.yml

Change: Modify providers.docker section to support multiple endpoints

providers:
  docker:
    # Local Docker provider (always present)
    endpoint: "unix:///var/run/docker.sock"
    exposedByDefault: false
    network: "traefik-network"
    watch: true
    
  # Additional remote Docker providers (added dynamically)
  docker-remote1:
    endpoint: "tcp://REMOTE1_IP:2376"
    exposedByDefault: false
    network: "traefik-network"
    watch: true
    tls:
      ca: "/certs/ca.pem"
      cert: "/certs/client-cert.pem"
      key: "/certs/client-key.pem"
      insecureSkipVerify: false
      
  # Pattern repeats for each remote server

Note: Traefik v3 supports multiple Docker providers with different names. Each remote server gets its own provider definition.

2. Sablier Per-Server Deployment

Current: One Sablier in core stack with remote docker-proxy
Proposed: Sablier as separate stack, deployed on each server (core, remote1, remote2, etc.)

Sablier Stack (identical on all servers):

# /opt/stacks/sablier/docker-compose.yml
services:
  sablier:
    image: sablierapp/sablier:latest
    container_name: sablier
    restart: unless-stopped
    networks:
      - traefik-network
    environment:
      - SABLIER_PROVIDER=docker
      - DOCKER_HOST=unix:///var/run/docker.sock  # Local only
      - SABLIER_DOCKER_NETWORK=traefik-network
      - SABLIER_LOG_LEVEL=info
    volumes:
      - /var/run/docker.sock:/var/run/docker.sock:ro
    ports:
      - 10000:10000  # Accessible from core Traefik for middleware

networks:
  traefik-network:
    external: true

Key Change: Sablier moved from core stack to dedicated stack for consistency across servers

Sablier Middleware Labels (per service):

labels:
  - "sablier.enable=true"
  - "sablier.group=${SERVER_HOSTNAME}-servicename"
  - "traefik.http.routers.service.middlewares=sablier-${SERVER_HOSTNAME}-servicename@file,authelia@docker"

Note: Container name changed from sablier-service to sablier for consistency

Dynamic Sablier Middleware Config (per server):

# /opt/stacks/core/traefik/dynamic/sablier-remote1.yml
http:
  middlewares:
    sablier-remote1-service:
      plugin:
        sablier:
          sablierUrl: http://REMOTE1_IP:10000  # Points to remote Sablier
          group: remote1-service
          sessionDuration: 30m

Implementation Plan

Phase 1: Shared CA Infrastructure (Already Complete)

Status: Already implemented in generate_shared_ca() and setup_multi_server_tls()

Components:

  • generate_shared_ca(): Creates /opt/stacks/core/shared-ca/ with ca.pem and ca-key.pem
  • setup_multi_server_tls(): Fetches shared CA from core server via SSH/SCP
  • setup_docker_tls(): Uses shared CA to generate server/client certs

No changes needed - this infrastructure already supports multi-server TLS.


Phase 2: Script Enhancements

2.1 New Functions in scripts/common.sh

Purpose: Shared utilities for multi-server management

# Function: detect_server_role
# Purpose: Determine if this is a core server or remote server
# Logic:
#   - Checks if CORE_SERVER_IP is set in .env
#   - If empty or matches SERVER_IP: this is core
#   - If different: this is remote
# Returns: "core" or "remote"
detect_server_role() {
    local server_ip="${SERVER_IP}"
    local core_ip="${CORE_SERVER_IP:-}"
    
    if [ -z "$core_ip" ] || [ "$core_ip" == "$server_ip" ]; then
        echo "core"
    else
        echo "remote"
    fi
}

# Function: generate_traefik_provider_config
# Purpose: Generate a Traefik Docker provider block for a remote server
# Input: $1 = provider name (e.g., "docker-remote1")
#        $2 = remote server IP
#        $3 = TLS certs directory
# Output: YAML snippet for traefik.yml providers section
# Usage: Called when adding a new remote server to core
generate_traefik_provider_config() {
    local provider_name="$1"
    local remote_ip="$2"
    local tls_dir="$3"
    
    cat <<EOF
  ${provider_name}:
    endpoint: "tcp://${remote_ip}:2376"
    exposedByDefault: false
    network: "traefik-network"
    watch: true
    tls:
      ca: "${tls_dir}/ca.pem"
      cert: "${tls_dir}/client-cert.pem"
      key: "${tls_dir}/client-key.pem"
      insecureSkipVerify: false
EOF
}

# Function: generate_sablier_middleware_config
# Purpose: Generate Sablier middleware config for a remote server
# Input: $1 = server hostname
#        $2 = remote server IP
#        $3 = service name
# Output: YAML file in /opt/stacks/core/traefik/dynamic/
# Usage: Auto-generate middleware configs for remote Sablier instances
generate_sablier_middleware_config() {
    local server_hostname="$1"
    local remote_ip="$2"
    local service_name="$3"
    
    local output_file="/opt/stacks/core/traefik/dynamic/sablier-${server_hostname}-${service_name}.yml"
    
    cat > "$output_file" <<EOF
# Auto-generated Sablier middleware for ${server_hostname}
# Generated: $(date)
http:
  middlewares:
    sablier-${server_hostname}-${service_name}:
      plugin:
        sablier:
          sablierUrl: http://${remote_ip}:10000
          group: ${server_hostname}-${service_name}
          sessionDuration: 30m
          ignoreUserAgent: curl
          dynamic:
            displayName: ${service_name^}  # Capitalize first letter
            theme: ghost
            show-details-by-default: true
EOF
    
    log_success "Generated Sablier middleware: $output_file"
}

# Function: add_remote_server_to_traefik
# Purpose: Add a new remote server to core Traefik configuration
# Input: $1 = remote server hostname
#        $2 = remote server IP
# Process:
#   1. Generate provider config block
#   2. Append to traefik.yml (if not already present)
#   3. Generate Sablier middleware template
#   4. Restart Traefik to apply changes
# Usage: Called on core server when adding a new remote
add_remote_server_to_traefik() {
    local remote_hostname="$1"
    local remote_ip="$2"
    local traefik_config="/opt/stacks/core/traefik/config/traefik.yml"
    local provider_name="docker-${remote_hostname}"
    
    log_info "Adding remote server ${remote_hostname} (${remote_ip}) to Traefik..."
    
    # Check if provider already exists
    if grep -q "  ${provider_name}:" "$traefik_config"; then
        log_warning "Provider ${provider_name} already exists in Traefik config"
        return 0
    fi
    
    # Generate provider config
    local provider_config=$(generate_traefik_provider_config \
        "$provider_name" \
        "$remote_ip" \
        "/certs")
    
    # Append to traefik.yml under providers section
    # Note: Requires careful YAML formatting
    # Find the line with "providers:" and append after the last provider
    awk -v config="$provider_config" '
        /^providers:/ { in_providers=1; print; next }
        in_providers && /^[^ ]/ { print config; in_providers=0 }
        { print }
        END { if (in_providers) print config }
    ' "$traefik_config" > "${traefik_config}.tmp"
    
    mv "${traefik_config}.tmp" "$traefik_config"
    
    log_success "Added provider ${provider_name} to Traefik config"
    
    # Generate base Sablier middleware config
    generate_sablier_middleware_config "$remote_hostname" "$remote_ip" "example"
    
    log_warning "Restart Traefik for changes to take effect: docker compose -f /opt/stacks/core/docker-compose.yml restart traefik"
}

2.2 Enhancements to Existing Functions in scripts/ez-homelab.sh

Purpose: Enhance existing validation and workflow functions

# Function: check_docker_installed (NEW)
# Purpose: Silently check if Docker is installed and running
# Returns: 0 if Docker ready, 1 if not
# Note: Lightweight check to avoid hanging on limited resources (Pi 4 4GB)
check_docker_installed() {
    # Quick check without spawning heavy processes
    if command -v docker >/dev/null 2>&1; then
        # Verify Docker daemon is responsive (with timeout)
        if timeout 3 docker ps >/dev/null 2>&1; then
            return 0
        fi
    fi
    return 1
}

# Modification to REQUIRED_VARS (EXISTING - line 398)
# Purpose: Make REQUIRED_VARS dynamic based on deployment type
# Current: REQUIRED_VARS=("SERVER_IP" "SERVER_HOSTNAME" "DUCKDNS_SUBDOMAINS" "DUCKDNS_TOKEN" "DOMAIN" "DEFAULT_USER" "DEFAULT_PASSWORD" "DEFAULT_EMAIL")
# Enhancement: Add function to set required vars based on deployment choice
set_required_vars_for_deployment() {
    local deployment_type="${1:-core}"
    
    if [ "$deployment_type" == "core" ]; then
        # Core deployment requires DuckDNS and domain variables
        REQUIRED_VARS=("SERVER_IP" "SERVER_HOSTNAME" "DUCKDNS_SUBDOMAINS" "DUCKDNS_TOKEN" "DOMAIN" "DEFAULT_USER" "DEFAULT_PASSWORD" "DEFAULT_EMAIL")
    elif [ "$deployment_type" == "remote" ]; then
        # Remote deployment requires remote server connection variables
        REQUIRED_VARS=("SERVER_IP" "SERVER_HOSTNAME" "REMOTE_SERVER_IP" "REMOTE_SERVER_HOSTNAME" "REMOTE_SERVER_USER" "DEFAULT_USER" "DEFAULT_EMAIL")
    fi
}

# Enhancement to validate_and_prompt_variables() (EXISTING - line 562)
# Purpose: Existing function already validates and prompts for missing variables
# No changes needed - this function already:
#   1. Checks if each var in REQUIRED_VARS is valid
#   2. Prompts user for invalid/missing values
#   3. Loops until all valid
# Simply call set_required_vars_for_deployment() before calling this function

# Function: register_remote_server_with_core
# Purpose: After deploying remote server, register it with core Traefik
# Process:
#   1. Use variables from .env (REMOTE_SERVER_IP, REMOTE_SERVER_USER, etc.)
#   2. Only prompt for missing/invalid values
#   3. SSH to core server and run registration
#   4. Restart Traefik to apply changes
# Usage: Called at end of remote server deployment
# Note: Uses existing .env variables to minimize user interaction
register_remote_server_with_core() {
    log_info "Registering this server with core Traefik..."
    
    # Load variables from .env
    local remote_ip="${REMOTE_SERVER_IP:-}"
    local remote_hostname="${REMOTE_SERVER_HOSTNAME:-}"
    local remote_user="${REMOTE_SERVER_USER:-${ACTUAL_USER}}"
    local remote_password="${REMOTE_SERVER_PASSWORD:-}"
    
    # Validate and prompt only for missing values
    if [ -z "$remote_ip" ] || [[ "$remote_ip" == your.* ]]; then
        read -p "Core server IP address: " remote_ip
        sed -i "s|^REMOTE_SERVER_IP=.*|REMOTE_SERVER_IP=$remote_ip|" "$REPO_DIR/.env"
    fi
    
    if [ -z "$remote_hostname" ] || [[ "$remote_hostname" == your-* ]]; then
        read -p "Core server hostname [$(echo $remote_ip | tr '.' '-')]: " remote_hostname
        remote_hostname=${remote_hostname:-$(echo $remote_ip | tr '.' '-')}
        sed -i "s|^REMOTE_SERVER_HOSTNAME=.*|REMOTE_SERVER_HOSTNAME=$remote_hostname|" "$REPO_DIR/.env"
    fi
    
    if [ -z "$remote_user" ]; then
        read -p "SSH username for core server [${ACTUAL_USER}]: " remote_user
        remote_user=${remote_user:-$ACTUAL_USER}
        sed -i "s|^REMOTE_SERVER_USER=.*|REMOTE_SERVER_USER=$remote_user|" "$REPO_DIR/.env"
    fi
    
    echo ""
    echo "Registering with core server: ${remote_user}@${remote_ip} (${remote_hostname})"
    read -p "Proceed? (y/n) [y]: " register_choice
    register_choice=${register_choice:-y}
    
    if [ "$register_choice" != "y" ]; then
        log_warning "Skipping registration. Manual steps:"
        echo "  1. SSH to core: ssh ${remote_user}@${remote_ip}"
        echo "  2. Run: cd ~/EZ-Homelab && source scripts/common.sh"
        echo "  3. Run: add_remote_server_to_traefik ${SERVER_HOSTNAME} ${SERVER_IP}"
        echo "  4. Restart Traefik: docker compose -f /opt/stacks/core/docker-compose.yml restart traefik"
        return 0
    fi
    
    # Test SSH connection with timeout (Pi resource constraint)
    log_info "Testing SSH connection..."
    if ! timeout 10 ssh -o ConnectTimeout=5 -o StrictHostKeyChecking=no \
         "${remote_user}@${remote_ip}" "echo 'SSH OK'" 2>/dev/null; then
        log_error "Cannot connect to core server via SSH"
        log_warning "Check: SSH keys configured, core server reachable, user has access"
        return 1
    fi
    
    # Execute registration on core server (with timeout for Pi)
    log_info "Executing registration on core server..."
    timeout 30 ssh "${remote_user}@${remote_ip}" bash <<EOF
cd ~/EZ-Homelab
source scripts/common.sh
add_remote_server_to_traefik "${SERVER_HOSTNAME}" "${SERVER_IP}"
docker compose -f /opt/stacks/core/docker-compose.yml restart traefik
EOF
    
    if [ $? -eq 0 ]; then
        log_success "Server registered with core Traefik"
        log_success "Services with traefik.enable=true will now auto-route"
    else
        log_error "Registration failed or timed out"
        log_warning "Try manual registration (see above)"
    fi
}

2.3 Modifications to Core Stack Compose File

Changes: Remove Sablier from core stack (moved to separate stack)

Location: docker-compose/core/docker-compose.yml

Action: Delete the sablier-service section entirely from core compose file

Rationale:

  • Sablier will be deployed as a separate stack (/opt/stacks/sablier/)
  • Keeps core stack focused on routing/auth (DuckDNS, Traefik, Authelia)
  • Allows consistent Sablier deployment across core and remote servers
  • No script modifications needed - changes are in repo compose files

2.4 New Sablier Stack

Location: docker-compose/sablier/docker-compose.yml

# Sablier Stack - Lazy Loading Service
# Deploy on ALL servers (core and remote) for local container control
#
# This stack is identical on all servers - no configuration differences needed
# Sablier controls only containers on the local server via Docker socket

services:
  sablier:
    image: sablierapp/sablier:latest
    container_name: sablier
    restart: unless-stopped
    networks:
      - traefik-network
    environment:
      - SABLIER_PROVIDER=docker
      - SABLIER_DOCKER_API_VERSION=1.51
      - SABLIER_DOCKER_NETWORK=traefik-network
      - SABLIER_LOG_LEVEL=info
      # Local Docker socket only - no remote access needed
    volumes:
      - /var/run/docker.sock:/var/run/docker.sock:ro
    ports:
      - "10000:10000"
    labels:
      - "homelab.category=infrastructure"
      - "homelab.description=Lazy loading service for local containers"

networks:
  traefik-network:
    external: true

x-dockge:
  urls:
    - http://${SERVER_IP}:10000

Deployment:

  • Core server: Deployed automatically after core stack
  • Remote servers: Deployed automatically during remote setup

#### 2.5 New Deployment Function for Remote Server

**Purpose**: Deploy remote server infrastructure (Docker TLS + Sablier)

```bash
# Function: deploy_remote_server
# Purpose: Deploy Docker TLS and Sablier on remote servers
# Process:
#   1. Validate required variables using existing mechanism
#   2. Setup Docker TLS using shared CA from core
#   3. Deploy Sablier stack (from repo compose file)
#   4. Configure firewall for port 2376
#   5. Register with core Traefik
# Note: Optimized for Pi 4 - uses timeouts and lightweight operations
deploy_remote_server() {
    log_info "Deploying remote server infrastructure..."
    
    # Set required variables for remote deployment
    set_required_vars_for_deployment "remote"
    
    # Use existing validation and prompting function
    # This will check all variables in REQUIRED_VARS and prompt for missing/invalid ones
    validate_and_prompt_variables
    
    # Save updated configuration
    save_env_file
    
    log_info "  - Docker TLS (port 2376)"
    log_info "  - Sablier stack (lazy loading)"
    echo ""
    
    # Ensure shared CA is fetched from core
    if [ ! -f "/opt/stacks/core/shared-ca/ca.pem" ]; then
        log_info "Fetching shared CA from core server..."
        setup_multi_server_tls
        if [ $? -ne 0 ]; then
            log_error "Failed to fetch shared CA from core server"
            log_warning "Ensure core server is accessible and has shared CA"
            return 1
        fi
    fi
    
    # Setup Docker TLS with shared CA
    setup_docker_tls
    
    # Deploy Sablier stack (using repo compose file, not generated)
    log_info "Deploying Sablier stack..."
    sudo mkdir -p /opt/stacks/sablier
    sudo chown "$ACTUAL_USER:$ACTUAL_USER" /opt/stacks/sablier
    
    # Copy compose file from repo
    sudo cp "$REPO_DIR/docker-compose/sablier/docker-compose.yml" /opt/stacks/sablier/docker-compose.yml
    sudo cp "$REPO_DIR/.env" /opt/stacks/sablier/.env
    sudo chown "$ACTUAL_USER:$ACTUAL_USER" /opt/stacks/sablier/docker-compose.yml
    sudo chown "$ACTUAL_USER:$ACTUAL_USER" /opt/stacks/sablier/.env
    
    # Remove sensitive variables from Sablier .env
    sed -i '/^AUTHELIA_/d' /opt/stacks/sablier/.env
    
    # Deploy (with timeout for Pi resource constraint)
    cd /opt/stacks/sablier
    timeout 60 docker compose up -d || {
        log_error "Sablier deployment timed out or failed"
        return 1
    }
    
    log_success "Sablier stack deployed"
    
    # Configure firewall for Docker API access
    if command -v ufw >/dev/null 2>&1; then
        log_info "Configuring firewall for Docker API access..."
        sudo ufw allow from "${CORE_SERVER_IP}" to any port 2376 proto tcp
        log_success "Firewall configured"
    fi
    
    # Register with core Traefik
    register_remote_server_with_core
    
    echo ""
    log_success "Remote server setup complete!"
    echo ""
    echo "Next steps:"
    echo "  1. Deploy additional stacks (media, productivity, etc.)"
    echo "  2. Use Traefik labels as usual - routing is automatic"
    echo "  3. Use Sablier labels for lazy loading local services"
    echo ""
}

2.6 Enhanced Main Workflow in scripts/ez-homelab.sh

Changes: Keep existing menu, add smart pre-checks and use existing validation

# Main workflow enhancement - called at script start
main() {
    # ... existing args parsing ...
    
    # STEP 1: Silent Docker check (lightweight for Pi)
    if ! check_docker_installed; then
        # Docker not installed - only show prerequisites option
        echo ""
        echo "=================================================="
        echo "  EZ-Homelab Setup"
        echo "=================================================="
        echo ""
        echo "Docker is not installed or not running."
        echo ""
        echo "  1) Install Prerequisites (Docker, packages)"
        echo "  2) Exit"
        echo ""
        read -p "Selection: " pre_choice
        
        case $pre_choice in
            1)
                prepare_deployment
                # After install, restart script to show full menu
                exec "$0" "$@"
                ;;
            2)
                exit 0
                ;;
            *)
                log_error "Invalid selection"
                exit 1
                ;;
        esac
    fi
    
    # STEP 2: Docker installed - show existing main menu
    show_main_menu
    
    read -p "Selection: " DEPLOY_CHOICE
    
    case $DEPLOY_CHOICE in
        1)
            # Install prerequisites (if needed)
            ;;
        2)
            # Deploy Core
            # Set required variables for core deployment
            set_required_vars_for_deployment "core"
            # Use existing validation function - prompts for missing/invalid vars
            validate_and_prompt_variables
            # Save configuration
            save_env_file
            DEPLOY_CORE=true
            ;;
        3)
            # Deploy Additional Server (Remote)
            # Set required variables for remote deployment
            set_required_vars_for_deployment "remote"
            # Use existing validation function - prompts for all required remote vars
            validate_and_prompt_variables
            # Save configuration
            save_env_file
            # Deploy remote infrastructure
            deploy_remote_server
            ;;
        # ... rest of existing menu options ...
    esac
    
    # ... rest of existing main logic ...
}

**Key Points**:
- **Reuses existing code**: `validate_and_prompt_variables()` already handles validation and prompting
- **Dynamic requirements**: `set_required_vars_for_deployment()` adjusts REQUIRED_VARS based on deployment type
- **No manual .env editing**: Script always prompts for missing/invalid variables
- **Existing logic preserved**: No changes to prompt_for_variable() or validation logic

Key Changes:

  • Pre-check: Silent Docker check before showing menu
  • Minimal Menu: If no Docker, only show "Install Prerequisites"
  • Keep Existing: Main menu unchanged (user's requirement)
  • Smart Validation: Check .env before core/remote deployments
  • Existing Prompts: Reuse validate_and_prompt_variables function
  • Pi-Friendly: All checks use timeouts and lightweight operations

---

### Phase 3: Configuration Files

#### 3.1 Updated `docker-compose/core/traefik/traefik.yml`

**Changes**: Add commented template for remote providers

**Note**: The `config-templates/` folder is deprecated. All working configs are in `docker-compose/` folder.

```yaml
# Traefik Static Configuration
# Source: docker-compose/core/traefik/traefik.yml

experimental:
  plugins:
    sablier:
      moduleName: github.com/sablierapp/sablier-traefik-plugin
      version: v1.1.0

providers:
  # Local Docker provider (always enabled)
  docker:
    endpoint: "unix:///var/run/docker.sock"
    exposedByDefault: false
    network: "traefik-network"
    watch: true
  
  # REMOTE DOCKER PROVIDERS
  # Uncomment and configure for each remote server in your homelab
  # These are auto-added by the add_remote_server_to_traefik function
  # when deploying remote servers
  
  # Example remote provider (auto-generated):
  # docker-remote1:
  #   endpoint: "tcp://192.168.1.100:2376"
  #   exposedByDefault: false
  #   network: "traefik-network"
  #   watch: true
  #   tls:
  #     ca: "/certs/ca.pem"
  #     cert: "/certs/client-cert.pem"
  #     key: "/certs/client-key.pem"
  #     insecureSkipVerify: false

  # File provider for dynamic configuration
  file:
    directory: /dynamic
    watch: true

entryPoints:
  web:
    address: ":80"
  websecure:
    address: ":443"
  traefik:
    address: ":8080"

certificatesResolvers:
  letsencrypt:
    acme:
      dnsChallenge:
        provider: duckdns
      email: ${DEFAULT_EMAIL}
      storage: /letsencrypt/acme.json

log:
  level: DEBUG

accessLog:
  format: json

api:
  dashboard: true
  insecure: true

ping:
  manualRouting: true

3.2 Updated Core docker-compose.yml

Changes: Remove Sablier section entirely (moved to separate stack)

Action: Delete the entire sablier-service section from docker-compose/core/docker-compose.yml

Rationale:

  • Sablier is now a separate, reusable stack
  • Core stack focuses on: DuckDNS, Traefik, Authelia only
  • Sablier deployed separately on all servers (core and remote)

3.3 New Sablier Stack Template

Location: docker-compose/sablier/docker-compose.yml

# Sablier Stack - Lazy Loading Service
# Deploy on ALL servers (core and remote) for local container control
#
# This stack is identical on all servers - no configuration differences needed
# Sablier controls only containers on the local server via Docker socket
#
# Deployment:
#   - Core server: Deployed automatically after core stack
#   - Remote servers: Deployed automatically during remote setup

services:
  sablier:
    image: sablierapp/sablier:latest
    container_name: sablier
    restart: unless-stopped
    networks:
      - traefik-network
    environment:
      - SABLIER_PROVIDER=docker
      - SABLIER_DOCKER_API_VERSION=1.51
      - SABLIER_DOCKER_NETWORK=traefik-network
      - SABLIER_LOG_LEVEL=info
      # Local Docker socket only - no remote Docker access needed
    volumes:
      - /var/run/docker.sock:/var/run/docker.sock:ro
    ports:
      - "10000:10000"
    labels:
      - "homelab.category=infrastructure"
      - "homelab.description=Lazy loading service for local containers"
      # No Traefik routing labels - accessed directly by Traefik plugin

networks:
  traefik-network:
    external: true

x-dockge:
  urls:
    - http://${SERVER_IP}:10000

Key Change: Container name sablier (not sablier-service) for consistency


Phase 4: Deployment Workflow

4.1 Core Server Setup (First Server)

User Actions:

# 1. Clone repository and run setup
cd ~/EZ-Homelab
./scripts/ez-homelab.sh

# 2. If Docker not installed:
#    - Script shows limited menu: "1) Install Prerequisites"
#    - Select option 1, script installs Docker
#    - Script restarts and shows full menu

# 3. Select "2) Deploy Core"
#    - Script validates .env
#    - If invalid/missing: runs configuration wizard
#    - Deploys: DuckDNS, Traefik, Authelia
#    - Generates shared CA in /opt/stacks/core/shared-ca/
#    - Automatically deploys Sablier stack

# 4. Select "3) Infrastructure" (optional)
#    - Dockge, Portainer, etc.

# 5. Deploy other stacks as needed

Script Behavior:

  • Pre-check: Silently checks Docker (lightweight, Pi-safe)
  • If no Docker: Shows only "Install Prerequisites" option
  • If Docker present: Shows full existing menu (unchanged)
  • Option 2 (Deploy Core):
    • Validates .env for core variables
    • Runs existing prompt function if needed
    • Deploys core stack (DuckDNS, Traefik, Authelia)
    • Automatically deploys Sablier stack after core
    • Role detection: REMOTE_SERVER_IP empty or equals SERVER_IP → Core server

4.2 Remote Server Setup (Additional Servers)

User Actions:

# 1. Clone repository and run setup on remote server
cd ~/EZ-Homelab
./scripts/ez-homelab.sh

# 2. If Docker not installed:
#    - Script shows limited menu: "1) Install Prerequisites"
#    - Select option 1, script installs Docker
#    - Script restarts and shows full menu

# 3. Select "3) Deploy Additional Server"
#    - Script prompts for all required variables:
#      * SERVER_IP (this server's IP)
#      * SERVER_HOSTNAME (this server's hostname)
#      * REMOTE_SERVER_IP (core server IP)
#      * REMOTE_SERVER_HOSTNAME (core server hostname)
#      * REMOTE_SERVER_USER (SSH user on core)
#      * DEFAULT_USER (default user)
#      * DEFAULT_EMAIL (default email)
#    - Script validates each value
#    - Saves to .env file

# Note: Editing .env manually is optional - script will prompt for everything

# 4. After configuration:
#    - Script validates .env for remote variables
#    - If missing/invalid: prompts only for those variables
#    - Fetches shared CA from core server via SSH (with timeout)
#    - Configures Docker TLS (port 2376)
#    - Deploys Sablier stack (local control)
#    - Prompts to register with core Traefik

# 5. Registration (semi-automatic):
#    - Uses REMOTE_SERVER_* variables from .env
#    - Only prompts for missing values
#    - SSH to core server (with 10s timeout - Pi-safe)
#    - Runs add_remote_server_to_traefik on core
#    - Restarts Traefik on core

# 6. Deploy additional stacks via Dockge
#    - Use standard Traefik labels
#    - Services auto-route through core

Script Behavior:

  • Pre-check: Docker installed (same as core)
  • Option 3 (Deploy Additional Server):
    • Sets REQUIRED_VARS to include remote server variables:
      • SERVER_IP, SERVER_HOSTNAME (this server)
      • REMOTE_SERVER_IP, REMOTE_SERVER_HOSTNAME, REMOTE_SERVER_USER (core server)
      • DEFAULT_USER, DEFAULT_EMAIL
    • Calls existing validate_and_prompt_variables() function
    • Prompts for ALL variables in REQUIRED_VARS (validates and prompts if missing/invalid)
    • Saves complete configuration to .env
    • Fetches shared CA via SSH (timeout 30s for Pi)
    • Configures Docker TLS with shared CA
    • Deploys Sablier stack from repo compose file
    • Registration: uses saved .env variables
    • All operations have timeouts (Pi resource constraint)

4.3 Label-Based Service Deployment

Example Service on Remote Server:

services:
  sonarr:
    image: lscr.io/linuxserver/sonarr:latest
    container_name: sonarr
    restart: unless-stopped
    networks:
      - traefik-network
    environment:
      - PUID=1000
      - PGID=1000
    volumes:
      - ./sonarr/config:/config
      - /mnt/media:/media
    labels:
      # TRAEFIK ROUTING (automatic via core Traefik)
      - "traefik.enable=true"
      - "traefik.http.routers.sonarr.rule=Host(`sonarr.${DOMAIN}`)"
      - "traefik.http.routers.sonarr.entrypoints=websecure"
      - "traefik.http.routers.sonarr.tls.certresolver=letsencrypt"
      - "traefik.http.routers.sonarr.middlewares=authelia@docker"
      - "traefik.http.services.sonarr.loadbalancer.server.port=8989"
      
      # SABLIER LAZY LOADING (local Sablier control)
      - "sablier.enable=true"
      - "sablier.group=${SERVER_HOSTNAME}-arr"
      # Middleware points to remote Sablier via dynamic config
      # Generated by add_remote_server_to_traefik function

networks:
  traefik-network:
    external: true

How it works:

  1. Container starts on remote server (e.g., remote1)
  2. Traefik on core server discovers it via docker-remote1 provider
  3. Traefik reads labels and creates router: sonarr.domain.comhttp://remote1-ip:8989
  4. SSL certificate issued by core Traefik (Let's Encrypt)
  5. Sablier middleware points to http://remote1-ip:10000 for lazy loading
  6. User accesses sonarr.domain.com → Routes through core → Reaches remote container

No manual YAML files required - everything is label-driven!


Phase 5: Testing Strategy

5.1 Unit Tests

Test 1: Server Role Detection

# Test: Core server detection
export SERVER_IP="192.168.1.10"
export CORE_SERVER_IP=""
result=$(detect_server_role)
assert_equals "$result" "core"

# Test: Remote server detection
export CORE_SERVER_IP="192.168.1.10"
export SERVER_IP="192.168.1.20"
result=$(detect_server_role)
assert_equals "$result" "remote"

Test 2: Traefik Provider Generation

# Test: Provider config generation
config=$(generate_traefik_provider_config "docker-remote1" "192.168.1.20" "/certs")
assert_contains "$config" "docker-remote1:"
assert_contains "$config" "tcp://192.168.1.20:2376"

5.2 Integration Tests

Test 1: Core Server Deployment

  1. Deploy core stack on fresh Debian VM
  2. Verify Traefik has single local Docker provider
  3. Verify Sablier uses local Docker socket
  4. Check shared CA exists: /opt/stacks/core/shared-ca/ca.pem

Test 2: Remote Server Deployment

  1. Deploy remote infrastructure on second VM
  2. Verify Docker TLS listening on port 2376
  3. Verify Sablier uses local Docker socket
  4. Verify registration added provider to core Traefik
  5. Test: docker -H tcp://remote-ip:2376 --tlsverify ps from core

Test 3: Label-Based Routing

  1. Deploy test service on remote with Traefik labels
  2. Verify Traefik discovers service (check dashboard)
  3. Verify DNS resolves: nslookup test.domain.com
  4. Verify HTTPS access: curl https://test.domain.com
  5. Check certificate issued by core Traefik

Test 4: Lazy Loading

  1. Deploy service with Sablier labels on remote
  2. Stop service manually
  3. Access via browser
  4. Verify Sablier loading page appears
  5. Verify service starts within 30 seconds
  6. Verify service accessible after start

5.3 Rollback Testing

Test: Manual Provider Removal

# Remove provider from traefik.yml
sed -i '/docker-remote1:/,/insecureSkipVerify: false/d' \
    /opt/stacks/core/traefik/config/traefik.yml

# Restart Traefik
docker compose -f /opt/stacks/core/docker-compose.yml restart traefik

# Verify remote services no longer accessible
curl -I https://remote-service.domain.com  # Should fail

Migration Path

For Existing Deployments

Scenario: User has existing EZ-Homelab with manual external host YAML files

Step 1: Verify Automatic Backups

Important: The deploy scripts should automatically backup configurations before making changes.

Current Backup Location Check:

# Verify deploy-core.sh backs up from /opt/stacks/core/ (correct)
# NOT from ~/EZ-Homelab/docker-compose/core/ (incorrect - repo source files)

# Expected backup in deploy_core():
# cp -r /opt/stacks/core/traefik /opt/stacks/core/traefik.backup.$(date +%Y%m%d-%H%M%S)

Action Required: Review scripts/ez-homelab.sh deploy functions to ensure:

  • Backups are created from /opt/stacks/*/ (deployed location)
  • NOT from ~/EZ-Homelab/docker-compose/*/ (repo source)
  • Timestamped backups for rollback capability

If Backups Not Working:

# Manual backup as safety measure
cp -r /opt/stacks/core/traefik /opt/stacks/core/traefik.backup.$(date +%Y%m%d-%H%M%S)

Step 2: Update Core Server

# Pull latest repository changes
cd ~/EZ-Homelab
git pull

# Important: Traefik dynamic folder files need replacement
# Old method files (external-host-*.yml) will be replaced with:
#   - Updated sablier.yml (with per-server middleware configs)
#   - Auto-generated provider-specific configs

# Run deployment to update configs
./scripts/ez-homelab.sh
# Select: 2) Deploy Core (will update all configs)

Step 3: Configure Remote Servers

For each remote server that currently has manual YAML entries:

# 1. SSH to remote server
ssh remote-server

# 2. Setup Docker TLS
./scripts/ez-homelab.sh
# Select: Install Prerequisites → Configure Docker TLS

# 3. Deploy remote infrastructure
./scripts/ez-homelab.sh
# Select: Deploy Remote Infrastructure

# 4. Verify registration
# Check core Traefik logs for provider discovery

Step 4: Migrate Services

For each service currently using external YAML:

# 1. Add Traefik labels to docker-compose.yml on remote
labels:
  - "traefik.enable=true"
  - "traefik.http.routers.service.rule=Host(`service.${DOMAIN}`)"
  # ... other labels

# 2. Redeploy service
docker compose up -d

# 3. Verify Traefik discovers service
# Check Traefik dashboard

# 4. Remove manual YAML file
rm /opt/stacks/core/traefik/dynamic/external-host-service.yml

# 5. Restart Traefik
docker compose -f /opt/stacks/core/docker-compose.yml restart traefik

Step 5: Cleanup and Verification

# Note: Old external-host-*.yml files are replaced during core deployment
# The deploy script should:
#   1. Backup existing /opt/stacks/core/traefik/dynamic/
#   2. Copy new config files from repo: docker-compose/core/traefik/dynamic/
#   3. Process variables in new configs

# Verify new config structure:
ls -la /opt/stacks/core/traefik/dynamic/
# Expected files:
#   - sablier.yml (updated with new middleware format)
#   - Any auto-generated provider configs

# Old backups available at:
# /opt/stacks/core/traefik.backup.TIMESTAMP/

# Restart Traefik to apply changes
docker compose -f /opt/stacks/core/docker-compose.yml restart traefik

# Verify Traefik discovers local services
docker logs traefik | grep -i "provider.docker"

Documentation Updates

Files to Update

  1. AGENT_INSTRUCTIONS.md

    • Update "Service Creation with Traefik on a different server Template" section
    • Remove manual YAML instructions
    • Add "Multi-Server Label-Based Routing" section
  2. docs/getting-started.md

    • Add "Multi-Server Setup" section
    • Document core vs remote server deployment
  3. docs/proxying-external-hosts.md

    • Update title: "Multi-Server Docker Services"
    • Remove manual YAML method (legacy section)
    • Document label-based routing
  4. README.md

    • Update architecture diagram
    • Add multi-server features to feature list
  5. New Documentation

    • Create: docs/multi-server-setup.md
    • Create: docs/troubleshooting-multi-server.md

Risk Assessment and Mitigation

Risk 1: Breaking Existing Deployments

Impact: High
Probability: Medium
Mitigation:

  • Maintain backward compatibility
  • Keep manual YAML method functional
  • Provide migration script with rollback
  • Test on fresh VMs before release

Risk 2: TLS Certificate Issues

Impact: High (Docker API inaccessible)
Probability: Low (already tested)
Mitigation:

  • Shared CA already working
  • Detailed error messages in scripts
  • Fallback to local CA if core unavailable
  • Documentation for manual cert generation

Risk 3: Network Connectivity Issues

Impact: Medium
Probability: Medium
Mitigation:

  • Test SSH connectivity before operations
  • Provide manual registration steps
  • Firewall configuration warnings
  • Network troubleshooting guide

Risk 4: Traefik Configuration Corruption

Impact: High (all routing broken)
Probability: Low
Mitigation:

  • Backup traefik.yml before modifications
  • Validate YAML syntax before applying
  • Atomic file operations (write to .tmp, then mv)
  • Automatic rollback on validation failure

Risk 5: Sablier Cross-Server Confusion

Impact: Low (services don't wake up)
Probability: Medium
Mitigation:

  • Clear naming: sablier-${SERVER_HOSTNAME}-service
  • Separate middleware configs per server
  • Documentation on group naming conventions
  • Validation function to check Sablier reachability

Success Criteria

Phase 1: Core Functionality

  • Core server deploys with local Traefik + Sablier
  • Remote server deploys with Docker TLS + local Sablier
  • Remote server auto-registers with core Traefik
  • Service with Traefik labels on remote auto-routes
  • Service with Sablier labels on remote lazy loads

Phase 2: User Experience

  • Zero manual YAML editing for standard services
  • One-time registration per remote server
  • Clear error messages and recovery steps
  • Migration path for existing deployments

Phase 3: Documentation

  • Updated AGENT_INSTRUCTIONS.md
  • Multi-server setup guide
  • Troubleshooting documentation
  • Example service configurations

Phase 4: Testing

  • Fresh deployment on 2-server lab passes all tests
  • Migration from single-server passes all tests
  • Rollback procedures validated
  • No regression in existing functionality

Implementation Timeline

Week 1: Core Functions

  • Day 1-2: Implement common.sh functions
  • Day 3-4: Update ez-homelab.sh with role detection
  • Day 5: Test core and remote deployments separately

Week 2: Integration

  • Day 1-2: Implement registration workflow
  • Day 3: Update config templates
  • Day 4-5: Integration testing (2-server setup)

Week 3: Documentation and Polish

  • Day 1-2: Update all documentation
  • Day 3: Create migration guide
  • Day 4: User acceptance testing
  • Day 5: Bug fixes and refinement

Week 4: Release

  • Day 1-2: Final testing on fresh VMs
  • Day 3: Create release notesfrom .env | | generate_traefik_provider_config() | common.sh | Generate provider YAML | | generate_sablier_middleware_config() | common.sh | Generate middleware YAML | | add_remote_server_to_traefik() | common.sh | Register remote with core | | check_docker_installed() | ez-homelab.sh | Check Docker (with timeout for Pi) | | validate_env_file() | ez-homelab.sh | Validate .env for deployment type | | register_remote_server_with_core() | ez-homelab.sh | SSH registration (uses .env vars) | | deploy_remote_server() | ez-homelab.sh | Deploy remote server (TLS + Sablier)

A. Function Reference

New Functions:

Function File Purpose
detect_server_role() common.sh Determine core vs remote
generate_traefik_provider_config() common.sh Generate provider YAML
generate_sablier_middleware_config() common.sh Generate middleware YAML
add_remote_server_to_traefik() common.sh Register remote with core
prompt_for_server_role() ez-homelab.sh Interactive role selection
register_remote_server_with_core() ez-homelab.sh SSH registration workflow
deploy_remote_infrastructure() ez-homelab.sh Deploy remote stack

B. File Locations Reference

File Purpose Server
/opt/stacks/core/shared-ca/ca.pem Shared CA certificate Core
/opt/stacks/core/traefik/config/traefik.yml Traefik static config Core
/opt/stacks/core/traefik/dynamic/sablier-*.yml Sablier middlewares Core
/home/user/EZ-Homelab/docker-tls/ TLS certs for Docker API All
/opt/stacks/sablier/ Sablier stack All servers

C. Port Reference

Port Service Direction
80 Traefik HTTP Inbound (core)
443 Traefik HTTPS Inbound (core)
2376 Docker API (TLS) Core → Remote
10000 Sablier API Traefik → Sablier
9091 Authelia Internal (core)

Conclusion

This implementation plan provides a comprehensive roadmap for achieving label-based automatic routing and lazy loading across multiple servers. The approach maintains backward compatibility, leverages existing working infrastructure (TLS setup), and provides clear migration paths for existing deployments.

Key Benefits:

  • Zero manual YAML editing for standard services
  • Scalable to unlimited remote servers
  • Decentralized Sablier (no single point of failure)
  • Minimal core server changes (one-time per remote)
  • Full label-driven automation
  • Pi-friendly: Timeouts and lightweight checks prevent system hangs
  • Reuses existing menu structure and prompting logic
  • .env-driven: Minimal user interaction via smart defaults

Next Steps:

  1. Review and approve this plan
  2. Begin Phase 1 implementation (core functions)
  3. Test on development VMs
  4. Iterate based on testing results
  5. Document and release

Summary of Key Changes from Initial Plan

Simplified Workflow

  1. Removed prompt_for_server_role(): Uses existing menu options 2 & 3 instead
  2. Sablier Stack Separation: Moved from core to dedicated stack for consistency
  3. Container Naming: sablier (not sablier-service) across all deployments
  4. Compose Changes in Repo: No script overrides, changes made to source files

Enhanced User Experience

  1. Docker Pre-Check: Silent check before menu display
  2. Smart Menu: Limited menu if Docker missing, full menu if present
  3. .env Validation: Check required variables before each deployment type
  4. Minimal Prompting: Only ask for missing/invalid values
  5. Use .env Variables: REMOTE_SERVER_IP, REMOTE_SERVER_HOSTNAME, etc.

Pi 4 Optimizations

  1. Timeouts: All network operations (SSH, Docker, SCP) have timeouts
  2. Lightweight Checks: Avoid memory-heavy operations
  3. Efficient Validation: Use grep instead of loading entire files
  4. Process Monitoring: Prevent hanging processes on resource-constrained hardware

Implementation Priority

  1. Keep existing working code (prerequisites, core deployment)
  2. Enhance with validation and smart checks
  3. Add remote deployment as new option 3
  4. Minimal changes to existing flows

Plan Version: 2.0 (Revised)
Date: February 4, 2026
*Updated: Based on user feedback and Pi 4 constraints

Plan Version: 1.0
Date: February 4, 2026
Author: GitHub Copilot + User