- Add pre-flight validation checks (internet, disk space, Docker availability) - Fix Authelia password hash extraction (handle 'Digest:' prefix format) - Improve credential flow between setup and deploy scripts - Save plain password for user reference in ADMIN_PASSWORD.txt - Add cleanup for directory/file conflicts on re-runs - Add automatic Authelia database cleanup for encryption key mismatches - Add error recovery guidance with cleanup trap - Display credentials prominently after deployment - Update step numbering (now 10 steps with pre-flight) - Update documentation to Round 6 Tested on fresh Debian 12 installation - both scripts now complete successfully.
14 KiB
Round 6 Testing - Agent Instructions for Script Optimization
Mission Context
Test and optimize the AI-Homelab deployment scripts (setup-homelab.sh and deploy-homelab.sh) on a test server environment. Focus on improving repository code quality and reliability, not test server infrastructure.
Current Status - Round 6
- Test Environment: Fresh Debian installation
- Testing Complete: ✅ Both scripts now working successfully
- Issues Fixed:
- Password hash extraction from Authelia output (added proper parsing)
- Credential flow between setup and deploy scripts (saved plain password)
- Directory vs file conflicts on re-runs (added cleanup logic)
- Authelia database encryption key mismatches (added auto-cleanup)
- Password display for user reference (now shows and saves password)
- Previous Rounds: 1-5 focused on permission handling, deployment order, and stability
Primary Objectives
1. Script Reliability Analysis
- Identify why setup-homelab.sh failed during Authelia password hash generation
- Review error handling and fallback mechanisms
- Check Docker availability before container-based operations
- Validate all external dependencies (curl, openssl, docker)
2. Repository Improvements (Priority Focus)
- Enhance error messages with actionable debugging steps
- Add pre-flight validation checks before critical operations
- Improve script idempotency (safe to run multiple times)
- Add progress indicators for long-running operations
- Document assumptions and prerequisites more clearly
3. Testing Methodology
- Create test checklist for each script function
- Document expected vs. actual behavior at failure point
- Test edge cases (partial runs, re-runs, missing dependencies)
- Verify scripts work on truly fresh Debian installation
Critical Operating Principles
DO Focus On (Repository Optimization)
✅ Script robustness: Better error handling, validation, recovery ✅ User experience: Clear messages, progress indicators, helpful errors ✅ Documentation: Update docs to match actual script behavior ✅ Idempotency: Scripts can be safely re-run without breaking state ✅ Dependency checks: Validate requirements before attempting operations ✅ Rollback capability: Provide recovery steps when operations fail
DON'T Focus On (Test Server Infrastructure)
❌ Test server setup: Don't modify the test server's base configuration ❌ System packages: Test scripts should work with standard Debian ❌ Hardware setup: Focus on software reliability, not hardware optimization ❌ Performance tuning: Prioritize reliability over speed
Specific Issues to Address (Round 6)
Issue 1: Authelia Password Hash Generation Failure
Problem: Script failed while generating password hash using Docker container
Investigation Steps:
- Check if Docker daemon is running before attempting container operations
- Verify Docker image pull capability (network connectivity)
- Add timeout handling for Docker run operations
- Provide fallback mechanism if Docker-based hash generation fails
Proposed Solutions:
# Option 1: Pre-validate Docker availability
if ! docker info &> /dev/null 2>&1; then
log_error "Docker is not available for password hash generation"
log_info "This requires Docker to be running. Please ensure:"
log_info " 1. Docker daemon is started: sudo systemctl start docker"
log_info " 2. User can access Docker: docker ps"
exit 1
fi
# Option 2: Add timeout and error handling
log_info "Generating password hash (this may take a moment)..."
if ! PASSWORD_HASH=$(timeout 30 docker run --rm authelia/authelia:4.37 authelia crypto hash generate argon2 --password "$ADMIN_PASSWORD" 2>&1 | grep '^\$argon2'); then
log_error "Failed to generate password hash"
log_info "Error output:"
echo "$PASSWORD_HASH"
log_info ""
log_info "Troubleshooting steps:"
log_info " 1. Check Docker is running: docker ps"
log_info " 2. Test image pull: docker pull authelia/authelia:4.37"
log_info " 3. Check network connectivity"
exit 1
fi
# Option 3: Provide manual hash generation instructions
if [ -z "$PASSWORD_HASH" ]; then
log_error "Automated hash generation failed"
log_info ""
log_info "You can generate the hash manually after setup:"
log_info " docker run --rm authelia/authelia:4.37 authelia crypto hash generate argon2"
log_info " Then edit: /opt/stacks/core/authelia/users_database.yml"
log_info ""
read -p "Continue without setting admin password now? (y/N): " -n 1 -r
echo ""
if [[ ! $REPLY =~ ^[Yy]$ ]]; then
exit 1
fi
# Set placeholder that requires manual configuration
PASSWORD_HASH="\$argon2id\$v=19\$m=65536,t=3,p=4\$CHANGEME"
fi
Issue 2: Script Exit Codes and Error Recovery
Problem: Script exits on first error without cleanup or recovery guidance
Improvements:
# Add at script beginning
set -e # Exit on error
set -u # Exit on undefined variable
set -o pipefail # Exit on pipe failures
# Add trap for cleanup on error
cleanup_on_error() {
local exit_code=$?
log_error "Script failed with exit code: $exit_code"
log_info ""
log_info "Partial setup may have occurred. To resume:"
log_info " 1. Review error messages above"
log_info " 2. Fix the issue if possible"
log_info " 3. Re-run: sudo ./setup-homelab.sh"
log_info ""
log_info "The script is designed to be idempotent (safe to re-run)"
}
trap cleanup_on_error ERR
Issue 3: Pre-flight Validation
Add comprehensive checks before starting:
# New Step 0: Pre-flight validation
log_info "Step 0/9: Running pre-flight checks..."
# Check internet connectivity
if ! ping -c 1 8.8.8.8 &> /dev/null && ! ping -c 1 1.1.1.1 &> /dev/null; then
log_error "No internet connectivity detected"
log_info "Internet access is required for:"
log_info " - Installing packages"
log_info " - Downloading Docker images"
log_info " - Accessing Docker Hub"
exit 1
fi
# Check DNS resolution
if ! nslookup google.com &> /dev/null; then
log_warning "DNS resolution may be impaired"
fi
# Check disk space
AVAILABLE_SPACE=$(df / | tail -1 | awk '{print $4}')
REQUIRED_SPACE=5000000 # 5GB in KB
if [ "$AVAILABLE_SPACE" -lt "$REQUIRED_SPACE" ]; then
log_error "Insufficient disk space"
log_info "Available: $(($AVAILABLE_SPACE / 1024))MB"
log_info "Required: $(($REQUIRED_SPACE / 1024))MB"
exit 1
fi
log_success "Pre-flight checks passed"
echo ""
Testing Checklist for Round 6
setup-homelab.sh Testing
- Clean run: Fresh Debian 12 system with no prior setup
- Re-run safety: Run script twice, verify idempotency
- Interrupted run: Stop script mid-execution, verify recovery
- Network failure: Simulate network issues during package installation
- Docker unavailable: Test behavior when Docker daemon isn't running
- User input validation: Test invalid inputs (weak passwords, invalid emails)
- NVIDIA skip: Test skipping GPU driver installation
- NVIDIA install: Test GPU driver installation path (if hardware available)
deploy-homelab.sh Testing
- Missing .env: Test behavior when .env file doesn't exist
- Invalid .env: Test with incomplete or malformed .env values
- Port conflicts: Test when ports 80/443 are already in use
- Network issues: Simulate Docker network creation failures
- Core stack failure: Test recovery if core services fail to start
- Certificate generation: Monitor Let's Encrypt certificate acquisition
- Re-deployment: Test redeploying over existing infrastructure
- Partial deployment: Stop mid-deployment, verify recovery
Documentation Updates Required
1. Update docs/getting-started.md
- Document actual script behavior observed in testing
- Add troubleshooting section for common failure scenarios
- Include recovery procedures for partial deployments
- Update Round 4 → Round 6 references
2. Update AGENT_INSTRUCTIONS.md
- Change "Round 4" references to current round
- Add learnings from Round 5 and 6 testing
- Document new error handling patterns
- Update success criteria
3. Create Testing Guide
- Document test environment setup
- Provide test case templates
- Include expected vs actual result tracking
- Create test automation scripts if beneficial
Agent Action Workflow
Phase 1: Analysis (30 minutes)
- Review last terminal output to identify exact failure point
- Read relevant script sections around the failure
- Check for similar issues in previous rounds (git history)
- Document root cause and contributing factors
Phase 2: Solution Design (20 minutes)
- Propose specific code changes to address issues
- Design error handling and recovery mechanisms
- Create validation checks to prevent recurrence
- Draft user-facing error messages
Phase 3: Implementation (40 minutes)
- Apply changes to scripts using multi_replace_string_in_file
- Update related documentation
- Add comments explaining complex error handling
- Update .env.example if new variables needed
Phase 4: Validation (30 minutes)
- Review script syntax:
bash -n scripts/*.sh - Check for common shell script issues (shellcheck if available)
- Verify all error paths have recovery guidance
- Test modified sections in isolation if possible
Phase 5: Documentation (20 minutes)
- Update CHANGELOG or version notes
- Document breaking changes if any
- Update README if testing procedure changed
- Create/update this round's testing summary
Success Criteria for Round 6
Must Have
- ✅ setup-homelab.sh completes successfully on fresh Debian 12
- ✅ Script provides clear error messages when failures occur
- ✅ All error conditions have recovery/troubleshooting guidance
- ✅ Scripts can be safely re-run without breaking existing setup
- ✅ Docker availability validated before Docker operations
Should Have
- ✅ Pre-flight validation catches issues before they cause failures
- ✅ Progress indicators show user what's happening
- ✅ Timeout handling prevents infinite hangs
- ✅ Network failures provide helpful next steps
- ✅ Documentation updated to match current behavior
Nice to Have
- ⭐ Automated testing framework for scripts
- ⭐ Rollback capability for partial deployments
- ⭐ Log collection for debugging
- ⭐ Summary report at end of execution
- ⭐ Colored output supports colorless terminals
Key Files to Modify
Primary Targets
- scripts/setup-homelab.sh - Lines 1-491
- Focus areas: Steps 7 (Authelia secrets), error handling
- scripts/deploy-homelab.sh - Lines 1-340
- Focus areas: Pre-flight checks, error recovery
- AGENT_INSTRUCTIONS.md - Round references
- docs/getting-started.md - Testing documentation
Supporting Files
- .env.example - Validate all variables documented
- README.md - Update if quick start changed
- docs/quick-reference.md - Add troubleshooting entries
Constraints and Guidelines
Follow Repository Standards
- Maintain existing code style and conventions
- Use established logging functions (log_info, log_error, etc.)
- Keep user-facing messages clear and actionable
- Preserve idempotency where it exists
Avoid Scope Creep
- Don't rewrite entire scripts - make targeted improvements
- Don't add features - focus on fixing reliability issues
- Don't optimize performance - focus on correctness
- Don't modify test server - fix repository code
Safety First
- All changes must maintain backward compatibility
- Don't remove existing functionality
- Add validation, don't skip steps
- Provide rollback/recovery for all operations
Communication with User
Regular Updates
- Report progress through testing phases
- Highlight blockers or unexpected findings
- Ask for clarification when needed
- Summarize changes made
Final Report Format
# Round 6 Testing Summary
## Issues Identified
1. [Issue name] - [Brief description]
- Root cause: ...
- Impact: ...
## Changes Implemented
1. [File]: [Change description]
- Why: ...
- Testing: ...
## Testing Results
- ✅ [What works]
- ⚠️ [What needs attention]
- ❌ [What still fails]
## Next Steps for Round 7
1. [Remaining issue to address]
2. [Enhancement to consider]
Immediate Actions for Agent
-
Investigate current failure:
- Review terminal output showing exit code 1
- Identify line in setup-homelab.sh where failure occurred
- Determine if Docker was available for password hash generation
-
Propose specific fixes:
- Add Docker availability check before Authelia password hash step
- Add timeout handling for docker run command
- Provide fallback if automated hash generation fails
-
Implement improvements:
- Update setup-homelab.sh with error handling
- Add pre-flight validation checks
- Enhance error messages with recovery steps
-
Update documentation:
- Document Round 6 findings
- Update troubleshooting guide
- Revise Round 4 references to current round
-
Prepare for validation:
- Create test checklist
- Document expected behavior
- Note any manual testing steps needed
Generated for Round 6 testing of AI-Homelab deployment scripts Focus: Repository optimization, not test server modification