- Add pre-flight validation checks (internet, disk space, Docker availability) - Fix Authelia password hash extraction (handle 'Digest:' prefix format) - Improve credential flow between setup and deploy scripts - Save plain password for user reference in ADMIN_PASSWORD.txt - Add cleanup for directory/file conflicts on re-runs - Add automatic Authelia database cleanup for encryption key mismatches - Add error recovery guidance with cleanup trap - Display credentials prominently after deployment - Update step numbering (now 10 steps with pre-flight) - Update documentation to Round 6 Tested on fresh Debian 12 installation - both scripts now complete successfully.
368 lines
14 KiB
Markdown
368 lines
14 KiB
Markdown
# Round 6 Testing - Agent Instructions for Script Optimization
|
|
|
|
## Mission Context
|
|
Test and optimize the AI-Homelab deployment scripts ([setup-homelab.sh](scripts/setup-homelab.sh) and [deploy-homelab.sh](scripts/deploy-homelab.sh)) on a **test server** environment. Focus on improving repository code quality and reliability, not test server infrastructure.
|
|
|
|
## Current Status - Round 6
|
|
- **Test Environment**: Fresh Debian installation
|
|
- **Testing Complete**: ✅ Both scripts now working successfully
|
|
- **Issues Fixed**:
|
|
- Password hash extraction from Authelia output (added proper parsing)
|
|
- Credential flow between setup and deploy scripts (saved plain password)
|
|
- Directory vs file conflicts on re-runs (added cleanup logic)
|
|
- Authelia database encryption key mismatches (added auto-cleanup)
|
|
- Password display for user reference (now shows and saves password)
|
|
- **Previous Rounds**: 1-5 focused on permission handling, deployment order, and stability
|
|
|
|
## Primary Objectives
|
|
|
|
### 1. Script Reliability Analysis
|
|
- [ ] Identify why setup-homelab.sh failed during Authelia password hash generation
|
|
- [ ] Review error handling and fallback mechanisms
|
|
- [ ] Check Docker availability before container-based operations
|
|
- [ ] Validate all external dependencies (curl, openssl, docker)
|
|
|
|
### 2. Repository Improvements (Priority Focus)
|
|
- [ ] Enhance error messages with actionable debugging steps
|
|
- [ ] Add pre-flight validation checks before critical operations
|
|
- [ ] Improve script idempotency (safe to run multiple times)
|
|
- [ ] Add progress indicators for long-running operations
|
|
- [ ] Document assumptions and prerequisites more clearly
|
|
|
|
### 3. Testing Methodology
|
|
- [ ] Create test checklist for each script function
|
|
- [ ] Document expected vs. actual behavior at failure point
|
|
- [ ] Test edge cases (partial runs, re-runs, missing dependencies)
|
|
- [ ] Verify scripts work on truly fresh Debian installation
|
|
|
|
## Critical Operating Principles
|
|
|
|
### DO Focus On (Repository Optimization)
|
|
✅ **Script robustness**: Better error handling, validation, recovery
|
|
✅ **User experience**: Clear messages, progress indicators, helpful errors
|
|
✅ **Documentation**: Update docs to match actual script behavior
|
|
✅ **Idempotency**: Scripts can be safely re-run without breaking state
|
|
✅ **Dependency checks**: Validate requirements before attempting operations
|
|
✅ **Rollback capability**: Provide recovery steps when operations fail
|
|
|
|
### DON'T Focus On (Test Server Infrastructure)
|
|
❌ **Test server setup**: Don't modify the test server's base configuration
|
|
❌ **System packages**: Test scripts should work with standard Debian
|
|
❌ **Hardware setup**: Focus on software reliability, not hardware optimization
|
|
❌ **Performance tuning**: Prioritize reliability over speed
|
|
|
|
## Specific Issues to Address (Round 6)
|
|
|
|
### Issue 1: Authelia Password Hash Generation Failure
|
|
**Problem**: Script failed while generating password hash using Docker container
|
|
|
|
**Investigation Steps**:
|
|
1. Check if Docker daemon is running before attempting container operations
|
|
2. Verify Docker image pull capability (network connectivity)
|
|
3. Add timeout handling for Docker run operations
|
|
4. Provide fallback mechanism if Docker-based hash generation fails
|
|
|
|
**Proposed Solutions**:
|
|
```bash
|
|
# Option 1: Pre-validate Docker availability
|
|
if ! docker info &> /dev/null 2>&1; then
|
|
log_error "Docker is not available for password hash generation"
|
|
log_info "This requires Docker to be running. Please ensure:"
|
|
log_info " 1. Docker daemon is started: sudo systemctl start docker"
|
|
log_info " 2. User can access Docker: docker ps"
|
|
exit 1
|
|
fi
|
|
|
|
# Option 2: Add timeout and error handling
|
|
log_info "Generating password hash (this may take a moment)..."
|
|
if ! PASSWORD_HASH=$(timeout 30 docker run --rm authelia/authelia:4.37 authelia crypto hash generate argon2 --password "$ADMIN_PASSWORD" 2>&1 | grep '^\$argon2'); then
|
|
log_error "Failed to generate password hash"
|
|
log_info "Error output:"
|
|
echo "$PASSWORD_HASH"
|
|
log_info ""
|
|
log_info "Troubleshooting steps:"
|
|
log_info " 1. Check Docker is running: docker ps"
|
|
log_info " 2. Test image pull: docker pull authelia/authelia:4.37"
|
|
log_info " 3. Check network connectivity"
|
|
exit 1
|
|
fi
|
|
|
|
# Option 3: Provide manual hash generation instructions
|
|
if [ -z "$PASSWORD_HASH" ]; then
|
|
log_error "Automated hash generation failed"
|
|
log_info ""
|
|
log_info "You can generate the hash manually after setup:"
|
|
log_info " docker run --rm authelia/authelia:4.37 authelia crypto hash generate argon2"
|
|
log_info " Then edit: /opt/stacks/core/authelia/users_database.yml"
|
|
log_info ""
|
|
read -p "Continue without setting admin password now? (y/N): " -n 1 -r
|
|
echo ""
|
|
if [[ ! $REPLY =~ ^[Yy]$ ]]; then
|
|
exit 1
|
|
fi
|
|
# Set placeholder that requires manual configuration
|
|
PASSWORD_HASH="\$argon2id\$v=19\$m=65536,t=3,p=4\$CHANGEME"
|
|
fi
|
|
```
|
|
|
|
### Issue 2: Script Exit Codes and Error Recovery
|
|
|
|
**Problem**: Script exits on first error without cleanup or recovery guidance
|
|
|
|
**Improvements**:
|
|
```bash
|
|
# Add at script beginning
|
|
set -e # Exit on error
|
|
set -u # Exit on undefined variable
|
|
set -o pipefail # Exit on pipe failures
|
|
|
|
# Add trap for cleanup on error
|
|
cleanup_on_error() {
|
|
local exit_code=$?
|
|
log_error "Script failed with exit code: $exit_code"
|
|
log_info ""
|
|
log_info "Partial setup may have occurred. To resume:"
|
|
log_info " 1. Review error messages above"
|
|
log_info " 2. Fix the issue if possible"
|
|
log_info " 3. Re-run: sudo ./setup-homelab.sh"
|
|
log_info ""
|
|
log_info "The script is designed to be idempotent (safe to re-run)"
|
|
}
|
|
|
|
trap cleanup_on_error ERR
|
|
```
|
|
|
|
### Issue 3: Pre-flight Validation
|
|
|
|
**Add comprehensive checks before starting**:
|
|
```bash
|
|
# New Step 0: Pre-flight validation
|
|
log_info "Step 0/9: Running pre-flight checks..."
|
|
|
|
# Check internet connectivity
|
|
if ! ping -c 1 8.8.8.8 &> /dev/null && ! ping -c 1 1.1.1.1 &> /dev/null; then
|
|
log_error "No internet connectivity detected"
|
|
log_info "Internet access is required for:"
|
|
log_info " - Installing packages"
|
|
log_info " - Downloading Docker images"
|
|
log_info " - Accessing Docker Hub"
|
|
exit 1
|
|
fi
|
|
|
|
# Check DNS resolution
|
|
if ! nslookup google.com &> /dev/null; then
|
|
log_warning "DNS resolution may be impaired"
|
|
fi
|
|
|
|
# Check disk space
|
|
AVAILABLE_SPACE=$(df / | tail -1 | awk '{print $4}')
|
|
REQUIRED_SPACE=5000000 # 5GB in KB
|
|
if [ "$AVAILABLE_SPACE" -lt "$REQUIRED_SPACE" ]; then
|
|
log_error "Insufficient disk space"
|
|
log_info "Available: $(($AVAILABLE_SPACE / 1024))MB"
|
|
log_info "Required: $(($REQUIRED_SPACE / 1024))MB"
|
|
exit 1
|
|
fi
|
|
|
|
log_success "Pre-flight checks passed"
|
|
echo ""
|
|
```
|
|
|
|
## Testing Checklist for Round 6
|
|
|
|
### setup-homelab.sh Testing
|
|
- [ ] **Clean run**: Fresh Debian 12 system with no prior setup
|
|
- [ ] **Re-run safety**: Run script twice, verify idempotency
|
|
- [ ] **Interrupted run**: Stop script mid-execution, verify recovery
|
|
- [ ] **Network failure**: Simulate network issues during package installation
|
|
- [ ] **Docker unavailable**: Test behavior when Docker daemon isn't running
|
|
- [ ] **User input validation**: Test invalid inputs (weak passwords, invalid emails)
|
|
- [ ] **NVIDIA skip**: Test skipping GPU driver installation
|
|
- [ ] **NVIDIA install**: Test GPU driver installation path (if hardware available)
|
|
|
|
### deploy-homelab.sh Testing
|
|
- [ ] **Missing .env**: Test behavior when .env file doesn't exist
|
|
- [ ] **Invalid .env**: Test with incomplete or malformed .env values
|
|
- [ ] **Port conflicts**: Test when ports 80/443 are already in use
|
|
- [ ] **Network issues**: Simulate Docker network creation failures
|
|
- [ ] **Core stack failure**: Test recovery if core services fail to start
|
|
- [ ] **Certificate generation**: Monitor Let's Encrypt certificate acquisition
|
|
- [ ] **Re-deployment**: Test redeploying over existing infrastructure
|
|
- [ ] **Partial deployment**: Stop mid-deployment, verify recovery
|
|
|
|
## Documentation Updates Required
|
|
|
|
### 1. Update [docs/getting-started.md](docs/getting-started.md)
|
|
- Document actual script behavior observed in testing
|
|
- Add troubleshooting section for common failure scenarios
|
|
- Include recovery procedures for partial deployments
|
|
- Update Round 4 → Round 6 references
|
|
|
|
### 2. Update [AGENT_INSTRUCTIONS.md](AGENT_INSTRUCTIONS.md)
|
|
- Change "Round 4" references to current round
|
|
- Add learnings from Round 5 and 6 testing
|
|
- Document new error handling patterns
|
|
- Update success criteria
|
|
|
|
### 3. Create Testing Guide
|
|
- Document test environment setup
|
|
- Provide test case templates
|
|
- Include expected vs actual result tracking
|
|
- Create test automation scripts if beneficial
|
|
|
|
## Agent Action Workflow
|
|
|
|
### Phase 1: Analysis (30 minutes)
|
|
1. Review last terminal output to identify exact failure point
|
|
2. Read relevant script sections around the failure
|
|
3. Check for similar issues in previous rounds (git history)
|
|
4. Document root cause and contributing factors
|
|
|
|
### Phase 2: Solution Design (20 minutes)
|
|
1. Propose specific code changes to address issues
|
|
2. Design error handling and recovery mechanisms
|
|
3. Create validation checks to prevent recurrence
|
|
4. Draft user-facing error messages
|
|
|
|
### Phase 3: Implementation (40 minutes)
|
|
1. Apply changes to scripts using multi_replace_string_in_file
|
|
2. Update related documentation
|
|
3. Add comments explaining complex error handling
|
|
4. Update .env.example if new variables needed
|
|
|
|
### Phase 4: Validation (30 minutes)
|
|
1. Review script syntax: `bash -n scripts/*.sh`
|
|
2. Check for common shell script issues (shellcheck if available)
|
|
3. Verify all error paths have recovery guidance
|
|
4. Test modified sections in isolation if possible
|
|
|
|
### Phase 5: Documentation (20 minutes)
|
|
1. Update CHANGELOG or version notes
|
|
2. Document breaking changes if any
|
|
3. Update README if testing procedure changed
|
|
4. Create/update this round's testing summary
|
|
|
|
## Success Criteria for Round 6
|
|
|
|
### Must Have
|
|
- ✅ setup-homelab.sh completes successfully on fresh Debian 12
|
|
- ✅ Script provides clear error messages when failures occur
|
|
- ✅ All error conditions have recovery/troubleshooting guidance
|
|
- ✅ Scripts can be safely re-run without breaking existing setup
|
|
- ✅ Docker availability validated before Docker operations
|
|
|
|
### Should Have
|
|
- ✅ Pre-flight validation catches issues before they cause failures
|
|
- ✅ Progress indicators show user what's happening
|
|
- ✅ Timeout handling prevents infinite hangs
|
|
- ✅ Network failures provide helpful next steps
|
|
- ✅ Documentation updated to match current behavior
|
|
|
|
### Nice to Have
|
|
- ⭐ Automated testing framework for scripts
|
|
- ⭐ Rollback capability for partial deployments
|
|
- ⭐ Log collection for debugging
|
|
- ⭐ Summary report at end of execution
|
|
- ⭐ Colored output supports colorless terminals
|
|
|
|
## Key Files to Modify
|
|
|
|
### Primary Targets
|
|
1. [scripts/setup-homelab.sh](scripts/setup-homelab.sh) - Lines 1-491
|
|
- Focus areas: Steps 7 (Authelia secrets), error handling
|
|
2. [scripts/deploy-homelab.sh](scripts/deploy-homelab.sh) - Lines 1-340
|
|
- Focus areas: Pre-flight checks, error recovery
|
|
3. [AGENT_INSTRUCTIONS.md](AGENT_INSTRUCTIONS.md) - Round references
|
|
4. [docs/getting-started.md](docs/getting-started.md) - Testing documentation
|
|
|
|
### Supporting Files
|
|
- [.env.example](.env.example) - Validate all variables documented
|
|
- [README.md](README.md) - Update if quick start changed
|
|
- [docs/quick-reference.md](docs/quick-reference.md) - Add troubleshooting entries
|
|
|
|
## Constraints and Guidelines
|
|
|
|
### Follow Repository Standards
|
|
- Maintain existing code style and conventions
|
|
- Use established logging functions (log_info, log_error, etc.)
|
|
- Keep user-facing messages clear and actionable
|
|
- Preserve idempotency where it exists
|
|
|
|
### Avoid Scope Creep
|
|
- Don't rewrite entire scripts - make targeted improvements
|
|
- Don't add features - focus on fixing reliability issues
|
|
- Don't optimize performance - focus on correctness
|
|
- Don't modify test server - fix repository code
|
|
|
|
### Safety First
|
|
- All changes must maintain backward compatibility
|
|
- Don't remove existing functionality
|
|
- Add validation, don't skip steps
|
|
- Provide rollback/recovery for all operations
|
|
|
|
## Communication with User
|
|
|
|
### Regular Updates
|
|
- Report progress through testing phases
|
|
- Highlight blockers or unexpected findings
|
|
- Ask for clarification when needed
|
|
- Summarize changes made
|
|
|
|
### Final Report Format
|
|
```markdown
|
|
# Round 6 Testing Summary
|
|
|
|
## Issues Identified
|
|
1. [Issue name] - [Brief description]
|
|
- Root cause: ...
|
|
- Impact: ...
|
|
|
|
## Changes Implemented
|
|
1. [File]: [Change description]
|
|
- Why: ...
|
|
- Testing: ...
|
|
|
|
## Testing Results
|
|
- ✅ [What works]
|
|
- ⚠️ [What needs attention]
|
|
- ❌ [What still fails]
|
|
|
|
## Next Steps for Round 7
|
|
1. [Remaining issue to address]
|
|
2. [Enhancement to consider]
|
|
```
|
|
|
|
---
|
|
|
|
## Immediate Actions for Agent
|
|
|
|
1. **Investigate current failure**:
|
|
- Review terminal output showing exit code 1
|
|
- Identify line in setup-homelab.sh where failure occurred
|
|
- Determine if Docker was available for password hash generation
|
|
|
|
2. **Propose specific fixes**:
|
|
- Add Docker availability check before Authelia password hash step
|
|
- Add timeout handling for docker run command
|
|
- Provide fallback if automated hash generation fails
|
|
|
|
3. **Implement improvements**:
|
|
- Update setup-homelab.sh with error handling
|
|
- Add pre-flight validation checks
|
|
- Enhance error messages with recovery steps
|
|
|
|
4. **Update documentation**:
|
|
- Document Round 6 findings
|
|
- Update troubleshooting guide
|
|
- Revise Round 4 references to current round
|
|
|
|
5. **Prepare for validation**:
|
|
- Create test checklist
|
|
- Document expected behavior
|
|
- Note any manual testing steps needed
|
|
|
|
---
|
|
|
|
*Generated for Round 6 testing of AI-Homelab deployment scripts*
|
|
*Focus: Repository optimization, not test server modification*
|