Server Maintenance Procedures

Overview

Regular maintenance is essential for keeping our infrastructure secure, stable, and performant. This guide outlines routine maintenance tasks and procedures.

Maintenance Schedule

Daily Tasks

Automated (verify running):

Backups completed successfully
Monitoring checks all passing
No critical alerts

Manual check (5 minutes):

Check support tickets for issues
Review monitoring dashboard
Check for any failed services
Verify email queue is processing

Weekly Tasks (30-60 minutes)

Every Monday morning:

Review backup logs (all successful?)
Check disk space on all servers
Review resource usage trends
Check for security updates
Review error logs for patterns
Verify SSL certificates not expiring soon (< 30 days)
Check for suspended/overdue accounts
Review bandwidth usage
Check spam blacklist status

Monthly Tasks (2-3 hours)

First weekend of each month:

Quarterly Tasks (Half day)

Every 3 months:

Full security audit
Test disaster recovery procedures
Review and update emergency contacts
Audit all passwords/credentials
Review capacity planning (need more resources?)
Update internal documentation
Review and update client contracts if needed
Check for software EOL (end of life) notices
Full backup integrity verification

Annual Tasks (Full day)

Once per year:

Complete infrastructure review
Evaluate new technologies/improvements
Review and renew licenses
Update disaster recovery plan
Review insurance and liability coverage
Security penetration testing (if budget allows)
Full documentation audit and update
Review and update all procedures
Team training on new features/systems

Routine Procedures

Backup Verification

Daily Check:

Login to backup management dashboard
Verify all scheduled backups completed
Check backup sizes (significant changes?)
Review any backup failures
Investigate and resolve any issues

Monthly Restoration Test:

Select random client account
Restore to test environment
Verify all files present
Test database restoration
Verify email restoration (if applicable)
Document test results
Delete test restoration

Security Updates

Critical Security Updates (Within 24-48 hours):

Review security bulletin
Assess impact and urgency
Schedule emergency maintenance if critical
Notify clients if downtime required
Take pre-update backup
Apply update
Test services
Monitor for issues
Document update

Regular Updates (Monthly maintenance window):

Review available updates
Note any that require reboot
Schedule maintenance window
Notify clients 48-72 hours in advance
Take backups before starting
Apply updates
Reboot if required
Verify all services running
Monitor for 24 hours post-update
Document updates applied

Log Review

Weekly Log Analysis:

Check these logs:

DirectAdmin/Web Server:

# Apache errors
tail -100 /var/log/httpd/error_log | grep -i "critical\|error"

# PHP errors
tail -100 /var/log/php/error.log

# DirectAdmin errors
tail -100 /var/log/directadmin/error.log

Email Server (SmarterMail):

# Check for delivery issues
tail -f /opt/SmarterMail/Logs/smtp.log | grep -i "error\|failed"

# Check spam filter
tail -f /opt/SmarterMail/Logs/spool.log | grep -i "error"

System Logs:

# System messages
tail -100 /var/log/messages | grep -i "error\|fail\|critical"

# Authentication logs
tail -100 /var/log/secure | grep -i "failed"

What to look for:

Repeated errors (indicates systemic issue)
Failed login attempts (security concern)
Disk space warnings
Service crashes
Database errors
Permission issues

Disk Space Management

When disk usage > 85%:

Identify what's using space:

du -sh /* | sort -h
du -sh /home/* | sort -h

Common culprits:
- Old backups
- Log files
- Client uploads/files
- Database dumps
- Email storage
Clean up:
- Rotate old logs: logrotate -f /etc/logrotate.conf
- Delete old backups beyond retention
- Contact clients with excessive usage
- Clear temp files: rm -rf /tmp/*
- Empty trash folders
Prevent recurrence:
- Adjust backup retention if needed
- Implement quotas
- Set up alerts at 80%

Database Optimization

Monthly MySQL/MariaDB Optimization:

# Login to MySQL
mysql -u root -p

# Show database sizes
SELECT 
  table_schema AS "Database",
  ROUND(SUM(data_length + index_length) / 1024 / 1024, 2) AS "Size (MB)"
FROM information_schema.TABLES
GROUP BY table_schema;

# Optimize tables
mysqlcheck -o --all-databases -u root -p

# Repair if needed
mysqlcheck -r --all-databases -u root -p

After optimization:

Monitor database performance
Check query execution times
Verify applications working correctly

Service Health Checks

Daily quick check:

# Check all services running
systemctl status httpd
systemctl status mysqld
systemctl status directadmin
systemctl status sshd

# Or use:
systemctl list-units --state=failed

Web Hosting:

Test website loading
Check DirectAdmin accessible
Verify FTP connection
Test email sending/receiving

Email Service:

Login to webmail
Send test email
Check mail queue
Verify spam filtering

Bot Hosting:

Login to Pterodactyl
Check active servers
Verify resource allocation
Test server starts

SSL Certificate Monitoring

Check certificate expiry:

# Check SSL cert expiration
echo | openssl s_client -servername domain.com -connect domain.com:443 2>/dev/null | openssl x509 -noout -dates

Automated renewal check:

Verify Let's Encrypt auto-renewal working
Check renewal logs
Test manual renewal if issues

Certificates expiring < 30 days:

Verify auto-renewal configured
Manually renew if needed
Test certificate after renewal
Update clients if manual action needed

Maintenance Window Procedures

Pre-Maintenance

48-72 hours before:

Schedule window
- Low-traffic time (e.g., 2-4 AM)
- Outside business hours
- Avoid weekends if possible (unless emergency)
Client notification

Subject: Scheduled Maintenance - [Date/Time]

Dear Valued Clients,

We will be performing scheduled maintenance on [Date] from [Start Time] to [End Time] GMT.

Expected impact:
- [Service] may experience brief interruptions
- [Estimated downtime: X minutes]

We apologize for any inconvenience and appreciate your understanding.

If you have any concerns, please contact support.

Best regards,
Alfie Web Solutions Team

Preparation
- Review maintenance steps
- Take full backups
- Prepare rollback plan
- Test in staging (if applicable)
- Have emergency contacts ready
- Clear schedule for monitoring

During Maintenance

Start maintenance
- Post status update (if status page exists)
- Begin work according to plan
- Document steps taken
Apply changes
- Follow procedures carefully
- Take notes of any issues
- Test each change before proceeding
Monitor
- Watch for errors
- Check service status
- Verify connectivity
Rollback if needed
- If major issues, rollback immediately
- Restore from pre-maintenance backup
- Reschedule maintenance

Post-Maintenance

Verification (immediate)
- All services running
- Test functionality
- Check for errors
- Verify client access
Monitoring (24 hours)
- Watch error logs
- Monitor performance
- Check for client reports
- Respond to any issues quickly
Communication

Subject: Maintenance Complete

Dear Clients,

Our scheduled maintenance has been completed successfully. All services are now operational.

Thank you for your patience.

If you experience any issues, please contact support.

Best regards,
Alfie Web Solutions Team

Documentation
- Update change log
- Document any issues encountered
- Note lessons learned
- Update procedures if needed

Emergency Procedures

Service Outage

If service goes down unexpectedly:

Immediate Response (Within 5 minutes)
- Confirm outage (not local issue)
- Check server status
- Review recent changes
- Start investigating cause
Communication (Within 15 minutes)
- Post status update
- Notify clients via email
- Set expectations on resolution time
Resolution
- Identify root cause
- Implement fix
- Test thoroughly
- Monitor closely
Post-Incident
- Post-mortem analysis
- Document incident
- Identify prevention measures
- Update procedures
- Client apology/compensation if warranted

Server Compromise

If security breach suspected:

Immediate Actions
- Isolate affected server (disconnect if necessary)
- Preserve logs and evidence
- Change all passwords
- Notify Tom/Alfie immediately
Investigation
- Identify breach method
- Assess scope of compromise
- Check for backdoors
- Review access logs
Remediation
- Remove malware/backdoors
- Patch vulnerabilities
- Restore from clean backup if needed
- Strengthen security
Client Notification
- Inform affected clients
- Provide guidance
- Offer assistance
- Document incident
Prevention
- Security audit
- Implement additional protections
- Update policies
- Team training

Performance Optimization

Web Server Optimization

If websites loading slowly:

Identify bottleneck
- CPU usage high?
- RAM exhausted?
- Disk I/O slow?
- Network latency?
Common optimizations
- Enable caching (OPcache for PHP)
- Optimize Apache/Nginx config
- Review .htaccess rules
- Optimize databases
- Implement CDN if needed
Client-side
- Image optimization
- Minify CSS/JS
- Enable compression
- Review plugins/themes (if WordPress)

Database Performance

If database slow:

# Check for slow queries
mysqladmin -u root -p processlist

# Enable slow query log
SET GLOBAL slow_query_log = 'ON';
SET GLOBAL long_query_time = 2;

# Review slow queries
pt-query-digest /var/log/mysql/slow.log

Optimization steps:

Add indexes to frequently queried columns
Optimize table structure
Increase MySQL buffer pool
Tune MySQL configuration

Email Performance

If email slow/delayed:

Check mail queue:

# In SmarterMail admin panel
# Navigate to: Manage → Spool → View Active Messages
# Or check via command line:
find /opt/SmarterMail/Spool -type f | wc -l

Common issues:
- Queue backlog (processing)
- Greylist delays (normal)
- External spam filtering
- Slow recipient servers
Actions:
- Flush queue if stuck
- Adjust rate limits if needed
- Check for blacklisting
- Monitor delivery times

Tools & Resources

Monitoring Tools

Uptime monitoring: [Tool name]
Resource monitoring: top, htop, netdata
Log analysis: grep, awk, logwatch
Network: ping, traceroute, mtr
SSL: openssl, SSL Labs

Useful Commands Reference

System Status:

# Load average
uptime

# Memory
free -h

# Disk
df -h

# Network
netstat -tuln
ss -tuln

# Processes
ps aux | head -20

Service Management:

# Status
systemctl status [service]

# Start/Stop/Restart
systemctl start [service]
systemctl stop [service]
systemctl restart [service]

# Enable/Disable auto-start
systemctl enable [service]
systemctl disable [service]

DirectAdmin:

# User list
cd /usr/local/directadmin/scripts
./listusers.sh

# Suspended users
./listsuspended.sh

# Disk usage
./quota.sh

Best Practices

Always backup first - Before any changes
Test in staging - When possible
Document everything - Changes, issues, solutions
Schedule wisely - Low-traffic times
Communicate proactively - Keep clients informed
Monitor closely - After changes
Have rollback plan - Know how to undo
Learn from incidents - Post-mortems
Keep updated - Security patches promptly
Stay organized - Follow procedures

Maintenance Checklist Template

Pre-Maintenance:

During Maintenance:

Post-Maintenance:

Keep this document updated with any procedure changes Last updated: [Date]

Overview​

Maintenance Schedule​

Daily Tasks​

Weekly Tasks (30-60 minutes)​

Monthly Tasks (2-3 hours)​

Quarterly Tasks (Half day)​

Annual Tasks (Full day)​

Routine Procedures​

Backup Verification​

Security Updates​

Log Review​

Disk Space Management​

Database Optimization​

Service Health Checks​

SSL Certificate Monitoring​

Maintenance Window Procedures​

Pre-Maintenance​

During Maintenance​

Post-Maintenance​

Emergency Procedures​

Service Outage​

Server Compromise​

Performance Optimization​

Web Server Optimization​

Database Performance​

Email Performance​

Tools & Resources​

Monitoring Tools​

Useful Commands Reference​

Best Practices​

Maintenance Checklist Template​