Database Backup Vanishing Act
The Crime
During daylight saving time transitions, automated database backup systems scheduled during the "spring forward" hour simply vanish into the void. The backup window from 2:00-3:00 AM doesn't exist, causing backup scripts to silently fail, leaving critical databases unprotected for days or weeks until someone notices the missing backups.
Real-World Impact
AWS RDS Backup Failures (March 2018)
Incident: Multiple AWS customers reported missing RDS automated backups after DST transition
Affected Regions: US-East-1, US-West-2, EU-West-1
Duration: Backups missed for 1-7 days before detection
Root Cause: Backup windows scheduled during non-existent 2:00-3:00 AM hour
Impact: Estimated 15,000+ databases affected, compliance violations, audit failures
Fortune 500 Financial Services (2019)
Incident: Critical trading database backups failed during DST transition
Discovery: Found 3 weeks later during routine audit
Business Impact: $2.3M in regulatory fines, failed SOX compliance
Recovery Cost: $500K in emergency data recovery and system redesign
Lesson: Silent failures are the most dangerous failures
Regional Healthcare Network (2020)
Incident: Patient records backup system failed during spring DST
Discovery: Detected during actual data loss event 2 months later
Impact: HIPAA violation, patient data at risk, emergency recovery procedures
Cost: $1.8M in recovery, legal fees, and system overhaul
Quote: "We thought our backups were working. The monitoring showed 'success' every day."
Technical Analysis
Root Cause Breakdown
1. The Vanishing Hour
During spring DST transition, clocks jump from 1:59:59 AM directly to 3:00:00 AM. The entire 2:00-3:00 AM hour simply doesn't exist.
2. Scheduler Confusion
Cron jobs and task schedulers don't know how to handle non-existent times. Different systems handle this differently:
- Some skip the job entirely (silent failure)
- Some run it at 3:00 AM (delayed execution)
- Some throw errors (visible failure, but still no backup)
- Some run it immediately at 1:59:59 AM (early execution)
3. Monitoring Blind Spots
Most monitoring systems don't detect "jobs that should have run but didn't" - they only track jobs that actually executed.
Typical Incident Timeline
Code Examples
❌ Problematic: Naive Cron Scheduling
# Cron job that will fail during DST spring forward
30 2 * * * /usr/local/bin/backup-database.sh
# This backup will never run on DST transition day
# The 2:30 AM time slot simply doesn't exist
❌ Problematic: Naive Python Scheduler
import schedule
import time
from datetime import datetime
def backup_database():
print(f"Running backup at {datetime.now()}")
# Backup logic here...
# This will silently fail during DST transitions
schedule.every().day.at("02:30").do(backup_database)
while True:
schedule.run_pending()
time.sleep(60)
✅ Safe: UTC-Based Scheduling
import schedule
import time
from datetime import datetime, timezone
import pytz
def backup_database():
utc_now = datetime.now(timezone.utc)
print(f"Running backup at {utc_now} UTC")
# Backup logic here...
def schedule_utc_backup():
"""Schedule backup at 07:30 UTC (roughly 2:30 AM EST/EDT)"""
utc_time = "07:30"
schedule.every().day.at(utc_time).do(backup_database)
# Alternative: Use multiple safe time slots
schedule.every().day.at("06:30").do(backup_database) # 1:30 AM EST
schedule.every().day.at("08:30").do(backup_database) # 3:30 AM EST
schedule_utc_backup()
while True:
schedule.run_pending()
time.sleep(60)
✅ Safe: AWS RDS Backup Configuration
# AWS CLI - Set backup window to safe UTC time
aws rds modify-db-instance \
--db-instance-identifier mydb \
--preferred-backup-window "09:00-10:00" \
--backup-retention-period 7
# Terraform configuration
resource "aws_db_instance" "main" {
# ... other configuration ...
# Safe backup window in UTC (avoids DST issues)
backup_window = "09:00-10:00" # 4-5 AM EST, 5-6 AM EDT
backup_retention_period = 7
# Enable backup monitoring
monitoring_interval = 60
monitoring_role_arn = aws_iam_role.rds_monitoring.arn
}
✅ Safe: Backup Monitoring Script
#!/bin/bash
# backup-monitor.sh - Verify backups actually happened
BACKUP_DIR="/var/backups/database"
EXPECTED_BACKUP_AGE_HOURS=25 # Allow for DST variations
# Check if backup file exists and is recent
check_backup_freshness() {
local latest_backup=$(ls -t $BACKUP_DIR/*.sql.gz 2>/dev/null | head -1)
if [ -z "$latest_backup" ]; then
echo "CRITICAL: No backup files found!"
exit 2
fi
local backup_age=$(( ($(date +%s) - $(stat -c %Y "$latest_backup")) / 3600 ))
if [ $backup_age -gt $EXPECTED_BACKUP_AGE_HOURS ]; then
echo "CRITICAL: Latest backup is $backup_age hours old!"
echo "File: $latest_backup"
exit 2
fi
echo "OK: Latest backup is $backup_age hours old"
echo "File: $latest_backup"
}
check_backup_freshness
# Run this script every hour via cron
# 0 * * * * /usr/local/bin/backup-monitor.sh
Prevention Strategies
1. Use UTC for Scheduling
- • Schedule all automated tasks in UTC time
- • Avoid local time zones for critical operations
- • Convert to local time only for display purposes
- • Document all times in UTC in runbooks
2. Avoid DST Danger Zones
- • Never schedule between 1:00-4:00 AM local time
- • Use multiple backup windows if needed
- • Schedule during stable hours (10 AM - 6 PM UTC)
- • Test scheduling during DST transitions
3. Implement Robust Monitoring
- • Monitor backup file timestamps, not just job status
- • Alert on missing backups within SLA window
- • Verify backup integrity, not just existence
- • Use external monitoring systems
4. Test DST Scenarios
- • Simulate DST transitions in test environments
- • Include DST testing in deployment checklists
- • Document expected behavior during transitions
- • Run disaster recovery drills during DST weeks
Lessons Learned
1. Silent Failures Are the Worst Failures
Systems that fail silently during DST transitions are more dangerous than systems that crash loudly. At least crashes get noticed.
2. Monitor Outcomes, Not Just Processes
Don't just monitor whether the backup job ran - monitor whether the backup actually exists and is valid.
3. UTC Is Your Friend
When in doubt, use UTC. It never changes, never has DST, and never lies to you about what time it is.
4. Test the Edge Cases
DST transitions happen twice a year, every year. They're not edge cases - they're regular, predictable events that should be tested.
Related Crimes
📢 Share Your Backup Horror Story
Have you lost data due to DST backup failures? Share your story to help others avoid the same fate. The best submissions get featured in our hall of shame.