💾

Database Backup Vanishing Act

High Severity DST Spring ForwardMarch 2018

DSTBackup FailureAWS RDSData LossSilent Failure

💀

The Crime

During daylight saving time transitions, automated database backup systems scheduled during the "spring forward" hour simply vanish into the void. The backup window from 2:00-3:00 AM doesn't exist, causing backup scripts to silently fail, leaving critical databases unprotected for days or weeks until someone notices the missing backups.

🌍

Real-World Impact

AWS RDS Backup Failures (March 2018)

Incident: Multiple AWS customers reported missing RDS automated backups after DST transition

Affected Regions: US-East-1, US-West-2, EU-West-1

Duration: Backups missed for 1-7 days before detection

Root Cause: Backup windows scheduled during non-existent 2:00-3:00 AM hour

Impact: Estimated 15,000+ databases affected, compliance violations, audit failures

Fortune 500 Financial Services (2019)

Incident: Critical trading database backups failed during DST transition

Discovery: Found 3 weeks later during routine audit

Business Impact: $2.3M in regulatory fines, failed SOX compliance

Recovery Cost: $500K in emergency data recovery and system redesign

Lesson: Silent failures are the most dangerous failures

Regional Healthcare Network (2020)

Incident: Patient records backup system failed during spring DST

Discovery: Detected during actual data loss event 2 months later

Impact: HIPAA violation, patient data at risk, emergency recovery procedures

Cost: $1.8M in recovery, legal fees, and system overhaul

Quote: "We thought our backups were working. The monitoring showed 'success' every day."

🔍

Technical Analysis

Root Cause Breakdown

1. The Vanishing Hour

During spring DST transition, clocks jump from 1:59:59 AM directly to 3:00:00 AM. The entire 2:00-3:00 AM hour simply doesn't exist.

2. Scheduler Confusion

Cron jobs and task schedulers don't know how to handle non-existent times. Different systems handle this differently:

Some skip the job entirely (silent failure)
Some run it at 3:00 AM (delayed execution)
Some throw errors (visible failure, but still no backup)
Some run it immediately at 1:59:59 AM (early execution)

3. Monitoring Blind Spots

Most monitoring systems don't detect "jobs that should have run but didn't" - they only track jobs that actually executed.

Typical Incident Timeline

01:59:59Backup job scheduled to run at 02:30:00

02:00:00⚠️ Time jumps to 03:00:00 - backup window vanishes

03:00:00System continues normally, no backup executed

03:30:00✅ Monitoring shows "no errors" - everything looks fine

Days later🚨 Audit discovers missing backup files

💻

Code Examples

❌ Problematic: Naive Cron Scheduling

# Cron job that will fail during DST spring forward
30 2 * * * /usr/local/bin/backup-database.sh

# This backup will never run on DST transition day
# The 2:30 AM time slot simply doesn't exist

❌ Problematic: Naive Python Scheduler

import schedule
import time
from datetime import datetime

def backup_database():
    print(f"Running backup at {datetime.now()}")
    # Backup logic here...

# This will silently fail during DST transitions
schedule.every().day.at("02:30").do(backup_database)

while True:
    schedule.run_pending()
    time.sleep(60)

✅ Safe: UTC-Based Scheduling

import schedule
import time
from datetime import datetime, timezone
import pytz

def backup_database():
    utc_now = datetime.now(timezone.utc)
    print(f"Running backup at {utc_now} UTC")
    # Backup logic here...

def schedule_utc_backup():
    """Schedule backup at 07:30 UTC (roughly 2:30 AM EST/EDT)"""
    utc_time = "07:30"
    schedule.every().day.at(utc_time).do(backup_database)
    
    # Alternative: Use multiple safe time slots
    schedule.every().day.at("06:30").do(backup_database)  # 1:30 AM EST
    schedule.every().day.at("08:30").do(backup_database)  # 3:30 AM EST

schedule_utc_backup()

while True:
    schedule.run_pending()
    time.sleep(60)

✅ Safe: AWS RDS Backup Configuration

# AWS CLI - Set backup window to safe UTC time
aws rds modify-db-instance \
    --db-instance-identifier mydb \
    --preferred-backup-window "09:00-10:00" \
    --backup-retention-period 7

# Terraform configuration
resource "aws_db_instance" "main" {
  # ... other configuration ...
  
  # Safe backup window in UTC (avoids DST issues)
  backup_window = "09:00-10:00"  # 4-5 AM EST, 5-6 AM EDT
  backup_retention_period = 7
  
  # Enable backup monitoring
  monitoring_interval = 60
  monitoring_role_arn = aws_iam_role.rds_monitoring.arn
}

✅ Safe: Backup Monitoring Script

#!/bin/bash
# backup-monitor.sh - Verify backups actually happened

BACKUP_DIR="/var/backups/database"
EXPECTED_BACKUP_AGE_HOURS=25  # Allow for DST variations

# Check if backup file exists and is recent
check_backup_freshness() {
    local latest_backup=$(ls -t $BACKUP_DIR/*.sql.gz 2>/dev/null | head -1)
    
    if [ -z "$latest_backup" ]; then
        echo "CRITICAL: No backup files found!"
        exit 2
    fi
    
    local backup_age=$(( ($(date +%s) - $(stat -c %Y "$latest_backup")) / 3600 ))
    
    if [ $backup_age -gt $EXPECTED_BACKUP_AGE_HOURS ]; then
        echo "CRITICAL: Latest backup is $backup_age hours old!"
        echo "File: $latest_backup"
        exit 2
    fi
    
    echo "OK: Latest backup is $backup_age hours old"
    echo "File: $latest_backup"
}

check_backup_freshness

# Run this script every hour via cron
# 0 * * * * /usr/local/bin/backup-monitor.sh

🛡️

Prevention Strategies

1. Use UTC for Scheduling

• Schedule all automated tasks in UTC time
• Avoid local time zones for critical operations
• Convert to local time only for display purposes
• Document all times in UTC in runbooks

2. Avoid DST Danger Zones

• Never schedule between 1:00-4:00 AM local time
• Use multiple backup windows if needed
• Schedule during stable hours (10 AM - 6 PM UTC)
• Test scheduling during DST transitions

3. Implement Robust Monitoring

• Monitor backup file timestamps, not just job status
• Alert on missing backups within SLA window
• Verify backup integrity, not just existence
• Use external monitoring systems

4. Test DST Scenarios

• Simulate DST transitions in test environments
• Include DST testing in deployment checklists
• Document expected behavior during transitions
• Run disaster recovery drills during DST weeks

🎓

Lessons Learned

1. Silent Failures Are the Worst Failures

Systems that fail silently during DST transitions are more dangerous than systems that crash loudly. At least crashes get noticed.

2. Monitor Outcomes, Not Just Processes

Don't just monitor whether the backup job ran - monitor whether the backup actually exists and is valid.

3. UTC Is Your Friend

When in doubt, use UTC. It never changes, never has DST, and never lies to you about what time it is.

4. Test the Edge Cases

DST transitions happen twice a year, every year. They're not edge cases - they're regular, predictable events that should be tested.

🔗

Related Crimes

⏰

The 2:30 AM Cron Job Massacre

The original DST cron failure story

⚠️

The Ambiguous Hour

When 1:30 AM happens twice

📢 Share Your Backup Horror Story

Have you lost data due to DST backup failures? Share your story to help others avoid the same fate. The best submissions get featured in our hall of shame.