UESPWiki:Backup Plan

The UESPWiki – Your source for The Elder Scrolls since 1995
Jump to: navigation, search

Note: The old backup plan has been archived for reference purposes.

This current plan constantly changes as I find the best method for site backups. See also the Mirror Plan for information on the site mirror setup (basically a live backup server).

Backup Strategy[edit]

Database Backup[edit]

The overall database backup strategy involves four areas:

  1. Live database mirroring
  2. Backups of database master
  3. Backups of slaves
  4. Copying of backups to offsite servers

Database Mirroring[edit]

Currently the master database server (db1) is being mirrored by two slaves: content3 and backup1. The current database load is relatively light which means the typical lag between the slaves and masters is only a few seconds. The slaves should be periodically monitored (with the Special:UespSiteStats extension for example) to ensure no replication errors have occurred.

The relevant master database settings in MySQL include:

  • server-id = 1
  • All databases are being replicated in the bin-log (no binlog-do-db or binlog-ignore-db lines are set).

The relevant slave database settings on content3/backup1 include:

  • server-id = 201 (content3) / 2 (backup1)
  • report-host = content3 / backup1.uesp.net
  • replicate-wild-ignore-table = mysql.% (i.e., do not mirror the mysql database)

Database Master Backups[edit]

Backups on the master database server are composed of three things:

  1. Full snapshots (mysqldump)
  2. Incremental backups (mysqldump)
  3. Bin logs

As the database becomes larger it becomes impractical to perform full daily, or even weekly snapshots. Instead, a full snapshot is done infrequently with incremental backups done daily. The master bin-logs are also kept for a minimum of one month and can be used for restore purposes if needed (they are not backed up up offsite however).

Note that all database backups performed on the master currently interfere with the site's operation as the database tables are locked during the backup operation. This typically causes the site to become unavailable during the backup process. For this reason full snapshots of the database are performed only when required as it currently requires a 5-10 minute site downtime (which will grow as the database grows).

The current master data backup schedule and rotation is as follows:

  • Weekly full snapshots
  • Weekly rotation (5 weeks)
  • Daily incremental
  • Weekly rotation (5 weeks)
  • Minimum of 1 month of bin-logs kept (old logs deleted manually to free disk space as needed)

Database Slave Backups[edit]

Backups on the slave databases are simpler as they do not directly interfere with the operation of the site. The current slave backup schedule and rotation is:

  • Daily full snapshot
  • Daily rotation (7 days)
  • Weekly rotation (5 weeks)
  • Monthly rotation (6 months) (not on content3 for space reasons)

Offsite Backup Copies[edit]

The backups performed on the database master are copied offsite to backup1 and content3 daily with the following rotation:

  • Daily copy of master backups to content3/backup1 (both incremental and full snapshot as needed)
  • Daily rotation (7 days)
  • Weekly rotation (5 weeks)
  • Monthly rotation (6 months) (not on content3 for space reasons)

Misc[edit]

Backup Directory Structure[edit]

The basic backup directory structure is as follows:

  • /home/backup/ -- Parent directory for all backup files
  • alldbs/ -- Contains the local database snapshots and any daily rotations
  • daily/ -- Contains a copy of the last daily local backups
  • dailycopy/ -- Contains a copy of any copied backups from other servers
  • weekly/ -- Any weekly rotated backup files
  • weeklycopy/ - All weekly rotated backup files from other servers (files are usually rotated locally)
  • monthly/ - All monthly rotated backup files
  • monthlycopy/ -- All monthly rotated backup files from other servers (files are usually rotated locally)
  • uespcopy/ -- A copy of the UESP content directory (typically synced from content1)

Note that this is the general backup directory structure and specific servers may not contain all sub-directories and may other sub-directories not listed.

Fixing Database Replication Errors[edit]

Sometimes database slave replication will stop due to an error. Typically this is due to something causing a mismatch between the master and slave database contents (e.g., you manually deleted a table on the slave or ran a command on the master that wasn't replicated).

To show the slave error run the command:

  show slave status\G

on the database slave which should result in output like:

 *************************** 1. row ***************************
           Master_Host: 70.38.12.115
           Master_User: slave
           Master_Port: 3306
         Connect_retry: 60
       Master_Log_File: uesp-mysql-bin.034
   Read_Master_Log_Pos: 90283017
        Relay_Log_File: cl-t169-500cl-relay-bin.008
         Relay_Log_Pos: 16956105
 Relay_Master_Log_File: uesp-mysql-bin.030
      Slave_IO_Running: Yes
     Slave_SQL_Running: Yes
       Replicate_do_db: 
   Replicate_ignore_db: 
            Last_errno: 1146
            Last_error: Error 'Table 'uesp_net_blog.hitcounter' doesn't exist' on query. Default database: . Query: 'DELETE FROM 'uesp_net_blog'.'hitcounter';'
          Skip_counter: 0
   Exec_master_log_pos: 16952195
       Relay_log_space: 547078746

The relevant fields are Last_errno and Last_error which give a pretty simple explanation of what caused the replication to stop. The Read_Master_Log_Pos and Relay_Master_Log_File show the master bin-log position that the error occurred at and will give you an idea of how long ago the error occurred.

There are three ways to fix slave replication errors:

  1. Fix the original cause of the error
  2. Skip the replication command(s) causing the error
  3. Completely restore the slave from a master snapshot

Fix the Original Error Cause[edit]

This is the preferred method of correcting replication errors when possible as it has the lowest chance of causing a slave-master data mismatch (which will lead to more replication errors eventually). The idea is to find the original cause of the replication error and fix it before restarting the database slave. For example, if the error message was due to a missing table (like the above example) the solution would involve trying to figure out why that table doesn't exist on the slave in the first place. If it turns out to be to due previous replication errors or settings a full slave restore will probably have to be done.

Skip Replication Commands[edit]

Replication commands on the slave can be skipped if the offending commands can be confirmed to be not needed. Be careful not to skip commands that contain important data as this will cause future replication errors and result in the slave database content not matching the master. Replication commands can be skipped on the slave by using the MySQL commands:

  stop slave;
  set GLOBAL SQL_SLAVE_SKIP_COUNTER=1;
  start slave;
  show slave status\G

It is recommended to skip replication commands one at a time to ensure you don't accidentally skip a command containing real data.

Restore Slave from Master[edit]

The ultimate solution to slave replication errors is to restore a slave from the most recent complete snapshot from the master database.


Rotation Notes[edit]

Backup rotations are given in the form:

  • Daily rotation (7 days)
  • Weekly rotation (5 weeks)
  • Monthly rotation (6 months)

This means that there is a copy of the backup each day for the past 7 days, another copy kept each week for the past 5 weeks, and another copy kept each month for the past 6 months. In this case there will be 18 copies of the backup (7+5+6) kept spanning the past 6 months. Longer or shorter rotation periods can be used depending on the backup size and amount of available space. For example, backing up the complete wiki with the above scheme would take close to 200GB of space (~10GB/backup * 18 backups = 180GB).

Backups are kept for this length of time in order to prevent the worst case situtation where database data has been corrupted and is not noticed for several weeks or months or, similarily, if an error occurs during the database backup which corrupts the backups themselves making them partially or completely unusable.


Accessing Backup1[edit]

The backup1 server is on a ADSL line and as such has a dynamic IP address. UESP's current domain registrar (Register.com) does not provide any dynamic DNS services so to circumvent this scripts have been setup to update content1 hourly with backup1's current IP. The file /home/sshbackup/backup1-ip on content1 is updated hourly (from a script on backup1) and simply contains the IP address of backup1. Another hourly script on content1 updates the local hosts file entries for backup1 and backup1.uesp.net.

Backup ToDo List[edit]

  • Automatic monitoring of database slaves with e-mailing of any errors.
  • Update backup1 IP across all servers.
  • Unified keys across all servers.

Old Overall Plan[edit]

Currently the site's backup plan is:

  • The Wiki database is backed up daily with a cron job.
  • The forum databsae is similarily backup daily via cron.
  • The Wiki images path is backed up solely with a logrotate script over 7 days.
  • The database backups are similarily rotated with logrotate to keep the last 7 days.
  • Backups are downloaded manually form the site as time permits.
  • Weekly copies of all backup are kept on the server for some time.

Old Detailed Plan[edit]

Database[edit]

Databases Being Backed Up[edit]

Currently any database that needs to be backed up has to be setup manually in the various backup scripts. Currently the databases being backuped up include:

  • UESP Wiki
  • UESP Forums
  • UESP Map Data (Oblivion, SI, and Morrowind)
  • EQ Wiki
  • DHack Wiki
  • DHack Forums
  • Old UESP page counter

Schedule[edit]

  • DBs are live mirrored on backup1
  • Daily database dumps on db1 (soon to be weekly?)
  • Daily database dumps on backup1 on a (7 day/5 week/12 month rotations)
  • Weekly copies of database dumps from db1 to backup1 (5 week/12 month rotations)


Wiki Images[edit]

There are three wiki image/upload directories being backed up (UESP, EQ, and DHack).

Schedule[edit]

  • Hourly rsyncs on backup1
  • Daily full backups on content1 (7 day rotation)
  • Daily full backups on backup1 (7 day rotation)

Other Data[edit]

  • The DHack DATA directory is rsync'd hourly on backup1 and backed up daily on content1 and backup1.
  • The UESP web directory is rsync'd daily on backup1 and backuped up weekly on content1 and backup1.


Old Scripts[edit]

Database Backup Script[edit]

Originally, the databases were attempted to backed up solely via a logrotate script. Unfortunately, the backup files were never created and no error messages were ever logged. Because of this all database backup scripts were moved to a cron.daily script and rotated separately via logrotate.

  #!/bin/sh
  /virtual/mysql/bin/mysqldump --extra-file=backup.cred --opt database_name > dbbackup.sql
  gzip dbbackup.sql -f
  cp -f dbbackup.sql.gz daily/dbbackup.sql.gz

This is used for all databases needing to be backed up. Simply changed the database name and the destination filename. gzip is used in preference of bzip2 due to the latter taking a very long time to compress the 1GB raw backup file. The backup and compression is done in two separate steps in order to minimize the time that the database is locked.

The only issue with using a cron and a logrotate for backups is that, depending on which is executed first, the current version of the rotated backup will be 0 sized, eg:

    wiki.sql.gz    0 bytes   21 Jan 2007
    wiki.sql.gz.1  200 Mb    21 Jan 2007
    wiki.sql.gz.2  200 MB    20 Jan 2007

Thus, the '1' version of the rotated log file should be referenced as the latest file.


Wiki Image Backup[edit]

The backup script for the Wiki images path seems to work well entirely as a logrotate script.

 wikiimages.tar.gz {
     daily
     rotate 7
     missing ok
     postrotate
           tar -cf /path_to_wiki_images > imagebackup.tar
           gzip -f imagebackup.tar
           cp -f imagebackup.tar.gz daily/imagebackup.tar.gz
     end
 }


Logrotate Script[edit]

The database backups created via cron.daily must be rotated separately:

  backupfile.sql.gz {
       daily
       rotate 7
       missing ok
       postrotate
       end
  }


Weekly Logrotation[edit]

In addition to the daily logrotation a weekly rotation will be setup to keep several older backups. This may be useful in case the database or file system is corrupted in some way that is not noticed immediately. By keeping older backups we can more easily trace the origin of such a problem. While this type of thing should rarely, if ever, occur, disk space is cheap and it is better to err on the safe side.

   /weekly/somefile {
        weekly
        rotate 10
        missing ok
        postrotate
              cp -f daily/somefile weekly/somefile
        end
   }

This script can be used for all backups occurring on the server.


Backup Sizes and Times[edit]

While the size needed by the backups wi1ll slowly increase over time, the following are the rough backup sizes at various dates. Note that the given times are typically measured during low-load times and may vary significantally.

Size (Compressed)
Backup July 2007 Aug 2009
Wiki DB 800Mb (250Mb)
30sec (2min)
4.75Gb (1.35Gb)
5min (10min)
Wiki Images 400Mb (300Mb)
-
6Gb (5.7Gb)
35min (26min)
UESP Forum DB - 130Mb (35Mb)
10sec (15sec)
All DBs - 4.9Gb (1.40Gb)
5min (10min)
All DBs - No Wiki Text - 540Mb (108Mb)
25sec (30sec)
UESP New Wiki Text - 144Mb (40Mb)
10sec (11sec)
UESP Content Directory - 1.68Gb (1.05Gb)
15min (7min)

Database Binlog Backups[edit]

An alternative to backing up with mysqldump is via the MySQL "binlogs". As the database size increases the time required to backup using mysqldump likewise begins to increase. For example, at the current size of 4GB the uesp_net_wiki5 database takes around 10minutes to backup. This represents essentially a complete downtime for the site (technically it should only prevent database writes but it slows read-only queries so much the site is basically unusable).

MySQL binlogs are recorded automatically for the purpose of database replication but can also be used as a backup strategy. Rather than run mysqldump daily we can switch to weekly or even monthly dumps and make sure to keep all the binlogs after the dump. To restore the database to a certain point the last dump is loaded followed by all the backup binlogs.

Binlog Backup Plan[edit]

  • Switch to weekly/monthly mysqldumps (at least for the larger databases like the forums and wiki)
  • Flush logs at each dump. This causes a new binlog file to be created after the backup making it easier to find the log starting point.
  • Make sure binlogs are backed up both on and off site.
  • ** Details Todo **

Restoring From Binlogs[edit]

** Todo **

Things ToDo[edit]

  • Use a daily iterative backup for the Wiki images instead of full (most of it does not change from day to day).
  • Look into gzip options to increase its compression speed (--quick?).
  • Look into better options for the mysqldump command (currently just using a default setting).
  • Once site mirror is online move daily backups to the mirror only and perhaps only full weekly backups on the primary server.
  • Automatic downloads of full weekly backups once in place.