CS Disaster Recovery Plan This document is a DocBook article and should only be edited as such at ~sysadmin/docs/RecoveryBible/backups.xml $Revision $ $Date: 2010/01/25 20:34:49 $ $Author: leblancd $ 1.4 2010/01/25 12:30:24 leblancd updated for new NetApp filer, tape library, TiNa version 1.3 2008/12/24 23:41:17 leblancd resized screenshot images 1.2 2008/12/24 03:37:28 leblancd Replaced SAFE.cs with FILER.cs 1.1 2006/04/07 23:41:28 leblancd marked up in DocBook XML (from DocBook SGML <- LyX/LaTeX) 1.0 April 7, 2006 LeBlanc initial
Introduction The CS support model of servers and workstations is that no permanent/semi-permanent data that is to be kept should be stored locally. All permanent data is stored centrally on the CS file server, filer.cs.caltech.edu ( filer ). This enables any computer to be installed/rebuilt/upgraded quickly since there is no persistent data to be migrated, meanwhile making disaster recovery easy since there is only one device to backup. It is true that while backups are important, it is restoration that is most important since that is the entire point of backing up data in the first place. This document will cover both backup and restoration of user data, in addition to procedures for total disaster.
Software Software used is Time Navigator v4.2 (TiNa), made by Atempo, installed locally on safe.cs.caltech.edu . There is a 140GB disk dedicated to storage of TiNa's catalog of tapes/data.
Files specific to <systemitem class="server">filer</systemitem> : /opt/tina4/* /etc/init.d/tina.tina4 /etc/services : # Time Navigator (TiNa) tina 2527/tcp tina-msg 2528/udp
Files specific to <systemitem class="server">safe</systemitem> : Everything is located in /opt/tina4 , including the "catalog" (see also ).
Hardware Other than safe (the host running the TiNa software), there is the SpectraLogic T50e tape library that is directly attached to filer via fiber channel interface. The T50e also has a web interface, protected with administrative password, located at http://spectrat50e.cs.caltech.edu , which will allow remote tape changes and other utilities.
Operational Overview As the "orchestrator" of the backup and restoration process, safe sends commands to the TiNa client ( filer ) pertaining to the data and tape manipulation. safe does not do any actual data backup/restoration. During a backup process, safe sends commands over the network to the TiNa daemon process on filer to: load a tape with the appropriate label, begin backup of a specific "strategy" to the tape, send output of the process (status messages) back to safe , and output a list of files that were backed up such that safe can process the metadata. So, safe becomes aware of the data that was backed up, which tapes store files and which tapes are loaded in the tape library. This metadata involved becomes the "TiNa catalog" (see also ).
Classes TiNa employs the concept of "classes" in a backup strategy. This allows sectioning of the data to be backed up such that backup does not become an all-or-nothing process. Classes are declared on safe in /opt/tina4/catalog0/vol_description : DIR vol ( DIR infosys("/vol/infosys") DIR install("/vol/install") DIR software("/vol/software") DIR staff("/vol/staff") DIR students("/vol/students") DIR courses("/vol/courses") DIR complexity( DIR users("/vol/complexity/users") ) DIR gg( DIR users("/vol/gg/users") ) DIR geometry( DIR users("/vol/geometry/users") ) DIR infospheres( DIR users("/vol/infospheres/users") ) DIR iqi( DIR users("/vol/iqi/users") ) DIR mls( DIR users("/vol/mls/users") ) DIR multires( DIR users("/vol/multires/users") ) DIR networks( DIR users("/vol/networks/users") ) DIR perflab( DIR users("/vol/perflab/users") ) DIR theory( DIR users("/vol/theory/users") ) ) Each entry, or DIR , corresponds to a directory on the system to be backed up. Using this scheme, we can choose to back up ic_users , or instruction , or even back up the entire vol tree.
Backup Strategies For our purposes, we have divided all the data that needs to be backed up into 3 backup strategies; instruction , staff/infrastructure , and research that are named "A", "B", and "D" respectively. Although TiNa allows us to backup more than one class in a strategy, our strategies "A", "B", and "D" only backup single classes, to keep things as simple as possible; i.e., when backup strategy "A" launches, only files in /vol/courses and /vol/students are backed up.
Tape Cartridge Pools Cartridge pools in TiNa allow administrators to easily distinguish tapes from one another, and also specify which tapes get written in which backup strategies. Keeping with our simple implementation, we have pools named "INS-L0", "INS-L1", "RES-L0", "RES-L1", "STINF-L0", and "STINF-L1". For each tape in a cartridge pool, TiNa writes a label at the beginning of the tape that declares which pool it belongs to. This label is in the form of -; i.e., RES-L0-00001 for the first tape in the RES-L0 cartridge pool. You may already see a pattern emerging with classes, strategies and cartridge pools. Of course there's a pattern! We like to keep things as simple as possible! Here is a basic matrix of what gets backed up and where it goes. Strategy Backup Level Class Directory Cartridge Pool A 0 instruction filer:/vol/courses & students INS-L0 A 1 (Incr) instruction filer:/vol/courses & students INS-L1 B 0 staff & infrastructure filer:/vol/staff, infosys, install, software STINF-L0 B 1 (Incr) staff & infrastructure filer:/vol/staff, infosys, install, software STINF-L1 D 0 research filer:/vol/complexity, geometry, gg, iqi, mls, multires, networks, perflab, theory RES-L0 D 1 (Incr) research filer:/vol/complexity, geometry, gg, iqi, mls, multires, networks, perflab, theory RES-L1
Catalog It is important to keep a current catalog for TiNa's operation because without the organized metadata from previous backups the system cannot locate files needed for restoration in the event of a disaster. Hence, TiNa also backs up the catalog to tape. A catalog on tape also comes in handy when performing a total system restore in response to a catastrophic disaster in which all data is destroyed, including the TiNa software. Humans with magnetic fingers: System administrators won't necessarily know which files are stored on which tapes, unless they can read the labels on the first 64KB of every backup tape, hence the need for the TiNa backup catalog.
Schedule The individual strategies are scheduled to launch as follows: A : Full backup => monthly, first Friday of the month @ 19:00 A : Incremental => weekly, every Friday @ 19:00 B : Full backup => monthly, first Saturday of the month @ 01:00 B : Incremental => weekly, every Saturday @ 01:00 D : Full backup => monthly, first Sunday of the month @ 01:00 D : Incremental => weekly, every Sunday @ 01:00 A word about schedules: Sometimes it is necessary to reschedule backups, since specific dates are not possible and the aim is to perform full backups at the beginning of the month.
Data/Tape Retention Tapes are retained according to the level of backup accomplished, and when the backup was performed. Full (level 0) backups are retained for 3 months, with an annual kept for as long as possible. Incremental (level 1) backups are retained for a minimum of 3 weeks, or until the next full (level 0) backup is accomplished, or possibly longer depending on the number of current availability of blank tapes. This backup scheme provides: files on tape for the last 3 weeks on a weekly basis files on tape for the last 3 months on a monthly basis files on tape for the last few years on an annual basis When all goes as planned: A typical retention window will contain 3 weekly (level 1) sets, 3 monthly (level 0) sets, and 1 or 2 annual (level 0) sets of backup tapes.
Operation To describe the process of backup and restoration, the following sections will cover backup according to the above configuration and employ scenarios to describe the restoration process.
Backup Operation (central file server data) Log into safe via ssh: ssh root@safe Source the TiNa Environment: source /opt/tina4/tina4/.tina.sh Launch the TiNa Administrative Interface: tina_adm & A graphical (remote X) login prompt will appear. Login using the sysadmin's credentials. Administrative login for tina_adm A remote X session with Time Navigator tools will appear on the desktop. TiNa Administrative Interface In the upper left corner is terminus... , with A , B , and D under the icon. Right-clicking on any of these strategies enables the operator to perform a full backup, incremental backup or edit the strategy itself. Viewing the schedule is possible by clicking Monitoring -> Task Viewer... , and adjusting the options to view the backups/timeframe as needed. TiNa Schedule Interface Viewing the event log is possible by clicking Monitoring -> Event Viewer... , or Job Manager... for backup operations that are currently processing. TiNa Event Log Interface Tape library operations are performed in the Library Manager window, accessible by right-clicking the library icon and choosing Operations . TiNa Library Manager
Backup Operation (catalog data) The backup operation is essentially the same as outlined above in , with the exception that the actual data backed up is different. In a catalog backup, metadata is written to tape rather than the actual data previously backed up. The catalog is backed up on a daily basis at 18:00. There is a "boot catalog" also written to /opt/tina4/tina4/Data.catalog0/Boot , with 7 historical copies. The "boot catalog" is used in the case that the system itself needs to be recovered and a catalog recreated.( see )
Recovery Operation
Scenario: a file (or set of files) accidentally deleted In the TiNa Administration screen, select the safe.cs.caltech.edu... icon by right-clicking it, and choosing Restore & Archive Manager... to display the restoration login prompt. TiNa Admin Login Login, using ' root ' as a username (and the filesystem-level password for that user). Using the Date Control on the left and the file browser, you can restore a specific version of a file, or simply restore the last backed up version of the file or directory. TiNa Restoration Interface If you inadvertently attempt to restore a file that already exists, TiNa will warn you that you should use the Depth of Field on files that have been deleted. TiNa Depth of Field Warning After selecting the appropriate file/directory and version thereof, right-click and choose Restore . A separate window will appear, prompting you for the options on restoration of the file/directory. You can choose to restore the file in its original location or to a different location, with or without metadata, permissions, etc. TiNa Restoration Options Dialog TiNa will display a progress of the restoration process. TiNa Restoration Progress TiNa will display an "operation complete" status when restoration is complete. TiNa Restoration Complete Dialog
Scenario: no user files exist on the filer server Follow the same steps above in , but instead of selecting individual files (in step 3) you should select the directories/volumes in which to restore. This may take considerable time, depending on how much data needs to be restored and how many changes (incremental) there have been since the last full level 0 backup.
Scenario: no catalog exists (TiNa is not aware of any backup data of user files) A manual restoration of TiNa's catalog needs to be performed. Perform the same procedure as , but in step 1 right-click on catalog0.cat and choose Restore & Archive Manager... .
Scenario: no catalog exists and catalog backup does not exist A manual Boot restoration of TiNa's "boot catalog" needs to be performed. After SSH'ing into safe as root , and sourcing .tina.sh , on the command-line enter: tina_init -boot /opt/tina4/tina4/Data.catalog0/Boot/bootxxxx.cod where xxxx represents the last boot-catalog file written.
Scenario: no data exists on file server AND backup server In the unlikely event that all data is destroyed the backup server ( safe ) which runs the TiNa software needs to also be restored. Re-install the operating system on safe . Re-install the TiNa software on safe . Proceed to and restore the catalog. Proceed to and restore all files.