Backups at SPU
- What is Backed Up?
Backups fall into three categories: virtual machine hosts (hereafter VMs); Distributed File Systems (DFS); and Banner 10th Day.
- Examples of VMs include Banner, Talisma, PacketFence and AD Domain Controllers. Some of these VMs are deemed as "critical," meaning that they comprise core institutional data systems or integral infrastructure systems; other VMs are not critical but convenient for some level of business operations. Critical and non-critical VMs are backed up differently as detailed below.
- Examples of DFS resources include departmental file shares (aka Matthew) and documents stored in faculty and staff "My Documents" folders.
- Banner is backed up on the same schedule as other systems (see VMs above), however additional snapshots of the Banner database are taken on 10th Day of Fall, Winter, and Spring terms. These backups are retained for 7 years to preserve academic and financial records and history.
- It is important to note here that certain elements of the institutional record are not stored on campus, but rather are hosted off-site in cloud-based service facilities. Examples of these include O365 (email, SharePoint, etc) and Canvas (on line learning system).
- WHERE are data backed up?
Our backup strategy first calls for geographically separate locations on redundant hardware platforms housed on-campus. These local backups ensure fast and economical data recoverability in the event of limited casualties that affect either the building/room or hardware platform on which the primary data source resides. Local backups (for both VMs and DFS) are maintained on premises in a separate building from their production location (typically housed in the Marston Hall, Central Server Room). At this time, that separate location is in the Demaray Hall MDF. In this manner, if a problem occurs on the primary data host, recovery of affected systems may be done most economically and expediently in-house.
In addition to local backups, critical data are also backed up to "Cloud" or Internet-based storage facilities, locations which are far away from the SPU geographic campus. Cloud facilities feature high levels of redundancy, scalability and up-time - oftentimes well beyond that which we can provide locally. Typically, data "pushed to the cloud" are available for worst-case recovery scenarios - those instances when all local data (both primary and secondary backups) are destroyed and/or unrecoverable. Retrieval of said data is considerably more expensive and involves longer time periods for host/data restoration; this cloud storage is therefore seen as "last resort recovery."
- HOW are data backed up?
There are four different methods currently deployed in our data backup process: Veeam Enterprise Manager (VEM); Automated Script; and CloudBerry.
1. Veeam "Enterprise Manager" provides a robust, managed process to schedule and coordinate backup jobs both locally (Demaray) and to cloud service locations. Backups via Veeam boast nightly, automated processes with advanced reporting to alert system administrators when errors occur. Veeam encrypts target data (both at rest and in motion) to ensure confidentiality and integrity. Two Veeam servers are presently involved in this process:
- Veeam03 (VM) is used to transfer VMs (and their application data) across the three areas of storage: the primary host, the backup host in Demaray Hall (Storinator), and cloud storage - currently OffsiteDataSync. The "Veeam Cloud Connect" service facilitates the transfer of data to the cloud via block level storage transfer. Of note: only critical VMs are involved in this transfer. Non-critical VMs are backed up to Demaray but are not stored off-site.
- Veeam04 (VM) is responsible for replicating DFS data between primary (Marston) and secondary (Demaray) data stores. No cloud replication of DFS data is done by VEM.
2. Automated Scripts: A second method of backup involves the copying of DFS shares from their hosted location in the CIS server room to dedicated hardware in Demaray Hall via automated scripts that are written and maintained by CIS staff
3. CloudBerry: A final backup method is called "CloudBerry." CloudBerry is used to transfer DFS files from the Demaray backup site to cloud storage (currently hosted by BackBlaze). BackBlaze is simple file-level storage - a desirable method to contain costs associated with retrieval/recovery from the cloud. Unlike Veeam Cloud Connect, BackBlaze allows recovery of discreet files, whereas Cloud Connect is block storage and thus more costly (at present) to recover.
4. Banner 10th Day snapshots are backed up via a manual process and then transmitted to the cloud for long-term off-site preservation. As the title implies, snapshots are taken on the 10th day of Fall, Winter and Spring quarters. Presently, the University utilizes the BackBlaze Cloud Service for storage of long-term 10th day statistics.
The architecture for our on-prem and cloud backups is graphically illustrated in the attachment to this article.
- WHEN are data backed up?
VMs and DFS shares are updated nightly. And as the name implies, Banner 10th stats are backup up separately on or shortly after the 10th day of the quarter.
Local backups for DFS shares occur nightly via scripted processes between hosts in Marston and Demaray. Lower cost storage (Dell) is currently used for DFS local back up.
Off-site transfers: When initiating cloud services, an initial "seeding" of the cloud storage takes place. This process may involve high bandwidth and resource commitments as files must be transferred to the cloud. Similarly, initial seeding of new local hosts may be time and resource intensive depending on the application and data at play.
Once the initial seeding is complete, nightly backups occur incrementally; i.e., only change-data is transmitted thereby decreasing transfer and processing costs associated with the backup process.
- How Long? Backup Retention Schedule
- Local backups of VMs and DFS data are retained for up to 60 days. What this means is that at any given time we can restore backed-up VMs or DFS files at multiple points within the 60 window. There are no restore points beyond 60 days.
- Only the most recent/current version of "critical" VMs backed-up to the cloud are preserved off-site: no incremental restore points are transmitted to the cloud. Each night the latest VM images replace the previous image.
- DFS shares backed-up to the cloud include incremental backups for 30 days. At any given time we can restore backed-up DFS files at multiple points within the 30 day window. Currently we utilize the BackBlaze Cloud Storage facility for DFS off-site backups.
- Banner Backup
Banner is backed up on the same schedule as other systems, however an additional snapshot of the Banner database is taken on 10th Day of Fall, Winter, and Spring quarters. These backups are retained for a minimum of 7 years to preserve academic and financial records and history.
- What Constitutes "Critical" VM Designation?
As noted above, only VMs that are classified as "critical" are backed up to our cloud archive repository. In the simplest terms, "critical" institutional data is anything that is more expensive to recreate than to backup and maintain. While the determination of "critical" is made on a case-by-case basis, the following criteria are to be considered in the designation process:
- Servers/resources that contain data involving the University's formal business, academic, or financial record that need to persist indefinitely as part of ongoing operations and that would be either impossible or excessively expensive to recreate.
- Servers/resources that hold data for which the University is legally obligated to follow statutory requirements for retention and auditing.
- Servers/resources supporting critical infrastructure services with complex system configurations or which are dependent upon detailed transactional records for system functionality.
- All other services/resources in support of campus departmental operations for which "off site replication" is a requirement set forth and agreed upon in the negotiated service level support agreement (SLA).
Students, faculty and staff are responsible to keep backups of personal files and data stored outside of designated University resources.
- Individual Recovery
Users needing individual files restored should contact the CIS Help Desk during business hours. Requests must include the specific file name, location, and restore date. Files will be restored on a best-effort basis.
- Disaster Recovery
When the operating system and large portions of data are destroyed from a server or if the server is physically damaged, the following disaster-recovery procedures are activated to assure an orderly and timely recovery.
- Computer and Information Systems makes disaster recovery including the preservation and restoration of institutional data the highest priority.
- If more than one server is down, priorities are set based on the number of people using the server and the type of information it contains. First priority is given to servers containing financial or student records. Because it is not feasible to enumerate every combination of servers in this policy, priorities are set by the CIO or most senior director or admin available at the time of the disaster.
- As the recovery process begins, the CIO is responsible for communicating to users the plan of action and the methods we are using to communicate updated information.
With the exception of CIS Systems Administrators who administer the backup system, only individuals have access to their institutional data stored on University resources. CIS System Administrators must have access to the data in order to operate the service, however, they are bound by strict confidentiality agreements (see Privileged System Access Policy]) and by University policy to protect the security of the data. All backups are kept in a secure facility on campus and later off-site at cloud-hosted service providers who are contractually obligated to ensure the confidentiality, availability and integrity of University data.