One of the things I've come to appreciate most about the cloud software-as-a-service solutions is that it can reduce work. Although there's still a need for IT, some tasks -- such as performing backups -- are provided by the vendor. If you've ever tried to backup a Team Foundation Server by hand, you have can appreciate not having to do that task in Visual Studio Team Services (VSTS). Unfortunately, there are often compelling reasons to keep some your servers on-premise. In these situations, it's important to have a good backup and recovery strategy.
Surprisingly, I've noticed that many companies don't understand how to correctly perform backups on their TFS instances. They understand that TFS stores its data in SQL Server, so they rely on regular backups of SQL Server databases for disaster recovery. When the system fails, they are shocked to find out that the data is not recoverable. TFS won't recognize the restored databases.
That's the wrong time to discover that you may not be able to successfully recover your code.
Why It Goes Wrong
TFS uses SQL Server for the data storage, so it's natural for IT to believe that they just need to backup the databases. The truth is more complicated. TFS relies on synchronizing multiple databases so that the data is consistent at points in time. To do this, TFS relies on Marked Transactions. If you're not familiar with marked transactions, they provides a mechanism for keeping multiple SQL Server database instances synchronized. You can learn more on Microsoft Docs. Records are stored in the transaction logs of two or more databases to mark a logically consistent point. When the database is restored, the marks ensure the data is consistent to the same moment in time. Transactions created after the most recent transaction mark will be lost when the data is restored.
TFS relies on these transaction marks to keep the databases synchronized. If you perform a typical SQL Server backup, the resulting files will be missing the transaction marks on the various tables. Consequently, TFS is unable to verify that the records are logically consistent when the backups are restored. This will prevent TFS from loading the collection.
A second problem is a process issue. Companies often fail to realize that the backups are not working because they do not test the recovery process. The team only notices the problem when they are forced to recover from an actual failure. Thus, they don't realize the problem until it's uncorrectable.
The Backup Options
Naturally, one of the first options to consider is moving to the cloud. If you're on TFS 2017, it's possible with some effort to move your existing system into the cloud. Microsoft will then handle backing up the system and ensuring that everything is recoverable. If this is an option, make sure you understand the security that's in place to protect your data. This will make it easier to discuss the transition to the cloud.
Backing up an on-premise TFS properly is not difficult, but it can be tedious if the IT team insists on doing manual backups. Microsoft has outlined the steps here. The document is broken down into 14 sections which cover the entire process. This document offers guidance in backing up the encryption keys, marking the tables, and automating the process using stored procedures and scheduled jobs. If you are doing the backups manually, follow those steps to ensure the databases can be recovered.
If you're running TFS 2012 or better, there is a built-in wizard which automates the backup process and creates a recoverable archive of your entire system. It stores the backups and the details needed for recovery on a file share. If you're one of them many companies still running TFS 2010, then you will need to install the Team Foundation Server Power Tools on your TFS application server to take advantage of this functionality. If you're running 2008 or earlier -- this may be your reason to upgrade!
Starting the Backup Wizard
Let's walk through the steps for using the Wizard. I'm using TFS 2017, but these steps will also work for the earlier versions of TFS (and TFS 2010, if the Power Tools are installed on the TFS application server).
On your TFS server, open the TFS Administration Console and navigate to Scheduled Backups
Next, click Create Scheduled Backup
At this point, you will open the wizard that will guide you through the actual setup process. You will want to also have a network share that is accessible to the TFS service. This share will be used to store the backups as they are created. So ... how much space do you need?
Estimating Your Storage Requirements
As you walk through the backup wizard, it will prompt you to define a retention policy. This will in turn dictate how much space you require for storage. The files created on this share should generally not be altered manually. The contents represent everything that TFS will need to restore your system to full health. To estimate storage requirements, you'll want to know the current size of your databases. You will be looking for everything with "TFS_", "WSS_", and "Report" in the name. Typically, a single collection can include the following:
You will want enough space to hold a copy of each database. Additionally, the process will keep transactional snapshots. The amount of code churn each week will determine how much space that will require. If you want to use a typical 30-day backup strategy, you'll need about 4x the total size of the databases. This would include a weekly full backup and daily differential backups. If you're planning to use the defaults, you'll need enough space for approximately 31 copies of the data. You can reduce the storage requirements by 50% or more if you configure the database server to create compressed backups.
This gives you a starting point. The retention policy will define how frequently backups occur and how long older backups are retained. Longer retention plans require more space, so plan appropriately. If you're not doing it already, make sure to keep a second copy off-site (or at least on another physical machine). If you have an Azure subscription, you might consider installing Microsoft Azure Backup to automatically copy your backup files to the cloud.
We're Off to See the Wizard ...
So now that we understand the minimum expectations from the file share, let's walk through the wizard and configure our backup.
On the first screen, specify a file share where the TFS service can store the backups (Network Backup Path). Next, configure the number of days the backups will be retained (Backup Retention Days). Press Next.
On the Alerts screen, you can choose to send emails to let you know the outcome of a backup process. At a minimum, consider getting an email if a backup job fails. If the service is failing to create valid backups, you'll want to be able to correct the situation quickly. Press Next.
The next step is to define a schedule. This controls how often and what kind of backup you are creating. A full backup will require the most space. It contains complete copies of the databases and is everything necessary to restore the data tier. The default schedule to run a full backup each night. If you need to conserve space, make sure you get a full backup at least once a week.
The differential backup will track what has changed between two backups. It requires significantly less space, but requires a full backup to perform a restore. If you are not running a nightly full backup, run differential backups every night that does not have a scheduled full backup.
Transaction log backups contain the changes that are happening throughout the day. This allows for point-in-time recovery, covering the time between the full and differential backups. Transaction log backups mitigate the risk of data loss that can occur from check-ins during the workday. The default setting is 15 minutes.
After setting the schedule, you can use the Review and Readiness Checks to ensure the backups have the proper permissions. It will verify that the TFS service account has all the necessary permissions to access the file share and to create database backups. Ensure that you've handled all warnings before you leave this screen. Press Next to do a final check and to configure the scheduled task.
Restoring our Data
The Scheduled Backups task on the TFS application server has a second option. If you click Restore Databases, a wizard will launch to guide you through the recovery process. It will use the backups stored on the file share to provide you the ability to restore your environment to a specific point in time. You will want at least twice the size of the database available. This ensures the backup has room to be restored and the transaction logs created.
A complete walkthrough of the restore process is available on the Visual Studio documentation site at https://www.visualstudio.com/en-us/docs/setup-admin/tfs/admin/backup/tut-single-svr-home
Knowing You Know
This should be obvious, but you really must make plans to periodically test your backups. If you aren't doing this, you don't know that you have a working recovery strategy. You also won't have any idea of how long it will take to recover from a disaster. Don't wait until you have a disaster situation to start assessing the quality of your backups. Verify!
There's a few ways you can (mostly) painlessly test the recoverability of your backups. While you can use a dedicated physical machine for your tests, consider using a virtual machine. If you are using Microsoft Azure, spin up a Windows virtual machine with SQL Server and TFS. Use that to test restoring your backups. Once you have verified the restored data works, you can eliminate or stop the virtual machine.
If you're using an on-premise virtualization, consider having a virtual machine for the task in that environment. Make sure that the name of the sandbox machine is different from the existing TFS server to avoid potential conflicts. Alternatively, use the virtual machines provided by Microsoft here: https://almvm.azurewebsites.net/labs/tfs/. These virtual machines make it easy to spin up a pre-configured, network-isolated virtual machine running TFS and Visual Studio. These can be an ideal environment to test recovering backups and to evaluate data integrity.
If you're relying on SQL Server backups -- perhaps it's time to consider updating your process to ensure you have recoverable data. It is not difficult to create a recoverable TFS environment. You have several options available for how to implement your backup strategy. If you aren't quite ready to move to the cloud and use VSTS, consider using the backup wizard to schedule backups of your TFS environment. Then, enjoy a good night's rest knowing that you aren't one failure away from losing all of your development team's hard work!