Perforce Server Backup Procedure Scripts

This article describes a pair of scripts I have written that implement the same Perforce server backup procedure. One (PerforceBackupProcedureLibrary.bat) is intended for use with Microsoft Windows. The other (PerforceBackupProcedureLibrary) is intended for use with Unix based systems (including Mac OS X). These scripts perform the essential steps needed to backup a Perforce server. They save their output in XML format which can be easily consumed by other programs.

XSLT and CSS style sheets are provided for displaying the backup procedure results as HTML. A PowerShell script is provided provided for generating an RSS feed of the backup procedure results on Microsoft Windows systems. Unix users can use the bash script for doing the same thing. Both of these scripts use an XSLT style sheet when generating the RSS feed.

These scripts are distributed under the MIT license.

Perforce Server Backup Procedure

The Perforce server stores information in two ways. It stores metadata such as change descriptions in a relational database. It stores the actual file modifications in versioned file trees.

The metadata database is stored under the server root. The server root can be specified by the P4ROOT variable or the –r option to p4d or p4s (the Perforce server daemon and service executables). The working database consists of a group of files whose names begin with “db.”. These correspond to the tables in the database. The Perforce server also stores transaction information in a journal file which can be used to restore the “db.*” if needed. The location of the journal file can be specified by the P4JOURNAL variable or the –J option to p4d or p4s.

There is one versioned file tree per depot. The locations of the versioned file trees are defined by the depot specifications’ map field. There are three types of depots: local, spec and remote. Remote depots are depots controlled by other Perforce servers. Spec depots store the changes various specifications such as branch, client, depot, group, job, label, protect, typemap, and user. Local depots contain the files that have been submitted to the Perforce server.

The versioned file trees can be archived directly. However, you should not backup the “db.*” files. Instead, you create a checkpoint file that contains all the information needed to re-create the metadata database (“db.*” files). When you create the checkpoint, the information from the journal file is included and the journal file is truncated. Thus regular checkpoints help to keep the journal file from becoming too large. It is this checkpoint file that you should backup with the versioned file trees.

The Perforce server license file is located under the server root directory. You should back this up too.

Before backing up anything, you should insure the integrity of your data by running p4 verify and checking its output for files that are missing or whose stored md5 digest does not match the one computed for the p4 verify command. If you have errors at this point, you should read this Knowledge Base article and contact Perforce support.

You should also test your backup by performing a test restore. This type of test is needed to insure the integrity of your backups. You must do it. If you don’t, you don’t know whether your backups are working. Don’t find out the hard way.

Script Usage

Configuration information is passed to these scripts via environment variables. I recommend setting up the environment in one configuration script which then the runs PerforceBackupProcedureLibrary.bat or PerforceBackupProcedureLibrary. Advantages of this approach are:

The items that must be set in the configuration file are:

These values must be set if you are going to generate an RSS feed. If you do not wish to generate an RSS feed, remove from the configuration script the line which runs PerforceBackupProcedureRSS.

You can also use this configuration file to setup any other environment information such as your PATH and Perforce environment variables.

I've provided template versions of these configuration scripts for you to use. They are named MyServerPerforceBackupProcedure.bat and MyServerPerforceBackupProcedure. Copy the appropriate one of these and replace “MyServer” with the value of the serverShortName variable. This will help keep things organized.

The template configuration scripts provided vary the backupType by the day of the week. You can modify this to fit your needs. See Possible Modifications for more information on changes you might want to make.

Script Output

Log file names include the serverShortName (set in the configuration script) and the checkpoint number to aid in traceability. The following items correspond to checkpoint.23.gz.

Exemplics.23.BackupProcedure.xml — the backup procedure results
Exemplics.23.Backup.txt — the backup program output
Exemplics.23.Restore — the test restore directory
Exemplics.23.Verify.txt — the p4 verify output
Exemplics.23.bks — Microsoft Windows only, contains the list of files and folders to be backup up

The *.*.BackupProcedure.xml files contain the ShortServerName and CheckpointNumber as attributes of the root node (BackupProcedure). No information is lost if these files are renamed although it might be easier to interoperate with other scripts if you do not rename the files.

The follow list shows what the backup directory would look like if it contained Microsoft Windows backup files.

Exemplics.21.normal.bkf
Exemplics.22.incremental.bkf
Exemplics.23.incremental.bkf

The follow list shows what the backup directory would look like if it contained Unix backup files.

Exemplics.21.normal.tgz
Exemplics.22.incremental.tgz
Exemplics.23.incremental.tgz
Exemplics.snar — snapshot file used by tar to create incremental backups, not needed during restore

Viewing Results as HTML

Copy PerforceBackupProcedureHTML.xsl and PerforceBackupProcedure.css to your log directory. The backup procedure scripts have been modified so that this style sheet can be linked directly from the XML results file. Modify your configuration file to set includeStyleSheet to a non-empty value. When you view the XML file in your browser, the transform will automatically be applied. If you want to view the raw XML source, choose the “View Source” menu item from the appropriate browser menu.

You can also generate the HTML using an external processor such as msxsl or xsltproc. In this case, you will need to navigate to the generated HTML file in your browser.

msxsl MyServer.54.BackupProcedure.xml PerforceBackupProcedureHTML.xsl –o MyServer.54.BackupProcedure.htm

xsltproc –o MyServer.54.BackupProcedure.html PerforceBackupProcedureHTML.xsl MyServer.54.BackupProcedure.xml

Compatibility

Linking the style sheet directly from the XML results file works fine for recent versions of Internet Explorer, Safari, Opera and Chrome but Firefox does not properly render the HTML. The generated HTML contains calls to JavaScript functions and the Firefox generates a new document for the document.write() calls instead of using the generated HTML document. The workaround for this is to apply the transformation using an external program such as msxsl or xsltproc as previously described. Firefox has no problem when the HTML is generated this way.

Generating an RSS feed

If you want to transform the XML results into an RSS feed, you need to copy PerforceBackupProcedureRSS.xsl to your log directory.

Design Guidelines

Here are some guidelines I used when writing these scripts.

Use tools already available on the target operating systems. Don't require installation of other tools. The Microsoft Windows XP version of these scripts is written for cmd.exe. The backup application is ntbackup.exe. The Unix version is written for bash. The backup application is tar. The bash script used to generate the RSS feed uses xsltproc to transform the results of each backup procedure into an RSS item. I've chosen to use PowerShell to transform the results for Microsoft Windows users. I did this, in part, to demonstrate how easy it is to utilize the .NET framework with PowerShell. If you do not wish to use PowerShell and .NET, you could write a cmd.exe script that uses an external XSLT processor like msxsl.

Provide enough information so you can modify these scripts to fit your needs.

Provide a test mode that produces the XML output but does not run the commands associated with the steps.

The inclusion of the checkpoint restore and test steps contradicts this guideline. The checkpoint restore and test results are logged but the backup is done even if these steps fail. These steps could be moved to separate scripts because their results are not needed here. This move would enable off–line restore and test. I've put these steps here because they are relatively fast and simple and I really want this information. Perhaps these steps will be split off in a later version of these scripts.

Keep the scripts as simple as possible. Defer non–essential tasks to downstream tools which could be run on a system other than the server (perhaps using a different tool set).

Keep the scripts structurally similar so parallel changes can be made more easily. This screenshot shows portions of the Microsoft Windows script on the left and the Unix script on the right.

Backup Procedure source comparison

The XML output structure also reflects the structure of the scripts. For example:

Backup Procedure XML multiple step example

Don't vary XML output. As suggested in the preceding screenshots, the dates and times are saved in the same format on both platforms.

Insure integrity of the configuration by deriving as much information as possible from that already supplied in the environment. The Perforce client settings are used specify the Perforce server port. From this, the server root directory, current checkpoint number and depot versioned file tree locations are obtained from the output of the p4 info, p4 counter and p4 depots commands.

Gather useful data as we go. File sizes and start and end times are saved in the XML log file. Full verification output is saved (the –q switch is not used) so that we can extract the most information. I want to make the most of running this command since it is one of the longer running commands in the script.

Script Details

A key feature of these scripts is that the results of the backup procedure are saved in XML format. These results include information about the Perforce connection, environment, the backup procedure steps and commands, the output of those commands, sizes of files created, start and end times of steps and the backup procedure as a whole. This information can then be used by other scripts or web servers to extract and display information about the health of the backup procedure and Perforce server.

Backup Procedure XML single step example

This results file and other output is saved in the log directory. If errors occur when running these scripts, the files in the log directory can be used to help diagnose the problems. The backup files (archives) can, and should, be saved in a separate backup directory.

The steps in the procedure are:

If any of the previous steps do not succeed, the remaining steps are skipped. If they all succeed, the scripts continue as follows.

What These Scripts Don't Do

These scripts do not post notifications via email or any other means. The XML output of these scripts can be communicated some other means such as a web dashboard or RSS feed.

These scripts do not analyze the verification output. Information such as the number of files, changes and versions can be obtained by this analysis. This information can then be used when estimating hardware requirements.

These scripts do not analyze the backup program output. The backup log file is saved in the log directory so you can conveniently inspect it.

These scripts do not perform a test restore from backup. This type of test is needed to insure the integrity of your backups. You must do it. If you don't, you don't know whether your backups are working.

These scripts do not cleanup old files that they have created. Old checkpoint, journal, log and backup files will need to be removed on a regular basis.

These scripts do not replace the Perforce metadata database with one created from the restored checkpoints. This would require restarting the Perforce server. Newer Perforce server versions typically do not require tree rebalancing obtained by restoring from the checkpoint.

These scripts do not implement off–line checkpoints

Possible Modifications

You'll probably want to modify the Backup step to use your own backup utility.

You might want to modify how p4 modify is run since it takes a long time to run and its output is can be quite large. You can add the –q flag (p4 verify –q //…) to make the p4 verify output shorter. When run this way, p4 verify only outputs file revisions with errors. Another desirable change would be to run p4 verify on certain days or perhaps different parts of the depot on different days. See p4 help verify for other verification strategies.

The test of the restore is trivial. This could be improved, especially if the restore and test steps are moved to a separate script.

Share Results

Here are some ways you can share this information with team members.

Viewing the Results as HTML

I’ve written XSLT and CSS style sheets which transforms the backup procedure XML results to HTML. The generated HTML page displays the:


My goal when creating this transform was to make it easier to understand the backup procedure results. To achieve this goal, I’ve tried to follow the principles described by Edward Tufte in his books The Visual Display of Quantitative Information , Envisioning Information, Visual Explanations and Beautiful Evidence.

The tabular information displayed is organized in topical paragraphs as delimited by the section labels, column headings and horizontal rules. The data-ink ratio and data density are maximized allowing (in most cases) all the information to be viewed without scrolling. Visual parallelism is enhanced by placement of error text just below the step where the error occurred, server directory free space just below server item sizes and archive directory free space just below archive items sizes. Numeric columns are right aligned and rendered with a monospace font. They are multifunctioning serving as both data values and data measures (like the bars on a bar chart).

I’ve avoided chartjunk and excessive grids which hinder comprehension. Thin gray lines are used for rules and these are used sparingly. An implicit grid is formed by the typography. Color is used sparingly to differentiate between normal results and those where an error occurred.

This is what the backup results look like when no error has occurred.

HTML backup procedure result with no errors

This is what the backup results look like when an error has occurred.

HTML backup procedure result with no errors

Implementation Notes

These scripts demonstrate that it’s not too hard to output scripts results in XML format. CDATA sections are used to capture commands and the output of those commands. Command text is a child of the Command element. Any command output that is not explicitly redirected to a file is saved as a child of the ConsoleText element.

Some output that is not useful to these scripts, like the output of the pushd and popd commands in the Unix version of these scripts, is ignored. That is, it is redirected to /dev/null in the Unix version and nul in the Microsoft Windows version.

Functions are defined in the order they are used within each calling depth. Bash requires that functions be defined before they are used. The main procedure must be at the bottom of the script. This is not a requirement for the Microsoft Windows version, but I put the main procedure at the end so that the structure of the scripts is similar.

Indentation of the script follows the indentation of the XML output. This is an attempt to make it easier to correlate the script commands with the XML output.

Each step sets and restores the working directory if needed (using pushd and popd).

tar strips the leading / from path so that the archives can be restored to an alternate location. The scripts have to use full paths because depot versioned file trees can be located anywhere. They do not need to be located under the server root (where the metadata database is). Ntbackup can also restore to an alternate location.

You cannot directly specify files to be backed up on the ntbackup command line. You can only specify directories. Since we want to backup the checkpoint and license files, we’ll have to create a backup selection file that contains the items we want to backup. The backup selection file is a Unicode encoded file with paths listed one per line. Directory paths end with “\”. We can then use this backup selection file on the ntbackup command line. The backup selection file should not have a byte order mark.

If you get result code 141 from tar, it probably means that the disk containing the backup directory is full.

If you get result code 31 from ntbackup, it probably means an error occurred when writing the archive file.

Viewing the Results as HTML

JavaScript functions are used in the generated HTML to compute elapsed times and trim leading and trailing whitespace from error text. This approach does not require XSLT extensions.

The overall start and end times (from the /BackupProcedure/Summary element) are used to compute the total elapsed time. This eliminates the accumulated rounding errors that would occur if the elapsed times of each step were summed. Therefore, the total elapsed time displayed in the Steps table might not equal the sum of the elapsed times for each step.

The RSS feeds do not use any embedded HTML. Different web browsers and RSS readers vary in their support for embedded HTML.

The scripts used to generate the RSS feeds do not use any XSLT extensions to enable greater portability.