As is especially the case when developing software, the data that you maintain under version control is often closely related to, or perhaps dependent upon, someone else's data. Generally, the needs of your project will dictate that you stay as up-to-date as possible with the data provided by that external entity without sacrificing the stability of your own project. This scenario plays itself out all the time—anywhere that the information generated by one group of people has a direct effect on that which is generated by another group.
For example, software developers might be working on an application which makes use of a third-party library. Subversion has just such a relationship with the Apache Portable Runtime library (see the section called “The Apache Portable Runtime Library”). The Subversion source code depends on the APR library for all its portability needs. In earlier stages of Subversion's development, the project closely tracked APR's changing API, always sticking to the “bleeding edge” of the library's code churn. Now that both APR and Subversion have matured, Subversion attempts to synchronize with APR's library API only at well-tested, stable release points.
Now, if your project depends on someone else's information, there are several ways that you could attempt to synchronize that information with your own. Most painfully, you could issue oral or written instructions to all the contributors of your project, telling them to make sure that they have the specific versions of that third-party information that your project needs. If the third-party information is maintained in a Subversion repository, you could also use Subversion's externals definitions to effectively “pin down” specific versions of that information to some location in your own working copy directory (see the section called “Externals Definitions”).
But sometimes you want to maintain custom modifications to third-party data in your own version control system. Returning to the software development example, programmers might need to make modifications to that third-party library for their own purposes. These modifications might include new functionality or bug fixes, maintained internally only until they become part of an official release of the third-party library. Or the changes might never be relayed back to the library maintainers, existing solely as custom tweaks to make the library further suit the needs of the software developers.
Now you face an interesting situation. Your project could house its custom modifications to the third-party data in some disjointed fashion, such as using patch files or full-fledged alternate versions of files and directories. But these quickly become maintenance headaches, requiring some mechanism by which to apply your custom changes to the third-party data, and necessitating regeneration of those changes with each successive version of the third-party data that you track.
The solution to this problem is to use vendor branches. A vendor branch is a directory tree in your own version control system that contains information provided by a third-party entity, or vendor. Each version of the vendor's data that you decide to absorb into your project is called a vendor drop.
Vendor branches provide two key benefits. First, by storing the currently supported vendor drop in your own version control system, the members of your project never need to question whether they have the right version of the vendor's data. They simply receive that correct version as part of their regular working copy updates. Secondly, because the data lives in your own Subversion repository, you can store your custom changes to it in-place—you have no more need of an automated (or worse, manual) method for swapping in your customizations.
Managing vendor branches generally works like this.  You
        create a top-level directory (such as
        /vendor) to hold the vendor branches.
        Then you import the third party code into a subdirectory of
        that top-level directory.  You then copy that subdirectory
        into your main development branch (for example,
        /trunk) at the appropriate location.  You
        always make your local changes in the main development branch.
        With each new release of the code you are tracking you bring
        it into the vendor branch and merge the changes into
        /trunk, resolving whatever conflicts
        occur between your local changes and the upstream
        changes.
Perhaps an example will help to clarify this algorithm.
        We'll use a scenario where your development team is creating a
        calculator program that links against a third-party complex
        number arithmetic library, libcomplex.  We'll begin with the
        initial creation of the vendor branch, and the import of the
        first vendor drop.  We'll call our vendor branch directory
        libcomplex, and our code drops will go
        into a subdirectory of our vendor branch called
        current.  And since svn
        import creates all the intermediate parent
        directories it needs, we can actually accomplish both of these
        steps with a single command.
$ svn import /path/to/libcomplex-1.0 \
             http://svn.example.com/repos/vendor/libcomplex/current \
             -m 'importing initial 1.0 vendor drop'
…
We now have the current version of the libcomplex source
        code in /vendor/libcomplex/current.  Now,
        we tag that version (see the section called “Tags”)
        and then copy it into the main development branch.  Our copy
        will create a new directory called
        libcomplex in our existing
        calc project directory.  It is in this
        copied version of the vendor data that we will make our
        customizations.
$ svn copy http://svn.example.com/repos/vendor/libcomplex/current  \
           http://svn.example.com/repos/vendor/libcomplex/1.0      \
           -m 'tagging libcomplex-1.0'
…
$ svn copy http://svn.example.com/repos/vendor/libcomplex/1.0  \
           http://svn.example.com/repos/calc/libcomplex        \
           -m 'bringing libcomplex-1.0 into the main branch'
…
We check out our project's main branch—which now includes a copy of the first vendor drop—and we get to work customizing the libcomplex code. Before we know it, our modified version of libcomplex is now completely integrated into our calculator program. [24]
A few weeks later, the developers of libcomplex release a new version of their library—version 1.1—which contains some features and functionality that we really want. We'd like to upgrade to this new version, but without losing the customizations we made to the existing version. What we essentially would like to do is to replace our current baseline version of libcomplex 1.0 with a copy of libcomplex 1.1, and then re-apply the custom modifications we previously made to that library to the new version. But we actually approach the problem from the other direction, applying the changes made to libcomplex between versions 1.0 and 1.1 to our modified copy of it.
To perform this upgrade, we checkout a copy of our vendor
        branch, and replace the code in the
        current directory with the new libcomplex
        1.1 source code.  We quite literally copy new files on top of
        existing files, perhaps exploding the libcomplex 1.1 release
        tarball atop our existing files and directories.  The goal
        here is to make our current directory
        contain only the libcomplex 1.1 code, and to ensure that all
        that code is under version control.  Oh, and we want to do
        this with as little version control history disturbance as
        possible.
After replacing the 1.0 code with 1.1 code, svn
        status will show files with local modifications as
        well as, perhaps, some unversioned or missing files.  If we
        did what we were supposed to do, the unversioned files are
        only those new files introduced in the 1.1 release of
        libcomplex—we run svn add on those to
        get them under version control.  The missing files are files
        that were in 1.0 but not in 1.1, and on those paths we run
        svn delete.  Finally, once our
        current working copy contains only the
        libcomplex 1.1 code, we commit the changes we made to get it
        looking that way.
Our current branch now contains the
        new vendor drop.  We tag the new version (in the same way we
        previously tagged the version 1.0 vendor drop), and then merge
        the differences between the tag of the previous version and
        the new current version into our main development
        branch.
$ cd working-copies/calc
$ svn merge http://svn.example.com/repos/vendor/libcomplex/1.0      \
            http://svn.example.com/repos/vendor/libcomplex/current  \
            libcomplex
… # resolve all the conflicts between their changes and our changes
$ svn commit -m 'merging libcomplex-1.1 into the main branch'
…
In the trivial use case, the new version of our third-party tool would look, from a files-and-directories point of view, just like the previous version. None of the libcomplex source files would have been deleted, renamed or moved to different locations—the new version would contain only textual modifications against the previous one. In a perfect world, our modifications would apply cleanly to the new version of the library, with absolutely no complications or conflicts.
But things aren't always that simple, and in fact it is quite common for source files to get moved around between releases of software. This complicates the process of ensuring that our modifications are still valid for the new version of code, and can quickly degrade into a situation where we have to manually recreate our customizations in the new version. Once Subversion knows about the history of a given source file—including all its previous locations—the process of merging in the new version of the library is pretty simple. But we are responsible for telling Subversion how the source file layout changed from vendor drop to vendor drop.
Vendor drops that contain more than a few deletes, additions and moves complicate the process of upgrading to each successive version of the third-party data. So Subversion supplies the svn_load_dirs.pl script to assist with this process. This script automates the importing steps we mentioned in the general vendor branch management procedure to make sure that mistakes are minimized. You will still be responsible for using the merge commands to merge the new versions of the third-party data into your main development branch, but svn_load_dirs.pl can help you more quickly and easily arrive at that stage.
In short, svn_load_dirs.pl is an enhancement to svn import that has several important characteristics:
It can be run at any point in time to bring an existing directory in the repository to exactly match an external directory, performing all the necessary adds and deletes, and optionally performing moves, too.
It takes care of complicated series of operations between which Subversion requires an intermediate commit—such as before renaming a file or directory twice.
It will optionally tag the newly imported directory.
It will optionally add arbitrary properties to files and directories that match a regular expression.
svn_load_dirs.pl takes three mandatory arguments. The first argument is the URL to the base Subversion directory to work in. This argument is followed by the URL—relative to the first argument—into which the current vendor drop will be imported. Finally, the third argument is the local directory to import. Using our previous example, a typical run of svn_load_dirs.pl might look like:
$ svn_load_dirs.pl http://svn.example.com/repos/vendor/libcomplex \
                   current                                        \
                   /path/to/libcomplex-1.1
…
You can indicate that you'd like
        svn_load_dirs.pl to tag the new vendor drop
        by passing the -t command-line option and
        specifying a tag name.  This tag is another URL relative to
        the first program argument.
$ svn_load_dirs.pl -t libcomplex-1.1                              \
                   http://svn.example.com/repos/vendor/libcomplex \
                   current                                        \
                   /path/to/libcomplex-1.1
…
When you run svn_load_dirs.pl, it
        examines the contents of your existing “current”
        vendor drop, and compares them with the proposed new vendor
        drop.  In the trivial case, there will be no files that are in
        one version and not the other, and the script will perform the
        new import without incident.  If, however, there are
        discrepancies in the file layouts between versions,
        svn_load_dirs.pl will prompt you for how
        you would like to resolve those differences.  For example, you
        will have the opportunity to tell the script that you know
        that the file math.c in version 1.0 of
        libcomplex was renamed to arithmetic.c in
        libcomplex 1.1.  Any discrepancies not explained by moves
        are treated as regular additions and deletions.
The script also accepts a separate configuration file for
        setting properties on files and directories matching a regular
        expression that are added to the
        repository.  This configuration file is specified to
        svn_load_dirs.pl using the
        -p command-line option.  Each line of the
        configuration file is a whitespace-delimited set of two or
        four values: a Perl-style regular expression to match the
        added path against, a control keyword (either
        break or cont), and then
        optionally a property name and value.
\.png$ break svn:mime-type image/png \.jpe?g$ break svn:mime-type image/jpeg \.m3u$ cont svn:mime-type audio/x-mpegurl \.m3u$ break svn:eol-style LF .* break svn:eol-style native
For each added path, the configured property changes whose
        regular expression matches the path are applied in order,
        unless the control specification is break
        (which means that no more property changes should be applied
        to that path).  If the control specification is
        cont—an abbreviation for
        continue—then matching will continue
        with the next line of the configuration file.
Any whitespace in the regular expression, property name,
        or property value must be surrounded by either single or
        double quote characters.  You can escape quote characters that
        are not used for wrapping whitespace by preceding them with a
        backslash (\) character.  The backslash
        escapes only quotes when parsing the configuration file, so do
        not protect any other characters beyond what is necessary for
        the regular expression.