Migrate from SVN to Git with Full History (Step-by-Step)

Okay, here’s a comprehensive article detailing the migration from SVN to Git, preserving full history, with step-by-step instructions and explanations. This will cover various scenarios and tools to provide a thorough guide.

Migrate from SVN to Git with Full History (Step-by-Step)

Migrating from Subversion (SVN) to Git is a common task for teams seeking the benefits of distributed version control, including improved branching, merging, offline work, and overall performance. However, simply copying files isn’t enough; preserving the complete history of changes is crucial for maintaining context, auditing, and the ability to revert to previous states. This guide provides a detailed, step-by-step approach to migrating from SVN to Git, ensuring full history preservation.

I. Why Migrate from SVN to Git?

Before diving into the technical details, let’s briefly recap the key motivations for migrating:

  • Distributed Version Control: Git’s distributed nature allows every developer to have a complete local copy of the repository, enabling offline work and reducing reliance on a central server.
  • Branching and Merging: Git’s branching and merging model is significantly more flexible and efficient than SVN’s. Creating and merging branches is fast and lightweight, encouraging experimentation and feature development in isolation.
  • Performance: Git generally offers better performance for most operations, especially branching, merging, and history browsing.
  • Community and Ecosystem: Git has a vast and active community, providing extensive documentation, tools, and support. The Git ecosystem is rich with integrations and extensions.
  • Modern Workflow: Git aligns better with modern development workflows, such as Continuous Integration/Continuous Delivery (CI/CD) and DevOps practices.
  • Staging Area:: The staging area (or index) in Git is a crucial feature that allows developers to carefully craft their commits.
  • Smaller repository size:: Even with history, Git repositories are generally smaller than their SVN counterparts due to efficient data storage.

II. Pre-Migration Planning and Preparation

Successful migration requires careful planning. Here’s what you need to do before starting the technical process:

  1. Choose a Migration Strategy:

    • One-Time Migration (Dump and Load): The most common approach. The entire SVN repository is converted to Git in a single operation. This is suitable for most projects.
    • Incremental Migration (Mirroring): Less common, but useful for very large repositories or situations where downtime needs to be minimized. Changes are mirrored from SVN to Git over time. This is more complex to set up and maintain.
    • Phased Migration (By Branch/Directory): Migrate specific branches or directories from SVN to Git incrementally. Useful if you only need to preserve the history of certain parts of the project.

    This guide will primarily focus on the one-time migration strategy, as it’s the most straightforward and applicable to the majority of cases.

  2. Assess Your SVN Repository:

    • Repository Size: Large repositories will take longer to migrate. Estimate the size to anticipate the time required.
    • Branching and Tagging Structure: Understand how branches and tags are used in your SVN repository. Git and SVN handle these differently, so you’ll need a plan to map them. Specifically:
      • Standard Layout: Does your repository follow the standard trunk, branches, tags structure? This makes migration easier.
      • Non-Standard Layout: If you have a custom layout, you’ll need to specify how to map these directories to Git branches and tags.
    • External Dependencies (svn:externals): Identify any svn:externals properties. These need to be handled separately, either by converting them to Git submodules or by incorporating the external projects directly into the new Git repository.
    • Binary Files: Large binary files can bloat the Git repository. Consider using Git LFS (Large File Storage) to manage them efficiently (more on this later).
    • Empty Directories: SVN tracks empty directories, while Git does not. You’ll need a strategy to handle these (usually by adding a .gitkeep file).
    • Commit Messages: Review your SVN commit messages. You might want to clean them up or add more context before migrating.
    • Usernames: Map SVN usernames to Git author names and email addresses. This is crucial for preserving authorship information correctly.
  3. Prepare Your Environment:

    • Install Git: Ensure Git is installed on the machine you’ll be using for the migration. Use your operating system’s package manager or download it from the official Git website (https://git-scm.com/).
    • Install Subversion: You’ll need the SVN command-line tools to interact with your SVN repository.
    • Install git-svn: This is the crucial tool that bridges Git and SVN. It’s usually included with Git, but you might need to install it separately on some systems. Verify its installation by running git svn --version.
    • Install a Java Runtime Environment (JRE) (if using SubGit): SubGit, a powerful alternative to git-svn, requires a JRE.
    • Sufficient Disk Space: Ensure you have enough disk space for both the SVN checkout and the new Git repository. The Git repository will often be smaller, but it’s best to be prepared.
  4. Communicate with Your Team:

    • Inform your team about the migration plan and timeline.
    • Establish a “code freeze” period during the migration to avoid conflicts. This is essential to ensure a clean and consistent migration.
    • Provide training on Git for team members who are unfamiliar with it.
  5. Create an Authors Mapping File

    • SVN only stores usernames, Git stores a name and email address for each commit.
    • To link SVN users to Git users, we need to create an authors mapping file.
    • Create a text file (e.g., authors.txt) with the following format:
      svn_username1 = Git Name 1 <[email protected]>
      svn_username2 = Git Name 2 <[email protected]>
      ...
    • You can often get a list of SVN usernames using: svn log -q | awk -F '|' '/^r/ {gsub(/ /, "", $2); print $2}' | sort | uniq. You’ll then need to manually map these to Git names and email addresses.

III. Migration Methods: git-svn vs. SubGit

There are two primary tools for migrating from SVN to Git with history:

  • git-svn: A built-in Git command that provides bidirectional communication between Git and SVN repositories. It’s readily available, but can be slower for large repositories and may have limitations with complex SVN setups.
  • SubGit: A commercial tool (with a free option for open-source projects and small teams) that offers a more robust and often faster migration process. It handles complex scenarios more reliably than git-svn.

This guide will cover both methods, starting with git-svn and then exploring SubGit.

IV. Migration Using git-svn (Step-by-Step)

This section provides a detailed walkthrough of the migration process using git-svn.

  1. Clone the SVN Repository (with History):

    The core command is git svn clone. Here’s a breakdown of the options:

    bash
    git svn clone <SVN_REPOSITORY_URL> <DESTINATION_DIRECTORY> \
    --stdlayout \
    --authors-file=authors.txt \
    --no-metadata

    • <SVN_REPOSITORY_URL>: The URL of your SVN repository (e.g., https://svn.example.com/repo/project).
    • <DESTINATION_DIRECTORY>: The local directory where the Git repository will be created (e.g., my-git-repo).
    • --stdlayout: Tells git-svn that your SVN repository follows the standard trunk, branches, tags layout. This is highly recommended.
    • --authors-file=authors.txt: Specifies the path to your authors mapping file. This is essential for correct authorship.
    • --no-metadata: Prevents git-svn from adding SVN-specific metadata to the Git commit messages. This results in a cleaner Git history.
    • --prefix=origin/: This option adds a prefix to the remote-tracking branches created by git-svn.

    Example (Standard Layout):

    bash
    git svn clone https://svn.example.com/repo/project my-git-repo \
    --stdlayout \
    --authors-file=authors.txt \
    --no-metadata --prefix=origin/

    Handling Non-Standard Layouts:

    If your SVN repository doesn’t use the standard layout, you’ll need to specify the paths to your trunk, branches, and tags using the following options:

    • -T <trunk_path>: Path to the trunk (e.g., -T main).
    • -b <branches_path>: Path to the branches directory (e.g., -b features).
    • -t <tags_path>: Path to the tags directory (e.g., -t releases).

    Example (Non-Standard Layout):

    bash
    git svn clone https://svn.example.com/repo/project my-git-repo \
    -T main \
    -b features \
    -t releases \
    --authors-file=authors.txt \
    --no-metadata --prefix=origin/

    Important Considerations:

    • This process can take a very long time, especially for large repositories. Be patient and monitor the progress.
    • If the process is interrupted, you can often resume it by running git svn fetch within the newly created Git repository.
    • git-svn fetches revisions in batches. You’ll see output indicating the progress.
  2. Convert SVN Branches and Tags to Git Branches and Tags:

    After the initial clone, git-svn creates remote-tracking branches for all SVN branches and tags. We need to convert these to native Git branches and tags.

    “`bash
    cd my-git-repo

    Convert SVN branches to Git branches

    for branch in $(git branch -r | grep ‘origin/’ | grep -v ‘origin/trunk’ | sed ‘s/origin\///’); do
    git branch “$branch” “origin/$branch”
    done

    Convert SVN tags to Git tags

    for tag in $(git branch -r | grep ‘origin/tags/’ | sed ‘s/origin\/tags\///’); do
    git tag “$tag” “origin/tags/$tag”
    done

    Cleanup remote-tracking branches

    git branch -r -d $(git branch -r | grep ‘origin/’)
    ``
    **Explanation:**
    * The first
    forloop iterates through all remote-tracking branches (exceptorigin/trunk), and creates a local Git branch with the same name, pointing to the corresponding remote-tracking branch.
    * The second
    forloop iterates through all remote-tracking branches that start withorigin/tags/, and creates a lightweight Git tag with the same name, pointing to the corresponding remote-tracking branch.
    * The last
    git branch -r -d` command removes all remote-tracking branches.

  3. Clean Up the Repository (Optional but Recommended):

    • Remove SVN-Specific Metadata: Even with --no-metadata, some SVN-related information might remain. You can use git filter-branch (or the more modern and safer git filter-repo) to remove it. This is a more advanced topic, but it can result in a cleaner Git history.

      • Using git filter-repo (Recommended): This is a safer and more user-friendly alternative to git filter-branch. You might need to install it separately (e.g., pip install git-filter-repo).

        bash
        git filter-repo --force --prune-empty always --commit-callback '
        if b"git-svn-id:" in commit.message:
        commit.message = commit.message.split(b"git-svn-id:")[0].strip()
        '

        This command removes git-svn-id lines from commit messages and prunes any resulting empty commits.

    • Garbage Collection: Run git gc --prune=now to optimize the repository and remove any unreachable objects.

    bash
    git gc --prune=now

  4. Handle svn:externals (Important):

    svn:externals are pointers to other SVN repositories. git-svn doesn’t automatically handle these. You have a few options:

    • Convert to Git Submodules: This is the recommended approach if you want to maintain separate repositories for the external projects. You’ll need to manually create Git repositories for each external project, migrate them, and then add them as submodules to your main Git repository.
    • Incorporate into the Main Repository: If the external projects are small or tightly coupled with your main project, you can copy their contents directly into your main Git repository. This simplifies the structure but loses the ability to track changes in the external projects separately.
    • Ignore: If the externals are not essential, you can simply ignore them.

    Example (Converting to Git Submodules – Conceptual):

    1. Migrate each external SVN repository to a separate Git repository.
    2. In your main Git repository, remove the svn:externals property.
    3. Add each external project as a Git submodule using git submodule add <repository_url> <path>.
    4. Commit the changes.
  5. Handle empty directories:
    As mentioned before, SVN tracks empty directories, while Git does not. There are several methods to handle this:

    • Add a .gitkeep file: The most common solution is to add an empty file named .gitkeep (or any other name, but .gitkeep is conventional) to each empty directory that you want to preserve.
    • Add a .gitignore file: If you want to track the directory itself, but not its contents (e.g., a directory for temporary files), you can create a .gitignore file within the directory with the following contents:
      # Ignore everything in this directory
      *
      # Except this file
      !.gitignore
    • Ignore empty directories If you don’t need the empty directories at all, simply do nothing. Git will ignore them.

    Here’s how you might find and add .gitkeep files:
    bash
    find . -type d -empty -print0 | xargs -0 -I {} touch {}/.gitkeep
    git add .
    git commit -m "Add .gitkeep files to preserve empty directories"

    6. Verify the Migration:

    • Browse the Git History: Use git log and gitk to examine the commit history and ensure it’s complete and accurate.
    • Check Out Different Branches and Tags: Verify that all branches and tags were correctly converted.
    • Compare the Working Directory: Compare the contents of the Git working directory with a fresh checkout from SVN to ensure that all files were migrated. You can use diff or a similar tool for this.
    • Run Tests: Execute your project’s test suite to ensure that everything is working as expected.
  6. Push to a Remote Git Repository (e.g., GitHub, GitLab, Bitbucket):

    bash
    git remote add origin <GIT_REPOSITORY_URL>
    git push -u origin --all # Push all branches
    git push origin --tags # Push all tags

    • <GIT_REPOSITORY_URL>: The URL of your new Git repository (e.g., [email protected]:your-username/your-repo.git).

V. Migration Using SubGit (Step-by-Step)

SubGit provides a more streamlined and often faster migration experience. Here’s how to use it:

  1. Install SubGit:

    Download SubGit from the official website (https://subgit.com/). You’ll need a Java Runtime Environment (JRE) installed. Follow the installation instructions provided by SubGit.

  2. Configure SubGit:

    bash
    subgit configure <SVN_REPOSITORY_URL> <GIT_REPOSITORY_PATH>

    * <SVN_REPOSITORY_URL>: The URL of your SVN repository.
    * <GIT_REPOSITORY_PATH>: The local path to the new, empty Git repository that SubGit will create. This is different from git-svn, where the Git repository is created during the clone process. SubGit requires you to specify an existing empty path.

    This command will create a configuration file (subgit/config) within the Git repository directory.

  3. Edit the Configuration File (subgit/config):

    Open the subgit/config file in a text editor. You’ll need to make several adjustments:

    • trunk, branches, tags: If your SVN repository uses a non-standard layout, adjust these paths accordingly. The syntax is slightly different from git-svn.

      [svn]
      url = <SVN_REPOSITORY_URL>
      trunk = trunk:refs/heads/main ; Use 'main' as the main branch
      branches = branches/*:refs/heads/*
      tags = tags/*:refs/tags/*

    • authorsFile: Specify the path to your authors mapping file.

      [auth]
      authorsFile = /path/to/authors.txt

      * excludePath: You can exclude specific paths from the migration.
      * svn:externals Handling SubGit automatically handles svn:externals in one of three ways. You select the method by editing the subgit/config file.
      * externalsImport = none – Ignore svn:externals
      * externalsImport = commit – Import external as a regular commit (default behavior)
      * externalsImport = submodule– Import external as a submodule (requires a configuration file).

  4. Import the SVN Repository:

    bash
    subgit import <GIT_REPOSITORY_PATH>

    This command performs the actual migration. SubGit will fetch the SVN history and convert it to Git. This process can also take time, but it’s generally faster than git-svn.

  5. Configure a Mirror (Optional):
    SubGit can also set up a continuous mirror between the SVN and Git repositories, which allows for incremental migration and a smoother transition period. This is done with the subgit install command. We won’t cover that here, as we’re focusing on a one-time migration.

  6. Verify and Push (Same as with git-svn):
    After the import, follow the same verification steps as described in the git-svn section. Push the repository to your remote Git hosting service.

VI. Handling Large Files with Git LFS

If your SVN repository contains large binary files (e.g., images, videos, compiled binaries), it’s highly recommended to use Git LFS (Large File Storage) to manage them efficiently. Git LFS replaces large files with text pointers in the Git repository, while storing the actual file content on a separate server. This keeps the Git repository small and fast.

  1. Install Git LFS: Download and install Git LFS from https://git-lfs.github.com/.
  2. Initialize Git LFS in Your Repository:

    bash
    git lfs install

  3. Track Large Files:

    Specify the file types you want to track with Git LFS using git lfs track.

    bash
    git lfs track "*.psd" # Track Photoshop files
    git lfs track "*.zip" # Track ZIP archives
    git lfs track "path/to/large/files/*" # Track all files in a directory

    These patterns are stored in a .gitattributes file, which you should commit to your repository.

  4. Commit and Push:

    bash
    git add .gitattributes
    git add . # Add any new or modified large files
    git commit -m "Add large files with Git LFS"
    git push origin --all
    git push origin --tags

    When you push, Git LFS will upload the large files to the LFS server.

VII. Troubleshooting Common Issues

  • git svn clone Fails:

    • Network Connectivity: Ensure you have a stable internet connection and can access the SVN server.
    • Authentication: Verify your SVN credentials. You might need to provide your username and password interactively or use an authentication helper.
    • Repository URL: Double-check the SVN repository URL for typos.
    • Server-Side Issues: The SVN server might be down or experiencing problems.
    • Conflicting Local Changes: Make sure the destination directory is empty or doesn’t contain a Git repository already.
    • Out of Memory: Very large repositories can cause git-svn to run out of memory. Try increasing the available memory or using SubGit.
  • Incorrect Authors:

    • Missing or Incorrect authors.txt: Ensure the authors.txt file is correctly formatted and specified in the git svn clone command.
    • Username Mismatches: Double-check that the usernames in the authors.txt file match the usernames in the SVN repository.
  • Slow Performance:

    • Large Repository: git-svn can be slow for large repositories. Consider using SubGit.
    • Network Speed: A slow network connection can significantly impact performance.
    • Server Load: A heavily loaded SVN server can also slow down the migration.
  • Problems with branches or tags:

    • Ensure you use the correct layout options (--stdlayout or -T, -b, -t).
    • Carefully review the conversion steps for branches and tags.
  • Errors during git filter-repo (or git filter-branch):

    • Syntax Errors: Double-check the command syntax and any regular expressions used.
    • File Not Found: Ensure that any files referenced in the command exist.
    • Backup: Always have a backup of your repository before using git filter-branch or git filter-repo, as these commands rewrite history. git filter-repo is safer, but still requires caution.
  • SubGit Installation or Configuration Issues:

    • Java Version: Ensure you have a compatible Java Runtime Environment (JRE) installed.
    • Configuration File: Carefully review the subgit/config file for any errors.
    • Permissions: Make sure you have the necessary permissions to write to the Git repository directory.

VIII. Post-Migration Steps

  1. Update Workflows: Adapt your development workflows to use Git’s features, such as branching, merging, and pull requests.
  2. Train Your Team: Provide training and resources to help your team members learn Git.
  3. Decommission the SVN Server: Once you’re confident that the migration is successful and your team is comfortable with Git, you can decommission the SVN server. Keep a backup of the SVN repository for archival purposes.
  4. Update CI/CD Pipelines: Update any CI/CD pipelines to point to the new Git repository.
  5. Update Documentation: Update any project documentation that references the SVN repository.
  6. Communicate: Inform all stakeholders that the migration is complete and that they should now use the Git repository.

IX. Conclusion

Migrating from SVN to Git with full history preservation is a significant but achievable undertaking. By carefully planning the migration, choosing the right tools (either git-svn or SubGit), and following the steps outlined in this guide, you can successfully transition to Git and enjoy its many benefits. Remember to thoroughly test and verify the migration before decommissioning your SVN server. The effort invested in a well-executed migration will pay off in the long run with improved developer productivity, collaboration, and a more modern version control system.

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top