Backing up our Bitbucket Git repositories

by | Jan 8, 2020 | Programming, Tech

We are using the cloud version of Bitbucket for our internal projects. A while back I implemented a solution for backing up our Bitbucket Git repositories but since there were some API changes the tool needed adopting. It gave me a good reason to review and clean up the solution. As I went through a few hoops to get it working I thought writing it up would be useful for others.

Photo by Picsaf Com from Pexels

How it works

The solution was designed to be automated. Adding new projects and repositories should automatically be backed up. To that end, I came up with the following flow.

  • List all repositories using the BB API
  • Iterate through the repositories
    • Check if the state of the repository has already been backed up
    • Clone the Git repository
    • Make sure LFS data is downloaded
  • Report success or failure

The Code

My preferred framework and language for such complexity are Node.js + TypeScript. I was considering using a Git wrapper library but when I looked into the needs of the application it was simpler to rely on running git commands.

I also made sure to set up linting to enforce our coding standards. Visual Studio Code has a neat lint extension that supports auto-code fixes that can make a coder’s life easier when identifying and fixing code style issues.

I added basic functions that wrap making HTTP requests (using request.js) and return the results via promises. It allows for using the async keyword which makes the code readable and clean. I am aware there are modules that do this but the code I needed was about four functions. Pulling a library in for that was a bit of an overkill.

To quickly generate placeholder classes, interfaces for the BB response data I used quicktype.io to generate the typescript code from JSON. It is an extremely useful tool to have in your list.

Accessing the Bitbucket API

To access the Bitbucket API for private repositories the requests must be authenticated. The 2.0 version supports OAuth, Tokens and app passwords. The simplest could be using an app password. To set one up simply navigate to your account’s settings and create one. Make sure to write it down as you can only view it once.

To test it simply make a GET request to the following URL. You need your user name, app password and either the team name or your user name specified. I used Postman for this.

https://<USERNAME>:<APP_PASSWORD>@api.bitbucket.org/2.0/teams/<YOUR_TEAM>/repositores
Code language: Shell Session (shell)

Please note the username is not the email you use to log in to bitbucket with! You can find it under your account settings.

It should respond with a JSON structure that, if you have repositories contains the first page of your repository list and can be represented using the following class.

export class RepositoryListResponse { public pagelen: number = 0; public values: Repository[] = []; public next: string = ""; }
Code language: TypeScript (typescript)
Git cubics

Listing Git repositories

The API response is paged so you may need to perform multiple requests to get all of them. It’s really straightforward as the response contains the URL to the next data page. The following code performs the repository list loading using authenticated GET requests.

const repositoryList: Repository[] = []; try { let repoListUri = `https://api.bitbucket.org/2.0/teams/${config.bitBucketTeam}/repositories`; let loadMore = true; // The response is paged, it will contain the link ot the next page // so iterate through those and collect the repos in the collection. while (loadMore) { const pl = await HttpRequest.get<RepositoryListResponse>(repoListUri, { auth: { user: config.bitBucketUser, pass: config.bitBucketAppPassword, } }); if (pl.values != null && pl.values.length > 0) { // Only add the repositories that are using git repositoryList.push(...pl.values=>filter(x => x.scm=="git")); repoListUri = pl.next; } if (pl.next == null) { loadMore = false; } } } catch (err) { console.error(`Unable to get the list of repositories. Error: ${err.message}`); }
Code language: TypeScript (typescript)

The repository data is quite extensive so for the sake of readability I cut it down to the bits that are relevant for this post.

export class Repository { public scm: string = ""; public name: string = ""; public links: RespositoryLinks = new RespositoryLinks(); public updated_on: string = ""; public slug: string = ""; } export class RespositoryLinks { public clone: CloneLink[] = []; } export interface CloneLink { href: string; name: string; }
Code language: TypeScript (typescript)

I use the last modified date stamp to determine if the repo has already been backed up, and fish out the clone URL from the repo.links.clone collection.

function getHttpsCloneUrl(repository: Repository): string { return repository.links.clone.find(x => x.name === "https"); }
Code language: JavaScript (javascript)

At this point, I check if the repository has already been backed up. I simply add the versions to time-stamped folders using the last updated_on value.

I then reconstruct the URL, embed the credentials for basic auth and call the git command line to perform a bare clone.

async function cloneRepo(repo: Repository, cloneLink: string, logger:Logger) { // Reconstruct the URL const regex = /.*@(.*)$/gm; const match = regex.exec(cloneLink); const bitBucketUri = `https://${config.bitBucketUser}:${config.bitBucketAppPassword}@${match[1]}`; const updatedString = moment(repo.updated_on).format("YYYY-MM-DD-hh-mm-ss"); const pathEndSeparator = config.backupRoot.endsWith("/") || config.backupRoot.endsWith("\") ? "" : "/"; const backupLocation = `${config.backupRoot}${pathEndSeparator}${repo.owner.username}/${repo.slug}/${updatedString}`; const jsonfile = `${backupLocation}/__backup_info.json`; try { // Makre sure the target folder exists logger.log(` Esuring folder ${backupLocation}`); await fs.ensureDir(backupLocation); // Exit if the repo is backed up (marker json file is present) // if not emtpy the folder and perform the backup if (fs.existsSync(backupLocation)) { if (fs.existsSync(jsonfile)) { logger.log(`Repo state is already backed up (json file present). ${repo.owner.username}/${repo.slug}/${updatedString}`); return; } logger.log(`Emptying folder ${backupLocation}`); await fs.emptyDir(backupLocation); } // Call git clone... logger.log(`Backing up ${repo.slug}`); logger.log(` Cloning ${cloneLink}`); await Git.bareClone(bitBucketUri, backupLocation); logger.log("Adding backup info file"); // Write the backup marker file const repoString = JSON.stringify(repo, null, 2); fs.writeFileSync(jsonfile, repoString, { encoding: "utf-8" }); logger.log(" DONE"); } catch (ex) { logger.error(`Failed to clone ${bitBucketUri} to ${backupLocation}. Error: ${ex.message}`); } }
Code language: TypeScript (typescript)

Cloning and LFS support

Cloning is performed by performing a git bare clone.

git clone --bare <CLONE_URL> <TARGET_FOLDER>
Code language: Shell Session (shell)

I use an async wrapper function for calling child_process.spawn for handling this.

One thing that was not clear to me is cloning using Git will not get the LFS data. This tends to be important for repositories that contain for example Unity projects.

My first thought was to check if the repo has LFS support enabled in the REST call result but it is not supported yet (Atlassian support created a ticket for it) but it turns out calling LFT fetch fails gracefully for repositories without LFS support.

cd <TARGET_FOLDER> git lfs fetch --all
Code language: HTML, XML (xml)

Summary

It was a great exercise to put together a quick script to backup our data. Authentication and handling LFS data caused a bit of a headache but with a bit of reading, trying and help from Atlassian Support I solved it. And now we have a better backup solution.

Related posts