This post is co-authored by Theodor Marcu, Kyle Conroy and Jess Lin.

Over the past year, we’ve built and refined a feature called Source Control that currently integrates directly with GitHub (with support for more platforms to come).

Along the way, we hit a few gotchas while working with Git and the GitHub API, in particular in the way we built Git trees and made API calls concurrently. In this post we’ll share those lessons. We hope this helps you steer clear of the same pitfalls and expands your understanding of the internals of Git.

Building a version control system

Before we dive into the details of Git and GitHub, we’ll say a few words about Retool Source Control as background.

When we built Retool, we envisioned a tool that makes it much faster for developers to build and ship front-end applications without having to worry about frameworks, UI libraries, and cobbling multiple data sources together. However, as increasingly larger teams started building applications that grew more complex, we realized that users were starting to have trouble with versioning their apps.

In particular, our users started versioning apps manually by duplicating them and appending different numbers and strings to them (e.g., Dashboard-V4-Jane). However, as all programmers know, this does not scale well, and our users started accidentally forking the wrong apps and missing changes from each other.

To alleviate this, we built Retool Source Control, a feature that allows users to make changes to applications using Git branches, commits, and pull requests—all from the Retool UI. Source Control has the added benefit of letting users sync Retool apps from one Retool instance to another, for example from a dev to a staging environment, or a staging to a production environment. This feature is often used in multi-instance on-premise deployments to enable all instances to stay consistent.

Diagram of Retool's Source Control feature

The gotchas

Building a feature like Source Control may sound straightforward. We thought so too! Because Git and GitHub are so ingrained in our everyday workflows, they seem deceptively well-understood. However, as we implemented Source Control, we hit several of gotchas that caused bugs and slowness.

In this post, we’ll first cover two related issues and how we resolved them:

  • Hitting a Git tree object size limit in the GitHub API, and
  • Building Git trees in a (less than) optimal way

We’ll close with a third standalone issue: hitting a secondary rate limit in GitHub’s API.

A motivating incident

As engineers, we know that sometimes, something going very wrong can help us grow. Both of the Git-related issues we’ll talk about in this post came to light following an incident that affected a customer who runs an on-premise Retool instance.

The immediate bug our customer hit was an error every time they tried to commit a change to an app using the Source Control feature. They would click “Commit” in the UI, wait 10+ seconds, and then see a generic failure message.

Debugging with the customer, we ruled out a few hypotheses:

  • The logs showed this was likely not related to hitting an API rate limit in GitHub.
  • We double-checked their configured GitHub credentials and confirmed this was not a problem authenticating to GitHub.
  • We double-checked the environment variables set on the Retool instance and confirmed that Retool and Source Control were properly configured.
  • We made a change directly to the underlying GitHub repository, to check there was not an issue with the repository itself.

On the other hand, we discovered a few enlightening details:

  • The issue was happening to all Retool apps that were opted into Source Control.
  • While the customer only had a total of 67 apps, running find apps -type f | wc -l revealed that there were a total of ~5000 files within the underlying Git repository. This was a fairly high number, and it pointed to the fact that the apps contained many components—since Source Control writes a representation of each component to a single YAML file. (We made this design choice for the same reason you might split up regular code into smaller files: to make the code easier to navigate.)

Together, these details made us suspect that the size of the contents we were trying to commit was to blame. But what were we doing wrong?

Crash course: Git trees

Before we explain further, let’s take a detour to dive into relevant Git internals: in particular, how Git stores information as blobs within trees.

If you use Git, you likely know that a Git repository is a collection of commits. Each commit is a snapshot of what your code looked like at a certain point. This includes the file contents as well as the directory structure (plus other metadata, like who made the commit and when).

What might be less obvious is that under the hood, a commit is composed of two lower-level objects: 1) blobs, arranged in a 2) tree.

  • File contents: A blob in Git is a unique identifier for a file within a commit. It’s computed as a SHA1 hash of the size and file. This means that, for the same file and content, on any computer, Git will always assign the same hash ID to the blob.
  • Directory structure: A tree in Git represents the relationship between blobs, the directory structure of your files. A tree is identified by a SHA1 hash of the blobs and any sub-trees underneath it. This means that, for the same set of blobs and sub-trees, on any computer, Git will always assign the same hash ID to the tree.

The diagram below shows what we described above. A few notes:

  • A blob may remain unchanged between commits—this happens whenever you have a file that isn’t edited in a commit. (For example, test.txt remains unchanged between the second and third commit.) When this happens, Git does not create a new blob, and the tree remains pointing to the same blob between commits.
  • In contrast, test.txt does change between the first commit and the second commit. Accordingly, Git updates the tree to point to a new blob that represents the changed file.
  • This diagram shows a “flat” tree of one level. Note that trees can be nested, with child blobs and subtrees.
Diagram: structure of Git internals (commits, trees, and blobs)
diagram source: docs.github.com/en/rest/guides/getting-started-with-the-git-database-api

Crash course: commits in the GitHub API

Now that we’ve looked at how Git stores information—as commits backed by trees of blobs—let’s take a look at how to create a commit using the GitHub API. Together, these pieces will help us understand why our first Source Control implementation led to an incident.

GitHub’s “Getting started with the Git Database API” guide provides a great introduction to how to make a commit with the API. The guide gives a thorough example of the steps involved:

As an example, if you wanted to commit a change to a file in your repository, you would:
* Get the current commit object
* Retrieve the tree it points to
* Retrieve the content of the blob object that tree has for that particular file path
* Change the content somehow and post a new blob object with that new content, getting a blob SHA back
* Post a new tree object with that file path pointer replaced with your new blob SHA getting a tree SHA back
* Create a new commit object with the current commit SHA as the parent and the new tree SHA, getting a commit SHA back
* Update the reference of your branch to point to the new commit SHA

If we look at the “Create a tree” API docs, we see that this description matches up with the API. We’ll call out one additional detail that we can take advantage of: this API allows you to pass the content of a child component as a content string instead of a blob SHA. This field is useful because it can replace a call to the “Create a blob” API.

Gotcha 1: Git tree object size limit

Let’s go back to where we left off in our debugging story. Our suspicion was that the size of the contents the customer was trying to commit was causing the commits to fail. As a result, we looked at the size of the Git tree that we were sending to the GitHub “Create a tree” API.

Originally, we made a single call to this API, which looked like this:

const resultTree = await this.octokitClient.git.createTree({
    owner: this.owner,
    repo: this.repo,
    tree: [ array of Git tree objects ],
})

As context, the resultTree is then used in the call to create the actual commit:

const gitCommit = await this.octokitClient.git.createCommit({
    owner: this.owner,
    repo: this.repo,
    message: `${options.commitSubject}\n\n${options.commitMessage}`,
    tree: resultTree.data.sha,
    author: author,
    committer: author,
    parents: [options.baseTree],
})

Notably, the [array of Git tree objects] in the call to createTree consisted of the entire Git tree, and we found that when this array got too big, the GitHub API method failed. This failure was unexpected because the GitHub Database API documentation does not provide a hard limit on the size of the trees.

To illustrate what this array of Git tree objects looks like, and to help you understand why it grows with the complexity of a Retool app, here’s how we assemble this array in the Retool codebase:

First, we translate the state of the Retool app that is being committed into an array of “action” objects. These action objects differ depending on whether the Retool app is being created, updated, deleted, or moved from one Retool directory path to another. For example, if an app is being created, the actions generated will look like this:

[
    {
      action: 'create',
      filePath: ‘/the/file/path/app.yml’,
      content: ,
    },
    ...appComponents.map((appComponent) => ({
      action: 'create',
      filePath: ‘/the/file/path/.yml`,
      content:,
    })),
]

If an app is being deleted, the actions generated will look like this:

[
    {
      action: 'delete',
      filePath: ‘/the/file/path/app.yml’,
    },
    ...appComponents.map((appComponent) => ({
      action: 'delete',
      filePath: ‘/the/file/path/.yml`,
    })),
]

As you can see, in both cases, the array scales with the number of components in an app.

Second, we translate this array of action objects into the objects that the GitHub “Create a tree” API expects in the tree field. The logic for this step can be explained by this switch statement:

    switch (action.action) {
      case 'create':
      case 'update':
        treeObjects.push({
          path: action.filePath,
          content: action.content,
          mode: FileModeBlob,
          type: TypeBlob,
        })
        break
      case 'delete':
        treeObjects.push({
          path: action.filePath,
          sha: null,
          mode: FileModeBlob,
          type: TypeBlob,
        })
        break
      case 'move':
        treeObjects.push({
          path: action.previousPath!,
          sha: null,
          mode: FileModeBlob,
          type: TypeBlob,
        })
        treeObjects.push({
          path: action.filePath,
          content: action.content,
          mode: FileModeBlob,
          type: TypeBlob,
        })
        break
      default:
        throw new Error(`Unknown commit action type ${action.action}`)
    }

Note: In the GitHub API, a deletion is marked by setting the sha field to null. We missed this detail in our first reading of the docs. (This was quite an oversight, as we’ll see a bit later.)

Resolution

Now that you’ve seen how we originally constructed the createTree request and why that request increased in size with the number of components in a Retool app, let’s look at how we implemented an immediate fix to avoid hitting the createTree API request size limit.

Instead of making one createTree request, we changed our code to make several createTree requests, each with a smaller chunk of the Git tree.

As we saw earlier, the original single request looked like this:

const resultTree = await this.octokitClient.git.createTree({
    owner: this.owner,
    repo: this.repo,
    tree: [array of Git tree objects],
})

We replaced this with a for-loop that sends partial chunks of the Git tree:

const CHUNK_SIZE = 1000
let tree
const newTree = await generateTree({ ...relevant parameters... })

for (const treeChunk of chunk(newTree, CHUNK_SIZE)) {
    const freshTree = await this.octokitClient.git.createTree({
        owner: this.owner,
        repo: this.repo,
        tree: treeChunk,
        base_tree: tree ? tree.data.sha : undefined,
    })
    tree = freshTree
}

Notably, in the new for-loop:

  • We take advantage of the base_tree parameter in the createTree API to define the baseline to extend as we progressively add tree objects.
  • Retool will now send chunks of at most 1000 tree objects, which keeps the total size of the request below the limit we observed.

Gotcha 2: Building Git trees (less than) optimally

While the first resolution enabled the customer to make commits, there was a second problem: Git operations in Retool could still be quite slow, taking up to 15 seconds in larger Git repositories. In fact, we expected our first resolution to make Git operations slower, because we were now making several createTree API calls instead of just one. We needed to find ways to speed things up.

When trying to improve the performance of a system, you want to understand the individual steps that are happening. So, we inspected how we were building the Git trees for these operations.

Our investigation showed we were doing much more work building Git trees than we needed to.

As we noted earlier, we originally missed the fact that the GitHub API lets us delete a file by setting the sha field of the corresponding tree object to null. Thus, we mistakenly thought that in order to delete files, we had to recreate an entire new Git tree that did not include the files we wanted to delete—then overwrite the entire old tree.

This led us to write code that built the Git tree for the entire repository from scratch whenever someone tried to run “Commit” on a single Retool app. In other words, if you had 50 Source-Controlled apps and were editing one app, our code would build the Git tree consisting of the 50 apps from scratch every time you tried to commit changes to a single app.

This diagnosis explained why commit operations got slower as a customer developed more Source-Controlled apps, thereby growing the size of their overall Git tree.

Resolution

We now realized it was possible with the GitHub API to create commits from building the git tree for the files in the one app being edited, as opposed to files from (cough) all the apps in the repository. This would most certainly speed up commit operations.

To implement this fix, we considered the three categories of operations that generate distinct patterns of tree-building:

  • Adding Source Control versioning to an app: this means all files for this app will be added for the first time to the repository
  • Removing Source Control versioning from an app: this means all files for this app will be removed from the repository
  • Pushing and committing changes to a Source-Controlled app: this means some files for this app may be added, some may be removed, and some may be edited

All three of these categories can be handled by one algorithm for building the new Git tree:

  1. Pull down the most recently-committed Git tree. This will be the base_tree for the upcoming call to the “Create a Tree” GitHub API. Note: This Git tree represents the state of the entire Git repository during the last commit.
  2. Build a Git tree from only the files for the updated, not-yet-committed version of the one Retool app being edited. This step handles new and updated files in said Retool app.
  3. Then, to handle files that are deleted from the Retool app:
  • Take the list of file names for the app in the most recent commit.
  • Compare this with the list of file names for the app in the current, edited version, and look for the files that no longer exist.
  • Add the files that no longer exist to the Git tree from Step 2, but mark these as deleted by setting the sha field to null in the tree object.
  1. Call the “Create a Tree” API with the tree field set to the new tree built in Steps 2 and 3, and base_tree set to the tree from Step 1.

These changes together sped up commit operations up to 3x in our testing on large repositories.

Gotcha 3: Secondary rate limit in the GitHub API

The third gotcha we ran into was more straightforward to fix but trickier to track down.

We discovered this issue when a different customer reported that their on-premise Retool instance was not pulling in the latest changes from their Source Control GitHub repository. We observed that the instance would start to sync down the latest changes but suddenly fail before it finished. Based on this behavior, we knew it was not an authentication problem, and we suspected we were hitting some sort of an API rate limit.

Debugging this issue was challenging because we didn’t have direct access to the Retool instance. A further complication was that Retool will retry the Source Control sync if it fails, which by itself created a head-fake: the code retried enough times to trigger GitHub’s primary API rate limit, which did appear in the logs but was not ultimately the correct root cause. Adding insult to injury and further impeding our ability to iterate quickly, we had to wait an hour for the primary API limit to reset each time we retried the sync.

Resolution

We found the fix by once again digging into the GitHub API docs. We discovered that making “multiple concurrent requests” can result in hitting a secondary rate limit. Further digging led us to a specific section on “Dealing with secondary rate limits” that states, “Do not make requests for a single user or client ID concurrently.”

We originally thought we were protected from hitting rate limits because we were using the official GitHub Octokit API clients, which have built-in protection against hitting primary API limits. Our mistake was in wrapping these client API calls in a Promise.all(), thereby firing requests concurrently.

Concretely, our code used to look like this (WRONG!):


await Promise.all(
    paths.map((appPath) => {
        return syncPage({
            githubClient,
            organization,
            commitObject: latestCommitObject,
            ...etc...
        })
    }),
)

The fix was to swap the Promise.all() for a simple for-loop.


for (const appPath of paths) {
    await syncPage({
        githubClient,
        organization,
        commitObject: latestCommitObject,
        ...etc...
    })
}

Summary: faster, more resilient Git operations

We’ve been fortunate to debug, fix issues, and learn alongside our customers, and together improve the performance of Git operations in Retool. We’ve also definitely learned to closely read the GitHub API docs.

To recap, three specific highlights you can take away from our experience are:

  • You can use the base_tree parameter in the “Create a tree” GitHub API to progressively build a new git tree, and thus avoid hitting a request size limit
  • You can set the sha field of a file to null in the “Create a tree” GitHub API, to mark it as deleted
  • Calling GitHub APIs concurrently can cause you to hit secondary rate limits. Thus, avoid wrapping multiple requests to the GitHub API in a Promise.all()

With all this learning in tow, we've been able to build a performant and effective source control system for Retool apps. We hope the lessons we learned can help you sidestep the same problems when working with Git and the GitHub API.

Questions? Comments? Tweet to us @retool (or @theomarcu @kyle conroy @jesstyping)!