To Squash or Not to Squash?

That Is the Question. And Divya has the answers.

The process of integrating work into the main code base differs across companies and projects. In the debate over best practices, two camps emerge. On the one hand, there are those who squash multiple commits down to a handful of commits, and on the other hand, there are those who merge commits as is. The former involves re-writing history while the latter does not. While both options are equally useful depending on the situation and project, most teams tend to prefer one practice over the other.

Rebase

A rebase changes the original base commit of a branch. This usually happens when a tracking branch points to an obsolete base as a result of other team members’ active contributions to the main branch of development. In the process of a rebase, your commits are temporarily removed so that the base commit can be updated with upstream changes and your commits are then applied on top of the updated base commit. The diagram below visualizes this process. The sole purpose of a rebase is to maintain a linear history while integrating upstream changes into your local branch.

Diagram 1

(Source: https://www.atlassian.com/git/tutorials/rewriting-history/git-rebase)

Rebases are also handy in the context of cleaning up commits. Here at Sparkbox, one of the workflows we adopt is to “commit early and often.” As we incrementally make changes to the codebase, we create work in progress (WIP) commits. This ensures we cover our bases in case we unintentionally nuke a branch or override local changes without remembering to stash them. This, however, leaves us with the unfavorable side effect of as many as 20 commits in a single branch. In order to prevent polluting the codebase with all the cruft of extraneous commits, it is good practice to squash commits down. Rebasing before merging thereby allows us to “hide the sausage making”; in other words, commits are condensed into concise feature or fix commits to keep history tidy. Doing so furthermore associates commits with particular code changes such that the role of each commit and the trajectory of the project becomes strikingly clear. This gives your team members more clarity over the intent and purpose of individual commits when weeding through history in a git bisect or when testing out specific features in isolation via a cherry pick.

git rebase -i origin/master

Merge

Merging a branch involves pulling one branch into another branch while keeping the original branching structure intact. In the name of verbose history, a separate merge commit is recorded (**assuming the source branch has changed since you branched off or you appended your merge command with a --no-ff flag). Commits are thereby grouped as part of an encompassing feature or fix branch, and branches remain visible long after a branch has been deleted. This is especially useful when visibility and traceability is valued by your team in the long run.

**Note: Ordinarily, when you’re merging a branch back into an unchanged source branch (no commits have been made since branching), git adds your commits on top of the source branch without creating a merge commit or preserving the branch being merged. This is called a fast-forward merge [Refer to Diagram 1]. Some teams prefer to keep branch histories intact and add a --no-ff (no fast forward) flag to the merge command so it forces git to create a commit on merge. This is called a three-way merge.

                    

(Source: https://www.atlassian.com/pt/git/tutorial/git-branches#!merge)

Disclaimer: History may have been re-written in the making of this commit

“When history is rewritten in a shared branch touched by multiple developers breakage happens.”

- Nicola Paolucci

In theory, if done correctly, rebase works well when working alone. When working on a team with shared branches, however, it becomes much more complicated. One must be extremely careful when choosing to rebase while on a shared branch, or else the branch may break in a manner that can be maddening and time consuming to fix. This is especially the case when trying to rebase while a co-worker is reviewing a PR. Rebasing and force pushing changes while in the middle of a PR re-writes public history, causes commits to unexpectedly vanish, and can compound in code conflicts to PR reviewers mid-review. It also adds extra work to the reviewer who has to repeatedly delete his/her local branch copy and fetch your branch every time you rebase. Rebasing while a co-worker is actively reviewing your PR should be avoided at all costs. A better workflow would be to push changes marked review mid-PR and squash them down into your existing commits once your code has been fully reviewed and is ready to be merged. Something to be doubly wary of when rebasing is erasing contributors from branch history in the name of clean history. Just as you would (and should) be perceptive of giving positive feedback in code reviews, be mindful of your fellow authors contributions, and remember to give credit where credit is due.

(Source: https://www.atlassian.com/git/articles/git-team-workflows-merge-or-rebase/)

Only you can prevent messed up history!

“ Both of these commands are designed to integrate changes from one branch into another branch—they just do it in very different ways.”

- Atlassian Docs

There are obvious benefits and consequences when choosing to do either a rebase or a merge; the unifying question is whether to re-write history. Try as we might, the decision to do one over the other is neither absolute nor universal and differs on a case-by-case basis. Take for example that you’re working independently on a branch that has a shared branch as its source. The source branch is continuously getting updated with work from the other members of your team, and you constantly have to pull from the source in order to keep your branch up-to-date. In this case, you would probably want to prevent unnecessary merge commits from polluting and bloating your branch history. A rebase while pulling in changes (git pull --rebase) might make more sense in this case as the branch as the base commit is continually being updated and your commit graph is kept clean and tidy. In another instance—if a single branch represented work for a significant feature request—a merge may be more appropriate so as to ensure full and complete visibility of core work.

So which comes first? Traceability or Linearity?

There is ultimately no correct answer to the question of what is the best git workflow. The key question to ask yourselves and your teams is, which do you value: traceable branch history or tidy and linear history? Regardless of which option is chosen, git protects us from our own missteps while also providing us with the flexibility to entirely control our own workflows—Git, be all [our] sins remember’d.

Further Reading: