Understand gits behavior across multiple remote nodes

This is part of the Semicolon&Sons Code Diary - consisting of lessons learned on the job. You're in the git category.

Last Updated: 2024-11-23

At work, someone resolved a merge conflict in a way that clobbered another person's recent changes, then pushed this out. I undid it on my computer by moving HEAD back to before this change then force pushing this to GitHub.

Now, the other people's git instances had issues like "your branch is ahead of origin/master by X commits" and this persisted rather stubbornly. Here's a deeper dive on the issues.

The general case for 'your branch is ahead of origin/master'

This means you have commits locally that are ahead of (your computer's) view of origin/master. This usually means it's time for you to push.

There is useful information available by inspecting the ALL_CAPS filenames in the .git/ folder of your repo

$ ls .git

# gives (for example):

COMMIT_EDITMSG
HEAD

config
hooks
index
logs
objects
refs
...

Then looking inside on of these ALL CAPS files:

$ cat HEAD

# FYI: refactor_types was the name of the branch I was on at this time.
ref: refs/heads/refactor_types

Watch what happens when I change to the master branch (I was on the refactor_types branch):

$ cat .git/HEAD

# This ref has changed to reflect the branch name.
ref: refs/heads/master

What is the ref/heads/master? Just another file, this time one that contains an actual SHA:

$ cat .git/refs/heads/master

f80510d563a0ed823a1f0fbdc5bb88be1a366d16

FYI: HEAD will be different for each developer, i.e. the current HEAD is local to each repo.

Detached head

At first it seems (wrongly, it will turn out), that HEAD is a pointer that points to the file corresponding to the current branch - which itself points to an SHA. But this denies the existence of a detached head.

What then is a "detached head"?

When HEAD points to a commit that is not the last in a branch. In this case, doing $ cat .git/HEAD gives an SHA instead of a reference to a file containing an SHA:

$ cat .git/HEAD
75ea9be7a3e20fe494b931a1758da0d3bcfcfb7a

A repo has many heads but only one HEAD

Each branch has a named head, and the HEAD will be pointed to one of these. Specifically what you last checked out in your working directory.... this will become the parent of your next commit (if you stay in that branch).

What is ORIG_HEAD?

That's what HEAD was before a rebase/merge or reset started. You can go back to this starting point with: $ git reset --hard ORIG_HEAD

What is FETCH_HEAD?

This is an (often changing) reference to what has just been fetched from remote. If you do a git pull, internally it does a fetch and sets FETCH_HEAD to the tip of this branch. Git pull then uses git merge to merge FETCH_HEAD into the local branch.

While on master:

$ cat .git/FETCH_HEAD

eeae3dc6ea7f4ada8ab9b759c375cfb7c5b2723c        branch 'master' of github.com:zzzlo/thingie_driver_app
a0bd3f25167d0ded0498dcad4db7a3c54cbee1e2    not-for-merge   branch 'new_design' of github.com:zzzlo/thingie_driver_app
62e159f46933793b99643a392c4d7a70d3178cb0    not-for-merge   branch 'new_flow' of github.com:zzzlo/thingie_driver_app

not-for-merge means these will not be merged (here because the local master is up-to-date with the remote master)

If I switch to the branch new_design, I get the same output. However if I run git fetch again (but from this branch) and inspect the FETCH_HEAD , I get slightly different output (showing git's awareness of what branch is to be merged in):

cat .git/FETCH_HEAD

a0bd3f25167d0ded0498dcad4db7a3c54cbee1e2        branch 'new_design' of github.com:zzzlo/thingie_driver_app
eeae3dc6ea7f4ada8ab9b759c375cfb7c5b2723c    not-for-merge   branch 'master' of github.com:zzzlo/thingie_driver_app
...

PSA: Git status does not check remote

So its output about being x commits behind/ahead of remote could be wrong. Keep that in mind.

Git fetch (optionally with dry run)

$ git fetch gives you a local, separate copy of the remote (e.g. origin/master), without affecting your own local branch corresponding to that work. How does it make this possible?

Because git stores remote and local heads separately:

.git/refs/remotes/origin

# vs.

.git/refs/heads

In actual fact, running git branch will give you the same output as ls .git/refs/heads!

These heads/branches can be checked out: git checkout origin/new_design.

However this only puts you in a detached head state. Think of it as basically a read-only branch. Nevertheless, it can be useful to inspect/test this locally before later merging into your local version of the branch. Compare this to the more aggressive situation of git pull, which also does a merge at the same time.

To get all of the remote branches from a named remote, do git fetch remote_name. To see all remote branches from all remotes, do git fetch --all.

# From the `new_design` branch..
$ git fetch --dry-run

=> From github.com:xyz/myapp
   8d6c61b..f454a62  new_flow -> origin/new_flow

Here, even though I started on the new_design branch, the output shows it will take action on the new_flow branch.

To get a list of the commits that were added by remote, run a git fetch then use git log in combination with BRANCH_NAME..origin/BRANCH_NAME

git log --oneline development..origin/development

$ git fetch --dry-run indicates what will happen, and can be useful to avoid making a mess in future.

How to get a specific (writable) branch from a remote that didn't previously exist locally?

One way is git fetch remotename, git checkout remotename/branchname, then git checkout -b branchname. The second checkout command gives a detached head, but the third one gives us a local head so we can actually work on it.

How to undo the last commit but keep things staged?

$ git reset --soft HEAD~

# i.e. reset to the parent commit (HEAD~)

What if you want to keep the changes in the working tree (but un-staged)?

$ git reset --mixed HEAD~

# or (since mixed is the default)

$ git reset HEAD~

How to update all local branches which track remote branches?

You might think git fetch --all, but all this does is create local copies of remote branches. It does not update local branches, even if they track remote ones. (Nor does it create local branches)

What you want instead is this:

$ git pull --all

That updates all tracking local branches.

Finally, if you want to start tracking all remote branches, you can use this loop:

git branch -r | grep -v '\->' | while read remote; do git branch --track "${remote#origin/}" "$remote"; done

What does a colon do when used in a git fetdch reference?

e.g.

$ git fetch origin master:tmp

Renames branches transparently. Here it downloads the master branch from origin and renames it to tmp locally.

How to get a local branch to EXACTLY match a remote one?

While on the branch you want to reset to be exactly like its remote counterpart:

git fetch origin
git reset --hard origin/master

# OR
git reset FETCH_HEAD --hard

# OR (better)
git reset --hard @{upstream}

# OR (masterpiece)
git reset --hard @{u}

Explanation:

{@upstream} is the upstream version of the current branch. This is much more dynamic than typing out origin/master.

How does this differ to a git pull? Git pull forces a merge. And it is this forced merge that can cause issues, the likes of which I prompted me to carry out this investigation.

Warning though: This is a danger zone. Any local commits that have not been pushed will be lost. On my tests (not 100% sure though), the commits weren't even preserved in the ref log. Therefore, to be safe, you should put those extra commits in a separate branch before resetting from upstream:

# For example:

$ git checkout -b old_master
$ git commit -m "keeping for later"
$ git checkout master
# ...now reset from upstream

What is a tracking branch in git?

A local branch that has a direct relationship to a remote one. If you're on a tracking branch, git pull knows which server to fetch from and which branch to merge into. When you clone a repo, master is usually created and it tracks origin/master.

Here's how to get a tracking branch of the remote of new-feature2:

git fetch origin
git checkout -track origin/new-feature2

# Branch 'new-feature2' set up to track remote branch 'newfeature2' from 'origin'.

# And now you are on this newfeature2 branch

A second way:

git checkout --track origin/new-feature2

Ok, what if I want to push a local branch to a remote (e.g. GitHub) for the 1st time? Just pushing fails...

$ git checkout -b test1
$ git push

fatal: The current branch test1 has no upstream branch.
To push the current branch and set the remote as upstream, use git push --set-upstream origin test1

Therefore do as the error message commands (or use the shorthand -u instead):

$ git push -u origin test1

Does moving your head and force-pushing truly remove a commit from remote?

No - it is still accessible via SHA and could be in the remote caches (e.g on GitHub). Therefore to truly delete something (e.g. a password), you need to go further (e.g. change password or, in extreme cases even delete whole repo.)

How to undo the last 3 commits remotely

git push -f origin HEAD^^^:master

# or (equivalently)
git push -f origin HEAD~3:master

Are tools like filter-branch or bfg repo clean safe in collaborative environments?

Not without careful coordination.

Because they rewrite history, they change the SHAs for existing commits that you alter (and any dependent commits). Therefore it's important to merge or close all open full requests before running these commands.

How to chose --theirs for JUST ONE FILE during a merge?

$ git checkout master
$ git merge feature

CONFLICT (Content): merge conflicts in file.js

Say you know that you want their version of file.js. How would you do this?

# The fix
$ git checkout --theirs file.js

This works when rebasing too (just remember to continue with git rebase --continue)