As a software developer, I’ve used Git every day in my work. However, I realized that while I understood how to use Git’s commands and features, I did not fully understand how it worked under the hood. I asked myself what would be the best way to understand all of this?
The answer was simple. Build my own minimal version of Git. So I did!
Goals
I had three main objectives that I wanted to grasp:
- How Git stores data on disk and manages it (blobs, trees, commits)
- How branching works in Git
- The algorithms and data structures that power Git
Takeaways
I was prepared to be happy if I grasped the three concepts above. What I didn’t know was how much more knowledge I would get from doing something like this. So let’s go through it.
Getting comfortable writing Go
Building Gitgo pushed me to better understand Go’s strengths. Working with bytes, binary data, and file systems revealed Go’s powerful standard library and way of working. I became comfortable with readers, writers, error handling and structuring Go projects.
Content-addressable storage
Those encrypted-looking files in the .git
directory became super familiar. Now I knew why there are so many directories with 2 letters/digits in the objects directory.
Creating branches, jumping to commits and other operations became clear. I knew what was happening under the hood, which was the goal of this project.
And one question I always wanted to know got answered:
How doesn’t the .git
directory become so large with keeping all the changes?
It turns out making SHA hashes based on modified_at date and content allows us to use the same blob in many commits which saves a lot of disk space. A new blob will only be created if there was an actual change in the file.
Binary file formats
This was a big shift. Before, when I opened binary-encoded files all I saw was intimidating hex dumps.
The project gave me confidence in working with binary formats. Now when I look at file format specifications, I see structured data waiting to be parsed, not just mysterious bytes. This knowledge extends beyond Git - whether it’s parsing Git files or PDF files, the principles remain the same.
Trees are awesome
Seeing how a simple tree structure could represent entire directory states was a joy. The recursive nature of trees (they can store other trees) helped me create an elegant solution for storing directory structure.
The magic moment was implementing checkout functionality - watching a tree object rebuild an entire working directory from seemingly simple hash references.
Conclusion
This project was incredibly eye-opening. Git went from a tool I used to a tool I truly understand. The project pushed me to learn way more than expected. I learned about file systems, binary formats, cryptographic hashing, trees, etc.
To any developer wondering how Git works, I’d say: Build your own. The journey will teach you not just about Git, but about fundamental computer science concepts that make you a better programmer. The satisfaction of seeing your first commit, branch or checkout work is just unmatched. Most importantly, it reminded me why I love programming - the joy of understanding complex systems by building them yourself.