If you ever wondered how Git works under the hood, this is the blog for you. In this series, we’ll demystify Git by building our version control system in Golang. While it won’t be a full Git implementation, we’ll create a working system that handles the core concepts: blobs, trees, commits, and basic commands.
Also, you can follow this tutorial in any language, keep logic the same. I have chosen Golang for my learning purposes.
At the end you will have your minimal git tool with the following commands:
- init (creating our git repository)
- add (adding files to the staged area)
- remove (removing files from staged area)
- commit (committing the staged files)
- branch (listing, creating and deleting branches)
- checkout (switching between branches)
- log (displaying commit history)
So let’s get started!
In this part, we will:
- Setup project in Golang
- Get needed dependencies
- Create a directory structure that we will use
- Implement repository
NOTE: I assume you already have Golang configured on your machine. If you don’t, please check this before moving on.
Project setup
Let’s start by making a directory where we will work
mkdir gitgo && cd gitgo
Now that we have our project directory, let’s initialize our Go module(replace HalilFocic with your username):
go mod init github.com/HalilFocic/gitgo
Now let’s lay out directory structure we will use in this part.
├── go.mod
└── internal
├── config
│ ├── config.go
├── repository
│ ├── repository.go
│ └── repository_test.go
Config
We will use config to define our constants we will use across the app. This way it’s way easier to change something we don’t like instead of changing it in multiple places.
//internal/config/config.go
package config
const (
GitDirName = ".gitgo"
)
Repository
This is our starting point! In this section, we will work on the repository.go
and repository_test.go
files.
So what is a repository? It’s a specialized directory structure that will act as a database for tracking our file changes. Picture it as a self-contained filesystem within your project’s filesystem.
In our gitgo
tool, when we initialize a repository using gitgo init
, we create a .gitgo
directory(.git
in real Git) that serves as the backbone of our version control system.
This directory houses several critical components:
.gitgo/ # Version control root
├── HEAD # Points to the current branch
├── index # Staging area file
├── objects/ # Content storage (Will store blobs, trees, commits)
└── refs/ # References directory
└── heads/ # Branch references
└── main # Main branch reference
How Repository Connects to Git Components
The .gitgo
directory structure we just created will serve as the foundation for our entire version control system. Each directory plays a crucial role in storing different components:
objects/
This directory will store all our content in a content-addressable filesystem:
- Blobs: Raw file contents (like a snapshot of your files)
- Trees: Directory structures linking to blobs and other trees
- Commits: Snapshots of your project, linking to trees and parent commits
Think of it like a database where:
- Files are stored as blobs (file content)
- Directories are stored as trees (structure)
- Commits are stored as pointers to these trees (snapshots)
refs/
The refs directory tracks different versions of your project:
refs/heads/
: Contains branch references- Each branch file contains a hash pointing to the latest commit
- This lets us track multiple lines of development
HEAD
Acts like a pointer to your current location:
- Usually points to a branch (e.g., “ref: refs/heads/main”)
- Tells us which commit we’re currently working on
- Will be used when creating new commits or switching branches
index
The staging area file:
- Tracks what will go into your next commit
- Records file paths, modes, and content hashes
- Bridge between your working directory and repository
This structure allows us to:
- Store file contents efficiently (objects)
- Track project history (commits)
- Maintain different versions (branches)
- Stage changes (index)
Tests first!
We will start with some test that ensure our repositorys init
function behaves like it should and creates all necessary files.
package repository
import (
"github.com/HalilFocic/gitgo/internal/config"
"os"
"path/filepath"
"testing"
)
func TestInitRepository(t *testing.T) {
cwd, err := os.Getwd()
if err != nil {
t.Fatalf("Failed to get current working directory: %v", err)
}
os.RemoveAll(filepath.Join(cwd, config.GitDirName))
t.Run("1.1: Initialize new repository", func(t *testing.T) {
_, err := Init(".")
if err != nil {
t.Fatalf("Failed to initialize repository: %v", err)
}
// Check if .gitgo directory exists
if _, err := os.Stat(config.GitDirName); os.IsNotExist(err) {
t.Errorf("%s directory was not created", config.GitDirName)
}
// Check essential directories
dirs := []string{
config.GitDirName,
filepath.Join(config.GitDirName, "objects"),
filepath.Join(config.GitDirName, "refs"),
filepath.Join(config.GitDirName, "refs/heads"),
}
for _, dir := range dirs {
if _, err := os.Stat(dir); os.IsNotExist(err) {
t.Errorf("Required directory not created: %s", dir)
}
}
// Repository should not initialize if already exists
_, err = Init(".")
if err == nil {
t.Errorf("Should not initialize repository in existing %s directory",config.GitDirName)
}
})
t.Run("1.2: IsRepository validation", func(t *testing.T) {
// Should return true for valid repository
if !IsRepository(".") {
t.Error("IsRepository() returned false for valid repository")
}
// Should return false for non-existent path
if IsRepository("./non-existent-path") {
t.Error("IsRepository() returned true for non-existent path")
}
// Should return false if missing critical directories
// Remove objects directory
os.RemoveAll(filepath.Join(config.GitDirName, "objects"))
if IsRepository(".") {
t.Error("IsRepository() returned true for repository with missing objects directory")
}
// Cleanup and create new repository for next test
os.RemoveAll(config.GitDirName)
Init(".")
// Remove refs directory
os.RemoveAll(filepath.Join(config.GitDirName, "refs"))
if IsRepository(".") {
t.Error("IsRepository() returned true for repository with missing refs directory")
}
})
}
func TestRepositoryPaths(t *testing.T) {
// Get current directory
cwd, err := os.Getwd()
if err != nil {
t.Fatalf("Failed to get current working directory: %v", err)
}
// Clean up and create new repository
os.RemoveAll(filepath.Join(cwd, config.GitDirName))
repo, err := Init(".")
if err != nil {
t.Fatalf("Failed to initialize test repository: %v", err)
}
t.Run("2.1: Test ObjectPath", func(t *testing.T) {
expected := filepath.Join(repo.GitgoDir, "objects")
if repo.ObjectPath() != expected {
t.Errorf("ObjectPath() = %v, want %v", repo.ObjectPath(), expected)
}
})
t.Run("2.2: Test RefsPath", func(t *testing.T) {
expected := filepath.Join(repo.GitgoDir, "refs")
if repo.RefsPath() != expected {
t.Errorf("RefsPath() = %v, want %v", repo.RefsPath(), expected)
}
})
}
func TestIsRepository(t *testing.T) {
cwd, err := os.Getwd()
if err != nil {
t.Fatalf("Failed to get current working directory: %v", err)
}
t.Run("3.1: Valid repository detection", func(t *testing.T) {
// Clean up and create new repository
os.RemoveAll(filepath.Join(cwd, config.GitDirName))
_, err := Init(".")
if err != nil {
t.Fatalf("Failed to initialize test repository: %v", err)
}
if !IsRepository(".") {
t.Error("IsRepository() = false, want true for valid repository")
}
})
t.Run("3.2: Invalid repository detection", func(t *testing.T) {
// Test non-existent directory
if IsRepository("./non-existent") {
t.Error("IsRepository() = true, want false for non-existent directory")
}
// Test incomplete repository structure
os.RemoveAll(filepath.Join(cwd, config.GitDirName, "objects"))
if IsRepository(".") {
t.Error("IsRepository() = true, want false for repository with missing objects directory")
}
})
}
Lots of code, I know. But lets go through each function and explain what it does:
TestInitRepository
- Initialize new repository (1.1)
- Creates a new .gitgo repository in current directory
- Verifies all required directories exist (objects, refs, heads)
- Checks that you can’t initialize a repository where one already exists
- IsRepository validation (1.2)
- Verifies that
IsRepository()
correctly identifies valid repositories - Tests behavior with non-existent paths
- Ensures it detects missing critical directories (objects, refs)
- Verifies that
TestRepositoryPaths
- Test ObjectPath (2.1)
- Verifies correct path construction for objects directory
- Ensures ObjectPath() returns expected full path
- Test RefsPath (2.2)
- Verifies correct path construction for refs directory
- Ensures RefsPath() returns expected full path
TestIsRepository
- Valid repository detection (3.1)
- Confirms
IsRepository()
returns true for properly initialized repositories - Tests with freshly created repository
- Confirms
- Invalid repository detection (3.2)
- Verifies
IsRepository()
returns false for non-existent directories - Ensures it detects incomplete repository structure (missing objects directory)
- Verifies
Lets Code!
This is the fun part. We will now be implementing the Init
, and IsRepository
. Besides that, we will implement ObjectsPath
and RefsPath
helper functions which will just return paths for Objects and Refs directories.
Init
Here is our function signature. If everything goes right, it should return a pointer to the Repository. If not, we will have a non nil
value in error.
func Init(path string) (*Repository, error) {}
Now our init function is going to do the following things:
- Get the absolute path of a given directory
- Check if the repository already exists
- Create a directory structure for the repository
- Create empty files
- Initialize HEAD file
- Return repository object
Here is pseudo-code of functionality we are implementing. Try to do it yourself before looking at the solution:
function init(path):
# Get the absolute path and check if repository already exists
absolutePath = get absolute path(path)
gitgoPath = joinPaths(absolutePath, config.GitDirName)
if directoryExists(gitgoPath):
return error("repository already exists")
# Define all required paths
objectsPath = joinPaths(gitgoPath, "objects")
refsPath = joinPaths(gitgoPath, "refs")
headsPath = joinPaths(gitgoPath, "refs/heads")
mainPath = joinPaths(gitgoPath, "refs/heads/main")
indexPath = joinPaths(gitgoPath, "index")
headPath = joinPaths(gitgoPath, "HEAD")
# Create directory structure
createDirectories:
- gitgoPath
- objectsPath
- refsPath
- headsPath
# Create empty files
createEmptyFiles:
- indexPath
- mainPath
# Initialize HEAD file
writeToFile(headPath, "ref: refs/heads/main\n")
# Return repository object
return newRepository(
path: absolutePath,
gitgoDir: gitgoPath
)
# Note: If any operation fails, return an appropriate error
If you are unfamiliar with Golang std, here are the functions that we will use:
filepath.Abs(path) // Retrieving the absolute path
filepath.Join(path1,path2) // Joinin two paths
os.Stat(path) // Getting data about a file or directory to check if it exists
os.MkdirAll(path,MODE) // Creating directory with all parents
os.Create(path) // Creating file
os.WriteFile(file,content,mode) // Writing content to file
Now let’s start writing in our repository.go
file. First, we will import the needed packages and define our Repository struct:
// internal/repository/repository.go
package repository
import (
"fmt"
"os"
"path/filepath"
"github.com/HalilFocic/gitgo/internal/config"
)
type Repository struct {
Path string
GitgoDir string
}
Then we will implement our Init
function:
func Init(path string) (*Repository, error) {
absPath, err := filepath.Abs(path)
if err != nil {
return nil, err
}
gitGoPath := filepath.Join(absPath,config.GitDirName)
file, err := os.Stat(gitGoPath)
if err == nil && file != nil {
return nil, fmt.Errorf("repository already exists in this directory")
}
objectsPath := filepath.Join(gitGoPath, "objects")
refsPath := filepath.Join(gitGoPath, "refs")
headsPath := filepath.Join(gitGoPath, "refs/heads")
mainPath := filepath.Join(gitGoPath, "refs/heads/main")
indexPath := filepath.Join(gitGoPath, "index")
err = os.MkdirAll(gitGoPath, 0755)
if err != nil {
return nil, err
}
err = os.MkdirAll(objectsPath, 0755)
if err != nil {
return nil, err
}
err = os.MkdirAll(refsPath, 0755)
if err != nil {
return nil, err
}
err = os.MkdirAll(headsPath, 0755)
if err != nil {
return nil, err
}
indexFile, err := os.Create(indexPath)
if err != nil {
return nil, err
}
refMainFile, err := os.Create(mainPath)
if err != nil {
return nil, err
}
indexFile.Close()
refMainFile.Close()
headPath := filepath.Join(gitGoPath, "HEAD")
err = os.WriteFile(headPath, []byte("ref: refs/heads/main\n"), 0644)
if err != nil {
os.RemoveAll(gitGoPath)
return nil, fmt.Errorf("failed to create HEAD file: %v", err)
}
return &Repository{
Path: absPath,
GitgoDir: gitGoPath,
}, nil
}
Understanding File Permissions
In our implementation, we use two permission modes:
- 0755 for directories: The owner can read/write/execute, and others can read/execute
- 0644 for files: The owner can read/write, others can read only
Why Absolute Paths?
We use absolute paths to ensure our repository works correctly regardless of:
- Where the command is run from
- Relative path references
- Symbolic links This prevents inconsistencies when accessing the repository from different locations.
Since we didn’t implement IsRepository, ObjectPath, and RefsPath functions, our tests will fail.
We could just provide function signatures to make tests compile, but let’s implement them since they are way easier than the Init
function.
Our IsRepository
will check if the current path contains the following directories and files:
.gitgo
.gitgo/objects
.gitgo/refs
.gitgo/refs/heads
Here is the pseudo-code for the IsRepository
function:
function isRepository(path):
# Get absolute path of directory
absolutePath = getAbsolutePath(path)
if error:
return false
# Build path to .gitgo directory
gitgoPath = joinPaths(absolutePath, config.GitDirName)
# Required directories to check
requiredDirs = [
".",
"./objects",
"./refs",
"./refs/heads"
]
# Check each required directory exists
for each dir in requiredDirs:
fullPath = joinPaths(gitgoPath, dir)
if not directoryExists(fullPath):
return false
# All directories exist
return true
Now since we already tackled all std functions we need in our Init
function, this should be easier to implement. Here is the code:
func IsRepository(path string) bool {
absPath, err := filepath.Abs(path)
if err != nil {
return false
}
gitGoPath := filepath.Join(absPath,config.GitDirName)
dirs := []string{
".",
"./objects",
"./refs",
"./refs/heads",
}
for _, d := range dirs {
p := filepath.Join(gitGoPath, d)
file, err := os.Stat(p)
if file == nil || err != nil {
return false
}
}
return true
}
Now, the extra easy part. We will implement ObjectPath
and RefsPath
functions. They will just return paths to objects and refs directories for a given repository:
func (r *Repository) ObjectPath() string {
return filepath.Join(r.GitgoDir, "objects")
}
func (r *Repository) RefsPath() string {
return filepath.Join(r.GitgoDir, "refs")
}
If you are unfamiliar with the (r *Repository)
syntax, it is called receiver. It is used to define methods on types. In this case, we are defining methods on the Repository type.
So when we call repo.ObjectPath()
, it will return a path to the objects directory for the given repository.
With that done we can run our tests and see if everything is working as expected.
go test ./internal/repository
If you see all tests passing, congratulations! You have completed part 1 of this series. If not, please double-check your code and see if you missed something.
Since I already provided full code for the repository_test.go
file, here is the full content for the repository.go
file:
package repository
import (
"fmt"
"os"
"path/filepath"
"github.com/HalilFocic/gitgo/internal/config"
)
type Repository struct {
Path string
GitgoDir string
}
func Init(path string) (*Repository, error) {
absPath, err := filepath.Abs(path)
if err != nil {
return nil, err
}
gitGoPath := filepath.Join(absPath, config.GitDirName)
file, err := os.Stat(gitGoPath)
if err == nil && file != nil {
return nil, fmt.Errorf("repository already exists in this directory")
}
objectsPath := filepath.Join(gitGoPath, "objects")
refsPath := filepath.Join(gitGoPath, "refs")
headsPath := filepath.Join(gitGoPath, "refs/heads")
mainPath := filepath.Join(gitGoPath, "refs/heads/main")
indexPath := filepath.Join(gitGoPath, "index")
err = os.MkdirAll(gitGoPath, 0755)
if err != nil {
return nil, err
}
err = os.MkdirAll(objectsPath, 0755)
if err != nil {
return nil, err
}
err = os.MkdirAll(refsPath, 0755)
if err != nil {
return nil, err
}
err = os.MkdirAll(headsPath, 0755)
if err != nil {
return nil, err
}
indexFile, err := os.Create(indexPath)
if err != nil {
return nil, err
}
refMainFile, err := os.Create(mainPath)
if err != nil {
return nil, err
}
indexFile.Close()
refMainFile.Close()
headPath := filepath.Join(gitGoPath, "HEAD")
err = os.WriteFile(headPath, []byte("ref: refs/heads/main\n"), 0644)
if err != nil {
os.RemoveAll(gitGoPath)
return nil, fmt.Errorf("failed to create HEAD file: %v", err)
}
return &Repository{
Path: absPath,
GitgoDir: gitGoPath,
}, nil
}
func IsRepository(path string) bool {
absPath, err := filepath.Abs(path)
if err != nil {
return false
}
gitGoPath := filepath.Join(absPath,config.GitDirName)
dirs := []string{
".",
"./objects",
"./refs",
"./refs/heads",
}
for _, d := range dirs {
p := filepath.Join(gitGoPath, d)
file, err := os.Stat(p)
if file == nil || err != nil {
return false
}
}
return true
}
func (r *Repository) ObjectPath() string {
return filepath.Join(r.GitgoDir, "objects")
}
func (r *Repository) RefsPath() string {
return filepath.Join(r.GitgoDir, "refs")
}
What’s Next?
In Part 2, we’ll implement blob storage:
- Learn how Git stores file contents
- Implement content-addressable storage
- Utilize SHA-1 for hashing
- Use compression and decompression This will form the foundation for storing all our repository content.