25 min read
Write your git - Part 7: Command Line Interface

First of all, I just wanna share one thing that poped in my mind. This blog series is kinda like Harry Potter books. It started super simple and evolved where each chapter built on top of the previous one. Now we are in the endgame, combining everything we have built so far so we can test the tool.

In our previous chapter, we implemented Git’s references system, allowing us to create human-readable names for commits and track branches. Now, it’s time to bring everything together by building a command-line interface that makes our Git clone usable in everyday scenarios.

By the end of this chapter, we’ll have a fully functional gitgo command that supports:

  • Initializing repositories (init)
  • Adding files to the staging area (add)
  • Removing files from staging (remove)
  • Creating commits (commit)
  • Managing branches (branch)
  • Switching between branches (checkout)
  • Viewing commit history (log)

Let’s transform our collection of libraries into a complete version control tool!

Command Line Interface Design

Before we dive into implementation, let’s think about how our CLI should work. Command-line tools typically follow a pattern like:

command [subcommand] [options] [arguments]

For our gitgo tool, this pattern would look like:

gitgo init # initializes repository
gitgo add file.txt # adds file to staging area
gitgo commit -m "Initial commit" # commits the files inside staging area
gitgo branch -c feature # create new branch called 'feature'
gitgo checkout feature # switches to that branch
gitgo log # logs commit history

Each subcommand corresponds to a different Git operation and may have its own options. We’ll need a way to parse command-line arguments, validate them, and call the appropriate functions from our existing packages.

Project Structure

Let’s update our project structure to include the command-line interface:

gitgo/
├── cmd/                 # NEW DIRECTORY
│   └── gitgo/           # CLI entry point
│       └── main.go      # Main package
├── go.mod
└── internal/
    ├── blob/            # From part 2
    │   ├── blob.go
    │   └── blob_test.go
    ├── config/          # From part 1
    │   └── config.go
    ├── repository/      # From part 1
    │   ├── repository.go
    │   └── repository_test.go
    ├── staging/         # From part 3
    │   ├── staging.go
    │   └── staging_test.go
    ├── tree/            # From part 4
    │   ├── tree.go
    │   └── tree_test.go
    ├── commit/          # From part 5
    │   ├── commit.go
    │   └── commit_test.go
    ├── refs/            # From part 6
    │   ├── refs.go
    │   └── refs_test.go
    └── commands/        # NEW DIRECTORY
        ├── add.go
        ├── branch.go
        ├── checkout.go
        ├── commit.go
        ├── log.go
        └── remove.go

The main addition here is the cmd directory, which will contain our entry point main.go, and the internal/commands directory, which will contain individual command implementations.

Implementation Approach

For our CLI implementation, we’ll follow this approach:

  1. Create a command pattern using Go’s standard flag package
  2. Implement each command as a separate struct with an Execute() method
  3. Create a main function that parses arguments and calls the appropriate command
  4. Connect each command to our existing functionality from previous chapters

This design will keep our code modular and make it easy to add new commands in the future.

Command Pattern

Let’s start by implementing our command pattern. Each command will have a consistent structure that makes them easy to use from our main function.

First, let’s look at how our main function will work:

// cmd/gitgo/main.go
package main

import (
	"flag"
	"fmt"
	"github.com/HalilFocic/gitgo/internal/commands"
	"github.com/HalilFocic/gitgo/internal/repository"
	"os"
)

func main() {
	if len(os.Args) < 2 {
		fmt.Println("usage: gitgo <command> [<args>]")
		os.Exit(1)
	}

	cwd, err := os.Getwd()
	if err != nil {
		fmt.Printf("error: %v\n", err)
		os.Exit(1)
	}

	switch os.Args[1] {
	case "init":
		initCmd := flag.NewFlagSet("init", flag.ExitOnError)
		initCmd.Parse(os.Args[2:])
		_, err := repository.Init(cwd)
		if err != nil {
			fmt.Printf("error: %v\n", err)
			os.Exit(1)
		}
		fmt.Printf("Initialized empty gitgo repository in %s\n", cwd)

	case "add":
		addCmd := flag.NewFlagSet("add", flag.ExitOnError)
		addCmd.Parse(os.Args[2:])
		if addCmd.NArg() < 1 {
			fmt.Println("error: path required for 'add'")
			os.Exit(1)
		}
		for _, path := range addCmd.Args() {
			cmd := commands.NewAddCommand(cwd, path)
			if err := cmd.Execute(); err != nil {
				fmt.Printf("error: %v\n", err)
				os.Exit(1)
			}
		}
		fmt.Printf("Successfully added %d files to index.\n", len(addCmd.Args()))

	case "remove":
		rmCmd := flag.NewFlagSet("remove", flag.ExitOnError)
		rmCmd.Parse(os.Args[2:])
		if rmCmd.NArg() < 1 {
			fmt.Println("error: path required for 'remove'")
			os.Exit(1)
		}
		for _, path := range rmCmd.Args() {
			cmd := commands.NewRemoveCommand(cwd, path)
			if err := cmd.Execute(); err != nil {
				fmt.Printf("error: %v\n", err)
				os.Exit(1)
			}
		}
		fmt.Printf("Successfully removed %d files from index.\n", len(rmCmd.Args()))

	case "commit":
		commitCmd := flag.NewFlagSet("commit", flag.ExitOnError)
		message := commitCmd.String("m", "", "commit message")
		commitCmd.Parse(os.Args[2:])
		if *message == "" {
			fmt.Println("error: -m flag required")
			os.Exit(1)
		}
		cmd := commands.NewCommitCommand(cwd, *message, "User <user@example.com>")
		if err := cmd.Execute(); err != nil {
			fmt.Printf("error: %v\n", err)
			os.Exit(1)
		}
		fmt.Printf("Commit added successfully\n")

	case "branch":
		branchCmd := flag.NewFlagSet("branch", flag.ExitOnError)
		create := branchCmd.Bool("c", false, "create new branch")
		delete := branchCmd.Bool("d", false, "delete branch")
		branchCmd.Parse(os.Args[2:])

		if *create && branchCmd.NArg() == 1 {
			cmd := commands.NewBranchCommand(cwd, branchCmd.Arg(0), "create")
			if err := cmd.Execute(); err != nil {
				fmt.Printf("error: %v\n", err)
				os.Exit(1)
			}
		} else if *delete && branchCmd.NArg() == 1 {
			cmd := commands.NewBranchCommand(cwd, branchCmd.Arg(0), "delete")
			if err := cmd.Execute(); err != nil {
				fmt.Printf("error: %v\n", err)
				os.Exit(1)
			}
		} else {
			cmd := commands.NewBranchCommand(cwd, "", "list")
			if err := cmd.Execute(); err != nil {
				fmt.Printf("error: %v\n", err)
				os.Exit(1)
			}
		}

	case "checkout":
		checkoutCmd := flag.NewFlagSet("checkout", flag.ExitOnError)
		checkoutCmd.Parse(os.Args[2:])
		if checkoutCmd.NArg() != 1 {
			fmt.Println("error: branch name or commit hash required")
			os.Exit(1)
		}
		cmd := commands.NewCheckoutCommand(cwd, checkoutCmd.Arg(0))
		if err := cmd.Execute(); err != nil {
			fmt.Printf("error: %v\n", err)
			os.Exit(1)
		}

	case "log":
		logCmd := flag.NewFlagSet("log", flag.ExitOnError)
		maxCount := logCmd.Int("n", -1, "limit number of commits")
		logCmd.Parse(os.Args[2:])

		cmd := commands.NewLogCommand(cwd, *maxCount)
		if err := cmd.Execute(); err != nil {
			fmt.Printf("error: %v\n", err)
			os.Exit(1)
		}

	default:
		fmt.Printf("Unknown command: %s\n", os.Args[1])
		os.Exit(1)
	}
}

Let’s break down this main function:

  1. Command-line Argument Handling:

    • Checks if at least one argument is provided
    • Retrieves the current working directory
    • Uses a switch statement to handle different subcommands
  2. Flag Parsing:

    • For each command, creates a new FlagSet to parse command-specific flags
    • Handles both flags (like -m for commit messages) and arguments (like file paths)
  3. Command Execution:

    • Creates command objects using constructors like NewAddCommand
    • Calls the Execute() method on each command
    • Handles errors and provides user feedback

The flag package is part of Go’s standard library and provides a simple way to define and parse command-line flags.

Command Implementation

Now let’s implement each command. We’ll implement them as separate packages in the internal/commands directory.

The Add Command

The add command adds files to the staging area. Let’s implement it:

// internal/commands/add.go
package commands

import (
	"fmt"
	"github.com/HalilFocic/gitgo/internal/staging"
)

type AddCommand struct {
	rootPath string
	path     string
}

func NewAddCommand(rootPath, path string) *AddCommand {
	return &AddCommand{
		rootPath: rootPath,
		path:     path,
	}
}

func (c *AddCommand) Execute() error {
	index, err := staging.New(c.rootPath)
	if err != nil {
		return fmt.Errorf("failed to read staging area: %v", err)
	}

	if err := index.Add(c.path); err != nil {
		return fmt.Errorf("failed to add %s: %v", c.path, err)
	}

	return nil
}

This implementation:

  1. Creates a command struct that holds the repository path and the file path
  2. Provides a constructor function NewAddCommand
  3. Implements an Execute() method that:
    • Creates a new staging index
    • Adds the specified file to the index
    • Returns any errors encountered

The staging.New and index.Add functions are the ones we implemented in Chapter 3. But to freshen our memory:

  • staging.New: creates index file if not present or reads current state of staging and returns index with data.
  • index.Add: Adds entry to index and writes new state of index to disk.

The Remove Command

The remove command removes files from the staging area:

// internal/commands/remove.go
package commands

import (
	"fmt"

	"github.com/HalilFocic/gitgo/internal/staging"
)

type RemoveCommand struct {
	rootPath string
	path     string
}

func NewRemoveCommand(rootPath, path string) *RemoveCommand {
	return &RemoveCommand{
		rootPath: rootPath,
		path:     path,
	}
}

func (c *RemoveCommand) Execute() error {
	index, err := staging.New(c.rootPath)
	if err != nil {
		return fmt.Errorf("failed to read staging area: %v", err)
	}
	if err := index.Remove(c.path); err != nil {
		return fmt.Errorf("failed to remove file %s from index: %v", c.path, err)
	}
	return nil
}

This follows the same pattern as the add command but calls index.Remove instead of index.Add. Nothing much here to explain, so we will move onto commit.

The Commit Command

The commit command creates a new commit from the staged files:

//internal/commands/commit.go
package commands

import (
	"fmt"
	"io/fs"
	"os"
	"path/filepath"
	"strings"

	"github.com/HalilFocic/gitgo/internal/commit"
	"github.com/HalilFocic/gitgo/internal/config"
	"github.com/HalilFocic/gitgo/internal/staging"
	"github.com/HalilFocic/gitgo/internal/tree"
)

type CommitCommand struct {
	rootPath string
	message  string
	author   string
}

func NewCommitCommand(rootPath, message, author string) *CommitCommand {
	return &CommitCommand{
		rootPath: rootPath,
		message:  message,
		author:   author,
	}
}

func (c *CommitCommand) Execute() error {
	index, err := staging.New(c.rootPath)
	if err != nil {
		return fmt.Errorf("failed to read staging area: %v", err)
	}

	entries := index.Entries()
	if len(entries) == 0 {
		return fmt.Errorf("nothing to commit, staging area is empty")
	}
	headContent, err := os.ReadFile(filepath.Join(c.rootPath, config.GitDirName, "HEAD"))
	if err != nil {
		return err
	}

	headRef := strings.TrimSpace(string(headContent))
	if !strings.HasPrefix(headRef, "ref: refs/heads/") {
		return fmt.Errorf("invalid HEAD format")
	}

	branchName := strings.TrimPrefix(headRef, "ref: refs/heads/")
	branchPath := filepath.Join(c.rootPath, config.GitDirName, "refs", "heads", branchName)

	var previousTreeHash string
	parentHash := ""

	if previousCommitHash, err := os.ReadFile(branchPath); err == nil {
		parentHash = strings.TrimSpace(string(previousCommitHash))

		if parentHash != "" {
			previousCommit, err := commit.Read(filepath.Join(c.rootPath, config.GitDirName, "objects"), parentHash)
			if err != nil {
				return fmt.Errorf("failed to read previous commit :%v", err)
			}
			previousTreeHash = previousCommit.TreeHash

		}
	}
	objectsPath := filepath.Join(c.rootPath, config.GitDirName, "objects")
	combinedRoot := c.combineTreeWithStaged(previousTreeHash, entries, objectsPath)
	treeHash, err := c.createTreeFromNode(combinedRoot, objectsPath)
	if err != nil {
		return fmt.Errorf("failed to create tree: %v", err)
	}

	if hash, err := os.ReadFile(branchPath); err == nil {
		parentHash = strings.TrimSpace(string(hash))
	}

	newCommit, err := commit.New(treeHash, parentHash, c.author, c.message)
	if err != nil {
		return fmt.Errorf("failed to create commit: %v", err)
	}

	commitHash, err := newCommit.Write(objectsPath)
	if err != nil {
		return fmt.Errorf("failed to write commit :%v", err)
	}

	if err := os.WriteFile(branchPath, []byte(commitHash), 0644); err != nil {
		return fmt.Errorf("failed to update branch reference: %v", err)
	}
	index.Clear()
	return nil
}

type pathNode struct {
	files    map[string]staging.Entry
	children map[string]*pathNode
}

func NewPathNode() *pathNode {
	return &pathNode{
		files:    make(map[string]staging.Entry),
		children: make(map[string]*pathNode),
	}
}


func getFileMode(mode fs.FileMode) int {
	if mode&fs.ModeDir != 0 {
		return tree.DirectoryMode
	}
	if mode&0111 != 0 {
		return tree.ExecutableMode
	}
	return tree.RegularFileMode
}

func (c *CommitCommand) createTreeFromNode(node *pathNode, objectsPath string) (string, error) {
	t := tree.New()

	for dirName, childNode := range node.children {
		childHash, err := c.createTreeFromNode(childNode, objectsPath)
		if err != nil {
			return "", fmt.Errorf("failed to create tree for %s: %v", dirName, err)
		}

		err = t.AddEntry(dirName, childHash, tree.DirectoryMode)
	}
	for fileName, entry := range node.files {
		err := t.AddEntry(fileName, entry.Hash, getFileMode(entry.Mode))
		if err != nil {
			return "", fmt.Errorf("failed to add entry %s: %v", fileName, err)
		}
	}

	hash, err := t.Write(objectsPath)
	if err != nil {
		return "", fmt.Errorf("failed to write tree: %v", err)
	}
	return hash, nil
}

func (c *CommitCommand) combineTreeWithStaged(previousTreeHash string, stagedEntries []*staging.Entry, objectsPath string) *pathNode {
	root := NewPathNode()

	if previousTreeHash != "" {
		previousTree, err := tree.Read(objectsPath, previousTreeHash)
		if err != nil {
			fmt.Printf("Warning: could not read previous tree: %v\n", err)
		} else {
			for _, entry := range previousTree.Entries() {
				if entry.Mode == tree.DirectoryMode {
					c.addTreeEntriesToPathNode(root, entry.Name, entry.Hash, objectsPath)
				} else {
					root.files[entry.Name] = staging.Entry{
						Path: entry.Name,
						Hash: entry.Hash,
						Mode: fs.FileMode(entry.Mode),
					}
				}
			}
		}
	}

	for _, entry := range stagedEntries {
		parts := strings.Split(entry.Path, "/")
		current := root

		for i := 0; i < len(parts)-1; i++ {
			dirName := parts[i]
			if _, exists := current.children[dirName]; !exists {
				current.children[dirName] = NewPathNode()
			}
			current = current.children[dirName]
		}

		filename := parts[len(parts)-1]
		current.files[filename] = *entry
	}

	return root
}

func (c *CommitCommand) addTreeEntriesToPathNode(root *pathNode, prefix string, treeHash string, objectsPath string) {
	subtree, err := tree.Read(objectsPath, treeHash)
	if err != nil {
		fmt.Printf("Warning: could not read subtree %s: %v\n", treeHash, err)
		return
	}

	for _, entry := range subtree.Entries() {
		fullPath := filepath.Join(prefix, entry.Name)

		if entry.Mode == tree.DirectoryMode {
			c.addTreeEntriesToPathNode(root, fullPath, entry.Hash, objectsPath)
		} else {
			root.files[fullPath] = staging.Entry{
				Path: fullPath,
				Hash: entry.Hash,
				Mode: fs.FileMode(entry.Mode),
			}
		}
	}
}

Breaking Down the Execute Method

The Execute() method performs several critical operations that mirror Git’s internal commit process:

  1. Reading the Staging Area:
index, err := staging.New(c.rootPath)
entries := index.Entries()

First, we load the current staging area (index) which contains all files that have been staged with git add. These are the files that will be included in our commit.

  1. Validation Checks:
if len(entries) == 0 {
    return fmt.Errorf("nothing to commit, staging area is empty")
}

We verify there’s actually something to commit. This is why Git won’t let you create empty commits by default.

  1. Determining the Current Branch:
headContent, err := os.ReadFile(filepath.Join(c.rootPath, config.GitDirName, "HEAD"))
headRef := strings.TrimSpace(string(headContent))
branchName := strings.TrimPrefix(headRef, "ref: refs/heads/")

Git records the current branch in the HEAD file. We read this file to determine which branch we’re committing to.

  1. Finding the Parent Commit:
if previousCommitHash, err := os.ReadFile(branchPath); err == nil {
    parentHash = strings.TrimSpace(string(previousCommitHash))

    if parentHash != "" {
        previousCommit, err := commit.Read(filepath.Join(c.rootPath, config.GitDirName, "objects"), parentHash)
        previousTreeHash = previousCommit.TreeHash
    }
}

For non-first commits, we need to establish the parent commit (creating the commit chain that forms Git history). We also need the previous tree to properly handle files that haven’t changed.

  1. Building the Tree Structure:
combinedRoot := c.combineTreeWithStaged(previousTreeHash, entries, objectsPath)
treeHash, err := c.createTreeFromNode(combinedRoot, objectsPath)

We take the staged files and the previous tree structure and merge them to create a new tree structure representing the complete state of the repository at commit time.

  1. Creating the Commit:
newCommit, err := commit.New(treeHash, parentHash, c.author, c.message)
commitHash, err := newCommit.Write(objectsPath)

We create a commit object that points to our new tree structure, records the author information, commit message, and links back to the parent commit.

  1. Updating the Branch Reference:
if err := os.WriteFile(branchPath, []byte(commitHash), 0644); err != nil {
    return fmt.Errorf("failed to update branch reference: %v", err)
}

The final step is to update the branch to point to our new commit, effectively advancing the branch forward.

  1. Cleaning up:
index.Clear()

Since the changes are now committed, we clear the staging area, mimicking how Git’s state transitions from “staged” to “committed”.

The Path Node Structure

type pathNode struct {
	files    map[string]staging.Entry
	children map[string]*pathNode
}

func NewPathNode() *pathNode {
	return &pathNode{
		files:    make(map[string]staging.Entry),
		children: make(map[string]*pathNode),
	}
}

The pathNode structure is a fundamental component that creates a virtual file system in memory, representing the repository’s directory structure:

  • files: Maps filenames to their staging entries, representing files in the current directory
  • children: Maps directory names to their corresponding pathNode, representing subdirectories

This tree-like structure efficiently mirrors the hierarchical nature of a filesystem, which is critical for Git’s tree objects that represent directories.

Function getFileMode

func getFileMode(mode fs.FileMode) int {
	if mode&fs.ModeDir != 0 {
		return tree.DirectoryMode
	}
	if mode&0111 != 0 {
		return tree.ExecutableMode
	}
	return tree.RegularFileMode
}

This utility function translates Go’s filesystem mode flags into Git’s simplified mode constants:

  • If the file is a directory, it returns DirectoryMode (040000 in octal)
  • If the file has executable permissions (the 111 bits set), it returns ExecutableMode (100755 in octal)
  • Otherwise, it returns RegularFileMode (100644 in octal)

Git uses these specific mode values in its tree objects to track file types and permissions, which is essential for properly restoring files when checking out a commit.

Function createTreeFromNode

func (c *CommitCommand) createTreeFromNode(node *pathNode, objectsPath string) (string, error) {
	// Create a new tree object
	t := tree.New()

	// Process all subdirectories first
	for dirName, childNode := range node.children {
		// Recursively create tree for the subdirectory
		childHash, err := c.createTreeFromNode(childNode, objectsPath)
		if err != nil {
			return "", fmt.Errorf("failed to create tree for %s: %v", dirName, err)
		}

		// Add the subdirectory to the current tree
		err = t.AddEntry(dirName, childHash, tree.DirectoryMode)
	}

	// Process all files in the current directory
	for fileName, entry := range node.files {
		// Add each file to the tree with the appropriate mode
		err := t.AddEntry(fileName, entry.Hash, getFileMode(entry.Mode))
		if err != nil {
			return "", fmt.Errorf("failed to add entry %s: %v", fileName, err)
		}
	}

	// Write the tree to disk and return its hash
	hash, err := t.Write(objectsPath)
	if err != nil {
		return "", fmt.Errorf("failed to write tree: %v", err)
	}
	return hash, nil
}

This function is responsible for converting our virtual filesystem into Git’s tree objects. It works recursively to handle nested directories:

  1. It creates a new tree object for the current directory

  2. For each subdirectory:

    • It recursively calls itself to create a tree for that subdirectory
    • It adds the resulting tree as an entry in the current tree
  3. For each file in the current directory:

    • It adds the file as an entry in the current tree, using its blob hash
  4. It writes the completed tree to disk and returns its hash

This recursive approach handles arbitrarily nested directory structures, mirroring Git’s own tree implementation.

Function combineTreeWithStaged

func (c *CommitCommand) combineTreeWithStaged(previousTreeHash string, stagedEntries []*staging.Entry, objectsPath string) *pathNode {
	// Create a new virtual filesystem root
	root := NewPathNode()

	// If there's a previous tree, read its contents
	if previousTreeHash != "" {
		previousTree, err := tree.Read(objectsPath, previousTreeHash)
		if err != nil {
			fmt.Printf("Warning: could not read previous tree: %v\n", err)
		} else {
			// Add previous tree entries to our virtual filesystem
			for _, entry := range previousTree.Entries() {
				if entry.Mode == tree.DirectoryMode {
					// Process subdirectories recursively
					c.addTreeEntriesToPathNode(root, entry.Name, entry.Hash, objectsPath)
				} else {
					// Add files directly to the root
					root.files[entry.Name] = staging.Entry{
						Path: entry.Name,
						Hash: entry.Hash,
						Mode: fs.FileMode(entry.Mode),
					}
				}
			}
		}
	}

	// Add all staged entries to our virtual filesystem, potentially overwriting
	// files from the previous tree
	for _, entry := range stagedEntries {
		parts := strings.Split(entry.Path, "/")
		current := root

		// Navigate/create the directory structure
		for i := 0; i < len(parts)-1; i++ {
			dirName := parts[i]
			if _, exists := current.children[dirName]; !exists {
				current.children[dirName] = NewPathNode()
			}
			current = current.children[dirName]
		}

		// Add the file in its proper location
		filename := parts[len(parts)-1]
		current.files[filename] = *entry
	}

	return root
}

This critical function merges the previous repository state with the newly staged changes:

  1. It creates a new virtual filesystem

  2. If a previous commit exists:

    • It reads the previous tree structure
    • It adds all files and directories from the previous tree to our virtual filesystem
  3. It then adds all the staged entries to the virtual filesystem:

    • These will overwrite any existing files with the same path
    • New files are simply added
    • Files in the previous tree that aren’t in the staging area are preserved
  4. The result is a complete snapshot of the repository as it should appear in the new commit

This merge approach allows Git to efficiently handle partial commits without needing to restage unchanged files.

Function addTreeEntriesToPathNode

func (c *CommitCommand) addTreeEntriesToPathNode(root *pathNode, prefix string, treeHash string, objectsPath string) {
	// Read the tree object from disk
	subtree, err := tree.Read(objectsPath, treeHash)
	if err != nil {
		fmt.Printf("Warning: could not read subtree %s: %v\n", treeHash, err)
		return
	}

	// Process each entry in the tree
	for _, entry := range subtree.Entries() {
		// Build the full path for this entry
		fullPath := filepath.Join(prefix, entry.Name)

		if entry.Mode == tree.DirectoryMode {
			// Recursively process subdirectories
			c.addTreeEntriesToPathNode(root, fullPath, entry.Hash, objectsPath)
		} else {
			// Add files to our virtual filesystem with their full path
			root.files[fullPath] = staging.Entry{
				Path: fullPath,
				Hash: entry.Hash,
				Mode: fs.FileMode(entry.Mode),
			}
		}
	}
}

This helper function recursively reads Git tree objects and adds their entries to our virtual filesystem:

  1. It reads a tree object from disk

  2. For each entry in the tree:

    • If it’s a directory, it recursively processes that directory’s tree
    • If it’s a file, it adds it to our virtual filesystem with its full path
  3. The prefix parameter accumulates the path as we recurse through the tree structure

This function is crucial for handling deeply nested directory structures from previous commits, allowing us to preserve the complete repository state.

Together, these functions implement Git’s commit creation process, transforming a set of staged files into a permanent snapshot of your repository. The process elegantly handles nested directories, file mode preservation, and merging new changes with the existing repository state.


This command is more complex because it needs to:

  1. Read the staging area
  2. Check if there are staged files
  3. Read the current branch from HEAD
  4. Retrieve the previous commit (if any)
  5. Create a tree structure from the staged files
  6. Create a new commit
  7. Update the branch reference
  8. Clear the staging area

The helper functions handle the conversion between the staging area’s flat list of files and Git’s tree structure.

The Branch Command

The branch command manages branches:

// internal/commands/branch.go
package commands

import (
	"fmt"
	"github.com/HalilFocic/gitgo/internal/refs"
)

type BranchCommand struct {
	rootPath string
	name     string
	action   string
}

func NewBranchCommand(rootPath, name, action string) *BranchCommand {
	return &BranchCommand{
		rootPath: rootPath,
		name:     name,
		action:   action,
	}
}

func (c *BranchCommand) Execute() error {
	switch c.action {
	case "create":
		head, err := refs.ReadHead(c.rootPath)
		if err != nil {
			return fmt.Errorf("failed to read HEAD: %v", err)
		}
		if head.Type == refs.RefTypeSymbolic {
			ref, err := refs.ReadRef(c.rootPath, head.Target)
			if err != nil {
				return fmt.Errorf("failed to read current branch: %v", err)
			}
			head = ref
		}
		if err := refs.CreateBranch(c.rootPath, c.name, head.Target); err != nil {
			return fmt.Errorf("failed to create branch: %v", err)
		}

	case "delete":
		if err := refs.DeleteBranch(c.rootPath, c.name); err != nil {
			return fmt.Errorf("failed to delete branch: %v", err)
		}

	case "list":
		branches, err := refs.ListBranches(c.rootPath)
		if err != nil {
			return fmt.Errorf("failed to list branches: %v", err)
		}

		head, err := refs.ReadHead(c.rootPath)
		if err != nil {
			return fmt.Errorf("failed to read HEAD: %v", err)
		}

		currentBranch := ""
		if head.Type == refs.RefTypeSymbolic {
			currentBranch = head.Target[len("refs/heads/"):]
		}

		for _, branch := range branches {
			if branch == currentBranch {
				fmt.Printf("* %s\n", branch)
			} else {
				fmt.Printf("  %s\n", branch)
			}
		}

	default:
		return fmt.Errorf("unknown branch action: %s", c.action)
	}

	return nil
}

This command supports three actions:

  1. create: Creates a new branch pointing to the current commit
  2. delete: Deletes an existing branch
  3. list: Lists all branches, marking the current one with an asterisk (*)

It uses the reference functions we implemented in Chapter 6. Since this function is way less complex and should be clear what it does, I will skip the explaination.

The Checkout Command

The checkout command allows users to switch between branches or restore the working directory to a specific commit. This is a powerful functionality that enables parallel development workflows and time travel through your project’s history.

package commands

import (
	"fmt"
	"io/fs"
	"os"
	"path/filepath"

	"github.com/HalilFocic/gitgo/internal/blob"
	"github.com/HalilFocic/gitgo/internal/commit"
	"github.com/HalilFocic/gitgo/internal/config"
	"github.com/HalilFocic/gitgo/internal/refs"
	"github.com/HalilFocic/gitgo/internal/tree"
)

type CheckoutCommand struct {
	rootPath string  // Repository root path
	target   string  // Target branch name or commit hash
}

func NewCheckoutCommand(rootPath, target string) *CheckoutCommand {
	return &CheckoutCommand{
		rootPath: rootPath,
		target:   target,
	}
}

The core of the checkout functionality is implemented in the Execute() method, which orchestrates the entire checkout process:

func (c *CheckoutCommand) Execute() error {
	// Construct the branch reference path
	branchRef := filepath.Join("refs", "heads", c.target)

	// Try to read the target as a branch reference
	ref, err := refs.ReadRef(c.rootPath, branchRef)

	var commitHash string
	if err == nil {
		// If target is a valid branch, use its commit hash
		commitHash = ref.Target

		// Update HEAD to point to the branch (symbolic reference)
		if err := refs.WriteHead(c.rootPath, branchRef, true); err != nil {
			return fmt.Errorf("failed to update HEAD: %v", err)
		}
	} else {
		// If target is not a branch, check if it's a valid commit hash
		if len(c.target) != 40 {
			return fmt.Errorf("invalid reference: %s", c.target)
		}

		// Use the target directly as a commit hash
		commitHash = c.target

		// Update HEAD to point directly to the commit (detached HEAD)
		if err := refs.WriteHead(c.rootPath, commitHash, false); err != nil {
			return fmt.Errorf("failed to update HEAD: %v", err)
		}
	}

	// Read the commit object from the repository
	com, err := commit.Read(filepath.Join(c.rootPath, config.GitDirName, "objects"), commitHash)
	if err != nil {
		return fmt.Errorf("failed to read commit: %v", err)
	}

	// Read the root tree that the commit points to
	rootTree, err := tree.Read(filepath.Join(c.rootPath, config.GitDirName, "objects"), com.TreeHash)
	if err != nil {
		return fmt.Errorf("failed to read tree: %v", err)
	}

	// Clean the working directory (remove everything except .gitgo)
	files, err := filepath.Glob(filepath.Join(c.rootPath, "*"))
	if err != nil {
		return fmt.Errorf("failed to list files: %v", err)
	}
	for _, f := range files {
		if filepath.Base(f) != config.GitDirName {
			os.RemoveAll(f)
		}
	}

	// Recreate the working directory from the commit's tree
	if err := c.writeTree(rootTree, c.rootPath); err != nil {
		return fmt.Errorf("failed to write files: %v", err)
	}

	return nil
}

Breaking Down the Execute() Method

The Execute() method performs several key operations that mirror Git’s checkout process:

  1. Identify the Target:

    branchRef := filepath.Join("refs", "heads", c.target)
    ref, err := refs.ReadRef(c.rootPath, branchRef)

    First, we attempt to interpret the target as a branch name by constructing the branch reference path and trying to read it. This is what happens when you run git checkout main.

  2. Handle Branch or Commit Checkout:

    if err == nil {
        commitHash = ref.Target
        if err := refs.WriteHead(c.rootPath, branchRef, true); err != nil {
            return fmt.Errorf("failed to update HEAD: %v", err)
        }
    } else {
        if len(c.target) != 40 {
            return fmt.Errorf("invalid reference: %s", c.target)
        }
        commitHash = c.target
        if err := refs.WriteHead(c.rootPath, commitHash, false); err != nil {
            return fmt.Errorf("failed to update HEAD: %v", err)
        }
    }

    This section handles two different checkout scenarios:

    • Branch Checkout: If the target is a valid branch, we update HEAD to point to that branch as a symbolic reference (ref: refs/heads/main). This is the normal Git checkout behavior.
    • Commit Checkout: If the target isn’t a valid branch but is a valid commit hash (40 characters), we enter “detached HEAD” state by making HEAD point directly to the commit hash instead of to a branch.
  3. Read the Target Commit and Tree:

    com, err := commit.Read(filepath.Join(c.rootPath, config.GitDirName, "objects"), commitHash)
    rootTree, err := tree.Read(filepath.Join(c.rootPath, config.GitDirName, "objects"), com.TreeHash)

    Next, we read the target commit object and its associated root tree. The commit object contains metadata (author, message, etc.), while the tree object contains the actual file structure that we want to restore.

  4. Clean the Working Directory:

    files, err := filepath.Glob(filepath.Join(c.rootPath, "*"))
    for _, f := range files {
        if filepath.Base(f) != config.GitDirName {
            os.RemoveAll(f)
        }
    }

    This is a critical step: we remove all files and directories from the working directory, except for the .gitgo directory itself. This ensures that the working directory will exactly match the commit’s state without any extra files. In real Git, this step is more sophisticated and handles file preservation and merging strategies. But for us, this is perfectly fine to do.

  5. Restore the Files:

    if err := c.writeTree(rootTree, c.rootPath); err != nil {
        return fmt.Errorf("failed to write files: %v", err)
    }

    Finally, we restore all files from the commit’s tree structure into the working directory. This process recursively recreates the entire file structure as it was at the time of the commit.

Function: writeTree

The writeTree function is responsible for recursively recreating the file structure from a Git tree object:

func (c *CheckoutCommand) writeTree(t *tree.Tree, path string) error {
	// Process each entry in the tree
	for _, entry := range t.Entries() {
		// Construct the full path for this entry
		fullPath := filepath.Join(path, entry.Name)

		if entry.Mode == tree.DirectoryMode {
			// If the entry is a directory:
			// 1. Create the directory
			os.MkdirAll(fullPath, 0755)

			// 2. Read the subtree object
			subTree, err := tree.Read(filepath.Join(c.rootPath, config.GitDirName, "objects"), entry.Hash)
			if err != nil {
				return fmt.Errorf("failed to read subtree: %v", err)
			}

			// 3. Recursively process the subtree
			if err := c.writeTree(subTree, fullPath); err != nil {
				return err
			}
		} else {
			// If the entry is a file:
			// 1. Read the blob object
			b, err := blob.Read(filepath.Join(c.rootPath, config.GitDirName, "objects"), entry.Hash)
			if err != nil {
				return fmt.Errorf("failed to read blob: %v", err)
			}

			// 2. Write the file contents to disk with proper permissions
			if err := os.WriteFile(fullPath, b.Content(), fs.FileMode(entry.Mode)); err != nil {
				return fmt.Errorf("failed to write file: %v", err)
			}
		}
	}
	return nil
}

Breaking Down the writeTree() Method

This recursive function effectively transforms Git’s tree and blob objects back into actual files and directories on disk:

  1. Iterate Through Tree Entries:

    for _, entry := range t.Entries() {
        fullPath := filepath.Join(path, entry.Name)
        // ...
    }

    The function processes each entry in the current tree, calculating its full path in the working directory.

  2. Directory Handling:

    if entry.Mode == tree.DirectoryMode {
        os.MkdirAll(fullPath, 0755)
        subTree, err := tree.Read(filepath.Join(c.rootPath, config.GitDirName, "objects"), entry.Hash)
        if err != nil {
            return fmt.Errorf("failed to read subtree: %v", err)
        }
        if err := c.writeTree(subTree, fullPath); err != nil {
            return err
        }
    }

    When the entry is a directory:

    • Create the directory with appropriate permissions (0755)
    • Read the subtree object from the Git object database
    • Recursively call writeTree on the subtree, using the directory as the new base path

    This elegant recursive approach can handle arbitrarily deep directory structures.

  3. File Handling:

    else {
        b, err := blob.Read(filepath.Join(c.rootPath, config.GitDirName, "objects"), entry.Hash)
        if err != nil {
            return fmt.Errorf("failed to read blob: %v", err)
        }
        if err := os.WriteFile(fullPath, b.Content(), fs.FileMode(entry.Mode)); err != nil {
            return fmt.Errorf("failed to write file: %v", err)
        }
    }

    When the entry is a file:

    • Read the blob object from the Git object database
    • Write the blob content to a file in the working directory
    • Set the file permissions according to the mode stored in the tree entry

    This ensures each file is restored with its exact content and permissions.

The entire process mirrors Git’s internal checkout mechanism, which needs to efficiently translate its content-addressable storage format (trees and blobs) back into a traditional file system structure.

This checkout implementation demonstrates how Git can efficiently “time travel” through your repository’s history. The elegant separation between references (branches), commits, trees, and blobs allows Git to restore any historical state of your project with a simple command.

The Log Command

The log command is an essential tool in any version control system, allowing developers to explore the project’s history. It traverses the commit graph and displays information about each commit, helping users understand how their project evolved over time.

// internal/commands/log.go
package commands
import (
	"fmt"
	"github.com/HalilFocic/gitgo/internal/commit"
	"github.com/HalilFocic/gitgo/internal/refs"
	"path/filepath"
)

type LogCommand struct {
	rootPath string  // Repository root path
	maxCount int     // Maximum number of commits to display (-1 for unlimited)
}

func NewLogCommand(rootPath string, maxCount int) *LogCommand {
	if maxCount <= 0 {
		maxCount = -1 // No limit if zero or negative value provided
	}
	return &LogCommand{
		rootPath: rootPath,
		maxCount: maxCount,
	}
}

The core of the log functionality is implemented in the Execute() method:

func (c *LogCommand) Execute() error {
	// Set up path to Git objects
	objectsPath := filepath.Join(c.rootPath, ".gitgo", "objects")

	// Read the HEAD reference to determine where we are
	headRef, err := refs.ReadHead(c.rootPath)
	if err != nil {
		return fmt.Errorf("failed to read HEAD: %v", err)
	}

	// Determine the current commit hash
	var currentCommitHash string
	if headRef.Type == refs.RefTypeCommit {
		// If HEAD points directly to a commit (detached HEAD)
		currentCommitHash = headRef.Target
	} else {
		// If HEAD points to a branch reference
		newRef, err := refs.ReadRef(c.rootPath, headRef.Target)
		if err != nil {
			return fmt.Errorf("failed to read link inside head: %v", err)
		}
		currentCommitHash = newRef.Target
	}

	// Track how many commits we've displayed
	commitCount := 0

	// Traverse the commit chain
	for currentCommitHash != "" {
		// Respect the max count limit if specified
		if c.maxCount != -1 && commitCount >= c.maxCount {
			break
		}

		// Read the current commit
		currentCommit, err := commit.Read(objectsPath, currentCommitHash)
		if err != nil {
			return fmt.Errorf("failed to read commit %s: %v", currentCommitHash, err)
		}

		// Display commit information
		fmt.Printf("commit %s\n", currentCommitHash)
		fmt.Printf("Author: %s\n", currentCommit.Author)
		fmt.Printf("Date: %v\n", currentCommit.AuthorDate.Format("Mon Jan 2 15:04:05 2006 -0700"))
		fmt.Printf("\n    %s\n\n", currentCommit.Message)

		// Move to the parent commit for the next iteration
		currentCommitHash = currentCommit.ParentHash
		commitCount++
	}

	// Notify if there were no commits
	if commitCount == 0 {
		fmt.Println("No commits found")
	}

	return nil
}

Breaking Down the Execute Method

The Execute() method implements a straightforward but powerful algorithm for traversing and displaying commit history:

  1. Find the Starting Point:
headRef, err := refs.ReadHead(c.rootPath)
// Determine currentCommitHash from headRef

We first need to determine where to start our history traversal. This is typically the commit that HEAD points to, which could be:

  • The commit that the current branch points to (normal case)
  • A specific commit if in detached HEAD state
  1. ** Handle Different HEAD States**:
if headRef.Type == refs.RefTypeCommit {
    currentCommitHash = headRef.Target
} else {
    newRef, err := refs.ReadRef(c.rootPath, headRef.Target)
    currentCommitHash = newRef.Target
}

This logic handles two different HEAD scenarios:

  • Detached HEAD: HEAD directly contains a commit hash
  • Normal HEAD: HEAD points to a branch reference, which we need to follow to get the commit hash
  1. Traverse the Commit Chain:
for currentCommitHash != "" {
    // Process commit
    // ...
    currentCommitHash = currentCommit.ParentHash
}

The key to the log command is this loop that follows the parent chain:

  • Start at the current commit
  • Display its information
  • Move to its parent
  • Repeat until we reach the end of history (a commit with no parent)

This simple approach works because Git commits form a linked list, with each commit pointing to its parent(s).

  1. ** Display Commit Information***:
fmt.Printf("commit %s\n", currentCommitHash)
fmt.Printf("Author: %s\n", currentCommit.Author)
fmt.Printf("Date: %v\n", currentCommit.AuthorDate.Format(...))
fmt.Printf("\n    %s\n\n", currentCommit.Message)

For each commit, we display:

  • The commit hash
  • The author information
  • The date in a human-readable format
  • The commit message
  1. Respect PAgination Limits:
if c.maxCount != -1 && commitCount >= c.maxCount {
    break
}

This handles the case where the user only wants to see a limited number of commits (like git log -n 5), preventing the display of the entire history for large repositories.

Because commits form a linked list through their parent references, traversing the project history becomes a straightforward operation of following these links.

Conclusion: Bringing It All Together

We’ve now completed our journey of building a Git clone from scratch! Starting from basic repository initialization in Chapter 1, we’ve progressively implemented each core component of Git’s architecture and now tied everything together with a functional command-line interface.

Our gitgo tool now provides the essential commands needed for basic version control:

  • init: Creates a new repository with the necessary directory structure
  • add: Stores file contents as blobs and updates the staging area
  • remove: Removes files from the staging area
  • commit: Creates snapshots combining trees, blobs, and metadata
  • branch: Manages different development lines
  • checkout: Switches between branches or specific commits
  • log: Shows the commit history

This journey has given us unique insights into Git’s internal design:

  1. Content-addressable storage: How Git uses SHA-1 hashes to identify and store content
  2. Object model: The blob, tree, and commit objects that form Git’s foundation
  3. Staging mechanism: How Git prepares changes before committing
  4. Reference system: How branches and HEAD work to track the repository state

While our implementation is simpler than the actual Git, it demonstrates the core principles that make Git such a powerful version control system. By understanding these internals, you’ll have a deeper appreciation for Git’s design and be better equipped to use it effectively in your projects.

Building Gitgo

To build our tool, run:

go build -o gitgo ./cmd/gitgo/main.go

Using Gitgo

To make your gitgo tool more convenient to use, you can add it to your system’s PATH. This allows you to run it from any directory by simply typing gitgo instead of providing the full path to the executable. Once you’ve added gitgo to your PATH, you can use it just like the standard Git command:

Then you can use it like any other Git command:

# Initialize a repository
gitgo init

# Stage some files
gitgo add file.txt

# Create a commit
gitgo commit -m "Initial commit"

# View history
gitgo log

Wrapping Up: The Git Journey

We’ve reached the end of our journey building a functional Git implementation from scratch. Looking back at what we’ve accomplished over these seven parts:

  1. Repository Structure: We laid the groundwork with Git’s directory structure and initialization
  2. Blob Storage: We implemented content-addressable storage for file contents
  3. Staging Area: We created the bridge between working directory and repository
  4. Trees: We built Git’s mechanism for representing directories and file hierarchies
  5. Commits: We developed the snapshot system that forms Git’s history
  6. References: We implemented branches and HEAD to make navigation user-friendly
  7. Command Line: We tied everything together into a usable tool

Building gitgo has given us insight into Git’s brilliant design choices. We’ve seen how Git’s content-addressable storage enables efficient deduplication, how its staging area provides flexibility in commit creation, and how its simple references system enables powerful branching workflows.

Beyond just understanding Git, we’ve explored fundamental computer science concepts: cryptographic hashing, directed acyclic graphs, tree structures, and filesystem operations. These principles extend far beyond version control and appear throughout software engineering.

I hope this series has transformed Git from a mysterious black box into a comprehensible system built on elegant principles. The next time you run git commit, you’ll have a mental model of exactly what’s happening beneath the command.

What aspect of Git’s design did you find most interesting? What features might you add to gitgo? I’d love to hear your thoughts in the comments below!

Potential Extensions

For those wanting to go extra mile, two easiest things you can add to this project are:

  • status command that lists out files added to staging
  • gitgo config that will hold commiter info

But if we are talking about harder stuff, you could implement:

  1. Diff functionality: Show changes between commits or working directory.
  2. Remote operations: Add support for push, pull, and fetch
  3. Merging: Implement branch merging functionality
  4. Interactive staging: Allow staging parts of files
Chapter 6: Refs