12 min read
Write your git - Part 4: Trees

Introduction

In our previous chapters, we’ve built a solid foundation for our Git clone by implementing repository initialization, blob storage, and the staging area. We can now create repositories, store file contents as blobs, and track which files are staged for a commit. But we’re missing a crucial component to make actual commits: Git’s tree structure.

While blobs store individual file contents, they don’t carry any information about filenames, directories, or the overall structure of your project.

This is where trees come in. Think of trees as Git’s way of representing directories – they track file names, permissions. A tree is essentially a snapshot of your repository’s file hierarchy at a specific point in time.

In this chapter, we’ll implement Git’s tree structure, which will allow us to:

  • Represent directory structures in our repository
  • Link filenames to their content blobs
  • Track file permissions and types
  • Create snapshots of the entire project structure
  • Support nested directories

This is a crucial step toward implementing commits, as each commit will point to a tree that represents the repository’s state at commit time

What is a Tree in Git?

In Git, a tree is an object that represents a directory. It contains list of entries, where each entry consits of:

  • filename
  • mode (file permissions and type)
  • hash (pointing to blob or another tree)

Think of it like its a table of contents:

Mode    | Type  | Hash                     | Name
--------|-------|--------------------------|-------
100644  | blob  | a45bd...                 | README.md
100755  | blob  | b78ef...                 | run.sh
040000  | tree  | c913a...                 | src

In this example, we have:

  • Regular files (README.md, run.sh) pointing to blobs
  • A directory (src) pointing to another tree The mode values have specific meanings:
  • 100644: Regular file
  • 100755: Executable file
  • 040000: Directory (tree)

Trees and Project Structure

To understand how trees work in Git, let’s look at a simple project structure:

project/
├── README.md
├── main.go
└── lib/
    ├── util.go
    └── helper.go

Git would represent this project state with combination of trees and blobs. At the top, we always have root tree which can contain other trees and blobs.

In this case we have:

  1. A root tree containing:
    • README.MD (blob)
    • main.go (blob)
    • lib (tree)
  2. A lib tree containing:
    • util.go (blob)
    • helper.go (blob)

Each tree is identified by its SHA-1 hash, just like blobs. This forms a tree structure where directories reference other trees, and files reference blobs.

Recursive structure

The beauty of Git’s design is its recursive nature. A tree can reference other trees, allowing Git to efficiently represent nested directory structures of any depth. This is similar to how filesystems work, with directories containing files and other directories.

Project Structure

gitgo/
├── go.mod
└── internal/
    ├── blob/              # From part 2
    │   ├── blob.go
    │   └── blob_test.go
    ├── config/            # From part 1
    │   └── config.go
    ├── repository/        # From part 1
    │   ├── repository.go
    │   └── repository_test.go
    ├── staging/           # From part 3
    │   ├── staging.go
    │   └── staging_test.go
    └── tree/              # NEW DIRECTORY
        ├── tree.go
        └── tree_test.go

Tests

As always, we will write some tests for trees, explain them and move on the actual implementation.

// internal/tree/tree_test.go
package tree

import (
	"os"
	"path/filepath"
	"testing"
	"github.com/HalilFocic/gitgo/internal/config"
)

func TestTreeOperations(t *testing.T) {
	t.Run("1.1: Create new tree and add entries", func(t *testing.T) {
		tree := New()
		fileHash := "1234567890123456789012345678901234567890"
		dirHash := "abcdefabcdefabcdefabcdefabcdefabcdefabcd"
		err := tree.AddEntry("main.go", fileHash, RegularFileMode)
		if err != nil {
			t.Fatalf("Failed to add file entry: %v", err)
		}

		err = tree.AddEntry("lib", dirHash, DirectoryMode)
		if err != nil {
			t.Fatalf("Failed to add directory entry: %v", err)
		}

		entries := tree.Entries()
		if len(entries) != 2 {
			t.Fatalf("Expected 2 entries, got %d", len(entries))
		}

		// 1 was used instead of zero because m is after l alphabetically
		// since we sorted the entries, this makes impact
		if entries[1].Name != "main.go" {
			t.Errorf("Expected name 'main.go', got %s", entries[0].Name)
		}
		if entries[1].Hash != fileHash {
			t.Errorf("Expected hash 'abc123', got %s", entries[0].Hash)
		}
		if entries[1].Mode != RegularFileMode {
			t.Errorf("Expected mode %o, got %o", RegularFileMode, entries[0].Mode)
		}
	})

	t.Run("1.2: Invalid entry handling", func(t *testing.T) {
		tree := New()

		err := tree.AddEntry("", "abc123", RegularFileMode)
		if err == nil {
			t.Errorf("Expected error for empty name, got none")
		}

		err = tree.AddEntry("test.txt", "", RegularFileMode)
		if err == nil {
			t.Errorf("Expected error for empty hash, got none")
		}

		err = tree.AddEntry("test.txt", "abc123", 0777)
		if err == nil {
			t.Error("Expected error for invalid mode, got none")
		}
	})

	t.Run("1.3: Storage and retrieval", func(t *testing.T) {
		tree := New()
		tree.AddEntry("file1.txt", "abc123", RegularFileMode)
		tree.AddEntry("dir", "def456", DirectoryMode)

		cwd, err := os.Getwd()
		if err != nil {
			t.Fatalf("Failed to get current directory: %v", err)
		}

		testDir := filepath.Join(cwd, "testdata")
		os.RemoveAll(testDir)
		os.MkdirAll(filepath.Join(testDir, config.GitDirName, "objects"), 0755)
		defer os.RemoveAll(testDir)

		hash, err := tree.Write(filepath.Join(testDir, config.GitDirName, "objects"))
		if err != nil {
			t.Fatalf("Failed to write tree: %v", err)
		}

		readTree, err := Read(filepath.Join(testDir, config.GitDirName, "objects"), hash)
		if err != nil {
			t.Fatalf("Failed to read tree: %v", err)
		}

		originalEntries := tree.Entries()
		readEntries := readTree.Entries()

		if len(originalEntries) != len(readEntries) {
			t.Errorf("Entry count mismatch: got %d, want %d",
				len(readEntries), len(originalEntries))
		}

		for i, original := range originalEntries {
			read := readEntries[i]
			if original.Name != read.Name {
				t.Errorf("Name mismatch: got %s, want %s",
					read.Name, original.Name)
			}
			if original.Hash != read.Hash {
				t.Errorf("Hash mismatch: got %s, want %s",
					read.Hash, original.Hash)
			}
			if original.Mode != read.Mode {
				t.Errorf("Mode mismatch: got %o, want %o",
					read.Mode, original.Mode)
			}
		}
	})

}

func TestTreeEdgeCases(t *testing.T) {
	t.Run("2.1: Special characters in filename", func(t *testing.T) {
		tree := New()
		hash := "1234567890123456789012345678901234567890"
		// This test tries to add files with special characters
		specialNames := []string{
			"hello world.txt",
			"!@#$%.txt",
			"über.txt",
			"file名前.txt",
		}

		for _, name := range specialNames {
			err := tree.AddEntry(name, hash, RegularFileMode)
			if err != nil {
				t.Errorf("Failed to add file with special name '%s': %v", name, err)
			}
		}

		// Test write/read with special characters
		cwd, _ := os.Getwd()
		testDir := filepath.Join(cwd, "testdata")
		os.MkdirAll(filepath.Join(testDir, config.GitDirName, "objects"), 0755)
		defer os.RemoveAll(testDir)

		treeHash, err := tree.Write(filepath.Join(testDir, config.GitDirName, "objects"))
		if err != nil {
			t.Fatalf("Failed to write tree: %v", err)
		}

		readTree, err := Read(filepath.Join(testDir, config.GitDirName, "objects"), treeHash)
		if err != nil {
			t.Fatalf("Failed to read tree: %v", err)
		}

		for _, name := range specialNames {
			found := false
			for _, entry := range readTree.Entries() {
				if entry.Name == name {
					found = true
					break
				}
			}
			if !found {
				t.Errorf("Failed to find entry '%s' after read", name)
			}
		}
	})

	t.Run("2.2: Empty tree", func(t *testing.T) {
		tree := New()

		cwd, _ := os.Getwd()
		testDir := filepath.Join(cwd, "testdata")
		os.MkdirAll(filepath.Join(testDir, config.GitDirName, "objects"), 0755)
		defer os.RemoveAll(testDir)

		hash, err := tree.Write(filepath.Join(testDir, config.GitDirName, "objects"))
		if err != nil {
			t.Fatalf("Failed to write empty tree: %v", err)
		}

		readTree, err := Read(filepath.Join(testDir, config.GitDirName, "objects"), hash)
		if err != nil {
			t.Fatalf("Failed to read empty tree: %v", err)
		}

		if len(readTree.Entries()) != 0 {
			t.Errorf("Expected empty tree, got %d entries", len(readTree.Entries()))
		}
	})

	t.Run("2.3: Invalid UTF-8", func(t *testing.T) {
		tree := New()
		hash := "1234567890123456789012345678901234567890"

		// Create invalid UTF-8 string
		invalidName := string([]byte{0xFF, 0xFE, 0xFD})

		err := tree.AddEntry(invalidName, hash, RegularFileMode)
		if err == nil {
			t.Error("Expected error for invalid UTF-8 filename")
		}
	})
}

In short, our tests do the following:

1.1: Create new tree and add entries:

  • Verifies that a new tree can be created
  • Tests adding both file entries and directory entries
  • Confirms entries are correctly stored with proper name, hash, and mode
  • Implicitly tests that entries are sorted alphabetically

1.2: Invalid entry handling:

  • Tests validation by attempting to add invalid entries:
    • Empty name (should reject)
    • Empty hash (should reject)
    • Invalid file mode (should reject)
  • Confirms proper error handling for invalid inputs

1.3: Storage and retrieval:

  • Tests the Write() and Read() functionality
  • Creates a tree with multiple entries
  • Writes it to disk in Git’s format
  • Reads it back and confirms all entries match the original
  • Verifies the serialization/deserialization roundtrip works correctly

2.1: Special characters in filename:

  • Tests support for filenames with special characters
  • Includes spaces, symbols, non-ASCII Unicode characters
  • Ensures these names survive the write/read cycle
  • Verifies proper UTF-8 handling

2.2: Empty tree:

  • Tests handling of trees with no entries
  • Confirms an empty tree can be written and read back
  • Verifies empty trees are handled consistently

2.3: Invalid UTF-8:

  • Tests rejection of invalid UTF-8 sequences in filenames
  • Creates a filename with invalid byte sequences
  • Confirms the implementation rejects these invalid names

Implementation Overview

Before diving into the detailed explaination, let’s look at what our tree.go should look like. Our implementation will need to:

  1. Represent a directory structure with files and other directories
  2. Store and retrieve tree objects from disk
  3. Handle file permissions and types
  4. Support validation of entries
  5. Use Git’s binary format for trees

Let’s start with a skeleton of our implementation including necessary imports, constants, and function signatures:

package tree

import (
	"bytes"
	"compress/zlib"
	"crypto/sha1"
	"encoding/hex"
	"fmt"
	"io"
	"os"
	"path/filepath"
	"sort"
	"strconv"
	"strings"
	"unicode/utf8"
)

// File mode constants as used by Git
const (
	RegularFileMode = 0100644 // Regular file
	ExecutableMode  = 0100755 // Executable file
	DirectoryMode   = 0040000 // Directory
)

// TreeEntry represents a single entry in a tree (file or directory)
type TreeEntry struct {
	Name string
	Hash string
	Mode int
}

// TreeEntries is a slice of TreeEntry with sorting capabilities
type TreeEntries []TreeEntry

func (te TreeEntries) Len() int {
	return len(te)
}
func (te TreeEntries) Swap(i, j int)      { te[i], te[j] = te[j], te[i] }
func (te TreeEntries) Less(i, j int) bool { return te[i].Name < te[j].Name }


// Tree represents a directory in Git
type Tree struct {
	entries []TreeEntry
}


func New() *Tree {}  // Creates new empty tree
func (tree *Tree) AddEntry(name, hash string, filemode int) error {} // Adds a file or directory to the tree
func (tree *Tree) Entries() []TreeEntry {} // Returns all entries in the tree
func (tree *Tree) Write(objectsPath string) (string, error) {} // Writes the tree to disk
func Read(objectsPath, hash string) (*Tree, error) {} // Reads a tree from disk

New and Entries Methods

Since New and Entries methods are prety simple and we have implemented similar methods, I will just provide a code snippet for them.

// creates new tree struct and returns it
func New() *Tree {
	return &Tree{
		entries: []TreeEntry{},
	}
}

// returns array of entries for that tree
func (tree *Tree) Entries() []TreeEntry {
	return tree.entries
}

AddEntries method

The AddEntry method adds a new file or directory entry to our tree. This method meeds to perform several validation checks to ensure the entry is valid before adding it to the tree.

Here is the implementation:

func (tree *Tree) AddEntry(name, hash string, filemode int) error {
	if len(name) == 0 {
		return fmt.Errorf("entry name cannot be empty")
	}
	if len(hash) == 0 {
		return fmt.Errorf("entry hash cannot be empty")
	}
	if len(hash) != 40 {
		return fmt.Errorf("invalid hash length: expected 40 characters, got %d", len(hash))
	}
	if !utf8.ValidString(name) {
		return fmt.Errorf("invalid UTF-8 in filename")
	}
	if filemode != RegularFileMode && filemode != ExecutableMode && filemode != DirectoryMode {
		return fmt.Errorf("Invalid file mode.")
	}
	if strings.Contains(name, "/") {
		return fmt.Errorf("Entry name cannot contain '/', this should be handled by seperate tree")
	}
	for _, tEntry := range tree.entries {
		if tEntry.Name == name {
			return fmt.Errorf("Name %s already exists inside this tree", tEntry.Name)
		}
	}
	tree.entries = append(tree.entries, TreeEntry{
		Name: name,
		Hash: hash,
		Mode: filemode,
	})
	sort.Sort(TreeEntries(tree.entries))
	return nil
}

Let’s break down what this method does:

  1. Name Validation:
  • Checks if the name is empty, which isn’t allowed in Git
  • Verifies the name contains valid UTF-8 characters using Go’s utf8.ValidString function
  • Ensures the name doesn’t contain slash characters (/), as these should be represented by separate trees
  1. Hash Validation:
  • Ensures the hash isn’t empty
  • Validates that the hash is exactly 40 characters (the length of a SHA-1 hash in hexadecimal)
  1. File Mode Validation:
  • Checks that the provided mode is one of our supported constants:
    • RegularFileMode (0100644) for normal files
    • ExecutableMode (0100755) for executable files
    • DirectoryMode (0040000) for directories
  • Rejects any other modes to maintain compatibility with Git
  1. Duplicate Prevention:
  • Iterates through existing entries to ensure no duplicate filenames
  • Git trees cannot contain multiple entries with the same name (just like directories in a filesystem)
  1. Entry Addition and Sorting:
  • Creates a new TreeEntry with the provided name, hash, and mode
  • Adds it to the tree’s entries slice
  • Sorts all entries alphabetically by name using our sort.Sort implementation
  • This ensures consistent hashing regardless of the order entries were added

The sorting is particularly important because it guarantees that the same set of files will always produce the same tree hash, regardless of the order in which they were added. This is essential for Git’s content-addressed storage model.

Implementing the Write Method

The Write method is responsible for serializing our tree structure into Git’s binary format and storing it in the objects database. This is a operation that involves several steps.

How the Write Method Should Work

To write a tree to disk, we need to:

  1. Format the Header: Create a header that includes the object type (“tree”) and content size
  2. Format Each Entry: For each entry in the tree, we need to:
    • Format the mode as an octal string followed by a space
    • Write the name followed by a null byte (0x00)
    • Write the hash as binary data (not hex string)
  3. Calculate Hash: generate a SHA-1 hash of the formatted content
  4. Compress Data: Compress the entire content using zlib
  5. Determine File Path: Use the first two characters of the hash as a directory name and the rest as the filename
  6. Write to Disk: Create the directory if needed and write the compressed data to the file
func (tree *Tree) Write(objectsPath string) (string, error) {
	var buffer bytes.Buffer

	// Calculate the size of content for the header
	contentSize := 0
	for _, entry := range tree.entries {
		contentSize += len(strconv.FormatInt(int64(entry.Mode), 8)) + 1 + len(entry.Name) + 1 + 20
	}

	// Write the header: "tree {size}\0"
	fmt.Fprintf(&buffer, "tree %d\x00", contentSize)

	// Write each entry
	for _, entry := range tree.entries {
		// Write mode in octal + space
		fmt.Fprintf(&buffer, "%o ", entry.Mode)

		// Write name + null byte
		buffer.WriteString(entry.Name)
		buffer.WriteByte(0)

		// Convert hash from hex to binary and write
		hashBytes, err := hex.DecodeString(entry.Hash)
		if err != nil {
			return "", fmt.Errorf("invalid hash %s: %v", entry.Hash, err)
		}
		buffer.Write(hashBytes)
	}

	// Compress the content
	var compressed bytes.Buffer
	zWriter := zlib.NewWriter(&compressed)
	if _, err := zWriter.Write(buffer.Bytes()); err != nil {
		return "", fmt.Errorf("compression failed: %v", err)
	}
	zWriter.Close()

	// Generate the SHA-1 hash
	hash := sha1.Sum(compressed.Bytes())
	hashString := hex.EncodeToString(hash[:])

	// Create file path from hash
	hashPath := filepath.Join(objectsPath, hashString[:2], hashString[2:])
	if err := os.MkdirAll(filepath.Dir(hashPath), 0755); err != nil {
		return "", fmt.Errorf("failed to create object directory: %v", err)
	}

	// Write to file
	if err := os.WriteFile(hashPath, compressed.Bytes(), 0644); err != nil {
		return "", fmt.Errorf("failed to write object file: %v", err)
	}

	return hashString, nil
}

Let’s break down each step in detail:

1. Calculating content Size

contentSize := 0
for _, entry := range tree.entries {
    contentSize += len(strconv.FormatInt(int64(entry.Mode), 8)) + 1 + len(entry.Name) + 1 + 20
}

This calculates the exact size of the tree’s content (excluding the header):

  • len(strconv.FormatInt(int64(entry.Mode), 8)): Length of the mode in octal format
  • + 1: Space after the mode
  • + len(entry.Name): Length of the entry name
  • + 1: Null byte after the name
  • + 20: Length of the binary hash (20 bytes for SHA-1)

Git requires this size in the header for validation and parsing.

2. Writing the Header

fmt.Fprintf(&buffer, "tree %d\x00", contentSize)

The header consists of:

  • The word “tree” to identify the object type
  • A space
  • The size of the content in decimal
  • A null byte (0x00) as a delimiter between header and content

3. Writing Each Entry

for _, entry := range tree.entries {
    // Write mode in octal + space
    fmt.Fprintf(&buffer, "%o ", entry.Mode)

    // Write name + null byte
    buffer.WriteString(entry.Name)
    buffer.WriteByte(0)

    // Convert hash from hex to binary and write
    hashBytes, err := hex.DecodeString(entry.Hash)
    if err != nil {
        return "", fmt.Errorf("invalid hash %s: %v", entry.Hash, err)
    }
    buffer.Write(hashBytes)
}

For each entry, we write:

  • The mode in octal format (like 100644 for regular files) followed by a space
  • The entry name as a string
  • A null byte (0x00) to terminate the name
  • The hash in binary format (converting from hex string to raw bytes)

The binary format for the hash is critical - Git stores hashes as 20 raw bytes, not as 40-character hex strings.

4. Compressing the Content


var compressed bytes.Buffer
zWriter := zlib.NewWriter(&compressed)
if _, err := zWriter.Write(buffer.Bytes()); err != nil {
    return "", fmt.Errorf("compression failed: %v", err)
}
zWriter.Close()

Git compresses all objects with zlib to save space:

  • We create a new buffer for the compressed output
  • Set up a zlib writer that outputs to this buffer
  • Write our formatted content through the zlib writer
  • Close the writer to ensure all data is flushed

5. Generating and Using the Hash

hash := sha1.Sum(compressed.Bytes())
hashString := hex.EncodeToString(hash[:])

hashPath := filepath.Join(objectsPath, hashString[:2], hashString[2:])

I would assume you are familiar with this from the blobs part. We construct the path by taing first 2 characters as directory name and rest of them as filename.

6. Writing to Disk

if err := os.MkdirAll(filepath.Dir(hashPath), 0755); err != nil {
    return "", fmt.Errorf("failed to create object directory: %v", err)
}

if err := os.WriteFile(hashPath, compressed.Bytes(), 0644); err != nil {
    return "", fmt.Errorf("failed to write object file: %v", err)
}

Finally, we:

  • Create the directory if it doesn’t exist yet (using 0755 permissions)
  • Write the compressed data to the file (using 0644 permissions)
  • Return the hash string, which serves as the identifier for this tree

By returning the hash, we allow callers to reference this tree in other objects, such as commits or parent trees, completing Git’s object graph structure.

Implementing the Read Method

The Read method is responsible for retrieving a tree object from the Git object database(from disk), decompressing it, and parsing its contents into our Tree structure. This is essentially the reverse of the Write operation we just implemented.

How the Read Method Should Work

To read a tree from disk, we need to:

  1. Locate the File: Use the hash to find the file in the objects directory
  2. Read and Decompress: Read the compressed data and decompress it using zlib
  3. Parse the Header: Extract and validate the object type and content size
  4. Parse Each Entry: For each entry in the tree content:
    • Extract the mode (permissions)
    • Extract the filename
    • Extract the 20-byte hash and convert to hex string
  5. Build the Tree: Create a new Tree object with all the parsed entries

Here’s the implementation of our Read function:

func Read(objectsPath, hash string) (*Tree, error) {
	hashPath := filepath.Join(objectsPath, hash[:2], hash[2:])
	compressed, err := os.ReadFile(hashPath)
	if err != nil {
		return nil, fmt.Errorf("failed to read object file: %v", err)
	}
	zReader, err := zlib.NewReader(bytes.NewReader(compressed))
	if err != nil {
		return nil, fmt.Errorf("failed to create zlib reader: %v", err)
	}
	defer zReader.Close()

	var buffer bytes.Buffer
	if _, err := io.Copy(&buffer, zReader); err != nil {
		return nil, fmt.Errorf("failed to decompress data: %v", err)
	}

	data := buffer.Bytes()
	header := bytes.SplitN(data, []byte{0}, 2)
	if len(header) != 2 {
		return nil, fmt.Errorf("invalid object format")
	}

	headerParts := bytes.Fields(header[0])
	if len(headerParts) != 2 || string(headerParts[0]) != "tree" {
		return nil, fmt.Errorf("not a tree object")
	}
	tree := New()
	content := header[1]
	for len(content) > 0 {

		spaceIndex := bytes.IndexByte(content, ' ')
		if spaceIndex == -1 {
			return nil, fmt.Errorf("invalid entry format")
		}

		nullIdx := bytes.IndexByte(content[spaceIndex+1:], 0)

		nullIdx += spaceIndex + 1

		mode, err := strconv.ParseInt(string(content[:spaceIndex]), 8, 32)
		if err != nil {
			return nil, fmt.Errorf("invalid mode: %v", err)
		}

		name := string(content[spaceIndex+1 : nullIdx])

		if len(content[nullIdx+1:]) < 20 {
			return nil, fmt.Errorf("truncated entry")
		}

		hash := hex.EncodeToString(content[nullIdx+1 : nullIdx+21])

		if err := tree.AddEntry(name, hash, int(mode)); err != nil {
			return nil, fmt.Errorf("failed to add entry: %v", err)
		}
		content = content[nullIdx+21:]
	}
	return tree, nil
}

Let’s break down the implementation one last time in this chapter:

  1. Locating and Reading the Compressed Object
hashPath := filepath.Join(objectsPath, hash[:2], hash[2:])
compressed, err := os.ReadFile(hashPath)
if err != nil {
    return nil, fmt.Errorf("failed to read object file: %v", err)
}

Just like with our blob implementation, we:

  • Construct the file path using the first two characters of the hash for the directory name
  • Use the remaining characters as the filename
  • Read the entire file into memory
  • Handle any errors that might occur if the file doesn’t exist or can’t be read
  1. Decompressing the Content
zReader, err := zlib.NewReader(bytes.NewReader(compressed))
if err != nil {
    return nil, fmt.Errorf("failed to create zlib reader: %v", err)
}
defer zReader.Close()

var buffer bytes.Buffer
if _, err := io.Copy(&buffer, zReader); err != nil {
    return nil, fmt.Errorf("failed to decompress data: %v", err)
}

Here we:

  • Create a zlib reader to decompress the data
  • Set up a deferred close to clean up resources
  • Create a buffer for the decompressed content
  • Use io.Copy to efficiently decompress and copy all data to our buffer
  • Handle any decompression errors that might occur
  1. Parsing The Header
data := buffer.Bytes()
header := bytes.SplitN(data, []byte{0}, 2)
if len(header) != 2 {
    return nil, fmt.Errorf("invalid object format")
}

headerParts := bytes.Fields(header[0])
if len(headerParts) != 2 || string(headerParts[0]) != "tree" {
    return nil, fmt.Errorf("not a tree object")
}

The header parsing:

  • Splits the decompressed data at the first null byte, separating header from content
  • Checks that we have both header and content parts
  • Splits the header into fields (type and size)
  • Verifies that the object is indeed a tree object
  • Ensures the header has the correct format

This validation is crucial as it prevents us from misinterpreting data from non-tree objects.

  1. Parsing Tree Entries
tree := New()
content := header[1]
for len(content) > 0 {
    spaceIndex := bytes.IndexByte(content, ' ')
    if spaceIndex == -1 {
        return nil, fmt.Errorf("invalid entry format")
    }

    nullIdx := bytes.IndexByte(content[spaceIndex+1:], 0)
    nullIdx += spaceIndex + 1

    mode, err := strconv.ParseInt(string(content[:spaceIndex]), 8, 32)
    if err != nil {
        return nil, fmt.Errorf("invalid mode: %v", err)
    }

    name := string(content[spaceIndex+1 : nullIdx])

    if len(content[nullIdx+1:]) < 20 {
        return nil, fmt.Errorf("truncated entry")
    }

    hash := hex.EncodeToString(content[nullIdx+1 : nullIdx+21])

    if err := tree.AddEntry(name, hash, int(mode)); err != nil {
        return nil, fmt.Errorf("failed to add entry: %v", err)
    }
    content = content[nullIdx+21:]
}

This is the most intricate part of the implementation, where we parse each tree entry:

  1. Find Mode and Name Boundaries:
  • Locate the space that separates mode from name
  • Find the null byte that terminates the name
  • Return an error if these delimiters aren’t found
  1. Parse Mode:
  • Extract the mode string (up to the space)
  • Convert it from octal string to integer
  • Validate it’s a valid number
  1. Extract Name:
  • Extract the characters between the space and null byte
  • Convert to a Go string
  1. Extract Hash:
  • Ensure there are at least 20 bytes remaining after the null byte
  • Take exactly 20 bytes (the binary SHA-1 hash)
  • Convert to a hex string for our internal representation
  1. Add Entry to Tree:
  • Use our AddEntry method to add this file/directory to the tree
  • This also validates the entry (mode, name, hash)
  1. Advance the Pointer:
  • Move our position pointer forward to the next entry
  • Continue until we’ve processed all content

The loop structure ensures we process each entry in sequence, correctly handling the binary format Git uses for tree objects.

  1. Returning the Constructed Tree
return tree, nil

Once we’ve successfully parsed all entries, we return the fully constructed tree object and a nil error to indicate success.

Testing Our Tree Implementation

Now that we’ve implemented both the Write and Read methods, we can test our tree implementation to ensure it’s working correctly:

go test ./internal/tree

If all tests pass, congratulations! You’ve successfully implemented Git’s tree structure, which is a fundamental building block for representing directory hierarchies in Git.

Summary

In this chapter, we’ve implemented Git’s tree structure, which allows us to represent directory hierarchies in our version control system. We’ve learned how Git stores file and directory information, how it links to blob objects, and how it handles permissions.

Our tree implementation:

  • Represents directory structures with files and subdirectories
  • Stores each entry with its name, hash, and file mode
  • Validates entries to ensure they meet Git’s requirements
  • Serializes and deserializes trees to/from Git’s binary format
  • Supports empty trees and special characters in filenames

This tree structure serves as a crucial foundation for our next chapter, where we’ll implement commits. Each commit will point to a tree representing the repository’s state at that point in time.

What’s Next?

In the next chapter, we’ll implement commit functionality, which will allow us to:

  • Create snapshots of our repository
  • Track the history of changes
  • Build a chain of commits that forms the project history
  • Store commit metadata like author, date, and message
  • Support branching and merging

The commit structure will build directly on the tree implementation we’ve just completed, using trees to represent the state of the repository at each commit. With trees and commits, we’ll have all the core components needed for a functioning Git-like version control system!