31 min read
Write your git - Part 3: Staging Area

In our previous chapters, we laid the groundwork for our Git clone by creating a repository structure and implementing blob storage. Now, we’ll dive into next Git component: the staging area, also known as the index.

What is the Staging Area?

Think of the staging area as a preparation room for your commits. It’s like a staging ground where you carefully select and arrange the files you want to include in your next snapshot of the project.

Imagine you’re packing a suitcase for a trip:

  • Your working directory is your entire wardrobe
  • The staging area is the collection of clothes you’ve carefully selected and laid out on the bed
  • The commit is when you actually pack those clothes into the suitcase

In Git terms:

  • Working directory = All your project files
  • Staging area = Files you’ve intentionally chosen to be part of the next commit
  • Commit = Saving a snapshot of those selected files

Project Structure

Let’s update our project structure to include the staging area implementation:

gitgo/
├── go.mod
└── internal/
    ├── blob/
    │   ├── blob.go
    │   └── blob_test.go
    ├── config/
    │   └── config.go
    ├── repository/
    │   ├── repository.go
    │   └── repository_test.go
    └── staging/           // NEW DIRECTORY
        ├── staging.go
        └── staging_test.go

The Index File Format

Git stores the staging area information in a binary file called index, typically located in .git/index. This file is more complex than you might expect. It contains:

  • A header with metadata
  • A list of staged entries
  • Each entry includes:
    • File path
    • File hash
    • File mode
    • Timestamp
    • Other metadata

Implementation Approach

As usual, we’ll start with creating tests that define the behavior of our staging before actual implementation.

Our staging will support:

  1. Adding files to the staging area
  2. Removing files from the staging area
  3. Listing staged files
  4. Checking if a file is staged
  5. Writing and reading the index file

Writing Tests First

Let’s start with our test file to define the expected behavior:

//internal/staging/staging_test.go
package staging

import (
	"fmt"
	"os"
	"path/filepath"
	"testing"

	"github.com/HalilFocic/gitgo/internal/config"
	"github.com/HalilFocic/gitgo/internal/repository"
)

func TestStaingArea(t *testing.T) {
	// Get the current working directory to set up a clean test environment
	cwd, err := os.Getwd()
	if err != nil {
		t.Fatalf("Failed to get working directory %v", err)
	}
	// Create a temporary test directory
	testDir := filepath.Join(cwd, "testdata")
	os.RemoveAll(testDir)

	// Create the test directory with proper permissions
	if err := os.MkdirAll(testDir, 0755); err != nil {
		t.Fatalf("Failed to created test directory %v", err)
	}
	// Ensure test directory is cleaned up after tests
	defer os.RemoveAll(testDir)

	// Change current directory to the test directory
	if err := os.Chdir(testDir); err != nil {
		t.Fatalf("Failed to change to test directory %v", err)
	}
	// Change back to original directory after tests
	defer os.Chdir(cwd)

	// Initialize a new git repository for testing
	_, err = repository.Init(".")
	if err != nil {
		t.Fatalf("Failed to initialize repository: %v", err)
	}

	// Test case 1.1: Verify adding a single file to staging
	t.Run("1.1: Add file to staging", func(t *testing.T) {
		// Create a test file with some content
		content := []byte("test content")
		if err := os.WriteFile("test.txt", content, 0644); err != nil {
			t.Fatalf("Failed to create test.txt file: %v", err)
		}

		// Create a new index (staging area)
		index, err := New(".")
		if err != nil {
			t.Fatalf("Failed to create index: %v", err)
		}
		// Ensure index is cleared after the test
		defer index.Clear()

		// Add the file to the staging area
		err = index.Add("test.txt")
		if err != nil {
			t.Fatalf("Failed to add file: %v", err)
		}

		// Verify the file was added correctly
		entries := index.Entries()
		if len(entries) != 1 {
			t.Fatalf("Expected 1 entry, got %d", len(entries))
		}
		if entries[0].Path != "test.txt" {
			t.Fatalf("Wrong path, expected test.txt, got %s", entries[0].Path)
		}
	})

	// Test case 1.2: Verify staging multiple files, including those in subdirectories
	t.Run("1.2: Multiple file staging", func(t *testing.T) {
		// Remove existing index to start fresh
		os.Remove(filepath.Join(config.GitDirName, "index"))

		// Create a new index
		index, err := New(".")
		if err != nil {
			t.Fatalf("failed to create index: %v", err)
		}
		defer index.Clear()

		// List of files to create, including a file in a subdirectory
		files := []string{"a.txt", "b.txt", "dir/c.txt"}

		// Create directories and files
		for _, f := range files {
			dir := filepath.Dir(f)
			if dir != "." {
				// Create subdirectories if needed
				if err = os.MkdirAll(dir, 0755); err != nil {
					t.Fatalf("Failed to create directory for %s: %v", f, err)
				}
			}

			// Write content to each file
			if err := os.WriteFile(f, []byte("content"), 0644); err != nil {
				t.Fatalf("Failed to create file %s: %v", f, err)
			}
		}

		// Add all files to the staging area
		for _, f := range files {
			err := index.Add(f)
			if err != nil {
				t.Fatalf("Failed to add %s: %v", f, err)
			}
		}

		// Verify all files were staged
		if len(index.Entries()) != len(files) {
			t.Fatalf("Expected %d entries, got %d", len(files), len(index.Entries()))
		}
	})

	// Test case 1.3: Verify that updating a staged file changes its hash
	t.Run("1.3 Update staged file", func(t *testing.T) {
		os.Remove(filepath.Join(config.GitDirName, "index"))
		index, err := New(".")
		if err != nil {
			t.Fatalf("Failed to create index %v", err)
		}
		defer index.Clear()

		// Create initial file
		if err = os.WriteFile("update.txt", []byte("initial"), 0644); err != nil {
			t.Fatalf("Failed to create update.txt file: %v", err)
		}
		err = index.Add("update.txt")
		if err != nil {
			t.Fatalf("Failed to add file: %v", err)
		}
		// Store the initial hash
		initialHash := index.Entries()[0].Hash

		// Update the file content
		if err := os.WriteFile("update.txt", []byte("updated"), 0644); err != nil {
			t.Fatalf("Failed to update the update.txt file:%v", err)
		}
		// Re-add the updated file
		err = index.Add("update.txt")
		if err != nil {
			t.Fatalf("Failed to add updated file:%v", err)
		}

		// Verify the hash has changed
		updatedHash := index.Entries()[0].Hash
		if initialHash == updatedHash {
			t.Fatalf("Hash should change when file is updated")
		}
	})

	// Test case 1.4: Verify removing a file from staging
	t.Run("1.4 Remove file from staging", func(t *testing.T) {
		os.Remove(filepath.Join(config.GitDirName, "index"))
		index, err := New(".")
		if err != nil {
			t.Fatalf("Failed to create index %v", err)
		}
		defer index.Clear()

		// Create a test file
		if err := os.WriteFile("remove.txt", []byte("content"), 0644); err != nil {
			t.Fatalf("Failed to create test file: %v", err)
		}

		// Add file to staging
		err = index.Add("remove.txt")
		if err != nil {
			t.Fatalf("Failed to add remove.txt to index: %v", err)
		}
		// Verify file is staged
		if !index.IsStaged("remove.txt") {
			t.Fatalf("remove.txt should have been staged")
		}
		// Remove file from staging
		err = index.Remove("remove.txt")
		if err != nil {
			t.Fatalf("Failed to remove file from staing: %v", err)
		}
		// Verify file is no longer staged
		if index.IsStaged("remove.txt") {
			t.Fatalf("File should not be staged after removal")
		}
	})

	// Test case 1.5: Verify writing and reading index
	t.Run("1.5: Write and read index", func(t *testing.T) {
		os.Remove(filepath.Join(config.GitDirName, "index"))
		index, err := New(".")
		if err != nil {
			t.Fatalf("Failed to created index: %v", err)
		}

		// Create test files
		files := []string{"write.txt", "read.txt"}

		// Create and stage files
		for _, f := range files {
			if err := os.WriteFile(f, []byte("content"), 0644); err != nil {
				t.Fatalf("Failed to create file %s: %v", f, err)
			}
			if err := index.Add(f); err != nil {
				t.Fatalf("Failed to add file %s: %v", f, err)
			} else {
				fmt.Printf("added to index : %v\n", f)
			}
		}
		// Write index to disk
		if err := index.Write(); err != nil {
			t.Fatalf("Failed to write to index :%v", err)
		}
		// Store original entries
		originalEntries := index.Entries()

		// Create a new index and read from disk
		newIndex, err := New(".")
		if err != nil {
			t.Fatalf("Failed to create second index:%v", err)
		}

		// Verify entries match
		newEntries := newIndex.Entries()
		if len(originalEntries) != len(newEntries) {
			t.Fatalf("Entries count missmatch: expected %d, go %d", len(originalEntries), len(newEntries))
		}
	})

	// Test case 1.6: Verify clearing the index
	t.Run("1.6: Clear index", func(t *testing.T) {
		os.Remove(filepath.Join(config.GitDirName, "index"))
		index, err := New(".")
		if err != nil {
			t.Fatalf("Failed to create index: %v", err)
		}

		// Create and stage a test file
		if err := os.WriteFile("clear.txt", []byte("content"), 0644); err != nil {
			t.Fatalf("Failed to create test file: %v", err)
		}

		err = index.Add("clear.txt")
		if err != nil {
			t.Fatalf("Failed to add file: %v", err)
		}

		// Verify index has entries before clearing
		if len(index.Entries()) == 0 {
			t.Error("Index should have entries before clear")
		}
		// Clear the index
		index.Clear()

		// Verify index is now empty
		if len(index.Entries()) != 0 {
			t.Error("Index should have no entries after clear")
		}
	})
}

I’ll provide a brief explaination of each test case in the TestStagingArea function:

  1. Add file to staging(1.1)

    • Verifies that a single file can be successfully added to the staging area
    • Checks that the index correctly captures the file’s path
    • Ensures only one entry is added when a single file is staged
  2. Multiple file staging (1.2)

    • Tests staging multiple files, including files in subdirectories
    • Confirms that files in different locations can be staged
    • Validates that the correct number of files are added to the index
    • Checks handling of nested directory structures
  3. Update staged file (1.3)

    • Demonstrates that modifying a file and re-adding it changes its hash
    • Ensures the index captures content changes
    • Validates that file updates are properly tracked in the staging area
  4. Remove file from staging (1.4)

    • Verifies the ability to remove a file from the staging area
    • Checks the IsStaged() method works correctly
    • Confirms that removing a file clears it from the index
  5. Write and read index (1.5)

    • Tests writing the index to disk
    • Validates that index entries can be written and read back
    • Ensures data persistence of the staging area
    • Checks that the number of entries remains consistent after writing and reading
  6. Clear index (1.6)

    • Verifies the ability to clear all entries from the index
    • Confirms that the Clear() method completely empties the staging area
    • Checks that an index can be reset to an empty state

Implementing the Staging Area

Struct definitions

Let’s start by defining the core structures that will power our staging area:

import (
    // we need these two now
	"os"
    "time"
    /* 
Rest of the imports will be used in the
functions we will implement later
*/
    "bufio"
    "crypto/sha1"
    "encoding/binary"
    "encoding/hex"
    "errors"
    "fmt"
    "io"
	"path/filepath"
	"sort"
	"strings"

	"github.com/HalilFocic/gitgo/internal/blob"
	"github.com/HalilFocic/gitgo/internal/config"
	"github.com/HalilFocic/gitgo/internal/repository"
)

type Entry struct {
    Path     string        // Relative path of the file in the repository
    Hash     string        // SHA-1 hash of the file's content
    Mode     os.FileMode   // File permissions and type
    Size     int64         // Size of the file
    Modified time.Time     // Last modification time
}

type Index struct {
    entries map[string]*Entry  // Map of staged files, keyed by their path
    root    string             // Root directory of the repository
}
  1. Entry struct

    • Captures all essential metadata about a staged file
    • Provides a complete snapshot of a file’s state at the time of staging
    • Uses os.FileMode to track file permissions
    • Stores the blob’s hash for content identification
  2. Index Struct

    • Manages the collection of staged files
    • Uses a map for efficient file tracking
    • Maintains the repository’s root path for relative path calculations

Implementing the New function:

Pseudo-code first as always

Before looking at the implementation, try to write a function that looks something like this:

function New(root_directory):
    # Validate the repository path
    convert path to absolute path

    # Check if it's a valid repository
    verify repository exists and is valid repository

    # Initialize the index
    create empty index structure
    - set root directory
    - create empty entries map

    # Try to read existing index (if any)
    attempt to read existing index file

    # Return initialized index
    return index or error

Actual Implementation

As usual, I will provide the function body and then explain it:

func New(root string) (*Index, error) {
	absPath, err := filepath.Abs(root)
	if err != nil {
		return nil, err
	}
	if _, err := os.Stat(absPath); os.IsNotExist(err) {
		return nil, errors.New("path does not exist")
	}
	if !repository.IsRepository(absPath) {
		return nil, fmt.Errorf("not a %s repository",config.GitDirName)
	}
	entries := make(map[string]*Entry)
	idx := &Index{
		root:    root,
		entries: entries,
	}
	err = idx.Read()
	if err != nil {
        return nil, fmt.Errorf("error while reading index: %v", err)
	}
	return idx, nil
}

Breakdown

Let’s break it down step by step:

  1. Convert the given path to an absolute path
  2. Verify the path exists
  3. Confirm it’s a valid GitGo repository
  4. Create an empty index structure
  5. Tries to read index
  6. Returns initialized index

The Read() method(which we’ll implement later) will load any existing staged files from the index file. For now, just understand that it attempts to read the existing index, preserving any previously staged files.

Implementing the Add Function

Our Add function will just add file to the staging area so it can be commited later.

Pseudo-code first as always

Before looking at the implementation, try to write a function that looks something like this:

function Add(path):
    # Prepare full and relative paths
    create absolute input path
    create objects path
    calculate relative path

    # Validate path
    ensure path is within repository
    check path is not a symlink

    # Read file contents
    read file contents

    # Create and store blob
    create blob from contents
    store blob in objects directory

    # Create index entry
    create entry with:
    - relative path
    - blob hash
    - file mode
    - file size
    - modification time

    # Update index
    add entry to index
    write index to disk

Actual Implementation

Here is the function body:

func (idx *Index) Add(path string) error {
	absInputPath := filepath.Join(idx.root, filepath.Clean(path))
	objectsPath := filepath.Join(idx.root, config.GitDirName, "objects")
	relPath, err := filepath.Rel(idx.root, absInputPath)
	if err != nil {
		return fmt.Errorf("failed to get relative path: %v", err)
	}
	if strings.HasPrefix(relPath, "..") {
		return fmt.Errorf("path %s is outside repository", path)
	}
	fileStat, err := os.Stat(absInputPath)
	if err != nil {
		return err
	}
	if fileStat.Mode()&os.ModeSymlink != 0 {
		return fmt.Errorf("symlinks are not supported")
	}
	content, err := os.ReadFile(absInputPath)
	if err != nil {
		return err
	}
	b, err := blob.New(content)
	if err != nil {
		return err
	}
	err = b.Store(objectsPath)
	if err != nil {
		return err
	}
	entry := Entry{
		Path:     relPath,
		Hash:     b.Hash(),
		Mode:     fileStat.Mode(),
		Size:     fileStat.Size(),
		Modified: fileStat.ModTime(),
	}
	idx.entries[relPath] = &entry
	err = idx.Write()
	if err != nil {
		return fmt.Errorf("failed to write index: %v", err)
	}
	return nil
}

Breakdown

  1. Path Preparation
absInputPath := filepath.Join(idx.root, filepath.Clean(path))
objectsPath := filepath.Join(idx.root, config.GitDirName, "objects")
  • Converts input path to absolute path
  • Cleans the path to remove any redundant separators
  • Prepares path to objects directory
  1. Path Validation
relPath, err := filepath.Rel(idx.root, absInputPath)
if strings.HasPrefix(relPath, "..") {
    return fmt.Errorf("path %s is outside repository", path)
}
  • Calculates relative path from repository root
  • Prevents adding files outside the repository
  1. File Checks
fileStat, err := os.Stat(absInputPath)
if fileStat.Mode()&os.ModeSymlink != 0 {
    return fmt.Errorf("symlinks are not supported")
}
  • Retrieves file metadata
  • Checks that the file is not a symbolic link
  1. Blob Creation
content, err := os.ReadFile(absInputPath)
b, err := blob.New(content)
err = b.Store(objectsPath)
  • Reads file contents
  • Creates a blob from the contents
  • Stores the blob in the objects directory

Implementing Remove Function

After adding files to our staging area, we of course need a way to remove them if needed. Let’s implement the Remove function which will unstage a file without affecting the actual file in the working directory.

Pseudo code


function Remove(path):
    # Prepare paths
    create absolute input path
    calculate relative path

    # Validate entry exists in index
    check if file is in staging area
    if not, return error

    # Remove from index
    delete entry from index map

    # Update index file
    write updated index to disk

Actual implementation

func (idx *Index) Remove(path string) error {
	absInputPath := filepath.Join(idx.root, filepath.Clean(path))
	relPath, err := filepath.Rel(idx.root, absInputPath)
	if err != nil {
		return err
	}
	_, ok := idx.entries[relPath]
	if !ok {
		return fmt.Errorf("File is not in index entries")
	}
	delete(idx.entries, relPath)
	err = idx.Write()
	if err != nil {
		return fmt.Errorf("failed to write to index: %v", err)
	}
	return nil
}

Breakdown

  1. Path Preparation

absInputPath := filepath.Join(idx.root, filepath.Clean(path))
relPath, err := filepath.Rel(idx.root, absInputPath)
  • Converts the input path to an absolute path
  • Calculates the relative path from the repository root

This ensures we’re using consistent path formats in our index.

  1. Entry Validation
_, ok := idx.entries[relPath]
if !ok {
   return fmt.Errorf("File is not in index entries")
}
  • Checks if the file exists in our staging area
  • Returns an error if trying to remove a file that isn’t staged

This prevents unnecessary operations and provides clear feedback

  1. Index Update
delete(idx.entries, relPath)
  • Removes the entry from our in-memory index map
  1. Persistence
err = idx.Write()
if err != nil {
   return fmt.Errorf("failed to write to index: %v", err)
}
  • Writes the updated index to disk
  • Ensures the changes are persisted for future operations

The beauty of this implementation is its simplicity. Unlike the Add function which needed to create and store blobs, Remove only needs to update the index itself. We don’t delete the blob objects because they might be referenced by other parts of the repository (like commits we’ll implement later).

Implementing Entries And IsStaged Functions

For these two functions, I will not explain each step, I will just provide them since I think there is not much complexity here.

  • Entries returns array of entries in that index.
  • IsStaged returns a boolean value indicating if the file is already staged.
func (idx *Index) Entries() []*Entry {
	slice := make([]*Entry, 0, len(idx.entries))
	for _, entry := range idx.entries {
		slice = append(slice, entry)
	}
	return slice
}

func (idx *Index) IsStaged(path string) bool {
	relPath, err := filepath.Rel(idx.root, path)
	if err != nil {
		return false
	}
	_, ok := idx.entries[relPath]
	return ok
}

Explaining Read and Write Functions

Since the following two functions are more complex than the previous ones, I will first give a bit of overview and some background information we should know before implementing them.

So the stagings area’s persistence relies on these two methods: Write() for saving the index to disk and Read() for loading it back. There functions deal with Git’s binary index format.

Git’s Index Format

So let’s understand the structure of the Git’s index file:

  1. Header: Contains a signature (“DIRC”), version number, and entry count
  2. Entries: A sorted list of file entries with metadata
  3. Checksum: A SHA-1 hash of the entire index content for integrity verification

Why binary format? It is more space-efficient than text-based alternatives and allows for faster parsing and writing, especially with large repositories.

The Index Header Structure

type IndexHeader struct {
    signature  [4]byte  // Magic bytes "DIRC" (DIRectory Cache)
    version    uint32   // Format version (we use 2)
    numEntries uint32   // Number of entries in the index
}

The signature “DIRC” helps Git identify that this is a valid index file. The version (2) specifies the format version we’re using, and numEntries tells us how many file entries are stored.

The Index Entry Structure

type IndexEntry struct {
    Ctimesec  uint32    // Creation time seconds
    Ctimenano uint32    // Creation time nanoseconds
    Mtimesec  uint32    // Modification time seconds
    Mtimenano uint32    // Modification time nanoseconds
    Dev       uint32    // Device ID
    Ino       uint32    // Inode number
    Mode      uint32    // File mode and permissions
    Size      uint32    // File size
    Hash      [20]byte  // SHA-1 hash of blob content
    Flags     uint16    // Entry flags, we use it for path length
    Path      []byte    // File path (variable length)
}

Each entry contian detailed metadata about a staged file, including:

  • Timestamps for creation and modification
  • File system information (device, inode)
  • File mode (permissions)
  • Content identifier (SHA-1 hash)
  • Path information

Understanding Big-Endian Byte Order

When working with binary data across different computer systems, byte order becomes critically important. In our implementation, we will use binary.BigEndian for reading and writing data, but what exactly does this mean?

Endianness Explained

Endianness refers to the order in which bytes are stored in memory for multi-byte values (like uint32). There are two primary types:

  1. Big-Endian: Stores the most significant byte first (at the lowest memory address)

    • Example: The number 0x12345678 would be stored as [12, 34, 56, 78]
  2. Little-Endian: Stores the least significant byte first

    • Example: The number 0x12345678 would be stored as [78, 56, 34, 12]

8-Byte Alignment and Padding

Another important aspect of Git’s index format is its 8-byte alignment requirement. This might seem like a technical detail, but it has important implications for performance and compatibility.

What is alignment?

If you used a garbage collected language(javascript,python,c#,go etc) you probably don’t know about memory alignment and padding. In memory systems, alignment means placing data at memory addresses that are multiples of some value (in this case, 8 bytes). Properly aligned data can be accessed more efficiently by CPUs.

How Padding Works in Our Implementation

padding := 8 - ((62 + len(indexEntry.Path) + 1) % 8)
if padding < 8 {
    zeros := make([]byte, padding)
    writer.Write(zeros)
}

Lets break this down:

  • 62 represents the total size of fixed fields (timestamps, mode, size, etc.)
  • len(indexEntry.Path) is the variable path length
  • + 1 accounts for the null terminator byte
  • The% 8 finds how many bytes remain in the last partial 8-byte block
  • 8 - remainder calculates how many padding bytes we need to add

For example, if the total size including path is 73 bytes:

  • 73 % 8 = 1 (1 byte into the next 8-byte block)
  • 8 - 1 = 7 (need 7 more padding bytes to reach the next 8-byte boundary)

The padding ensures each entry starts at an 8-byte boundary, which improves memory access efficiency.

Omitting Integrity Verification

In the original Git implementation, when reading the index file, Git verifies the SHA-1 checksum at the end to ensure the file hasn’t been corrupted. In our implementation, we write the checksum but don’t verify it when reading.

Why We Omitted It

We chose to omit integrity verification for several reasons:

  1. Simplicity: It reduces code complexity for our learning implementation
  2. Focus: It keeps the focus on the core concepts of the staging area
  3. Low Risk: For an educational implementation, the risk of corruption is minimal

How Git’s Integrity Verification Works

In production Git, integrity verification follows these steps:

  1. Read all index content except the last 20 bytes (the checksum)
  2. Calculate a SHA-1 hash of this content
  3. Compare with the stored checksum at the end of the file
  4. If they don’t match, the index is considered corrupted

The trade-off we make by omitting this verification is that our implementation won’t detect corrupted index files, but it significantly simplifies the code while still demonstrating the core concepts of Git’s staging area.

Implementing Write Function

For Read and Write, I won’t be giving the pseudo code. Just the implementation and explaination of previously mentioned

Write implementation code

func (idx *Index) Write() error {
	indexPath := filepath.Join(idx.root, config.GitDirName, "index")
	file, err := os.Create(indexPath)
	if err != nil {
		return fmt.Errorf("Failed to create index file: %v", err)
	}
	defer file.Close()

	hash := sha1.New()
	writer := bufio.NewWriter(io.MultiWriter(file, hash))

	header := IndexHeader{
		signature:  [4]byte{'D', 'I', 'R', 'C'},
		version:    2,
		numEntries: uint32(len(idx.entries)),
	}

	if err := binary.Write(writer, binary.BigEndian, header.signature); err != nil {
		return fmt.Errorf("failed to write signature: %v", err)
	}

	if err := binary.Write(writer, binary.BigEndian, header.version); err != nil {
		return fmt.Errorf("failed to write signature: %v", err)
	}

	if err := binary.Write(writer, binary.BigEndian, header.numEntries); err != nil {
		return fmt.Errorf("failed to write signature: %v", err)
	}

	paths := []string{}
	for path := range idx.entries {
		paths = append(paths, path)
	}
	sort.Strings(paths)

	for _, path := range paths {
		entry := idx.entries[path]

		indexEntry := IndexEntry{
			Ctimesec:  uint32(entry.Modified.Unix()),
			Ctimenano: uint32(entry.Modified.Nanosecond()),
			Mtimesec:  uint32(entry.Modified.Unix()),
			Mtimenano: uint32(entry.Modified.Nanosecond()),
			Dev:       0,
			Ino:       0,
			Path:      []byte(path),
			Mode:      uint32(entry.Mode),
			Size:      uint32(entry.Size),
			Flags:     uint16(len(entry.Path)),
		}

		hashBytes, err := hex.DecodeString(entry.Hash)
		if err != nil {
			return fmt.Errorf("failed to decode hash %v", err)
		}

		copy(indexEntry.Hash[:], hashBytes)

		if err := binary.Write(writer, binary.BigEndian, indexEntry.Ctimesec); err != nil {
			return fmt.Errorf("failed to write ctime sec: %v", err)
		}
		if err := binary.Write(writer, binary.BigEndian, indexEntry.Ctimenano); err != nil {
			return fmt.Errorf("failed to write ctime nano: %v", err)
		}
		if err := binary.Write(writer, binary.BigEndian, indexEntry.Mtimesec); err != nil {
			return fmt.Errorf("failed to write mtime sec: %v", err)
		}
		if err := binary.Write(writer, binary.BigEndian, indexEntry.Mtimenano); err != nil {
			return fmt.Errorf("failed to write mtime nano: %v", err)
		}
		if err := binary.Write(writer, binary.BigEndian, indexEntry.Dev); err != nil {
			return fmt.Errorf("failed to write dev: %v", err)
		}
		if err := binary.Write(writer, binary.BigEndian, indexEntry.Ino); err != nil {
			return fmt.Errorf("failed to write ino: %v", err)
		}
		if err := binary.Write(writer, binary.BigEndian, indexEntry.Mode); err != nil {
			return fmt.Errorf("failed to write mode: %v", err)
		}
		if err := binary.Write(writer, binary.BigEndian, indexEntry.Size); err != nil {
			return fmt.Errorf("failed to write size: %v", err)
		}
		if err := binary.Write(writer, binary.BigEndian, indexEntry.Hash); err != nil {
			return fmt.Errorf("failed to write hash: %v", err)
		}
		if err := binary.Write(writer, binary.BigEndian, indexEntry.Flags); err != nil {
			return fmt.Errorf("failed to write flags: %v", err)
		}
		if _, err := writer.Write(indexEntry.Path); err != nil {
			return fmt.Errorf("failed to write path: %v", err)
		}

		if err := writer.WriteByte(0); err != nil {
			return fmt.Errorf("failed to write path terminator: %v", err)
		}

		padding := 8 - ((62 + len(indexEntry.Path) + 1) % 8)
		if padding < 8 {
			zeros := make([]byte, padding)
			if _, err := writer.Write(zeros); err != nil {
				return fmt.Errorf("failed to write padding: %v", err)
			}
		}
	}
	if err := writer.Flush(); err != nil {
		return fmt.Errorf("failed to flush writer: %v", err)
	}
	if _, err := file.Write(hash.Sum(nil)); err != nil {
		return fmt.Errorf("failed to write checksum: %v", err)
	}

	return nil

}

Breakdown

Let’s break down the Write() function to understand exactly how we store our staging area to disk in Git’s binary format:

  1. File setup and Initialization
indexPath := filepath.Join(idx.root, config.GitDirName, "index")
file, err := os.Create(indexPath)
if err != nil {
    return fmt.Errorf("Failed to create index file: %v", err)
}
defer file.Close()

This part:

  • Constructs the full path to the index file (.gitgo/index)
  • Creates or truncates the file at that location, making it ready for writing
  • Sets up a deferred close operation to ensure the file gets closed even if errors occur
  • Returns an error if the file cannot be created
  1. Checksum and Writer
hash := sha1.New()
writer := bufio.NewWriter(io.MultiWriter(file, hash))
  • Create a new SHA-1 hash calculator
  • Set up a special writer that can write to multiple destinations simultaneously:
    • The file itself for storage
    • The hash calculator for integrity verification
  • Use a buffered writer to improve performance by minimizing system calls
  1. Creating and Writing the Header
header := IndexHeader{
    signature:  [4]byte{'D', 'I', 'R', 'C'},
    version:    2,
    numEntries: uint32(len(idx.entries)),
}
if err := binary.Write(writer, binary.BigEndian, header.signature); err != nil {
    return fmt.Errorf("failed to write signature: %v", err)
}
if err := binary.Write(writer, binary.BigEndian, header.version); err != nil {
    return fmt.Errorf("failed to write signature: %v", err)
}
if err := binary.Write(writer, binary.BigEndian, header.numEntries); err != nil {
    return fmt.Errorf("failed to write signature: %v", err)
}

In this section, we:

  • Create a header structure with:
    • The magic signature “DIRC” as bytes (stands for DIRectory Cache)
    • Version 2 of the index format
    • The number of entries in our index
  • Write each component in big-endian byte order to ensure compatibility
  • Return detailed errors if any of these writes fail
  1. Sorting Paths for Entries
paths := []string{}
for path := range idx.entries {
    paths = append(paths, path)
}
sort.Strings(paths)
  • Extracts all file paths from our map data structure
  • Sorts them alphabetically
  • Ensures our entries are written in a consistent order, as required by Git’s format
  1. Processing Each Entry For each file path, we process the corresponding entry:
for _, path := range paths {
    entry := idx.entries[path]
    indexEntry := IndexEntry{
        Ctimesec:  uint32(entry.Modified.Unix()),
        Ctimenano: uint32(entry.Modified.Nanosecond()),
        Mtimesec:  uint32(entry.Modified.Unix()),
        Mtimenano: uint32(entry.Modified.Nanosecond()),
        Dev:       0,
        Ino:       0,
        Path:      []byte(path),
        Mode:      uint32(entry.Mode),
        Size:      uint32(entry.Size),
        Flags:     uint16(len(entry.Path)),
    }

Here, we:

  • Retrieve the entry from our map using the path
  • Convert our internal Entry structure to the binary IndexEntry format:
    • Store both creation and modification times as seconds and nanoseconds
    • Set device and inode values to 0 (simplified for our implementation)
    • Convert the path to bytes for binary storage
    • Include file mode (permissions) and size
    • Use flags to store the path length
  1. Processing Entry Hash
hashBytes, err := hex.DecodeString(entry.Hash)
if err != nil {
    return fmt.Errorf("failed to decode hash %v", err)
}
copy(indexEntry.Hash[:], hashBytes)

This section:

  • Converts our hex string hash (like “a1b2c3…”) to raw bytes
  • Copies these bytes into the fixed-size hash field
  • Returns an error if the hash cannot be decoded (e.g., if it’s not valid hex)
  1. Writting Entry Fields
if err := binary.Write(writer, binary.BigEndian, indexEntry.Ctimesec); err != nil {
    return fmt.Errorf("failed to write ctime sec: %v", err)
}
if err := binary.Write(writer, binary.BigEndian, indexEntry.Flags); err != nil {
    return fmt.Errorf("failed to write flags: %v", err)
}
// ... [REST OF THE FIELDS ARE ALSO WRITTEN] ...

For each entry, we:

  • Write all the fixed-size fields in sequence
  • Use the same big-endian byte order for consistency
  • Include specific error messages to identify where any write failure occurs
  1. Writing Variable-Length Path

if _, err := writer.Write(indexEntry.Path); err != nil {
    return fmt.Errorf("failed to write path: %v", err)
}
if err := writer.WriteByte(0); err != nil {
    return fmt.Errorf("failed to write path terminator: %v", err)
}

After the fixed fields, we:

  • Write the variable-length path as bytes
  • Add a null byte (0) as a terminator, following C-style string conventions
  • Return errors if either of these writes fails
  1. Finalizing the Write Operation
if err := writer.Flush(); err != nil {
    return fmt.Errorf("failed to flush writer: %v", err)
}
if _, err := file.Write(hash.Sum(nil)); err != nil {
    return fmt.Errorf("failed to write checksum: %v", err)
}
return nil

After writing all entries, we:

  • Flush the buffered writer to ensure all data is written to disk
  • Calculate the final SHA-1 hash and append it to the file
  • Return nil to indicate success or an error if either operation fails

Paddding

We skipped explaining the padding part of the code since that was already covered before.

Read Function Implementation

The Read function is essentially the inverse of the Write function we just examined, so rather than explaining it line-by-line, let’s focus on the key differences and challenges.

Read code:

func (idx *Index) Read() error {
	indexPath := filepath.Join(idx.root, config.GitDirName, "index")
	stat, err := os.Stat(indexPath)
	if err != nil {
		if os.IsNotExist(err) {
			idx.Clear()
			return nil
		}
		return fmt.Errorf("failed to stat index file: %v", err)
	}
	if stat.Size() == 0 {
		idx.Clear()
		return nil
	}

	file, err := os.Open(indexPath)
	if err != nil {
		return fmt.Errorf("failed to open index file: %v", err)
	}
	defer file.Close()

	reader := bufio.NewReader(file)

	header := IndexHeader{}

	if err := binary.Read(reader, binary.BigEndian, &header.signature); err != nil {
		return fmt.Errorf("failed to read signature %v", err)
	}

	if err := binary.Read(reader, binary.BigEndian, &header.version); err != nil {
		return fmt.Errorf("failed to read version %v", err)
	}

	if err := binary.Read(reader, binary.BigEndian, &header.numEntries); err != nil {
		return fmt.Errorf("failed to read num entries %v", err)
	}
	idx.Clear()

	for i := uint32(0); i < header.numEntries; i++ {
		indexEntry := IndexEntry{}

		if err := binary.Read(reader, binary.BigEndian, &indexEntry.Ctimesec); err != nil {
			return fmt.Errorf("failed to read ctimesec %v", err)
		}
		if err := binary.Read(reader, binary.BigEndian, &indexEntry.Ctimenano); err != nil {
			return fmt.Errorf("failed to read ctimenano %v", err)
		}
		if err := binary.Read(reader, binary.BigEndian, &indexEntry.Mtimesec); err != nil {
			return fmt.Errorf("failed to read mtimesec %v", err)
		}
		if err := binary.Read(reader, binary.BigEndian, &indexEntry.Mtimenano); err != nil {
			return fmt.Errorf("failed to read mtimenano %v", err)
		}
		if err := binary.Read(reader, binary.BigEndian, &indexEntry.Dev); err != nil {
			return fmt.Errorf("failed to read dev %v", err)
		}
		if err := binary.Read(reader, binary.BigEndian, &indexEntry.Ino); err != nil {
			return fmt.Errorf("failed to read ino %v", err)
		}
		if err := binary.Read(reader, binary.BigEndian, &indexEntry.Mode); err != nil {
			return fmt.Errorf("failed to read mode %v", err)
		}
		if err := binary.Read(reader, binary.BigEndian, &indexEntry.Size); err != nil {
			return fmt.Errorf("failed to read size %v", err)
		}
		if err := binary.Read(reader, binary.BigEndian, &indexEntry.Hash); err != nil {
			return fmt.Errorf("failed to read hash %v", err)
		}
		if err := binary.Read(reader, binary.BigEndian, &indexEntry.Flags); err != nil {
			return fmt.Errorf("failed to read flags %v", err)
		}

		path, err := reader.ReadBytes(0)
		if err != nil {
			return fmt.Errorf("failed tyo read path: %v", err)
		}
		path = path[:len(path)-1]

		padding := 8 - ((62 + len(path) + 1) % 8)
		if padding < 8 {
			if _, err := reader.Discard(padding); err != nil {
				return fmt.Errorf("failed to skip padding %v", err)
			}
		}

		entry := &Entry{
			Path:     string(path),
			Hash:     hex.EncodeToString(indexEntry.Hash[:]),
			Mode:     os.FileMode(indexEntry.Mode),
			Size:     int64(indexEntry.Size),
			Modified: time.Unix(int64(indexEntry.Mtimesec), int64(indexEntry.Mtimenano)),
		}
		idx.entries[entry.Path] = entry
	}
	return nil
}

Differences from Write

  1. Empty File Handling: Unlike Write, Read needs to handle the case of an empty or non-existent index file gracefully:
if stat.Size() == 0 {
    idx.Clear()
    return nil
}
  1. Direction of Operations: Instead of serializing and writing, we’re reading and deserializing:
  • Write: Go data structure → binary format → disk
  • Read: Disk → binary format → Go data structure
  1. Initialization: Read clears the existing index before populating it with data from disk:

One Last Method

I swear, we are almost done. We just have one super simple method to implement which is Clear. It is responsible for reseting the index file and making it contain 0 entries and storing that on disk.

Code

func (idx *Index) Clear() {
	idx.entries = make(map[string]*Entry)
	err := idx.Write()
	if err != nil {
		fmt.Printf("failed to clear index\n")
	}
}

Testing Our Implementation

To run the tests:

go test ./internal/staging


What We’ve Built

Congrats 🎉 ! You have gone a long way to come here. I promise that the next chapters will not introduce much more complexity.

In this chapter, we’ve implemented:

  • A staging area (index) mechanism
  • Ability to add files to the staging area
  • Basic index file management
  • Integration with our blob storage system

Why This Matters

The staging area is crucial because it:

  • Allows selective committing of changes
  • Provides a preparation step before creating a commit
  • Enables granular control over version tracking

What’s Next

In the next chapter, we’ll implement Git’s tree functionality, which is the missing link between our individual file blobs and complete snapshots that Git commit represents.

We’ll discover:

  • How Git efficiently represents entire directory structures
  • Why trees are crucial for Git’s content-addressable storage system
  • How Git handles nested directories through recursive design

If you’ve ever wondered how Git tracks not just file contents but entire directory structures, the next chapter is for you!.