18 min read
Write your git - Part 5: Commits

In previous chapters, we’ve built several foundational components of our Git clone. We’ve implemented repository initialization, blob storage for file contents, a staging area to track changes, and tree structures to represent directories. Now it’s time to implement one of Git’s most important features: commits.

Commits represent snapshots of your entire repository at specific points in time, creating a historical record of your project’s evolution. Each commit can connect to previous commit(if there was one). This enables features like branching, merging and time travel through codebase.

In this chapter, we’ll implement Git’s commit functionality, which will allow us to:

  • Create repository snapshots at a point in time
  • Store metadata like author, timestamp, and commit message
  • Link commits to form a history chain through parent references
  • Store and retrieve commits from the object database

By the end of this chapter, we’ll have a functional commit system that records changes to our repository and maintains a history of those changes.

What is a Commit in Git?

In Git, a commit is an object that represents a snapshot of your repository at a specific point in time. But more specifically, a commit contains:

  1. Tree Hash: A reference to the root tree object that represents the state of your repository’s files and directories
  2. Parent Hash: A reference to the previous commit (or commits in case of a merge)
  3. Author: Information about who created the commit, typically name and email
  4. Timestamp: When the commit was created
  5. Message: A description of the changes made in the commit

If we have to visualise a single commit, it would look something like this:

Commit 67e0119

├── Tree: a1b2c3... (Points to the root directory tree)

├── Parent: 2757411 (Points to the previous commit)

├── Author: John Doe <john@example.com>

├── Date: Mon Mar 29 14:32:19 2025 -0700

└── Message: feat: implemented some feature

The first commit in a repository (often called the “root commit”) doesn’t have a parent.

Commit Object Format

In Git’s object database, a commit is stored in a specific format. Here’s what the raw content of a commit object looks like:

commit [size]\0tree a1b2c3d4e5f6g7h8i9j0k1l2m3n4o5p6q7r8s9t0
parent b1c2d3e4f5g6h7i8j9k0l1m2n3o4p5q6r7s8t9u0
author John Doe <john@example.com> 1716577234 -0700

This is the commit message.
It can span multiple lines.

The format consists of:

  • An object header: commit [size]\0 where [size] is the length of the content and \0 is a null byte
  • Content lines, including:
    • A tree reference
    • Optional parent reference(s)
    • Author information with timestamp and timezone
    • A blank line separator
    • The commit message

In our implementation, we’re using a simplified format that includes only the author field, while the full Git implementation also includes a separate “committer” field that can differ from the author. For our purposes, this simplification works well while maintaining compatibility with Git’s core functionality.

Like blobs and trees, this content is compressed with zlib and stored in the objects directory, identified by its SHA-1 hash.

The Commit Chain

Commits in Git form a directed acyclic graph (DAG), where each commit points to its parent(s):

A <--- B <--- C <--- D (HEAD)
       ^
        \
         --- E <--- F (feature)

In this diagram:

  • D is the latest commit on the main branch
  • F is the latest commit on a feature branch
  • B is a common ancestor of both branches
  • A is the root commit with no parent

This structure allows Git to efficiently track multiple development paths and merge them together.

Project structure

Let’s update our project structure to include the commit implementation:

gitgo/
├── go.mod
└── internal/
    ├── blob/                # From part 2
    │   ├── blob.go
    │   └── blob_test.go
    ├── config/              # From part 1
    │   └── config.go
    ├── repository/          # From part 1
    │   ├── repository.go
    │   └── repository_test.go
    ├── staging/             # From part 3
    │   ├── staging.go
    │   └── staging_test.go
    ├── tree/                # From part 4
    │   ├── tree.go
    │   └── tree_test.go
    └── commit/              # NEW DIRECTORY
        ├── commit.go
        └── commit_test.go

Tests first

As with our previous components, we’ll start by writing tests to define the expected behavior of our commit implementation.

// internal/commit/commit_test.go
package commit

import (
	"os"
	"path/filepath"
	"testing"
)

func TestCommitCreation(t *testing.T) {
	t.Run("1.1: valid commit creation", func(t *testing.T) {
		treeHash := "1234567890123456789012345678901234567890"
		parentHash := "abcdef1234567890abcdef1234567890abcdef12"
		author := "John Doe <john@example.com>"
		message := "Initial commit"

		commit, err := New(treeHash, parentHash, author, message)
		if err != nil {
			t.Fatalf("Failed to create valid commit: %v", err)
		}

		if commit.TreeHash != treeHash {
			t.Errorf("TreeHash = %s; want %s", commit.TreeHash, treeHash)
		}
		if commit.ParentHash != parentHash {
			t.Errorf("ParentHash = %s; want %s", commit.ParentHash, parentHash)
		}
		if commit.Author != author {
			t.Errorf("Author = %s; want %s", commit.Author, author)
		}
		if commit.Message != message {
			t.Errorf("Message = %s; want %s", commit.Message, message)
		}
		if commit.AuthorDate.IsZero() {
			t.Error("AuthorDate should not be zero")
		}
	})
	t.Run("1.2: Test first commit", func(t *testing.T) {
		treeHash := "1234567890123456789012345678901234567890"
		author := "John Doe <john@example.com>"
		message := "First commit"

		commit, err := New(treeHash, "", author, message)
		if err != nil {
			t.Fatalf("Failed to create first commit: %v", err)
		}

		if commit.ParentHash != "" {
			t.Errorf("First commit should have empty ParentHash, got %s", commit.ParentHash)
		}
	})
	t.Run("1.3: Invalid tree hash", func(t *testing.T) {
		cases := []struct {
			hash string
			desc string
		}{
			{"123", "too short"},
			{"1234567890123456789012345678901234567890extra", "too long"},
			{"123456789012345678901234567890123456789g", "non-hex character"},
			{"", "empty"},
		}

		for _, tc := range cases {
			_, err := New(tc.hash, "", "John Doe <john@example.com>", "test")
			if err == nil {
				t.Errorf("Expected error for invalid tree hash (%s)", tc.desc)
			}
		}
	})
	t.Run("1.4: Invalid parent hash", func(t *testing.T) {
		validTree := "1234567890123456789012345678901234567890"
		cases := []struct {
			hash string
			desc string
		}{
			{"123", "too short"},
			{"1234567890123456789012345678901234567890extra", "too long"},
			{"123456789012345678901234567890123456789g", "non-hex character"},
		}

		for _, tc := range cases {
			_, err := New(validTree, tc.hash, "John Doe <john@example.com>", "test")
			if err == nil {
				t.Errorf("Expected error for invalid parent hash (%s)", tc.desc)
			}
		}
	})
	t.Run("1.5: Invalid author format", func(t *testing.T) {
		validTree := "1234567890123456789012345678901234567890"
		cases := []struct {
			author string
			desc   string
		}{
			{"John Doe", "missing email"},
			{"<john@example.com>", "missing name"},
			{"John Doe john@example.com", "missing brackets"},
			{"", "empty"},
		}

		for _, tc := range cases {
			_, err := New(validTree, "", tc.author, "test")
			if err == nil {
				t.Errorf("Expected error for invalid author format (%s)", tc.desc)
			}
		}
	})

	t.Run("1.6: Invalid message", func(t *testing.T) {
		validTree := "1234567890123456789012345678901234567890"
		validAuthor := "John Doe <john@example.com>"

		_, err := New(validTree, "", validAuthor, "")
		if err == nil {
			t.Error("Expected error for empty message")
		}
	})

	t.Run("1.7: Multi-line message", func(t *testing.T) {
		validTree := "1234567890123456789012345678901234567890"
		validAuthor := "John Doe <john@example.com>"
		message := "First line\nSecond line\nThird line"

		commit, err := New(validTree, "", validAuthor, message)
		if err != nil {
			t.Fatalf("Failed to create commit with multi-line message: %v", err)
		}

		if commit.Message != message {
			t.Errorf("Message not preserved exactly. Got %q, want %q", commit.Message, message)
		}
	})

}

func TestCommitStorage(t *testing.T) {
	t.Run("2.1: Write and read commit", func(t *testing.T) {
		cwd, err := os.Getwd()
		if err != nil {
			t.Fatalf("Failed to get current directory: %v", err)
		}

		testDir := filepath.Join(cwd, "testdata")
		os.RemoveAll(testDir)
		os.MkdirAll(filepath.Join(testDir, ".gitgo", "objects"), 0755)
		defer os.RemoveAll(testDir)

		treeHash := "1234567890123456789012345678901234567890"
		parentHash := "abcdef1234567890abcdef1234567890abcdef12"
		author := "John Doe <john@example.com>"
		message := "Test commit message"

		commit, err := New(treeHash, parentHash, author, message)
		if err != nil {
			t.Fatalf("Failed to create commit: %v", err)
		}

		hash, err := commit.Write(filepath.Join(testDir, ".gitgo", "objects"))
		if err != nil {
			t.Fatalf("Failed to write commit: %v", err)
		}

		readCommit, err := Read(filepath.Join(testDir, ".gitgo", "objects"), hash)
		if err != nil {
			t.Fatalf("Failed to read commit: %v", err)
		}

		if readCommit.TreeHash != commit.TreeHash {
			t.Errorf("Tree hash mismatch: got %s, want %s", readCommit.TreeHash, commit.TreeHash)
		}
		if readCommit.ParentHash != commit.ParentHash {
			t.Errorf("Parent hash mismatch: got %s, want %s", readCommit.ParentHash, commit.ParentHash)
		}
		if readCommit.Author != commit.Author {
			t.Errorf("Author mismatch: got %s, want %s", readCommit.Author, commit.Author)
		}
		if readCommit.Message != commit.Message {
			t.Errorf("Message mismatch: got %s, want %s", readCommit.Message, commit.Message)
		}

		/*
        This additional logic was added since we didn't store nanoseconds
		on disk and regular date comparison was failing.
        With this logic we check if difference between
		written and read commit is greater than 1 second.
		*/
		timeDiff := readCommit.AuthorDate.Sub(commit.AuthorDate)
		if timeDiff.Seconds() > 1 {
			t.Errorf("Dates differ by more than 1 second: got %v, want %v",
				readCommit.AuthorDate, commit.AuthorDate)
		}
	})

	t.Run("2.2: Multi-line commit message", func(t *testing.T) {
		cwd, _ := os.Getwd()
		testDir := filepath.Join(cwd, "testdata")
		os.RemoveAll(testDir)
		os.MkdirAll(filepath.Join(testDir, ".gitgo", "objects"), 0755)
		defer os.RemoveAll(testDir)

		message := "First line\nSecond line\nThird line"
		commit, _ := New(
			"1234567890123456789012345678901234567890",
			"",
			"John Doe <john@example.com>",
			message,
		)

		hash, err := commit.Write(filepath.Join(testDir, ".gitgo", "objects"))
		if err != nil {
			t.Fatalf("Failed to write commit: %v", err)
		}

		readCommit, err := Read(filepath.Join(testDir, ".gitgo", "objects"), hash)
		if err != nil {
			t.Fatalf("Failed to read commit: %v", err)
		}

		if readCommit.Message != message {
			t.Errorf("Multi-line message not preserved.\nGot:\n%s\nWant:\n%s",
				readCommit.Message, message)
		}
	})

	t.Run("2.3: First commit (no parent)", func(t *testing.T) {
		cwd, _ := os.Getwd()
		testDir := filepath.Join(cwd, "testdata")
		os.RemoveAll(testDir)
		os.MkdirAll(filepath.Join(testDir, ".gitgo", "objects"), 0755)
		defer os.RemoveAll(testDir)

		commit, _ := New(
			"1234567890123456789012345678901234567890",
			"",
			"John Doe <john@example.com>",
			"First commit",
		)

		hash, err := commit.Write(filepath.Join(testDir, ".gitgo", "objects"))
		if err != nil {
			t.Fatalf("Failed to write commit: %v", err)
		}

		readCommit, err := Read(filepath.Join(testDir, ".gitgo", "objects"), hash)
		if err != nil {
			t.Fatalf("Failed to read commit: %v", err)
		}

		if readCommit.ParentHash != "" {
			t.Errorf("Expected empty parent hash, got %s", readCommit.ParentHash)
		}
	})
}

Our test cover several important scenarios:

  1. Basic Commit Creation (1.1 - 1.7):
  • Valid commit with tree hash, parent hash, author, and message
  • First commit with no parent (empty parent hash)
  • Input validation for tree hash, parent hash, author format, and message
  • Support for multi-line commit messages
  1. Commit Storage (2.1 - 2.3):
  • Writing a commit to disk and reading it back
  • Preserving multi-line messages through write/read cycles
  • Properly handling first commits with no parent

Implementation overview

Now that we have our tests, let’s implement the commit functionality. Here’s what our commit.go file will contain:


// internal/commit/commit.go
package commit
import (
	"bytes"
	"compress/zlib"
	"crypto/sha1"
	"encoding/hex"
	"fmt"
	"io"
	"os"
	"path/filepath"
	"regexp"
	"strconv"
	"time"
)

type Commit struct {
	TreeHash   string
	ParentHash string
	Author     string
	AuthorDate time.Time
	Message    string
}

func New(treeHash string, parentHash string, author string, message string) (*Commit, error) {}
func (c *Commit) Write(objectsPath string) (string, error) {}
func Read(objectsPath, hash string) (*Commit, error) {}

Our key components are:

  1. Commit struct: Holds all information about a commit
  2. New function: Creates and validates a new commit
  3. Write method: Serializes and stores a commit to disk
  4. Read function: Retrieves and parses a commit from disk

Let’s implement each of these functions one by one.

Implementing the New Function

The New function is responsible for creating a new Commit struct and validating its inputs. Let’s see how it looks:

func New(treeHash string, parentHash string, author string, message string) (*Commit, error) {
	if len(treeHash) != 40 {
		return nil, fmt.Errorf("expected treehash to have length 40, got %d", len(treeHash))
	}
	if !regexp.MustCompile(`^[0-9a-f]{40}$`).MatchString(treeHash) {
		return nil, fmt.Errorf("tree hash must contain only hex characters")
	}
	if parentHash != "" && len(parentHash) != 40 {
		return nil, fmt.Errorf("if present, parent hash must be 40 characters, got %d", len(parentHash))
	}
	if parentHash != "" && !regexp.MustCompile(`^[0-9a-f]{40}$`).MatchString(parentHash) {
		return nil, fmt.Errorf("parent hash must contain only hex characters")
	}
	if len(message) == 0 {
		return nil, fmt.Errorf("commit message cannot be empty")
	}
	authorRegex := regexp.MustCompile(`^([^<]+)\s+<([^>]+)>$`)
	if !authorRegex.MatchString(author) {
		return nil, fmt.Errorf("invalid author format, must be 'Name <email>'")
	}
	commit := Commit{
		TreeHash:   treeHash,
		ParentHash: parentHash,
		Author:     author,
		AuthorDate: time.Now(),
		Message:    message,
	}
	return &commit, nil
}

Let’s analyze this implementation:

  1. Tree Hash Validation:
  • Ensures the tree hash is exactly 40 characters (the length of a SHA-1 hash in hex)
  • Verifies it contains only valid hexadecimal characters
  1. Parent Hash Validation:
  • If present (not empty), ensures it’s also 40 characters
  • If present, verifies it contains only valid hexadecimal characters
  • Allows empty parent hash for the first commit in a repository
  1. Message Validation:
  • Ensures the commit message is not empty
  • Preserves any formatting, including multiple lines
  1. Author Format Validation:
  • Uses a regular expression to verify the author string follows the format “Name
  • This matches Git’s standard author format
  1. Commit Creation:
  • Creates a new Commit struct with the provided values
  • Sets the author date to the current time
  • Returns a pointer to the commit and nil error if validation passes

Implementing the Write Method

The Write method serializes a commit to Git’s object format and stores it in the objects directory. Here’s the implementation:

func (c *Commit) Write(objectsPath string) (string, error) {
	timestamp := c.AuthorDate.Unix()
	timezone := c.AuthorDate.Format("-0700")

	content := fmt.Sprintf("tree %s\n", c.TreeHash)
	if c.ParentHash != "" {
		content += fmt.Sprintf("parent %s\n", c.ParentHash)
	}
	content += fmt.Sprintf("author %s %d %s\n\n%s",
		c.Author,
		timestamp,
		timezone,
		c.Message)
	data := fmt.Sprintf("commit %d\x00%s", len(content), content)
	var compressed bytes.Buffer
	zw := zlib.NewWriter(&compressed)
	if _, err := zw.Write([]byte(data)); err != nil {
		return "", fmt.Errorf("failed to compress data: %v", err)
	}
	zw.Close()

	hash := sha1.Sum(compressed.Bytes())
	hashStr := hex.EncodeToString(hash[:])
	hashPath := filepath.Join(objectsPath, hashStr[:2], hashStr[2:])
	if err := os.MkdirAll(filepath.Dir(hashPath), 0755); err != nil {
		return "", fmt.Errorf("failed to create object directory: %v", err)
	}

	if err := os.WriteFile(hashPath, compressed.Bytes(), 0644); err != nil {
		return "", fmt.Errorf("failed to write object file: %v", err)
	}

	return hashStr, nil
}

Let’s break down this implementation:

  1. Format the Timestamp and Timezone:
timestamp := c.AuthorDate.Unix()
timezone := c.AuthorDate.Format("-0700")
  • Converts the author date to Unix timestamp (seconds since epoch)
  • Formats the timezone as a string like “-0700” (representing UTC offset)
  1. Build the Commit Content:
content := fmt.Sprintf("tree %s\n", c.TreeHash)
if c.ParentHash != "" {
    content += fmt.Sprintf("parent %s\n", c.ParentHash)
}
content += fmt.Sprintf("author %s %d %s\n\n%s",
    c.Author,
    timestamp,
    timezone,
    c.Message)
  • Creates the core commit content in Git’s format
  • Includes the tree hash reference
  • Adds parent hash only if one exists (skipped for first commits)
  • Adds author information with timestamp and timezone
  • Includes the commit message after a blank line
  1. Create the Full Object Data:
data := fmt.Sprintf("commit %d\x00%s", len(content), content)
  • Prefixes the content with object type (“commit”), size, and null byte
  • This follows Git’s standard object format
  1. Compress the Data:
var compressed bytes.Buffer
zw := zlib.NewWriter(&compressed)
if _, err := zw.Write([]byte(data)); err != nil {
    return "", fmt.Errorf("failed to compress data: %v", err)
}
zw.Close()
  • Compresses the object data using zlib
  • Stores the compressed content in a buffer
  1. Calculate the Hash and Path:
hash := sha1.Sum(compressed.Bytes())
hashStr := hex.EncodeToString(hash[:])
hashPath := filepath.Join(objectsPath, hashStr[:2], hashStr[2:])
  • Calculates the SHA-1 hash of the compressed data
  • Converts the hash to a hex string
  • Constructs the file path using the first two characters as directory name
  1. Store the Commit:
if err := os.MkdirAll(filepath.Dir(hashPath), 0755); err != nil {
    return "", fmt.Errorf("failed to create object directory: %v", err)
}
if err := os.WriteFile(hashPath, compressed.Bytes(), 0644); err != nil {
    return "", fmt.Errorf("failed to write object file: %v", err)
}
  • Creates the directory if it doesn’t exist
  • Writes the compressed data to the file
  • Returns the commit hash as identifier

This implementation follows Git’s format for commit objects, making it compatible with real Git repositories.

Implementing the Read Function

Finally, let’s implement the Read function to retrieve and parse a commit from the objects database

func Read(objectsPath, hash string) (*Commit, error) {
	hashPath := filepath.Join(objectsPath, hash[:2], hash[2:])
	compressed, err := os.ReadFile(hashPath)
	if err != nil {
		return nil, fmt.Errorf("failed to read object file: %v", err)
	}

	zr, err := zlib.NewReader(bytes.NewReader(compressed))
	if err != nil {
		return nil, fmt.Errorf("failed to create zlib reader: %v", err)
	}
	defer zr.Close()

	var buffer bytes.Buffer
	if _, err := io.Copy(&buffer, zr); err != nil {
		return nil, fmt.Errorf("failed to decompress data: %v", err)
	}

	data := buffer.Bytes()
	parts := bytes.SplitN(data, []byte{0}, 2)
	if len(parts) != 2 {
		return nil, fmt.Errorf("invalid format")
	}

	header := bytes.Fields(parts[0])
	if len(header) != 2 || string(header[0]) != "commit" {
		return nil, fmt.Errorf("not a commit object")
	}

	content := parts[1]
	lines := bytes.Split(content, []byte{'\n'})

	var treeHash, parentHash, author string
	var authorTime time.Time
	var message string

	messageStart := 0
	for i, line := range lines {
		if len(line) == 0 {
			messageStart = i + 1
			break
		}

		fields := bytes.Fields(line)
		if len(fields) < 2 {
			return nil, fmt.Errorf("invalid line format")
		}

		switch string(fields[0]) {
		case "tree":
			treeHash = string(fields[1])
		case "parent":
			parentHash = string(fields[1])
		case "author":
			if len(fields) < 4 {
				return nil, fmt.Errorf("invalid author line")
			}
			authorEnd := len(fields) - 2
			author = string(bytes.Join(fields[1:authorEnd], []byte(" ")))

			timestamp, err := strconv.ParseInt(string(fields[authorEnd]), 10, 64)
			if err != nil {
				return nil, fmt.Errorf("invalid timestamp: %v", err)
			}

			timezone := string(fields[authorEnd+1])
			tzHours, err := strconv.Atoi(timezone[1:3])
			if err != nil {
				return nil, fmt.Errorf("invalid timezone hours: %v", err)
			}
			tzMinutes, err := strconv.Atoi(timezone[3:])
			if err != nil {
				return nil, fmt.Errorf("invalid timezone minutes: %v", err)
			}
			tzOffset := (tzHours*60 + tzMinutes) * 60
			if timezone[0] == '-' {
				tzOffset = -tzOffset
			}
			authorTime = time.Unix(timestamp, 0).In(time.FixedZone("", tzOffset))
		}
	}

	message = string(bytes.Join(lines[messageStart:], []byte{'\n'}))

	commit := &Commit{
		TreeHash:   treeHash,
		ParentHash: parentHash,
		Author:     author,
		AuthorDate: authorTime,
		Message:    message,
	}

	return commit, nil
}

Let’s break down this more complex function:

  1. Read and Decompress the Object:
hashPath := filepath.Join(objectsPath, hash[:2], hash[2:])
compressed, err := os.ReadFile(hashPath)
zr, err := zlib.NewReader(bytes.NewReader(compressed))
var buffer bytes.Buffer
io.Copy(&buffer, zr)
  • Locates the commit object using its hash
  • Reads the compressed data from disk
  • Decompresses it using zlib
  1. Parse the Header:
data := buffer.Bytes()
parts := bytes.SplitN(data, []byte{0}, 2)
header := bytes.Fields(parts[0])
if len(header) != 2 || string(header[0]) != "commit" {
    return nil, fmt.Errorf("not a commit object")
}
  • Splits the data at the null byte to separate header from content
  • Verifies it’s a commit object by checking the header type
  1. Parse the Content:
content := parts[1]
lines := bytes.Split(content, []byte{'\n'})
  • Splits the content into lines for easier parsing
  • Prepares to extract commit information
  1. Extract Commit Information:
messageStart := 0
for i, line := range lines {
    if len(line) == 0 {
        messageStart = i + 1
        break
    }
    
    fields := bytes.Fields(line)
    switch string(fields[0]) {
    case "tree":
        treeHash = string(fields[1])
    case "parent":
        parentHash = string(fields[1])
    case "author":
        // Extract author, timestamp, and timezone ... 
    }
}
  • Iterates through each line before the blank line
  • Extracts tree hash, parent hash, and author information
  • Determines where the commit message starts
  1. Parse Author Information:
authorEnd := len(fields) - 2
author = string(bytes.Join(fields[1:authorEnd], []byte(" ")))

timestamp, err := strconv.ParseInt(string(fields[authorEnd]), 10, 64)

timezone := string(fields[authorEnd+1])
// Parse timezone offset and create time with zone
  • Extracts the author name and email
  • Parses the Unix timestamp
  • Handles the timezone information to recreate the precise time
  1. Extract the Message:
message = string(bytes.Join(lines[messageStart:], []byte{'\n'}))
  • Joins all lines after the blank line as the commit message
  • Preserves original formatting and line breaks
  1. Create and Return the Commit:
commit := &Commit{
    TreeHash:   treeHash,
    ParentHash: parentHash,
    Author:     author,
    AuthorDate: authorTime,
    Message:    message,
}
return commit, nil
  • Constructs a new Commit struct with all parsed information
  • Returns it to the caller

This implementation handles all the complexities of Git’s commit format, including multi-line messages, timezone information, and optional parent references.

Testing Our Implementation

Now that we’ve implemented all components of our commit functionality, let’s test it to ensure everything works as expected:

go test ./internal/commit

If all tests pass, congratulations! You’ve successfully implemented Git’s commit functionality, which is a cornerstone of any version control system.

Summary

In this chapter, we’ve implemented Git’s commit functionality, which enables us to create snapshot records of our repository. The key achievements include:

  1. Commit Creation and Validation: We’ve built a robust system to create commits with proper validation of tree hash, parent hash, author format, and message content.
  2. Object Serialization: We’ve implemented the serialization and deserialization of commit objects according to Git’s binary format, including proper content formatting and zlib compression.
  3. Metadata Tracking: Our commit implementation stores essential metadata including the author, timestamp, and commit message.
  4. History Tracking: By linking commits through parent references, we’ve created the foundation for Git’s historical tracking capabilities.
  5. Time Representation: We’ve handled timestamp formatting according to Git’s standards, storing time in Unix seconds with timezone information.

These components work together to create a system that can track changes to files, store snapshots of the entire repository, and maintain a chronological history of those changes.

What’s Next?

In the next chapter, we’ll implement Git’s references system, which will allow us to work with branches and track the HEAD of our repository. References are Git’s way of creating user-friendly pointers to specific commits.

We’ll implement:

  1. Reference Types: Both symbolic references (pointing to other references) and direct references (pointing to commits)
  2. HEAD Management: Tracking the current state of the repository with the HEAD file
  3. Branch Operations: Creating, reading, and deleting branches
  4. Branch Listing: Displaying all branches in the repository

Our refs implementation will build on top of the commit system we’ve just created, providing a convenient way to navigate and manipulate the commit history. With references, we’ll complete the core functionality needed for a basic Git implementation, enabling users to branch their development and track different lines of work.

The references system is the final piece that makes Git’s history truly usable, providing named access points to the commit graph.