Top Banner
The Insides Of Git by Konstantin Nazarov
52
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: The internals of git

The Insides Of Gitby Konstantin Nazarov

Page 2: The internals of git

Why?

Page 3: The internals of git

Because It’s Simpler Than You Think

Page 4: The internals of git

It’s Like A Filesystem

In many ways you can just see git as a filesystem — it is content-addressable, and it has a notion of versioning.

Page 5: The internals of git

Which Boils Down To A Simple DAG(Directed Acyclic Graph)

Page 6: The internals of git

Let’s Start With Building Blocks

Page 7: The internals of git

• Blobs • Trees • Commits

Page 8: The internals of git

Blobs (Files) Data from files gets there

Page 9: The internals of git

But instead of names, git makes emphasis on data itself

by pointing to it with its SHA1 hash

Page 10: The internals of git

When you put a file in git, it compresses the data, puts it to objects/

and names as the hash of the original data.

Page 11: The internals of git

• Blobs • Trees • Commits

Page 12: The internals of git

Trees are used to name things. They are objects too.

Page 13: The internals of git

Trees can point to trees. Looks like directory structure.

Page 14: The internals of git

Two trees can reference the same data it’s not stored twice

Page 15: The internals of git

• Blobs • Trees • Commits

Page 16: The internals of git

Commits represent a “snapshot” of the top-level tree

Like trees, they are objects too

Page 17: The internals of git

Commits form history by referring to other commits

Page 18: The internals of git

• Blobs • Trees • Commits

Page 19: The internals of git

That’s it!

Page 20: The internals of git

That’s it!Git is just a bunch of zlib-compressed text files.

Page 21: The internals of git

I’ll show you how it looks inside

Page 22: The internals of git

The following section is in shell, so you may try it

yourself, just remember to use your actual SHA1 values,

not mine.

Page 23: The internals of git

$ mkdir gittest$ cd gittest

# initialize the empty git repository$ git init

Page 24: The internals of git

Reading/Writing Blobs

Page 25: The internals of git

# write a simple file to the object database$ echo 'homer' | git hash-object -w --stdin4aa0bfa07f1680c50a1567ecc37bc3b6aa567b8f

# -w means actually write the data. not just hash it.

Page 26: The internals of git

# write a simple file to the object database$ echo 'homer' | git hash-object -w --stdin4aa0bfa07f1680c50a1567ecc37bc3b6aa567b8f

# -w means actually write the data. not just hash it.

$ find .git/objects -type f.git/objects/4a/a0bfa07f1680c50a1567ecc37bc3b6aa567b8f

Page 27: The internals of git

# write a simple file to the object database$ echo 'homer' | git hash-object -w --stdin4aa0bfa07f1680c50a1567ecc37bc3b6aa567b8f

# -w means actually write the data. not just hash it.

$ find .git/objects -type f.git/objects/4a/a0bfa07f1680c50a1567ecc37bc3b6aa567b8f

$ git cat-file -p 4aa0bhomer

$ git cat-file -t 4aa0bblob

Page 28: The internals of git

As I told you, blobs are just compressed

data. Let’s check that.

Page 29: The internals of git

$ python>>> import zlib>>> f = open('.git/objects/4a/a0bfa07f1680c50a1567ecc37bc3b6aa567b8f')>>> print zlib.decompress(f.read())blob 6homer

Page 30: The internals of git

Reading/Writing Trees

Page 31: The internals of git

# Create directory structure$ mkdir foo$ echo "test" > foo/bar$ echo "test2" > baz

Page 32: The internals of git

# Create directory structure$ mkdir foo$ echo "test" > foo/bar$ echo "test2" > baz

# This is just a way to create the tree$ git update-index --add foo/bar baz$ git write-treeaf6c7364afaa4488d8c6edd44306b91b20dcba93

Page 33: The internals of git

# Create directory structure$ mkdir foo$ echo "test" > foo/bar$ echo "test2" > baz

# This is just a way to create the tree$ git update-index --add foo/bar baz$ git write-treeaf6c7364afaa4488d8c6edd44306b91b20dcba93

# This is how plain tree file looks like$ git cat-file -p af6c7100644 blob 180cf8328022becee9aaa2577a8f84ea2b9f3827 baz100644 blob 4200aa606ead5dd5777a0b391f085cc4f4690d04 bigfile.dat040000 tree 701ce0a12c61f997c092d30121a256d17144766a foo

Page 34: The internals of git

# Create directory structure$ mkdir foo$ echo "test" > foo/bar$ echo "test2" > baz

# This is just a way to create the tree$ git update-index --add foo/bar baz$ git write-treeaf6c7364afaa4488d8c6edd44306b91b20dcba93

# This is how plain tree file looks like$ git cat-file -p af6c7100644 blob 180cf8328022becee9aaa2577a8f84ea2b9f3827 baz100644 blob 4200aa606ead5dd5777a0b391f085cc4f4690d04 bigfile.dat040000 tree 701ce0a12c61f997c092d30121a256d17144766a foo

# And the child tree$ git cat-file -p 701ce0100644 blob 9daeafb9864cf43055ae93beb0afd6c7d144bfa4 bar

Page 35: The internals of git

# Create directory structure$ mkdir foo$ echo "test" > foo/bar$ echo "test2" > baz

# This is just a way to create the tree$ git update-index --add foo/bar baz$ git write-treeaf6c7364afaa4488d8c6edd44306b91b20dcba93

# This is how plain tree file looks like$ git cat-file -p af6c7100644 blob 180cf8328022becee9aaa2577a8f84ea2b9f3827 baz100644 blob 4200aa606ead5dd5777a0b391f085cc4f4690d04bigfile.dat040000 tree 701ce0a12c61f997c092d30121a256d17144766a foo

# And the child tree$ git cat-file -p 701ce0100644 blob 9daeafb9864cf43055ae93beb0afd6c7d144bfa4 bar

# And the data file$ git cat-file -p 9daeatest

Page 36: The internals of git

In general, the plain tree structure is like this:

# format:tree [content size]\0[mode] [file/folder name]\0[SHA-1 of referencing blob or tree]...[mode] [file/folder name]\0[SHA-1 of referencing blob or tree]

Page 37: The internals of git

Let’s try the same trick with python.

Since some data is binary, I’ve done a bit of pretty-printing.

Page 38: The internals of git

$ python>> import zlib>> f = open('.git/objects/46/c826e9c8119915961f6acb01f6f842fb1e444a')>> d = zlib.decompress(f.read())>> (head, _, tail) = d.replace('\x00', '\n', 1).partition('\n')>>> print head>>> while tail:... pos = tail.find('\x00')... print tail[:pos] + " " + ''.join(x.encode('hex') for x in tail[pos+1:pos+21])... tail = tail[pos+21:]...

Result:tree 100100644 baz df6b0d2bcc76e6ec0fca20c227104a4f28bac41b100644 bigfile.dat 4200aa606ead5dd5777a0b391f085cc4f4690d0440000 foo 701ce0a12c61f997c092d30121a256d17144766a

Page 39: The internals of git

Reading/Writing Commits

Page 40: The internals of git

# Get the last tree we've created$ git write-tree46c826e9c8119915961f6acb01f6f842fb1e444a

Page 41: The internals of git

# Get the last tree we've created$ git write-tree46c826e9c8119915961f6acb01f6f842fb1e444a

# actually do the commit$ echo '1st commit' | git commit-tree 46c82afa322a9790619a18ec6e751469008551b3a5c77

Page 42: The internals of git

# Get the last tree we've created$ git write-tree46c826e9c8119915961f6acb01f6f842fb1e444a

# actually do the commit$ echo '1st commit' | git commit-tree 46c82afa322a9790619a18ec6e751469008551b3a5c77

# and read back the raw commit file$ git cat-file -p afa32tree 46c826e9c8119915961f6acb01f6f842fb1e444aauthor Konstantin Nazarov <[email protected]> 1421934034 +0300committer Konstantin Nazarov <[email protected]> 1421934034 +0300

1st commit

Page 43: The internals of git

Let’s try the commit hierarchy

Page 44: The internals of git

# change the tree$ echo "test4" >baz$ git update-index --add baz$ git write-treefb74bbb3f99afed23612d2f03e5cd80775bd2f8a

Page 45: The internals of git

# change the tree$ echo "test4" >baz$ git update-index --add baz$ git write-treefb74bbb3f99afed23612d2f03e5cd80775bd2f8a

# commit it (also specify a parent)$ echo '2nd commit' | git commit-tree fb74b -p afa32224dde75daa6879629304840aa1fd3a76187aaba

Page 46: The internals of git

# change the tree$ echo "test4" >baz$ git update-index --add baz$ git write-treefb74bbb3f99afed23612d2f03e5cd80775bd2f8a

# commit it (also specify a parent)$ echo '2nd commit' | git commit-tree fb74b -p afa32224dde75daa6879629304840aa1fd3a76187aaba

# see how it's changed$ git cat-file -p 224ddtree fb74bbb3f99afed23612d2f03e5cd80775bd2f8aparent afa322a9790619a18ec6e751469008551b3a5c77author Konstantin Nazarov <[email protected]> 1421934840 +0300committer Konstantin Nazarov <[email protected]> 1421934840 +0300

2nd commit

Page 47: The internals of git

Now let’s dump the commit with python.

Just to prove there is no magic.

Page 48: The internals of git

$ python>>> import zlib>>> f = open('.git/objects/af/a322a9790619a18ec6e751469008551b3a5c77')>>> d = zlib.decompress(f.read())>>> print d.replace('\x00', ‘\n')

Result:commit 197tree 46c826e9c8119915961f6acb01f6f842fb1e444aauthor Konstantin Nazarov <[email protected]> 1421934034 +0300committer Konstantin Nazarov <[email protected]> 1421934034 +0300

1st commit

Page 49: The internals of git

As you see, no magic.Just plain text files, hashed with SHA1 and formed into a

graph

Page 50: The internals of git

So, what are branches then?

Page 51: The internals of git

Just references to the top commit!

$ cat .git/refs/heads/master6566bfcd3a111ea6a1cf594301c39c7c4b1baf3c

$ git cat-file -t 6566bfcommit

Page 52: The internals of git

Questions?