How the GIT Was Tempered
P.S.V.Rand related enlightenments
“I'M AN EGOTISTICAL BASTARD, AND I NAME ALL MY
PROJECTS AFTER MYSELF. FIRST
'LINUX', NOW ‘GIT’.” !
—TORVALDS
The internal of git…
“ In many ways you can just see git as a filesystem — it is content-addressable, and it has a notion of versioning, but I really really designed it coming at the problem from the viewpoint of a filesystem person (hey, kernels is what I do), and I actually have absolutely zero interest in creating a traditional SCM system.
”—Torvalds
git is an append-only object database
When you make a commit into git
.git/objects/1f/7a7a472abf3dd9643fd615f6da379c4acb3e3a!
.git/objects/83/baae61804e65cc73a7201a7252750c76066a30!
.git/objects/81/fa49b077972391ad58037050f2a75f74e3671e!
.git/objects/82/0155eb4229851634a0f03eb265b69f5a2d56f3!
1f/7a7a472abf3dd9643fd615f6da379c4acb3e3a
83/baae61804e65cc73a7201a7252750c76066a30
BLOB, WITH THE FILE CONTENT INSIDE TREE, POINTING TO BLOB
COMMIT, REFERRING TO THE ROOT TREE82/0155eb4229851634a0f03eb265b69f5a2d56f3
81/fa49b077972391ad58037050f2a75f74e3671e YET ANOTHER TREE, POINTING TO THE ABOVE TREE
packs: !
storing blobs as their changes relative to other blobs
Ex. GIT LOG: !COMMIT A7CF7ABC07C3BDFAB20777A53BEB87A78F321E71 AUTHOR: P.S.V.R <[email protected]> DATE: WED MAY 28 20:03:35 2014 +0800 ! FINISH STAT GRAPHS ON INDEX OF NEW CTRL
Ex. commit fileGIT CAT-FILE -P A7CF7ABC07C3BDFAB20777A53BEB87A78F321E71: !TREE B6C9D5D2350DC2A2688F77C44E1111079AE3DDDE PARENT C8B1333B2F0178B83DA1105BF4F007042BAC9559 AUTHOR P.S.V.R <[email protected]> 1401278615 +0800 COMMITTER P.S.V.R <[email protected]> 1401278615 +0800 !FINISH STAT GRAPHS ON INDEX OF NEW CTRL
Ex. tree fileGIT CAT-FILE -P B6C9D5D2350DC2A2688F77C44E1111079AE3DDDE: !100644 BLOB D3753DF93BEF131B986E048D80AA256EBE057A9A .GITIGNORE 100644 BLOB 37AAA62F0AB02262EF72D8655BCB08CD398D5250 CAPFILE 100644 BLOB 7E8283AB544615AEC4CBF7016373835082B9BDD0 GEMFILE 100644 BLOB 32AE7784034F9C699BA39DB38429A7406784B3C4 GEMFILE.LOCK 100644 BLOB 95FE48313CA4C379AE626E2A2C7DE1048E8852AD README.RDOC 100644 BLOB BA6B733DD2358D858F00445EBD91C214F0F5D2E5 RAKEFILE 040000 TREE 9AD60F8DF1B3A4157EEF0EEE595F126F85AA60EC APP 040000 TREE A88C0FFCFF9A34C0305DAD029B1ECDDA911FEBD0 BIN 100644 BLOB 5BC2A619E83EA182B17E2507C5E0F2F07F7CF18C CONFIG.RU 040000 TREE 83DDDC313F6DD604B7AC8563FA8A98A0A431A240 CONFIG 040000 TREE 02F4FB8F1ADE3DAA988358FBB38C2482C01D9DE3 DB 040000 TREE 15B47D00D07EACA133CDEE405B93F19C5CEA445D DOC 040000 TREE E6E911CEEA9F405F20D350575D3DF81688A7D9ED LIB 040000 TREE 29A422C19251AEAEB907175E9B3219A9BED6C616 LOG 100644 BLOB 0B909558925D0D53054CA5860A6D4F0A6DD2CAAF ROUTES.TXT 100644 BLOB 26967D52C9DD1B2A194BE97789BA59927C5C26B6 RUBOCOP.TXT 040000 TREE C6483A352B8DC06A3F13DE50F4BF4D0CDB64C1E7 TEST 040000 TREE 8084E8E04D510CC28321F30A9646477CC50C235C VENDOR
Ex. plain file GIT CAT-FILE -P D3753DF93BEF131B986E048D80AA256EBE057A9A: !# SEE HTTPS://HELP.GITHUB.COM/ARTICLES/IGNORING-FILES FOR MORE ABOUT IGNORING FILES. # # IF YOU FIND YOURSELF IGNORING TEMPORARY FILES GENERATED BY YOUR TEXT EDITOR # OR OPERATING SYSTEM, YOU PROBABLY WANT TO ADD A GLOBAL IGNORE INSTEAD: # GIT CONFIG --GLOBAL CORE.EXCLUDESFILE '~/.GITIGNORE_GLOBAL' !# IGNORE BUNDLER CONFIG. /.BUNDLE !…
Demo time
cd .git
ls objects
finds stored at [sha]{0..2}/[sha]{3..-1}
git cat-file -p [sha]
Enlightenments
X-men, days of future past !
Can we do that in information systems?
the GIT model might be extended to external systems. These external systems could submit transactions, revert transactions and easily check out versions to go back in time…
an Analogy
General ledger accounts
Account Balance
Receipts
Git repositories
Current working directory
Commits
an Analogy
Information Systems
Read
Create, Update, Delete
Git repositories
Current working directory
Commits
traditional sandbox-involved workflowsI have some Excel data that I want to get imported
I created a DB sandbox, copied dump files
I dumped db, restored db -> import & Review -> I restored db, reimport -> Review & Fix errors -> I restored db, reimport -> Review & Fix errors -> I restored db, reimport -> ...
Time wasted at db dumps and db restores ☹
reversible actions
There are scenarios where a user may want to persist potential future changes but does not want to trigger the next activity in the process or to be subject to validation.
The transactions that were not accepted or never committed could later be deleted if desired as they were never an official part.
How to implement this?
INFORMATION SYSTEM
RELATIONAL DATABASE
BASED ON
One (perhaps naive) way of implementation
INFORMATION SYSTEM
RELATIONAL DATABASE GIT REPOSITORY
BASED ON BASED ON
Git, the old way
.GIT WORKING DIRECTORY
RDaaWD, the next big thing ;P
.GIT RELATIONAL DATABASE AS A WORKING DIRECTORY
How to RDaaWD: insert
INSERT RECORD #1 INTO TABLE PEOPLE
4 OBJECTS WILL BE CREATED IN .GIT.git/objects/1f/7a7a472abf3dd9643fd615f6da379c4acb3e3a!.git/objects/83/baae61804e65cc73a7201a7252750c76066a30!.git/objects/81/fa49b077972391ad58037050f2a75f74e3671e!.git/objects/82/0155eb4229851634a0f03eb265b69f5a2d56f3!
1f/7a7a472abf3dd9643fd615f6da379c4acb3e3a
83/baae61804e65cc73a7201a7252750c76066a30
BLOB, SERIALIZED VERSION OF RECORD #1 TREE, POINTING TO ID OF RECORD #1
COMMIT, REFERRING TO THE CHANGE82/0155eb4229851634a0f03eb265b69f5a2d56f3
81/fa49b077972391ad58037050f2a75f74e3671e TREE, POINTING TO ANOTHER TREE NAMED “PEOPLE”
How to RDaaWD: update
UPDATE RECORD ID=1 WITH NEW VALUES
1. RE-SERIALIZE RECORD #1, WRITE A NEW BLOB OBJECT FOR IT 2. WRITE A NEW TREE OBJECT POINTING TO THE NEW BLOB 3. WRITE A NEW TREE OBJECT POINTING TO THE NEW TREE WITH THE TABLE NAME 4. WRITE A NEW COMMIT OBJECT
How to RDaaWD: view objects at version x
GET OBJECT WITH ID=1
1. FIND “PEOPLE” TREE OF VERSION X 2. FIND SERIALIZED BLOB FOR ID=1 ON THAT TREE
1. DESERIALIZE IN MEMORY, OVERWRITE VALUES 2. SHOW IT TO THE USER
How to RDaaWD: rollback to version x
FOUND RECORDS WITH NEW VALUES
1. FIND “PEOPLE” TREE OF VERSION X 2. FIND SERIALIZED BLOB FOR RECORD 1
1. DESERIALIZE IN MEMORY, OVERWRITE VALUES 2. PERSIST THE OVERWRITTEN VERSION TO DB
How to RDaaWD: optimized branching
CREATE NEW BRANCH X
CREATE NEW DB SCHEMAS TO CACHE THE “WORKING DIRECTORY”
EX. POSTGRESQL SCHEMAS !
IT IS A WAY TO NAMESPACE TABLES AND THIS CAN BECOME QUITE POWERFUL WHEN WE COMBINE IT WITH SEARCH PATHS. WE CAN SET A
SEARCH PATH IN POSTGRES TO SPECIFY WHICH SCHEMAS THE DATABASE SHOULD LOOK IN THE DATA FOR.