Version control with GIT

For zombies…… By Zeeshan Khan

A method for centrally storing files

Keeping a record of changes

Who did what, when in the system

Covering yourself when things inevitably go wrong

Another “trendy” word combination

Something that every software developer should deal with

You can avoid using version control

But it can’t last long

You will need to collaborate eventually

It might be tricky sometimes

But you can avoid most problems

Recommendations:

Stick to basic working cycle

Learn basic working cycle commands

Practice on sandbox project

Allows a team to share code

Maintains separate “production” versions of code that are always deployable

Allows simultaneous development of different features on the same codebase

Keeps track of all old versions of files

Prevents work being overwritten

There are version control tools even for designers:

There is version control functionality embedded in:

Adobe version cue PixelNovel Timeline

Microsoft Word OpenOffice.org Writer

Branch - a copy of a set of files under version control which may be developed at different speeds or in different ways

Checkout - to copy the latest version of (a file in) the repository to your working copy

Commit - to copy (a file in) your working copy back into the repository as a new version

Merge - to combine multiple changes made to different working copies of the same files in the repository

Repository - a (shared) database with the complete revision history of all files under version control

Trunk - the unique line of development that is not a branch

Update - to retrieve and integrate changes in the repository since the update.

Working copy - your local copies of the files under version control you want to edit

• CVS

• Subversion

• VSS, TFS, Vault

• ClearCase

• AccuRev

Centralized (client-server model)

• Git

• Mercurial

• Bazzar

• Perforce

• BitKeeper

Distributed

CVS etc GIT etc.

Users commits changes to the central repository and a

new version is born to be checked out by other users

CENTRALIZED WORKFLOW Branch by Release

Release 1

Release 2

Branch

V1.0 V1.1 V1.2

V2.0 V2.1 V2.2

• Branch by Feature / Task

Feature 1

Main Trunk

Branch Merge

BranchFeature 2 Merge

CENTRALIZED WORKFLOW Access the central server and ‘pull’ down the changes

others have made

Make your changes, and test them

Commit (*) your changes to the central server, so other programmers can see them.

(*) Work out the merge conflicts (windiff, built in tools etc.)

Canonical Repository

Local Repository

3. Local repository is update

from canonical repository

2. Pushes changes to the

canonical repository

4. Working copy is updated

from local repository

1. Commits changes to

the local repository

Each user has a full local copy of the repository. Users commit changes

and when they want to share it, they push it to the shared repository

DISTRIBUTED WORKFLOW• Simple

Add Commit

• Branch by Member/ Features

Init/Clone Push

Development Trunk

V1.0 V1.1

Main Trunk

Developer 1

Developer 2

DISTRIBUTED WORKFLOW Each developer ‘clones’ a copy of a repository to their own machine.

The full history of the project is on their own hard drive.

Two phase commits: You commit first to your local staging area, and then push to the repository.

Central Repository is not mandatory, but you usually have one

Examples of distributed source control systems

Git, Mercurial, Bazaar

Single repository

Commit requires

connection (no staging

area).

Impossible to commit

changes to another user

All history in one place

Reintegrating the branch

might be a pain

Considered to be not so

fast as DVCS

Multiple repositories

Commit does not require connection (due to staging area)

Possible to commit changes to another user

Impossible to get all history

Easier branches management (especially reintegration)

Considered to be faster than CVCS

Centralized Distributed

• Speed

• Simple design

• Strong support for thousands of parallel branches

• Fully distributed

• Able to handle larges projects like Linux kernel effectively

• Ensure integrity

Snapshots of the filesystem are saved in every commit instead of saving the differences

• Fetch or clone (create a copy of the remote repository) (compare to cvscheck out)

• Modify the files in the local branch

• Stage the files (no cvs comparison)

• Commit the files locally (no cvscomparison)

• Push changes to remote repository (compare to cvs commit)

• Git directory: stores the metadata and

object database for your project.

• Working directory: a single checkout of

one version of the project

• Staging area (Index): file contained in

your Git directory that stores information

about what will go into he next commit

Untracked: files in your working directory

that were not in the last snapshot and are not

in staging area.

Unmodified: tracked but not modified

(initial clone)

Modified: tracked and modified

Staged: identified for next commit

There are four elementary object types in Git:

blob - a file.

tree - a directory.

commit - a particular state of the working directory.

tag - an annotated tag (we will ignore this one for now).

A blob is simply the content of a particular file plus some

meta-data.

A tree is a plain text file, which contains a list of blobs and/or trees with their corresponding file modes and names.

A commit is also a plain text file containing information about the author of the commit, a timestamp and references to the parent commit(s) and the corresponding tree.

All objects are compressed with the DEFLATE algorithm and stored in the git object database under .git/objects.

Everything is check-summed before it is stored

Everything is referred to by that checksum.

SHA-1 hash is used for making checksum hash.

Every commit is referred to by that SHA-1 hash.

Cannot change the contents of any file or directory without Git knowing about it

The Secure Hash Algorithm is a 160 bit cryptographic hash

function used in TLS, SSH, PGP, . . .

Every object is identified and referenced by its SHA-1 hash.

Every time Git accesses an object, it validates the hash.

Linus Torvalds: ”Git uses SHA-1 in a way which has nothing at all to do with security. [...] It’s about the ability to trust your data.”

If you change only a single character in a single file, all hashes up to the commit change!

Creating a new repository:

$ git init

Cloning from an existing repository:

$ git clone https://github.com/dbrgn/fahrplan

SPECIFIC CHANGES:

$ git add *.py

$ git add README.rst

$ git commit -m 'First commit'

ALL CHANGES:

$ git commit -am 'First commit'

FROM STAGING AREA

$ git rm --cached file.py

FROM INDEX AND FILE SYSTEM

$ git rm file.py

Git tracks content, not files. Although there is a move command...

$ git mv file1 file2

...this is the same as...

$ mv file1 file2

$ git rm file1

$ git add file2

SHOWING STATUS:

$ git status

SHOWING LOG (ENTIRE PAGED)

$ git log

SHOWING LOG (DATE FILTERING)

$ git log --since=2.weeks

$ git log --since="2 years 1 day 3 minutes ago"

LAST COMMIT

$ git show

SPECIFIC COMMIT

$ git show 1776f5

$ git show HEAD^

UNSTAGED CHANGES

$ git diff

STAGED CHANGES

$ git diff --cached

RELATIVE TO SPECIFIC REVISION

$ git diff 1776f5

$ git diff HEAD^

CHANGE LAST COMMIT

$ git commit --amend

UNSTAGE STAGED FILE

$ git reset HEAD file.py

UNMODIFY MODIFIED FILE

$ git checkout -- file.py

REVERT A COMMIT

$ git revert 1776f5

This is a file describing the files that are to be ignored from git tracking

Blank lines or lines starting with # are ignored

Standard glob patterns work

End pattern with slash (/) to specify a directory

Negate pattern with exclamation point (!)

$ cat .gitignore

/doc/[abc]*.txt

.pypirc

Other clones of the same repository

Can be local (another checkout) or remote (coworker, central server)

There are default remotes for push and pull

$ git remote -v

origin git://github.com/schacon/ticgit.git (fetch)

origin git://github.com/schacon/ticgit.git (push)

WITHOUT DEFAULT

$ git push <remote> <branch>

SETTING A DEFAULT

$ git push -u <remote> <branch>

THEN...

$ git push

FETCH & MERGE

$ git pull [<remote> <branch>]

FETCH & REBASE

$ git pull --rebase [<remote> <branch>]

-> Rebasing should be done cautiously!

Like most VCSs, Git has the ability to tag specific points in history as being important. Generally, people use this functionality to mark release points (v1.0, and so on)

Git uses two main types of tags: lightweight and annotated. A lightweight tag is very much like a branch that doesn’t change — it’s just a pointer to a specific commit. Annotated tags, however, are checksummed; contains the tagger name, e-mail, and date; have a tagging message; and can be signed and verified with GNU Privacy Guard (GPG).

LIGHTWEIGHT TAGS

$ git tag v0.1.0

ANNOTATED TAGS

$ git tag -a v0.1.0 -m 'Version 0.1.0'

Branches are "Pointers" to commits.

Any reference is actually a text file which contains nothing more than the hash of the latest commit made on the branch:

$ cat .git/refs/heads/master

57be35615e5782705321e5025577828a0ebed13d

HEAD is also a text file and contains only a pointer to the last object that was checked out:

$ cat .git/HEAD

ref: refs/heads/master

Scenario 1 – Interrupted workflow

You’re finished with part 1 of a new feature but you can’t continue with part 2 before part 1 is released and tested

Scenario 2 – Quick fixes

While you’re busy implementing some feature suddenly you’re being told to drop everything and fix a newly discovered bug

Branches can diverge.

Branches can be merged.

Different auto-merge strategies are there like fast-forward, 3 way , etc...

If it fails, fix by hand…..

$ git merge <branch>

Auto-merging index.html

CONFLICT (content): Merge conflict in index.html

Automatic merge failed; fix conflicts and then commit the result.

Then mark as resolved and trigger merge commit

$ git add index.html

$ git commit

Linear alternative to merging

Rewrites tree! Never rebase published code!

Often, when you’ve been working on part of your project, things are in a messy state and you want to switch branches for a bit to work on something else. The problem is, you don’t want to do a commit of half-done work just so you can get back to this point later. The answer to this issue is

$ git stash

Stashing takes the dirty state of your working directory — that is, your modified tracked files and staged changes — and saves it on a stack of unfinished changes that you can reapply at any time.

CREATE NEW BRANCH

$ git branch iss53

$ git checkout -b iss53 master

SWITCH BRANCH

$ git checkout iss53

DELETE BRANCH

$ git branch -d iss53

SHOW ALL BRANCHES

$ git branch

*master

testing

SHOW LAST BRANCH COMMITS

$ git branch -v

iss53 93b412c fix javascript issue

*master 7a98805 Merge branch 'iss53'

testing 782fd34 add scott to the author list in the readmes

SHOW MERGED BRANCHES

$ git branch --merged

*Master

SHOW UNMERGED BRANCHES

$ git branch --no-merged

testing

AKA feature branches

For each feature, create a branch

Merge early, merge often

If desired, squash commits

Version control with GIT

repository repository

central repository

commit branch

working copy commit

remote repository

shared repository

single repository

local copy

Documents

GIT. Contents Version Control System Types of Version...

Version control - University College London€¦ · Version...

Version control with Git & GitHub

Distributed Version Control with Git

Version Control with Svn, Git and git-svn · 1 Version...

Chapter - 2 What is “GIT” VERSION CONTROL AND GIT...

Version Control with Git - Yale Center for Research … ·....

Git Version Control System

Version Control History and Git Basics

Version Control and Git - GitHub Workshop

Version Control Systems: Git

Introduction to Git -...

Git - a powerful version control tool

Git 101 - Crash Course in Version Control using Git

Git Version Control - UT Southwestern · Git Version...

ٰGIT Version Control System