Version control with Git

2.2 Version control with Git

Git is a version control system. It tracks changes in all the files contained within a project. The Git terminology for a project is a repository, but this terminology itself is a bit confusing, it looks like this (Figure 2.1).

The working copy, staging area and the local repository are generally collectively referred to as a repository (sometimes just repo) or sometimes a project (I tend to use both in this document). All three of these are stored locally on the machine you are using.

The remote repository is a copy of the local repository that is stored elsewhere either on a server or, in the case of this publication, on the GitHub website.

A note on terminology
The terminology is a bit confusing with Git, there is a repository and this generally refers to the whole structure (the working copy, the staging area and the local repository). Using repository in this sense is referring to the whole project (a project is always held in a folder on your machine and it contains everything, all the files and folders that make up the project and the local repository and all the other bits that go with it). Then there is the local repository, this is a database (sort of) of all the versions of every tracked file, all the metadata (such as change logs, change statements &c.) and everything else the VCS needs. The local repository lives in its own folder within the main project folder (with Git it lives in a .git folder, note the leading full stop). Then there is the remote repository, this is a copy (sort of) of the local repository on some remote server, in our case it is on the GitHub website, but it could be a server in your office. I say sort of when calling it a copy, this is because it might hold more things than just what is within your local repository if other people are working on the same project—the remote repository is generally considered to be the master repository. To avoid confusion I always refer to:
the repository or project	When referring to the whole project
the local repository	When referring to just the tracked files that have been committed to the local repository itself (not working copies or staged area files)
the remote repository	When referring to just the remote repository stored on GitHub (or at least a repository not on the local machine)

Let’s forget the remote repository for the time being and concentrate on just the local machine; we’re left with Figure 2.2:

Figure 2.2 - Git repository structure on a local machine

So how does this all work, well that’s next.

2.2.1 Working, staging and the local repository—how it works

Let’s start with a very simple example, a single page website with a picture; I’m going to call it lab-01-website.

On my machine I keep all my Git repositories under a single directory, that directory is on my D: drive and is called 2500 Git Projects. Like this:

D:\2500 Git Projects

Yes I number my directories, yes it’s embarrassing, but I am an engineer—we like to number things—I discuss this peccadillo further here.

Next we need a directory to keep the repository we’re creating in; this will be called lab-01-website and will live under the 2500 Git Projects directory, thus:

D:\2500 Git Projects\lab-01-website

So far so good, we’ve created a new directory and it’s completely empty.

Now we have to tell Git that it is a new repository.

I’m not going to explain exactly how to do this yet, this is a high level discussion of how Git works and I want to explain it from the point of view of doing it through Brackets and we haven’t covered this yet. The terms I use, like initialise, are valid though and you will see them in Brackets when we come to look at it there. I show some of the more common Git commands in the sidebar.

We do this by initialising the repository. Once initialised, the repository will contain a new hidden folder called .git. It is this folder (created by Git itself) that holds the local repository. On my system it looks like this, Figure 2.3:

Git init

	$	cd <path-to-folder>
	$	git init

Figure 2.3 - Git repository directory structure

This is now a Git repository, it hasn’t got much in it, but it is ready to go.

The .git folder—a golden rule
The .git folder is a hidden folder in the root directory of the repository. It contains all that is important: the local repository, any staged files, all the metadata associated with the repository (change records, logs &c.) and all the important bits for a tracked project.
there is one golden rule concerning the .git folder DON’T FUCK WITH IT
Best thing is, don’t even look inside it. If you delete it, you delete everything, if you change it, everything gets screwed up

If we ignore the .git folder (and this is always the best thing to do, see above), everything else in the lab-01-website folder is our working copy, we can do what we like here: create files and directories, delete them modify them, rename and move them. Anything we like and at the minute Git will ignore everything we do.

So let’s start, let’s create a folder structure for our project and add some files to it. I want it to look like this:

Figure 2.4 - lab-01-website repository structure

Create the files and folders in whatever manner you like (Notepad and Windows Explorer will do), or download the finished article here _a.

Having done all this, we are free to modify these files as much as we like. Git is ignoring the lot of them. It knows they’re there, but it isn’t tracking them in any way. In Git parlance, these are untracked files; it also knows that there are three of them:

index.html
11-resources\01-css\style.css
11-resources\02-image\logo.png

Let’s modify one of these files, index.html. I’ve added some very basic code (Code 2.1 ).

index.html

<html lang="en">                       <!-- Declare language -->
    <head>                             <!-- Start of head section -->
        <meta charset="utf-8">         <!-- Use Unicode character set  -->

<link rel="stylesheet" type="text/css" href="11-resources/01-css/style.css">

        <title>PracticalSeries: Git Lab</title>   
    </head>

    <body>
        <h1>A Practical Series Website</h1>
       
        <figure class="cover-fig">
            <img src="11-resources/02-images/logo.png" alt="cover logo">
        </figure>
       
        <h3>A note by the author</h3>
       
<p>This is my second Practical Series publication—this one happened by accident too. The first publication is all about building a website, you can see it here. This publication came about because I wanted some sort of version control mechanism for the first publication.</p>

<p>There are lots of different version control systems (VCS) out there; some are free, some are commercial applications just google it. If you do, you will find that Git and GitHub show up again and again.</p>
       
    </body>
</html>

Code 2.1 index.html

Now let’s say that we’ve finished index.html and we want to start tracking it, well the first thing to do is move it to the staging area. We do this with the Git command add.

Nothing has actually happened to the file, it’s still there in the working area, however, a copy of the file exactly as it was at the time of the add has been placed in the staging area (we can’t see this, it’s inside the .git folder and we don’t go there).

Once we add a file to the staging area, Git begins to track it, in Git parlance; it is now a file to be commited.

If we continued to modify the file in the working area, nothing would happen to the staged (files in the staging area are said to be staged) version of the file. If we wanted to overwrite the file in the staging area with a modified working copy, we would need to add it again.

Git still isn’t properly tracking the index.html file; it knows we’ve done something to it, but we still haven’t told it to put the file in its repository under full version control.

Now let’s modify the style.css file, add the following to it:

style.css

* {
    margin: 0;
    padding: 0;
    box-sizing: border-box;
    position: relative;
}

html {
    background-color: #fbfaf6;       /* Set cream page bkgrd */
    color: #404030;
    font-family: serif;
    font-size: 26px;
    text-rendering: optimizeLegibility;
}

body {
    max-width: 1276px;
    margin: 0 auto;
    background-color: #fff;         /* make content area bkgrd white */
    border-left: 1px solid #ededed;
    border-right: 1px solid #ededed;
}

h1, h2, h3, h4, h5, h6 {            /* set standard headings */
    font-family: sans-serif;
    font-weight:normal;
    font-size: 3rem;
    padding: 2rem 5rem 2rem 5rem;
}
h3 { font-size: 2.5rem; }

.cover-fig {                         /* holder for cover image */
    width: 50%;
    margin: 2rem auto;
    padding: 0;
}
.cover-fig img {width: 100%;}       /* format cover image */

p {                                 /* TEXT STYLE - paragraph */
    margin-bottom: 1.2rem;          /* THIS SETS PARAGRAPH SPACING */
    padding: 0 5rem;
    line-height: 135%;
}

Code 2.2 style.css

The website actually looks like this, Figure 2.5 (not bad for two minutes work).

Now, the index.html file is already in the staging area, the next thing to do is add the style.css and the logo.png. This is done with another add

Out repository now looks like this:

Figure 2.6 - Repository with staged files

All the files are now in the staging area, the staging area acts as a collection area for files that we want to put into the local repository. It allows multiple files to be collected together and added to the local repository in one go. It means we can just have one message for the whole thing (we don’t have to enter separate messages for each file).

Files are sent from the staging area to the local repository with a commit instruction. When a commit is executed, a specific message must be entered, in this case, the message is initial commit (with Git command line, the message can be entered as part of the command line syntax as shown in the sidebar).

Now we have this:

Figure 2.7 - Repository with committed files

The staging area is empty (all its files have been committed to the local repository).

The working copy and the local repository now contain exactly the same versions of the file, in Git parlance, it would say nothing to commit, working directory is clean^†1.


†1		See, geeks — what it means is there is no difference between the working copy and the files in the repository, so there is nothing to commit.

Let’s make another modification to index.html. We’ll delete the second paragraph:

index.html

<html lang="en">                       <!-- Declare language -->
    <head>                             <!-- Start of head section -->
        <meta charset="utf-8">         <!-- Use Unicode character set  -->

<link rel="stylesheet" type="text/css" href="11-resources/01-css/style.css">

        <title>PracticalSeries: Git Lab</title>   
    </head>

    <body>
        <h1>A Practical Series Website</h1>
       
        <figure class="cover-fig">
            <img src="11-resources/02-images/logo.png" alt="cover logo">
        </figure>
       
        <h3>A note by the author</h3>
       
<p>This is my second Practical Series publication—this one happened by accident too. The first publication is all about building a website, you can see it here. This publication came about because I wanted some sort of version control mechanism for the first publication.</p>
       
    </body>
</html>

Code 2.3 Second modification to index.html

I’ve deleted lines 20 and 21 from the original file.

The website now looks like this:

Let’s also say that this is the only change we want to make. The next thing is to add index.html to the staging area:

Git would now report the status of index.html as modified and staged.

And then we commit it with the message index.html proof reading correction.

And this time we have Figure 2.10:

Figure 2.10 - Second modification committed

There are now two changes stored in the local repository.

Get the idea?

2.2.2 Commit version numbers

You can see from this that we can keep changing things and we just get more entries in the local repository. Git knows what the whole project looks like at any point in time, in the above example. Git knows that there are three files in the project and it knows that the latest version of style.css and logo.png are from the first entry in the repository and that the latest version of index.html is in the second revision.

Calling these revisions first revision and second revision is ok if there is one person working on the project in a linear fashion. But Git is designed to cater for much more complicated arrangements—and it does it by numbering the changes in a very different way.

Look again at the local repository shown in Figure 2.10, I’ve reproduced it below:

Figure 2.11 - Local repository with two commits

It shows two entries, and each entry has a funny seven digit number (shown in green). The first commit has [37bb05a] and the second has [88934e8]. These are effectively the version numbers. They are unique, but they are essentially just random numbers.

A note on commit numbers
These commit numbers are of course not random numbers. They are a checksum carried out of all the files in a commit, plus a header that contains other information (the commit numbers that immediately preceded this commit, plus some information about directory structures &c.). A checksum is basically a function applied to the binary value of every byte in a file that gives a reproducible figure that can be used to check to see if two files are the same or to identify data corruption within a file. The commit number used by Git is a checksum encode by using the SHA-1 algorithm (Secure Hash Algorithm 1). This produces a 20-byte (40 digit) hexadecimal number that uniquely identifies a commit. The commit number shown is just the first seven digits of the full commit number. This is usually enough to uniquely identify a commit (even on very large projects). The first seven digits of a commit number gives 268 million unique values, the full 20 byte number has 1.5×10⁴⁸ unique values (a similar number to the quantity of atoms that make up the Earth); these values also only apply within a repository (two different repositories can have the same commit number, they don’t interfere with each other). The chance of a duplicate 20 byte commit number is vanishingly small, even with just the first seven digits it won’t happen on any project you are likely to be working on.

A note on commit numbers

These commit numbers are of course not random numbers. They are a checksum carried out of all the files in a commit, plus a header that contains other information (the commit numbers that immediately preceded this commit, plus some information about directory structures &c.).

A checksum is basically a function applied to the binary value of every byte in a file that gives a reproducible figure that can be used to check to see if two files are the same or to identify data corruption within a file.

The commit number used by Git is a checksum encode by using the SHA-1 algorithm (Secure Hash Algorithm 1). This produces a 20-byte (40 digit) hexadecimal number that uniquely identifies a commit. The commit number shown is just the first seven digits of the full commit number. This is usually enough to uniquely identify a commit (even on very large projects).

The first seven digits of a commit number gives 268 million unique values, the full 20 byte number has 1.5×10⁴⁸ unique values (a similar number to the quantity of atoms that make up the Earth); these values also only apply within a repository (two different repositories can have the same commit number, they don’t interfere with each other).

The chance of a duplicate 20 byte commit number is vanishingly small, even with just the first seven digits it won’t happen on any project you are likely to be working on.

These commit numbers are referred to as either hash numbers or SHA (pronounced shar to rhyme with bar) numbers.

2.2.3 How to view commits

When a commit is made, there is effectively a snapshot of the complete project at that point. All the files in the project are available exactly as they were when the commit was made.

Any file in the project can be examined or reloaded from any commit (if it existed at the time of the commit). A series of commits form a regression path back through time, anything from the entire project to just a single file (or even part of a file) can be restored to an earlier commit; similarly, once restored, it can be moved forward in time to a later commit.

To explain further, when I say that each commit is a snapshot of the entire project—I don’t mean that each commit stores every file in the project, it doesn’t. The information stored in a particular commit is the files that were added or modified by the commit, the commit also contains a link to any preceding commits and information about the directory structure at the time of the commit. This allows Git to determine exactly what the state of the entire project was at the time of the commit.

We don’t need to know the exact ins and outs of how Git manages its commits—as far as we are concerned each commit is a snapshot of the entire project.

2.2.4 Committing changes
best practice

The best practice is to commit often.

Some guidelines suggest you should only commit work that has reached some defined state (i.e. don’t commit half done work). I don’t agree with this view.

There is no harm in committing unfinished work (in fact there are benefits, it keeps it safe). Committing work because it’s home time is as good a reason as any for making a commit.

Where work is in progress and you make a commit, it’s often best to start your commit message with something that indicates this; I use incremental build this tells me that I made a commit, but that the particular commit was a work in progress.

Don’t be afraid to make commits (don’t be afraid of commitment my son—well, just a little bit—like my old Dad used to say “don’t get married until you’re over thirty”. He also said “stay away from leggy blonds”—I screwed up on both counts). The GitHub mantra is commit early commit often—not sure what they have to say about leggy blonds, it doesn’t seem to feature in their documentation.

2.2

2.2.1

Working, staging and the local repository—how it works

2.2.2

Commit version numbers

2.2.3

How to view commits

2.2.4

Committing changesbest practice

Committing changes
best practice