2

2Git, the concept

2.2

Version control with Git

Git is a ver­sion con­trol sys­tem. It tracks changes in all the files con­tained within a pro­ject. The Git ter­mi­nol­ogy for a pro­ject is a repos­i­tory, but this ter­mi­nol­ogy it­self is a bit con­fus­ing, it looks like this (Fig­ure 2.1).

Figure 2.1 - Git repository structure

Figure 2.1   Git repository structure

The work­ing copy, stag­ing area and the local repos­i­tory are gen­er­ally col­lec­tively re­ferred to as a repos­i­tory (some­times just repo) or some­times a pro­ject (I tend to use both in this doc­u­ment). All three of these are stored lo­cally on the ma­chine you are using.

The re­mote repos­i­tory is a copy of the local repos­i­tory that is stored else­where ei­ther on a server or, in the case of this pub­li­ca­tion, on the GitHub web­site.

A note on terminology

The terminology is a bit confusing with Git, there is a repository and this generally refers to the whole structure (the working copy, the staging area and the local repository). Using repository in this sense is referring to the whole project (a project is always held in a folder on your machine and it contains everything, all the files and folders that make up the project and the local repository and all the other bits that go with it).

Then there is the local repository, this is a database (sort of) of all the versions of every tracked file, all the metadata (such as change logs, change statements &c.) and everything else the VCS needs. The local repository lives in its own folder within the main project folder (with Git it lives in a .git folder, note the leading full stop).

Then there is the remote repository, this is a copy (sort of) of the local repository on some remote server, in our case it is on the GitHub website, but it could be a server in your office. I say sort of when calling it a copy, this is because it might hold more things than just what is within your local repository if other people are working on the same project—the remote repository is generally considered to be the master repository.

To avoid confusion I always refer to:

the repository or project

When referring to the whole project

the local repository

When referring to just the tracked files that have been committed to the local repository itself (not working copies or staged area files)

the remote repository

When referring to just the remote repository stored on GitHub (or at least a repository not on the local machine)

Let’s for­get the re­mote repos­i­tory for the time being and con­cen­trate on just the local ma­chine; we’re left with Fig­ure 2.2:

Figure 2.2 - Git repository structure on a local machine

Figure 2.2   Git repository structure on a local machine

So how does this all work, well that’s next.

2.2.1

Working, staging and the local repository—how it works

Let’s start with a very sim­ple ex­am­ple, a sin­gle page web­site with a pic­ture; I’m going to call it lab-01-web­site.

On my ma­chine I keep all my Git repos­i­to­ries under a sin­gle di­rec­tory, that di­rec­tory is on my D: drive and is called 2500 Git Pro­jects. Like this:

D:\2500 Git Pro­jects

Yes I num­ber my di­rec­to­ries, yes it’s em­bar­rass­ing, but I am an en­gi­neer—we like to num­ber things—I dis­cuss this pec­ca­dillo fur­ther here.

Next we need a di­rec­tory to keep the repos­i­tory we’re cre­at­ing in; this will be called lab-01-web­site and will live under the 2500 Git Pro­jects di­rec­tory, thus:

D:\2500 Git Pro­jects\lab-01-web­site

So far so good, we’ve cre­ated a new di­rec­tory and it’s com­pletely empty.

Now we have to tell Git that it is a new repos­i­tory.

  • I’m not going to explain exactly how to do this yet, this is a high level discussion of how Git works and I want to explain it from the point of view of doing it through Brackets and we haven’t covered this yet. The terms I use, like initialise, are valid though and you will see them in Brackets when we come to look at it there. I show some of the more common Git commands in the sidebar.

We do this by ini­tial­is­ing the repos­i­tory. Once ini­tialised, the repos­i­tory will con­tain a new hid­den folder called .git. It is this folder (cre­ated by Git it­self) that holds the local repos­i­tory. On my sys­tem it looks like this, Fig­ure 2.3:

Figure 2.3 - Git repository directory structure

Figure 2.3   Git repository directory structure

This is now a Git repos­i­tory, it hasn’t got much in it, but it is ready to go.

The .git folder—a golden rule

The .git folder is a hidden folder in the root directory of the repository. It contains all that is important: the local repository, any staged files, all the metadata associated with the repository (change records, logs &c.) and all the important bits for a tracked project.

there is one golden rule concerning the .git folder
DON’T FUCK WITH IT

Best thing is, don’t even look inside it. If you delete it, you delete everything,
if you change it, everything gets screwed up

If we ig­nore the .git folder (and this is al­ways the best thing to do, see above), every­thing else in the lab-01-web­site folder is our work­ing copy, we can do what we like here: cre­ate files and di­rec­to­ries, delete them mod­ify them, re­name and move them. Any­thing we like and at the minute Git will ig­nore every­thing we do.

So let’s start, let’s cre­ate a folder struc­ture for our pro­ject and add some files to it. I want it to look like this:

Figure 2.4 - lab-01-website repository structure

Figure 2.4   lab-01-website repository structure

Cre­ate the files and fold­ers in what­ever man­ner you like (Notepad and Win­dows Ex­plorer will do), or down­load the fin­ished ar­ti­cle here a.

Hav­ing done all this, we are free to mod­ify these files as much as we like. Git is ig­nor­ing the lot of them. It knows they’re there, but it isn’t track­ing them in any way. In Git par­lance, these are un­tracked files; it also knows that there are three of them:

  • index.​html

  • 11-re­sources\01-css\style.​css

  • 11-re­sources\02-image\logo.​png

Let’s mod­ify one of these files, index.​html. I’ve added some very basic code (Code 2.1 ).

index.html
  1. <html lang="en">                      <!-- Declare language -->
  2.     <head>                            <!-- Start of head section -->
  3.         <meta charset="utf-8">         <!-- Use Unicode character set  -->
  4.  
  5. <link rel="stylesheet" type="text/css" href="11-resources/01-css/style.css">
  6.  
  7.         <title>PracticalSeries: Git Lab</title>   
  8.     </head>
  9.  
  10.     <body>
  11.         <h1>A Practical Series Website</h1>
  12.        
  13.         <figure class="cover-fig">
  14.             <img src="11-resources/02-images/logo.png" alt="cover logo">
  15.         </figure>
  16.        
  17.         <h3>A note by the author</h3>
  18.        
  19. <p>This is my second Practical Series publication—this one happened by accident too. The first publication is all about building a website, you can see it here. This publication came about because I wanted some sort of version control mechanism for the first publication.</p>
  20.  
  21. <p>There are lots of different version control systems (VCS) out there; some are free, some are commercial applications just google it. If you do, you will find that Git and GitHub show up again and again.</p>
  22.        
  23.     </body>
  24. </html>
Code 2.1   index.html a

Now let’s say that we’ve fin­ished index.​html and we want to start track­ing it, well the first thing to do is move it to the stag­ing area. We do this with the Git com­mand add.

Noth­ing has ac­tu­ally hap­pened to the file, it’s still there in the work­ing area, how­ever, a copy of the file ex­actly as it was at the time of the add has been placed in the stag­ing area (we can’t see this, it’s in­side the .git folder and we don’t go there).

Once we add a file to the stag­ing area, Git be­gins to track it, in Git par­lance; it is now a file to be com­mited.

If we con­tin­ued to mod­ify the file in the work­ing area, noth­ing would hap­pen to the staged (files in the stag­ing area are said to be staged) ver­sion of the file. If we wanted to over­write the file in the stag­ing area with a mod­i­fied work­ing copy, we would need to add it again.

Git still isn’t prop­erly track­ing the index.​html file; it knows we’ve done some­thing to it, but we still haven’t told it to put the file in its repos­i­tory under full ver­sion con­trol.

Now let’s mod­ify the style.​css file, add the fol­low­ing to it:

style.css
  1. * {
  2. margin: 0;
  3. padding: 0;
  4. box-sizing: border-box;
  5. position: relative;
  6. }
  7.  
  8. html {
  9. background-color: #fbfaf6; /* Set cream page bkgrd */
  10. color: #404030;
  11. font-family: serif;
  12. font-size: 26px;
  13. text-rendering: optimizeLegibility;
  14. }
  15.  
  16. body {
  17. max-width: 1276px;
  18. margin: 0 auto;
  19. background-color: #fff; /* make content area bkgrd white */
  20. border-left: 1px solid #ededed;
  21. border-right: 1px solid #ededed;
  22. }
  23.  
  24. h1, h2, h3, h4, h5, h6 { /* set standard headings */
  25. font-family: sans-serif;
  26. font-weight:normal;
  27. font-size: 3rem;
  28. padding: 2rem 5rem 2rem 5rem;
  29. }
  30. h3 { font-size: 2.5rem; }
  31.  
  32. .cover-fig { /* holder for cover image */
  33. width: 50%;
  34. margin: 2rem auto;
  35. padding: 0;
  36. }
  37. .cover-fig img {width: 100%;} /* format cover image */
  38.  
  39. p { /* TEXT STYLE - paragraph */
  40. margin-bottom: 1.2rem; /* THIS SETS PARAGRAPH SPACING */
  41. padding: 0 5rem;
  42. line-height: 135%;
  43. }
Code 2.2   style.css a

The web­site ac­tu­ally looks like this, Fig­ure 2.5 (not bad for two min­utes work).

Figure 2.5 - lab-01-website

Figure 2.5   lab-01-website

Now, the index.​html file is al­ready in the stag­ing area, the next thing to do is add the style.​css and the logo.​png. This is done with an­other add

Out repos­i­tory now looks like this:

Figure 2.6 - Repository with staged files

Figure 2.6   Repository with staged files

All the files are now in the stag­ing area, the stag­ing area acts as a col­lec­tion area for files that we want to put into the local repos­i­tory. It al­lows mul­ti­ple files to be col­lected to­gether and added to the local repos­i­tory in one go. It means we can just have one mes­sage for the whole thing (we don’t have to enter sep­a­rate mes­sages for each file).

Files are sent from the stag­ing area to the local repos­i­tory with a com­mit in­struc­tion. When a com­mit is ex­e­cuted, a spe­cific mes­sage must be en­tered, in this case, the mes­sage is ini­tial com­mit (with Git com­mand line, the mes­sage can be en­tered as part of the com­mand line syn­tax as shown in the side­bar).

Now we have this:

Figure 2.7 - Repository with committed files

Figure 2.7   Repository with committed files

The stag­ing area is empty (all its files have been com­mit­ted to the local repos­i­tory).

The work­ing copy and the local repos­i­tory now con­tain ex­actly the same ver­sions of the file, in Git par­lance, it would say noth­ing to com­mit, work­ing di­rec­tory is clean†1.

†1 See, geeks — what it means is there is no difference between the working copy and the files in the repository, so there is nothing to commit.

Let’s make an­other mod­i­fi­ca­tion to index.​html. We’ll delete the sec­ond para­graph:

index.html
  1. <html lang="en">                      <!-- Declare language -->
  2.     <head>                            <!-- Start of head section -->
  3.         <meta charset="utf-8">         <!-- Use Unicode character set  -->
  4.  
  5. <link rel="stylesheet" type="text/css" href="11-resources/01-css/style.css">
  6.  
  7.         <title>PracticalSeries: Git Lab</title>   
  8.     </head>
  9.  
  10.     <body>
  11.         <h1>A Practical Series Website</h1>
  12.        
  13.         <figure class="cover-fig">
  14.             <img src="11-resources/02-images/logo.png" alt="cover logo">
  15.         </figure>
  16.        
  17.         <h3>A note by the author</h3>
  18.        
  19. <p>This is my second Practical Series publication—this one happened by accident too. The first publication is all about building a website, you can see it here. This publication came about because I wanted some sort of version control mechanism for the first publication.</p>
  20.        
  21.     </body>
  22. </html>
Code 2.3   Second modification to index.html a

I’ve deleted lines 20 and 21 from the orig­i­nal file.

The web­site now looks like this:

Figure 2.8 - Modified website

Figure 2.8   Modified website

Let’s also say that this is the only change we want to make. The next thing is to add index.​html to the stag­ing area:

Figure 2.9 - Modified files staged

Figure 2.9   Modified files staged

Git would now re­port the sta­tus of index.​html as mod­i­fied and staged.

And then we com­mit it with the mes­sage index.​html proof read­ing cor­rec­tion.

And this time we have Fig­ure 2.10:

Figure 2.10 - Second modification committed

Figure 2.10   Second modification committed

There are now two changes stored in the local repos­i­tory.

Get the idea?

2.2.2

Commit version numbers

You can see from this that we can keep chang­ing things and we just get more en­tries in the local repos­i­tory. Git knows what the whole pro­ject looks like at any point in time, in the above ex­am­ple. Git knows that there are three files in the pro­ject and it knows that the lat­est ver­sion of style.​css and logo.​png are from the first entry in the repos­i­tory and that the lat­est ver­sion of index.​html is in the sec­ond re­vi­sion.

Call­ing these re­vi­sions first re­vi­sion and sec­ond re­vi­sion is ok if there is one per­son work­ing on the pro­ject in a lin­ear fash­ion. But Git is de­signed to cater for much more com­pli­cated arrange­ments—and it does it by num­ber­ing the changes in a very dif­fer­ent way.

Look again at the local repos­i­tory shown in Fig­ure 2.10, I’ve re­pro­duced it below:

Figure 2.11 - Local repository with two commits

Figure 2.11   Local repository with two commits

It shows two en­tries, and each entry has a funny seven digit num­ber (shown in green). The first com­mit has [37b­b05a] and the sec­ond has [88934e8]. These are ef­fec­tively the ver­sion num­bers. They are unique, but they are es­sen­tially just ran­dom num­bers.

A note on commit numbers

These com­mit num­bers are of course not ran­dom num­bers. They are a check­sum car­ried out of all the files in a com­mit, plus a header that con­tains other in­for­ma­tion (the com­mit num­bers that im­me­di­ately pre­ceded this com­mit, plus some in­for­ma­tion about di­rec­tory struc­tures &c.).

A check­sum is ba­si­cally a func­tion ap­plied to the bi­nary value of every byte in a file that gives a re­pro­ducible fig­ure that can be used to check to see if two files are the same or to iden­tify data cor­rup­tion within a file.

The com­mit num­ber used by Git is a check­sum en­code by using the SHA-1 al­go­rithm (Se­cure Hash Al­go­rithm 1). This pro­duces a 20-byte (40 digit) hexa­dec­i­mal num­ber that uniquely iden­ti­fies a com­mit. The com­mit num­ber shown is just the first seven dig­its of the full com­mit num­ber. This is usu­ally enough to uniquely iden­tify a com­mit (even on very large pro­jects).

The first seven dig­its of a com­mit num­ber gives 268 mil­lion unique val­ues, the full 20 byte num­ber has 1.5×1048 unique val­ues (a sim­i­lar num­ber to the quan­tity of atoms that make up the Earth); these val­ues also only apply within a repos­i­tory (two dif­fer­ent repos­i­to­ries can have the same com­mit num­ber, they don’t in­ter­fere with each other).

The chance of a du­pli­cate 20 byte com­mit num­ber is van­ish­ingly small, even with just the first seven dig­its it won’t hap­pen on any pro­ject you are likely to be work­ing on.

These com­mit num­bers are re­ferred to as ei­ther hash num­bers or SHA (pro­nounced shar to rhyme with bar) num­bers.

2.2.3

How to view commits

When a com­mit is made, there is ef­fec­tively a snap­shot of the com­plete pro­ject at that point. All the files in the pro­ject are avail­able ex­actly as they were when the com­mit was made.

Any file in the pro­ject can be ex­am­ined or re­loaded from any com­mit (if it ex­isted at the time of the com­mit). A se­ries of com­mits form a re­gres­sion path back through time, any­thing from the en­tire pro­ject to just a sin­gle file (or even part of a file) can be re­stored to an ear­lier com­mit; sim­i­larly, once re­stored, it can be moved for­ward in time to a later com­mit.

To ex­plain fur­ther, when I say that each com­mit is a snap­shot of the en­tire pro­ject—I don’t mean that each com­mit stores every file in the pro­ject, it doesn’t. The in­for­ma­tion stored in a par­tic­u­lar com­mit is the files that were added or mod­i­fied by the com­mit, the com­mit also con­tains a link to any pre­ced­ing com­mits and in­for­ma­tion about the di­rec­tory struc­ture at the time of the com­mit. This al­lows Git to de­ter­mine ex­actly what the state of the en­tire pro­ject was at the time of the com­mit.

We don’t need to know the exact ins and outs of how Git man­ages its com­mits—as far as we are con­cerned each com­mit is a snap­shot of the en­tire pro­ject.

2.2.4

Committing changes
best practice

The best prac­tice is to com­mit often.

Some guide­lines sug­gest you should only com­mit work that has reached some de­fined state (i.e. don’t com­mit half done work). I don’t agree with this view.

There is no harm in com­mit­ting un­fin­ished work (in fact there are ben­e­fits, it keeps it safe). Com­mit­ting work be­cause it’s home time is as good a rea­son as any for mak­ing a com­mit.

Where work is in progress and you make a com­mit, it’s often best to start your com­mit mes­sage with some­thing that in­di­cates this; I use in­cre­men­tal build this tells me that I made a com­mit, but that the par­tic­u­lar com­mit was a work in progress.

Don’t be afraid to make com­mits (don’t be afraid of com­mit­ment my son—well, just a lit­tle bit—like my old Dad used to say “don’t get mar­ried until you’re over thirty”. He also said “stay away from leggy blonds”—I screwed up on both counts). The GitHub mantra is com­mit early com­mit often—not sure what they have to say about leggy blonds, it doesn’t seem to fea­ture in their doc­u­men­ta­tion.



End flourish image