-
Notifications
You must be signed in to change notification settings - Fork 31
/
afterhours3.tex
200 lines (163 loc) · 8.44 KB
/
afterhours3.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
% afterhours3.tex - Afer Hours Week 3
\chapter{After Hours Week 3}
\section{``A Closer Look At Diffs and Tags''}
\subsection{The Diff Utility}
\index{diff!algorithm}We learnt in \emph{Week 3} how to work with a diff, and what a diff actually represents.
It is interesting to note how old the \texttt{diff} utility actually is and how it works.
The diff algorithm was developed in the early 1970s and the research published in 1976, by Douglas McIlroy, who wrote the original diff utility and James Hunt.
The algorithm we use today to perform diffs has become known as the Hunt-McIlroy after the research papers authors.
In essence the task of calculating a diff is that of finding the differences between two files on a line by line basis.
Mathematically, this can be described as the LCS or Longest Common Subsequence problem, which is a classic computer science problem.
Though we are not going to go into the problem in great detail, it is useful to know what actually happens at this level.
Essentially you have two sequences, for now we are going to simplify the problem and work on a string of letters.
Old string:
\begin{code}
a b c d e j k l m p r s
\end{code}
New string:
\begin{code}
a c d f g h i j n o p t u
\end{code}
The challenge is to find the longest sequence of items that is present in both of the strings above.
This new sequence is found by deleting items from the first and second set until all that remains is a sequence of common items.
In our example case, this is as follows
LCS string:
\begin{code}
a c d j p
\end{code}
Comparing this to each of our strings above, it is easy to generate a diff.
If the items are present in the \emph{Old string}, but not in the LCS string then they must have been deleted.
Conversely if they are present in the \emph{New string}, but not the LCS, then they must have been additions.
Putting this into practice in our example and marking deletions with \texttt{-} and additions with \texttt{+}, we get the following:
Diff string:
\begin{code}
b e fghi klm o rs tu
- - ++++ --- + -- ++
\end{code}
The actual algorithm for generating the LCS and the subsequent diff is too complex to describe here and is out of the scope of this book.
This section was included to give you some idea of how Git performs some of its actions internally.
\subsection{More about tags}
\index{tagging!internals}Tags can actually do a little more than just hold a single identifier to a specific commit.
A tag can also have a log message with it, similar to the commit objects we discussed earlier.
In order to invoke this option, we need to use the \texttt{git tag -m 'message'} option.
This will allow us to supply a message to be stored along with the tag.
Let us see how this works in practice.
Firstly we can use the \texttt{git tag} command to show all tags that are currently in the repository.
\begin{code}
john@satsuki:~/coderepo$ git tag
v0.9
v1.0a
john@satsuki:~/coderepo$
\end{code}
Now let us create a new tag and give it some extra information.
\index{tagging!annotated}
\begin{code}
john@satsuki:~/coderepo$ git tag v1.0b -m 'This is an annotated tag'
john@satsuki:~/coderepo$
\end{code}
Unfortunately Git was not particularly forthcoming with information on the creation of the tag.
On saying that, it is not difficult for us to use the \texttt{git show} command to see what we have done.
\begin{code}
john@satsuki:~/coderepo$ git show v1.0b
tag v1.0b
Tagger: John Haskins <john.haskins@tamagoyakiinc.koala>
Date: Thu Mar 31 23:55:50 2011 +0100
This is an annotated tag
commit a022d4d1edc69970b4e8b3fe1da3dccd943a55e4
Author: John Haskins <john.haskins@tamagoyakiinc.koala>
Date: Thu Mar 31 22:05:55 2011 +0100
Messed with a few files
diff --git a/my_second_committed_file b/my_second_committed_file
index 095b9cd..c9887f8 100644
--- a/my_second_committed_file
+++ b/my_second_committed_file
@@ -1,2 +1 @@
-Change1
-Change2
+Changed this file completely
diff --git a/my_third_committed_file b/my_third_committed_file
new file mode 100644
index 0000000..5d27866
--- /dev/null
+++ b/my_third_committed_file
@@ -0,0 +1 @@
+Addition to the line
john@satsuki:~/coderepo$
\end{code}
Notice that as well as showing us the tag itself, the \texttt{git show} command also gave us a diff of what exactly changed in this commit.
Small details like this are what makes Git in particular a joy to use for developers.
If we found that we had actually created the tag incorrectly, we have two options, we could use the \texttt{-d} option to delete a tag, or we could use the \texttt{-F} option to forcibly overwrite a tag with the same name with different information.
However please remember the warning about tags in \emph{Week 2}.
It is very dangerous to go changing tags, especially if you have already pushed them out somewhere where other people can grab them from.
Now that we have learnt a little about how to play with tags, we should probably take a look under the hood.
This is an After Hours section after all.
The implementation of tags in Git is very simple indeed.
We are going to jump into the \texttt{.git} directory and take a look at a simple output from some commands.
\begin{code}
john@satsuki:~/coderepo$ cd .git/
john@satsuki:~/coderepo/.git$ ls
branches config HEAD index logs refs
COMMIT_EDITMSG description hooks info objects
john@satsuki:~/coderepo/.git$ cd refs/tags/
john@satsuki:~/coderepo/.git/refs/tags$ ls
v0.9 v1.0a v1.0b
john@satsuki:~/coderepo/.git/refs/tags$ cat v1.0a
a022d4d1edc69970b4e8b3fe1da3dccd943a55e4
john@satsuki:~/coderepo/.git/refs/tags$
\end{code}
Inside the \texttt{.git/refs/tags} folder, there is a file for every single tag in the system.
This file contains a single string of characters.
That string looks oddly like an SHA-1 commit to me.
Using the tricks that we learnt in the last After Hours section, we can interrogate the Git repository, by throwing a few wrenches at it.
\begin{code}
john@satsuki:~/coderepo$ git cat-file -p a022d4
tree 96551f45496232c0ec6b389731d55fa3d7e1c8fd
parent 9938a0c30940dccaeddce4bb2eb151fba3a21ae5
author John Haskins <john.haskins@tamagoyakiinc.koala> 1301605555 +0100
committer John Haskins <john.haskins@tamagoyakiinc.koala> 1301605555 +0100
Messed with a few files
john@satsuki:~/coderepo$ git cat-file -t a022d4
commit
john@satsuki:~/coderepo$
\end{code}
Excellent! Just as we expected.
So Git stores the SHA-1 hash for the commit we are referring to, in a file which has the same name as the tag.
Just for clarity, let us run the same set of commands against our newly created annotated tag.
\begin{code}
john@satsuki:~/coderepo$ cd .git/refs/tags/
john@satsuki:~/coderepo/.git/refs/tags$ cat v1.0b
6cbcf47957589bf4b84cc934a26731636d021574
john@satsuki:~/coderepo/.git/refs/tags$
\end{code}
Hang on a minute! Shouldn't the output of the \texttt{v1.0a} and \texttt{v1.0b} files be the same.
There were both created at the same point in the repositories history.
They were both supposed to be pointing to the same commit.
Let us use a little plumbing and see what is going on.
\begin{code}
john@satsuki:~/coderepo$ git cat-file -t 6cbcf4
tag
john@satsuki:~/coderepo$
\end{code}
Interesting.
So it seems as if there is another type of object that can be added to the list.
The \emph{tag} object.
So what exactly is the difference here? To answer that we need to take a look at the difference between the two tags.
\texttt{v1.0a} was a simple tag, whereas \texttt{v1.0b} is an annotated tag.
When the tag was merely a pointer to a commit object, the repository required nothing more that the object that it was referring to, to be stored in the file.
Now we have more information, Git treats the information just as it would any other, by creating an object.
We can investigate this further and use the \texttt{-p} parameter to the \texttt{git cat-file} command to see exactly what is stored within this file.
\begin{code}
john@satsuki:~/coderepo$ git cat-file -p 6cbcf4
object a022d4d1edc69970b4e8b3fe1da3dccd943a55e4
type commit
tag v1.0b
tagger John Haskins <john.haskins@tamagoyakiinc.koala>
Thu Mar 31 23:55:50 2011 +0100
This is an annotated tag
john@satsuki:~/coderepo$
\end{code}
So an annotated tag contains more information than just the commit ID we are referring to.
This is pretty handy and later on in the book we will start talking about topics like signed tags, which are required if you wish to verify the identity of the person claiming to have created the tag.
A shorter After Hours section this time.
Next time we will move on to taking a deeper look at what happens behind the scenes with branches and merging.
Stay tuned.