-
Notifications
You must be signed in to change notification settings - Fork 0
/
fulltrees.mlt
474 lines (363 loc) · 30.7 KB
/
fulltrees.mlt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
(* -*- compile-command: "ocamlbuild -classic-display fulltrees.pdf" -*- *)
open Prelude
open Ocamlmode
##plugin coqdoc
##verbatim '@' = coqdoc
##verbatim '#' = ocaml_code
(*** labels ***)
let s_algo = label ()
let s_ocaml = label ()
let s_coq = label ()
(*** doc ***)
let abstract = "Starting with an algorithm to turn lists into full trees which uses non-obvious invariants and partial functions, we progressively encode the invariants in the types of the data, removing most of the burden of a correctness proof.
The invariants are encoded using non-uniform inductive types which parallel numerical representations in a style advertised by Okasaki, and a small amount of dependent types."
let intro = "{section"Introduction"}
Starting with a list <#lst#>, we want to turn it into a binary tree <#tr#> of the following form (in Ocaml):
{display"<#type 'a tree =
| Node of 'a tree * 'a * 'a tree
| Leaf#>"}
With the constraints that <#lst#> must be the infix traversal of <#tr#> and that <#tr#> must be {emph"full"}, in the sense that every level except the last are required to be completely filled. Such a function turns, in particular, sorted lists into balanced binary search trees.
There are a number of folklore algorithms to achieve this result in linear time. Here we consider one of these algorithms, presented in Section~{ref_ s_algo}, which repeatedly pairs up trees of height $h$ in a list to form a list of trees of height $h+1$. Our interest in this algorithm sprouts from the fact that its correctness is not obvious; in particular the invariants are complex: the main loop operates on a list of length $2^k-1$ whose elements are alternately of two distinct forms.
In Sections~{ref_ s_ocaml} and~{ref_ s_coq} we show refinements of the algorithm where the invariants are pushed into the types, leading to a complete and short proof of correctness in Coq.
"
let algo = "{section"A balancing algorithm" ~label:s_algo}
We start by giving a first, simple, implementation of the balancing algorithm. The heart of the algorithm relies on using an alternating list of length ${exponent $2$ $k$}-1$, where odd-position elements are trees and even-position elements are labels, of type <#'a#> (indices starting from 1). A full tree of height $k$ can be decomposed into the first $k-1$ levels, containing ${exponent $2$ $k-1$} -1$ internal nodes, and the $k$th level, which contains both nodes and leaves. Thus, the ${exponent $2$ $k-1$} - 1$ labels in the alternating list will be used to label the internal nodes in the $k-1$ first levels of the balanced tree, while the ${exponent $2$ $k-1$}$ trees, all of height at most one at first, will constitute the level $k$.
Though we could encode labels as trees of height one in the alternating list, we rather use an appropriate type for the sake of readability:
{display"<#type 'a tree_or_elt =
| Elt of 'a
| Tree of 'a tree#>"}
We decompose the problem into two parts: computing an alternating list of length ${exponent $2$ $k$} - 1$ from an arbitrary list of labels, and then transforming this alternating list into a balanced tree. We first show how to solve the second part: turning an alternating list into a full tree.
Given an alternating list <#lst#>, by pairing the trees in <#lst#> using only one traversal of the list, we obtain an alternating list with exactly half as many trees. Each pairing requires two trees and one label used as a root. In order to build a list that is alternated, we also need a second label, that is kept as a single element. This explains why we consider at each step the four first elements of the list.
A single traversal, encoded by <#pass : 'a tree_or_elt list -> 'a tree_or_elt_list#>, reduces an alternating list of length ${exponent $2$ $k$} - 1 {geqslant} 3$ to an alternating list of length ${exponent $2$ $k-1$} -1$. By iterating this process using <#loop : 'a tree_or_elt list -> 'a tree#>, we reduce the original list to a list of length one, whose one element is a balanced tree <#t#> such that the infix traversal of <#t#> is the initial list.
{display"<#let join left node right = Tree (Node (left, node, right))
let rec pass = function
| Tree left :: Elt root :: Tree right :: Elt e :: others ->
join left root right :: Elt e :: pass others
| [Tree left; Elt root; Tree right] -> [join left root right]
| _ -> assert false
let rec loop = function
| [] -> Leaf
| [Tree t] -> t
| list -> loop (pass list)#>"}
Notice how the invariant that alternating lists have length $2^k-1$ is maintained: this is because, for $k {geqslant} 2$, we have $2^k - 1 = 4 {times} ({exponent $2$ $k-2$} - 1) + 3$, hence we obtain an alternating list of length $2 {times} ({exponent $2$ $k-2$} - 1) + 1 = {exponent $2$ $k-1$} - 1$.
It remains to show how to transform a list of labels of length $n$ into an alternating list of trees and labels. Each of the original trees has height zero or one: they are leaves or contain only one label. Because we want a list of length precisely $2^k -1$, for $k = 1 + {between `Floor ${index log_ $2$} n$}$, it means we need $2^k - 1 - n$ leaves. This quantity is computed as the variable <#missing#>. The function <#pad#> computes the alternating list by creating as many leaves as needed, alternating them with elements, and once enough leaves are created, promotes all the odd-position labels into trees.
{display"<#let complete list =
let n = List.length list in
let rec pow2 i = if i <= n then pow2 (2*i) else i in
let missing = (pow2 1) - n - 1 in
let rec pad missing = function
| head::tail when missing <> 0 ->
Tree Leaf :: Elt head :: pad (missing - 1) tail
| odd::even::others -> join Leaf odd Leaf :: Elt even :: pad 0 others
| [single] -> [join Leaf single Leaf]
| [] -> []
in
pad missing list
#>"}
The balancing algorithm <#balance: 'a list -> 'a tree #> is thus given by the composition of <#loop#> with <#complete#>:
{display"<#let balance list = loop (complete list)#>"}
As for the complexity of this algorithm, notice that <#pass#> and <#complete#> are both clearly in linear-time in the length of the lists on which they work, while <#loop#> recurses on lists whose length are halved at each recursive step. Hence <#balance#> is a linear-time algorithm.
"
let ocaml = "{section"Removing partial functions" ~label:s_ocaml}
The <#loop#> function of Section~{ref_ s_algo} relies on the invariant that the <#list#> argument has length $2^k-1$. Additionally, all the odd-position values must be of the form <#Tree t#>, whereas all the even-position values must be of the form <#Elt x#>. If either of these invariants is broken, we would run into the <#assert false#> of <#pass#>.
It is not immediately apparent that these properties hold. If it does not take a tremendous effort to convince oneself that the <#balance#> function of Section~{ref_ s_algo} is indeed correct, a direct mechanically checked proof would not be very practical.
{subsection"Length invariants"}
Our goal in this section is to avoid resorting to <#assert false#>. In addition to making sure that <#balance#> indeed terminates with a value, it will make it considerably simpler to implement the balancing algorithm in Coq in Section~{ref_ s_coq}.
To achieve this goal, it is necessary to have more precise types.
Let us focus first on the length invariants: we will need to define a type which contains exactly the non-empty lists of length $2^k-1$.
A data structure which holds $2^k-1$ elements brings complete binary trees to mind. Even if it is possible -- though not necessary convenient -- to represent complete binary trees in Ocaml, they are not the appropriate structure. First, because complete binary trees are full trees and are, hence, unlikely to serve as a useful intermediate data structure to build a full tree. Second because there is a simpler -- albeit more exotic -- alternative.
Indeed, lists can be seen as decorated unary numbers: there is an element at each successor. Different kinds of lists can be obtained, more or less systematically, by varying the numerical representation. This idea goes back to Guibas \& al. in~{cite"Guibas1977"} and a fairly thorough exploration can be found in Okasaki~{cite"Okasaki1999"~extra:"Chapters 9\&10"}. In the simplest cases, the analogous list structure corresponds to a structurally recursive exponentiation algorithm. For regular lists, a list of size $n$ whose elements have type $a$ can be recursively defined with the following equations:
{displaymath begin just_left `Brace (array [`L;`L;`L] [
array_line [${exponent $a$ $0$}$;$=$;$1$];
array_line [${exponent $a$ $n+1$}$;$=$;$a{times}a^n$];
]) end}
Replacing unary numbers with binary numbers, we obtain the binary exponentiation algorithm:
{displaymath begin just_left `Brace (array [`L;`L;`L] [
array_line [exponent $a$ $2^0-1$; $=$ ; $1$];
array_line [exponent $a$ $2n$; $=$; ${exponent $(a^2)$ $n$}$];
array_line [exponent $a$ $2n+1$; $=$; $a{times}{exponent $(a^2)$ $n$}$];
]) end}
Okasaki~{cite"Okasaki1999"~extra:"Chapter 10"} uses a non-uniform inductive type to encode the latter exponentiation algorithm into a type of lists he calls {emph"binary lists"}. We are only interested in lists of length $2^k-1$, that is a length written only with the digit $1$ in binary representation. So following Okasaki, but skipping the second equation above (which corresponds to the digit $0$) we define the following non-uniform inductive type, which we call {emph"power lists"}:
{display"<#module PowerList = struct
type 'a t =
| Zero
| TwicePlusOne of 'a * ('a*'a) t
end#>"}
This type actually appears in Okasaki~{cite"Okasaki1999"~extra:"Chapter 10"} as an introduction to non-uniform binary lists. Relatedly, Okasaki~{cite"Okasaki1999a"} leverages a tail-recursive binary exponentiation algorithm to define a type capturing precisely square matrices; on the other hand, Myers~{cite"Myers1983"} introduced a flavour of list based on {emph"skew binary numbers"} which are not easily captured as exponentiation.
Although the power lists may look like some sort of trees, it is not a very accurate depiction. The easiest way to picture how power lists works is to see <#TwicePlusOne#> as a fancy <#(::)#>, then the lists with, respectively, $1$, $3$, $7$, and $15$ elements are as follows:
{itemize [
"<#[1]#>";
"<#[1;(2,3)]#>";
"<#[1;(2,3);((4,5),(6,7))]#>";
"<#[1;(2,3);((4,5),(6,7));(((8,9),(10,11)),((12,13),(14,15)))]#>";
]}
Elements appear in order, like in a regular list, but they are packed twice as tightly after each <#TwicePlusOne#>.
Just like with regular lists, there is a {emph"map"} function for power lists. Due to the non-uniformity it is a little trickier{footnote"The type annotation on <#PowerList.map#> informs Ocaml that <#map#> is a non-uniform recursive function. Without the type annotation, Ocaml simply assumes that <#map#> is uniformly recursive and fails to typecheck since <#f#> and <#f'#> have different types."} than the regular list map: in the recursive steps, the argument function <#f#> needs to process two consecutive elements at a time.
{display"<#module PowerList = struct
#{module_elipsis}#
let rec map : 'a 'b. ('a->'b) -> 'a t -> 'b t = fun f -> function
| Zero -> Zero
| TwicePlusOne (elt,lst) ->
let f' (x,y) = f x , f y in
TwicePlusOne (f elt , map f' lst)
end#>"}
{subsection"Alternation"}
In Section~{ref_ s_algo}, labels are separated from trees dynamically. The <#pass#> function verifies that trees and labels are interleaved properly, and fails if they are not.
In this section, instead, we consider a variant of <#'a PowerList.t#> where every odd position contains a tree, and every even position contains an element. More generally, we define a type <#('odd,'even) AlternatingPowerList.t#> where odd positions have type <#'odd#>, and even positions have type <#'even#>. Such a list should have the following pattern:
{itemize [
"<#['odd;('even,'odd);(('even,'odd),('even,'odd))]#>";
]}
After the first element, which must have type <#'odd#>, there is no difference between even and odd positions: indeed, excluding the first element, we are actually building an <#('even*'odd) PowerList.t#>. Hence the definition:
{display"<#module AlternatingPowerList = struct
type ('odd,'even) t =
| Zero
| TwicePlusOne of 'odd * ('even*'odd) PowerList.t
end#>"}
For brevity, let us write <#PL#> and <#APL#> for <#PowerList#> and <#AlternatingPowerList#> respectively.
Using these alternating power lists, we can define a version of the <#pass#> function free of <#assert false#>. Indeed, consider an alternating power list of length at least $3$: it is of the form <#APL.TwicePlusOne (a, PL.TwicePlusOne ((b,c), lst))#>, where <#lst#> has type <#(('even*'odd)*('even*'odd)) PowerList.t#>. The <#pass#> function of Section~{ref_ s_algo}, as it happens, manipulates its arguments by groups of four elements: basically, <#pass#> is simply a <#map#> over <#lst#>.
(* <#pass:'a tree -> ('a*'a tree) -> (('a*'a tree)*('a*'a tree)) PL.t -> ('a tree,'a) APL.t#> *)
We hence define the function <#pass#>
which joins the trees in a list of length ${exponent $2$ $k+2$}-1${footnote"To ensure that its argument list has at least three elements, <#pass#> takes the three first elements as extra arguments. In other words <#pass t (x,s) l#> is meant to be read as <#pass (APL.TwicePlusOne (t , PL.TwicePlusOne ((x,s),l)))#>."}, producing a list of length ${exponent $2$$k+1$}-1$. The function <#loop#> is virtually unchanged from Section~{ref_ s_algo}, except it acts on power lists:
{display"<#
let pass left (root,right) apl =
let join_up ((single,left),(root,right)) =
single, Node (left,root,right)
in
APL.TwicePlusOne ( Node (left,root,right) , PL.map join_up apl)
let rec loop : 'e. ('e tree,'e) APL.t -> 'e tree = function
| APL.Zero -> Leaf
| APL.TwicePlusOne (tree,PL.Zero) -> tree
| APL.TwicePlusOne (tree,PL.TwicePlusOne (pair,apl)) ->
loop (pass tree pair apl)#>"}
{subsection"Padding"}
Now that there is no more <#assert false#> in the code of <#loop#>, we need to change the <#complete#> function of Section~{ref_ s_algo} so that it returns an <#('a tree,'a) APL.t#> rather than a list. The heart of this section is a function which turns an <#'a list#> into an <#('a*'a tree) PL.t#>. The final function, which produces an <#('a tree,'a) APL.t#> is a simple wrapper around the former.
We want to turn a list <#lst#> of length $n+1$ into a pair of its first element, converted into a tree, plus a power list of length $2{times}(2^k-1){geqslant} n$ representing its tail <#tail#>. Each element of the power list is a pair, whose first term is an element, and its second term is a tree of height at most one. In particular, the length of the returned power list is always even, so if <#tail#> has odd length, we will need to insert at least a <#Leaf#>. This suggests that we may inspect the parity of the length of <#tail#>, and insert an extra element precisely when it is odd. This leads to a slightly different padding procedure than that of Section~{ref_ s_algo}, in particular the leaves are not inserted at the same position, but it is inconsequential.
An <#'a list#> of even length can be turned into an <#('a*'a) list#> whose length is halved. This turns out to be interesting for our recursion, since it mimics the inductive step of power lists. Also, in the case of even length, we need to distinguish the empty case from the non-empty case: the former will be turned into the empty power list <#APL.Zero#> while the latter will be turned into a power list of the form <#APL.TwicePlusOne((x,y),l)#>, where <#x#> and <#y#> correspond to the two first elements of <#tail#>. These different cases are represented in the following view:
{display"<#type 'a parity =
| Empty
| Odd of 'a * ('a*'a) list
| Even of ('a*'a) * ('a*'a) list
let pair_up lst =
let succ elt = function
| Empty -> Odd (elt, [])
| Odd (b,pairs) -> Even ((elt,b), pairs)
| Even (bc,pairs) -> Odd (elt, bc::pairs)
in
List.fold_right succ lst Empty#>"}
The padding function itself, <#of_list#>, is at first sight far from intuitive. Let us recall that we want to turn a list of labels of arbitrary length, into a power list of pairs. A label can be thought of as a bit of weight $2^0$, while a pair of labels would be a bit of weight $2^1$, and so on. At first, all our bits have weight $2^0$ and consists in one label each. We can build bits of higher weight by pairing up two bits of the same weight. A bit made up only of labels is called {emph"pure"}. We can also double the weight of a bit by interlacing leaves with it (with the function <#pad#>), but this gives a bit made of pairs of labels and trees, call them {emph"impure"}. Lastly, we can also transmute a pure bit into an impure bit of the same weight (with the function <#coerce#>), by replacing odd-position labels by trees of height one.
Each recursive step consists in taking a list of pure bits of the same weight $2^k$, and outputing exactly one impure bit of size ${exponent $2$ $k+1$}$, plus a list of pure bits of weight ${exponent $2$ $k+1$}$, which is converted recursively. We thus obtain, successively, one bit of each weight from $2^1$ to $2^l$, for some $l$, encoding a list of length ${exponent $2$ $l+1$} - 2$, as expected.
At any recursive step, suppose first that the number of bits of weight $2^k$ is odd. As we need to compute only bits of weight ${exponent $2$ $k+1$}$, one of them impure, we are forced to use <#pad#> on one bit, and to pair up the others. Suppose now that the number of bits of weight $2^k$ is even. In that case, we can pair them all into bits of weight ${exponent $2$ $k+1$}$, and then use <#coerce#> on one of them to make the impure bit.
The last difficulty is that <#pad#> and <#coerce#> both depend on the current weight of the bits, hence we need to update them at each recursive step. <#pad#> must add leaves between every two consecutive labels, in even positions, while <#coerce#> must upgrade every even-position label into a tree of height one. This leads to the following definition:
{display"<#module PowerList = struct
#{module_elipsis}#
let rec of_list : 'a 'b. ('a->'b) -> ('a*'a->'b) -> 'a list -> 'b t =
fun pad coerce bits ->
let pad' (x,y) = (pad x, pad y) in
let coerce' (x,y) = (coerce x, coerce y) in
match pair_up bits with
| Empty -> Zero
| Odd (a,pures) -> TwicePlusOne (pad a, of_list pad' coerce' pures)
| Even (ab,pures) ->
TwicePlusOne (coerce ab, of_list pad' coerce' pures)
end#>"}
With that function, we can conclude our implementation. Again writing <#PL#> and <#APL#> for <#PowerList#> and <#AlternatingPowerList#> respectively:
{display"<#module AlternatingPowerList = struct
#{module_elipsis}#
let of_list leaf up id = function
| [] -> Zero
| a::l ->
let pad x = id x , leaf in
let coerce (x,y) = id x , up y in
TwicePlusOne (up a, PowerList.of_list pad coerce l)
end
let singleton x = Node(Leaf,x,Leaf)
let balance l =
loop (APL.of_list Leaf singleton (fun e->e) l)#>"}
The final function, <#balance:'a list -> 'a tree#>, implements the same algorithm as Section~{ref_ s_algo} without any partial functions.
What we may have lost in this section, compared to the simple algorithm, is the simplicity of the complexity analysis of the algorithm. The subtleties of the main functions require a finer analysis. Consider first the function <#PL.map#>: clearly the number of recursive calls depends only logarithmically on the number of elements in the power list. But each recursive call uses as its first argument a function twice as complex than the previous one. This leads to the following inequation over the complexity $C(n,m,f)$ of <#map#>, where $n$ is the number of elements in the power list, $m$ is the size of the elements in the power list, and $f$ is the complexity of the mapped function:
{displaymath
$C(n,m,f) {leq} f(m) + C{between `Paren ${frac $n-1$ $2$}, 2m, k {mapsto} 2 {times} f(k/2) + O(1)$} + O(1)$
}
From there, it is easy to prove that $C(n,m,f) = n.f(m) + O(n)$, so that <#PL.map#> runs indeed in linear-time, and so is <#loop#>. Similarly the complexity of <#PL.of_list#> can be described by a higher-order recursive inequation (almost the same as above, except that the complexity depends on two functions and the constant term is replaced by a linear term), whose solution gives also a linear-time complexity.
"
let coq = "{section"Turning to Coq" ~label:s_coq}
There is still a property of the algorithm that the implementation of Section~{ref_ s_ocaml} does not make obvious: that the algorithm actually does build {emph"full"} trees. In this section we shall build into the type of <@balance@> that its output is indeed full.
To that effect, we will use Coq rather than Ocaml. Even if it is possible, with some effort, to represent full trees and implement the algorithm in Ocaml -- and relatively easy in Haskell -- a Coq implementation also gives us termination by construction. Coq forces every recursion to be structural, which will prove to be rather entertaining.
At a superficial level, a visible difference with the Ocaml implementation is that <#Powerlist.t#> and <#AlternatingPowerList.t#> must be decorated with the $k$ such that the length is $2^k-1$: it is the structural recursion parameter of the <@balance_powerlist@> function. Because it makes the code simpler, we will use a recursive definition rather than an inductive one:
{display"<@Module PowerList.
Fixpoint T (A:Type) (k:nat) :=
match k with
| 0 => unit:Type
| S k' => A * T (A*A) k'
end.
End PowerList.@>"}
We will also need a version where <@k@> can be arbitrary. For that purpose we use Coq's type of dependent pairs <@{ n:nat & F n}@>. The constructor for dependent pairs is written <@⟨ n , x ⟩@>. The implicit version comes with constructors -- <@tpo@> stands for ``twice plus one'':
{display"<@Module PowerList.
@{module_elipsis}@
Definition U (A:Type) := { k:nat & T A k }.
Definition zero {A:Type} : U A := ⟨ 0 , tt ⟩.
Definition tpo {A:Type} (a:A) (l:U (A*A)) : U A :=
let '⟨k,l⟩ := l in
⟨ S k , (a,l) ⟩.
End PowerList@>"}
The definition of <@AlternatingPowerList.T@> and <@AlternatingPowerList.U@> are similar.
{subsection"Full trees"}
To code full trees, we index trees by their height, and specify that leaves can happen only at height $0$ or $1$:
{display"<@Inductive FullTree (A:Type) : nat -> Type :=
| Leaf₀ : FullTree A 0
| Leaf₁ : FullTree A 1
| Node {k:nat} : FullTree A k -> A -> FullTree A k -> FullTree A (S k).@>"}
If we omitted the constructor <@Leaf₁@>, we would have a definition of complete binary trees: both subtrees of a node are complete binary trees of the same height. We allow the full trees to be incomplete by letting <@Leaf₁@> take the place of nodes on the last level.
Using the type <@FullTree A k@> in place of the type <#'a tree#>, the functions <@pass@> and <@balance_powerlist@> are virtually unmodified{footnote
"In fact, as can be seen from its type, <@loop@> only handles non-empty alternating power lists. This is due to a small technicality: the recursive step of <@loop@> is the case <@S (S k)@>, but Coq does not recognise <@S k@> as a structural subterm of <@S (S k)@>, so the definition from Section~{ref_ s_ocaml} does not verifies Coq's structural recursion criterion. As a workaround, the empty case is moved to the <@balance@> function."
} with respect to Section~{ref_ s_ocaml}. Only their types change to reflect the extra information:
{display"<@Definition pass {A k p} : APL.T (FullTree A (S p)) A (S (S k)) ->
APL.T (FullTree A (S (S p))) A (S k).
Fixpoint loop {A k p} : APL.T (FullTree A (S p)) A (S k) ->
FullTree A (plus k (S p)) {struct k}.@>"}
The algorithm indeed builds only full trees.
{subsection"Structural initialisation"}
The padding conversion from lists to power lists, in Section~{ref_ s_ocaml}, is not structural due to the use of <#pair_up#> in the recursive call. To tackle this recursion, we shall make use of another intermediate structure. What we need, essentially, is that all the calls to <#pair_up#> are pre-calculated, so the intermediate structure will be like <#parity#> except that the calls to <#('a*'a) list#> are replaced by inductive calls.
As it turns out, this is another non-uniform datatype which corresponds to a numerical representation. Indeed, any natural number can be written in binary with digits $1$ and $2$ (but not $0$). In this system, for example, $8=1{times}2^2+1{times}2^1+2{times}2^0$ is represented as $112$. Here is the definition, where <@tpo@> reads ``twice plus one'' and <@tpt@> ``twice plus two'':
{display"<@Module BinaryList.
Inductive T (A:Type) : Type :=
| zero
| tpo (a:A) (l:T (A*A))
| tpt (a b: A) (l:T (A*A)).
End BinaryList.@>"}
To turn a non-empty list into a <@BinaryList.T@>, all we need is a function <@cons@> of type <@A -> T A -> T A@> to add an element in front of the list. On the numerical representation side, it corresponds to adding $1$. It behaves like adding $1$ in the usual binary representation, except that $1$-s are turned into $2$-s without a carry and $2$-s into $1$-s while producing a carry:
{display"<@Module BinaryList.
@{module_elipsis}@
Fixpoint cons {A} (a:A) (l:T A) : T A :=
match l with
| zero => tpo a zero
| tpo b l => tpt a b l
| tpt b c l => tpo a (cons (b,c) l)
end.
Definition of_list {A} (l:list A) : T A :=
List.fold_right cons zero l.
End BinaryList.@>"}
Note that while <@cons@> takes, in the worst case, logarithmic time with respect to the length of the list, building a list by repeatedly using <@cons@> is still linear. Indeed, as previously mentioned, <#cons#> mimics the successor algorithm for binary numbers, whose amortized complexity is well-known to be constant.
We also need a function which turns a <@T (A*A)@> into a <@T A@>. This is effectively multiplication by $2$. The lack of $0$ among the digits{footnote"The constructor <@zero@> represents an empty list of digits."} makes this process recursive. A simple presentation of the doubling algorithm consists in adding a $0$ at the end of the number, then eliminating the $0$ using the following equalities:
{displaymath begin just_left `Brace (array [`C;`L;`C] [
array_line [$0$ ; $=$ ; cdot ];
array_line [$x20$ ; $=$ ; $x12$];
array_line [$x10$ ; $=$ ; $x02$];
]) end}
In terms of binary lists:
{display"<@Module BinaryList.
@{module_elipsis}@
Fixpoint twice {A} (l:T (A*A)) : T A :=
match l with
| zero => zero
| tpo (a,b) l => tpt a b (twice l)
| tpt (a,b) cd l => tpt a b (tpo cd l)
end.
End BinaryList.@>"}
We can now write a structurally recursive padding function, using binary lists as the structural argument. As we do not know in advance the length of the produced list, a <@PowerList.U@> is returned. We write <@BL@> as a shorthand for <@BinaryList@>:
{display"<@Module PowerList.
@{module_elipsis}@
Fixpoint of_binary_list {A X} (d:A->X) (f:A*A->X) (l:BL.T A) : U X :=
match l with
| BL.zero => zero
| BL.tpo a l =>
tpo (d a) (of_binary_list (d×d) (f×f) l)
| BL.tpt a b l =>
tpo (f (a,b)) (of_binary_list (d×d) (f×f) l)
end.
End PowerList.@>"}
Where <@g×f@> is the function which maps <@(x,y)@> to <@(g x,f y)@>.
The rest follows straightforwardly, and we can define the following functions which conclude the algorithm (<@BL@>, <@PL@>, and <@APL@> stand for <@BinaryList@>, <@PowerList@>, and <@AlternatingPowerList@> respectively):
{display"<@Module AlternatingPowerList.
@{module_elipsis}@
Definition of_binary_list {A Odd Even}
(d:Odd) (f:A->Odd) (g:A->Even) (l:BL.T A) : U Odd Even :=
match l with
| BL.zero => zero
| BL.tpo a l =>
let d' x := ( g x , d ) in
tpo (f a) (PL.of_binary_list d' (g×f) (BL.twice l))
| BL.tpt a b l =>
let d' x := ( g x , d ) in
tpo (f a) (PL.of_binary_list d' (g×f) (BL.tpo b l))
end.
Definition of_list {A Odd Even}
(d:Odd) (f:A->Odd) (g:A->Even) (l:list A) : U Odd Even :=
of_binary_list d f g (BL.of_list l).
End AlternatingPowerList.
Definition singleton {A:Type} (x:A) : FullTree A 1 :=
Node Leaf₀ x Leaf₀.
Definition balance {A:Type} (l:list A) : { k:nat & FullTree A k } :=
let '⟨k,l⟩ := APL.of_list Leaf₁ singleton (fun x=>x) l in
match k with
| 0 => fun _ => ⟨ 0 , Leaf₀ ⟩
| S k => fun l => ⟨ plus k 1 , loop l ⟩
end l.@>"}
"
let concl = "{section"Conclusion"}
The <@balance@> function of Section~{ref_ s_coq} is, by virtue of its type alone, a total function which turns lists into full binary trees. Yet, to the cost of using intermediary data-structures, it effectively implements the algorithm of Section~{ref_ s_algo}.
The missing piece is to prove that the infix traversal of <@balance l@> is indeed <@l@>. The infix traversal of a (full) tree is represented in Coq with the functions
{display"<@Fixpoint list_of_full_tree_n {A n} (t:FullTree A n) : list A :=
match t with
| Leaf₀ => []
| Leaf₁ => []
| Node _ t₁ x t₂ =>
list_of_full_tree_n t₁ ++ [x] ++ list_of_full_tree_n t₂
end.
Definition list_of_full_tree {A} (t:{ k:nat & FullTree A k }) : list A :=
list_of_full_tree_n (projT2 t).@>"}
We can then state the theorem:
{display"<@Theorem balance_preserves_order A (l:list A) :
list_of_full_tree (balance l) = l.@>"}
The proof is short and straightforward: we define a traversal function for each intermediate structure; and state a variant of <@balance_preserves_order@> for each intermediate function. Proving the intermediate lemmas is not difficult and can be mostly automatised: we use a very simple generic automated tactic, which discharges most goals. This theorem concludes our easy formal proof of the balancing algorithm.
"
let d = concat [
intro;
algo;
ocaml;
coq;
concl;
command \"bibliography\" [A,"library"] A;
]
(*** metadata ***)
open Llncs
let inria = new_institution "Inria Paris-Rocquencourt\\{textsc"Ens"}, Paris, France"
let amu = new_institution "Aix Marseille Université, CNRS, LIF UMR 7279, 13288, Marseille, France"
let funding =
"This research has received funding from the European Research Council under the FP7 grant agreement 278673, Project MemCAD"
let title = {
title = "Balancing lists: a proof pearl{command\"thanks\" [A,funding] A}";
running_title = Some"Balancing lists"
}
let authors = [
{ name = "Guyslain Naves";
email = "guyslain.naves@lif.univ-mrs.fr";
institution = amu;
running_name = None
};
{ name = "Arnaud Spiwack";
email = "arnaud@spiwack.net";
institution = inria;
running_name = None
};
]
let packages = [
"inputenc" , "utf8" ;
"fontenc" , "T1" ;
"textcomp", "";
"microtype" , "" ;
]
let prelude = concat_with_sep [
usepackage "hyperref";
text\"\\\\let\\\\oldampersand\\\\&\";
text\"\\\\renewcommand*\\\\&{{\\\\itshape\\\\oldampersand}}\";
command \"bibliographystyle\" [T,"splncs"] T;
] par
let file = \"fulltrees.tex\"
let _ = emit ~file (document
~title
~authors
~abstract
~prelude
~packages
d)