forked from git/git
-
Notifications
You must be signed in to change notification settings - Fork 2.5k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
pack-objects: create new name-hash algorithm (#5157)
This is an updated version of gitgitgadget#1785, intended for early consumption into Git for Windows. The idea here is to add a new `--full-name-hash` option to `git pack-objects` and `git repack`. This adjusts the name-hash value used for finding delta bases in such a way that uses the full path name with a lower likelihood of collisions than the default name-hash algorithm. In many repositories with name-hash collisions and many versions of those paths, this can significantly reduce the size of a full repack. It can also help in certain cases of `git push`, but only if the pack is already artificially inflated by name-hash collisions; cases that find "sibling" deltas as better choices become worse with `--full-name-hash`. Thus, this option is currently recommended for full repacks of large repos, and on client machines without reachability bitmaps. Some care is taken to ignore this option when using bitmaps, either writing bitmaps or using a bitmap walk during reads. The bitmap file format contains name-hash values, but no way to indicate which function is used, so compatibility is a concern for bitmaps. Future work could explore this idea. After this PR is merged, then the more-involved `--path-walk` option may be considered.
- Loading branch information
Showing
21 changed files
with
310 additions
and
13 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,24 @@ | ||
/* | ||
* test-name-hash.c: Read a list of paths over stdin and report on their | ||
* name-hash and full name-hash. | ||
*/ | ||
|
||
#include "test-tool.h" | ||
#include "git-compat-util.h" | ||
#include "pack-objects.h" | ||
#include "strbuf.h" | ||
|
||
int cmd__name_hash(int argc UNUSED, const char **argv UNUSED) | ||
{ | ||
struct strbuf line = STRBUF_INIT; | ||
|
||
while (!strbuf_getline(&line, stdin)) { | ||
uint32_t name_hash = pack_name_hash(line.buf); | ||
uint32_t full_hash = pack_full_name_hash(line.buf); | ||
|
||
printf("%10"PRIu32"\t%10"PRIu32"\t%s\n", name_hash, full_hash, line.buf); | ||
} | ||
|
||
strbuf_release(&line); | ||
return 0; | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,73 @@ | ||
#!/bin/sh | ||
|
||
test_description='Tests pack performance using bitmaps' | ||
. ./perf-lib.sh | ||
|
||
GIT_TEST_PASSING_SANITIZE_LEAK=0 | ||
export GIT_TEST_PASSING_SANITIZE_LEAK | ||
|
||
test_perf_large_repo | ||
|
||
test_expect_success 'create rev input' ' | ||
cat >in-thin <<-EOF && | ||
$(git rev-parse HEAD) | ||
^$(git rev-parse HEAD~1) | ||
EOF | ||
cat >in-big <<-EOF | ||
$(git rev-parse HEAD) | ||
^$(git rev-parse HEAD~1000) | ||
EOF | ||
' | ||
|
||
test_perf 'thin pack' ' | ||
git pack-objects --thin --stdout --revs --sparse <in-thin >out | ||
' | ||
|
||
test_size 'thin pack size' ' | ||
test_file_size out | ||
' | ||
|
||
test_perf 'thin pack with --full-name-hash' ' | ||
git pack-objects --thin --stdout --revs --sparse --full-name-hash <in-thin >out | ||
' | ||
|
||
test_size 'thin pack size with --full-name-hash' ' | ||
test_file_size out | ||
' | ||
|
||
test_perf 'big pack' ' | ||
git pack-objects --stdout --revs --sparse <in-big >out | ||
' | ||
|
||
test_size 'big pack size' ' | ||
test_file_size out | ||
' | ||
|
||
test_perf 'big pack with --full-name-hash' ' | ||
git pack-objects --stdout --revs --sparse --full-name-hash <in-big >out | ||
' | ||
|
||
test_size 'big pack size with --full-name-hash' ' | ||
test_file_size out | ||
' | ||
|
||
test_perf 'repack' ' | ||
git repack -adf | ||
' | ||
|
||
test_size 'repack size' ' | ||
pack=$(ls .git/objects/pack/pack-*.pack) && | ||
test_file_size "$pack" | ||
' | ||
|
||
test_perf 'repack with --full-name-hash' ' | ||
git repack -adf --full-name-hash | ||
' | ||
|
||
test_size 'repack size with --full-name-hash' ' | ||
pack=$(ls .git/objects/pack/pack-*.pack) && | ||
test_file_size "$pack" | ||
' | ||
|
||
test_done |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,41 @@ | ||
#!/bin/sh | ||
|
||
test_description='Tests pack performance using bitmaps' | ||
. ./perf-lib.sh | ||
|
||
GIT_TEST_PASSING_SANITIZE_LEAK=0 | ||
export GIT_TEST_PASSING_SANITIZE_LEAK | ||
|
||
test_perf_large_repo | ||
|
||
test_size 'paths at head' ' | ||
git ls-tree -r --name-only HEAD >path-list && | ||
wc -l <path-list | ||
' | ||
|
||
test_size 'number of distinct name-hashes' ' | ||
cat path-list | test-tool name-hash >name-hashes && | ||
cat name-hashes | awk "{ print \$1; }" | sort -n | uniq -c >name-hash-count && | ||
wc -l <name-hash-count | ||
' | ||
|
||
test_size 'number of distinct full-name-hashes' ' | ||
cat name-hashes | awk "{ print \$2; }" | sort -n | uniq -c >full-name-hash-count && | ||
wc -l <full-name-hash-count | ||
' | ||
|
||
test_size 'maximum multiplicity of name-hashes' ' | ||
cat name-hash-count | \ | ||
sort -nr | \ | ||
head -n 1 | \ | ||
awk "{ print \$1; }" | ||
' | ||
|
||
test_size 'maximum multiplicity of fullname-hashes' ' | ||
cat full-name-hash-count | \ | ||
sort -nr | \ | ||
head -n 1 | \ | ||
awk "{ print \$1; }" | ||
' | ||
|
||
test_done |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -45,7 +45,6 @@ rebase | |
remote | ||
remote-ext | ||
remote-fd | ||
repack | ||
reset | ||
restore | ||
rev-parse | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.