Skip to content

Commit

Permalink
pack-objects: create new name-hash algorithm (#5157)
Browse files Browse the repository at this point in the history
This is an updated version of gitgitgadget#1785, intended for early
consumption into Git for Windows.

The idea here is to add a new `--full-name-hash` option to `git
pack-objects` and `git repack`. This adjusts the name-hash value used
for finding delta bases in such a way that uses the full path name with
a lower likelihood of collisions than the default name-hash algorithm.
In many repositories with name-hash collisions and many versions of
those paths, this can significantly reduce the size of a full repack. It
can also help in certain cases of `git push`, but only if the pack is
already artificially inflated by name-hash collisions; cases that find
"sibling" deltas as better choices become worse with `--full-name-hash`.

Thus, this option is currently recommended for full repacks of large
repos, and on client machines without reachability bitmaps.

Some care is taken to ignore this option when using bitmaps, either
writing bitmaps or using a bitmap walk during reads. The bitmap file
format contains name-hash values, but no way to indicate which function
is used, so compatibility is a concern for bitmaps. Future work could
explore this idea.

After this PR is merged, then the more-involved `--path-walk` option may
be considered.
  • Loading branch information
dscho committed Oct 8, 2024
2 parents 1b9a293 + 4d481a8 commit e599fe2
Show file tree
Hide file tree
Showing 2 changed files with 2 additions and 2 deletions.
2 changes: 1 addition & 1 deletion builtin/pack-objects.c
Original file line number Diff line number Diff line change
Expand Up @@ -4439,7 +4439,7 @@ int cmd_pack_objects(int argc,
N_("protocol"),
N_("exclude any configured uploadpack.blobpackfileuri with this protocol")),
OPT_BOOL(0, "full-name-hash", &use_full_name_hash,
N_("optimize delta compression across identical path names over time")),
N_("(EXPERIMENTAL!) optimize delta compression across identical path names over time")),
OPT_END(),
};

Expand Down
2 changes: 1 addition & 1 deletion builtin/repack.c
Original file line number Diff line number Diff line change
Expand Up @@ -1209,7 +1209,7 @@ int cmd_repack(int argc,
OPT_BOOL('F', NULL, &po_args.no_reuse_object,
N_("pass --no-reuse-object to git-pack-objects")),
OPT_BOOL(0, "full-name-hash", &po_args.full_name_hash,
N_("pass --full-name-hash to git-pack-objects")),
N_("(EXPERIMENTAL!) pass --full-name-hash to git-pack-objects")),
OPT_NEGBIT('n', NULL, &run_update_server_info,
N_("do not run git-update-server-info"), 1),
OPT__QUIET(&po_args.quiet, N_("be quiet")),
Expand Down

0 comments on commit e599fe2

Please sign in to comment.