-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Introduce 'git backfill' to get missing blobs in a partial clone #5172
Conversation
In anticipation of implementing 'git backfill', populate the necessary files with the boilerplate of a new builtin. RFC TODO: When preparing this for a full implementation, make sure it is based on the newest standards introduced by [1]. [1] https://lore.kernel.org/git/xmqqjzfq2f0f.fsf@gitster.g/T/#m606036ea2e75a6d6819d6b5c90e729643b0ff7f7 [PATCH 1/3] builtin: add a repository parameter for builtin functions Signed-off-by: Derrick Stolee <stolee@gmail.com>
The default behavior of 'git backfill' is to fetch all missing blobs that are reachable from HEAD. Document and test this behavior. The implementation is a very simple use of the path-walk API, initializing the revision walk at HEAD to start the path-walk from all commits reachable from HEAD. Ignore the object arrays that correspond to tree entries, assuming that they are all present already. Signed-off-by: Derrick Stolee <stolee@gmail.com>
Users may want to specify a minimum batch size for their needs. This is only a minimum: the path-walk API provides a list of OIDs that correspond to the same path, and thus it is optimal to allow delta compression across those objects in a single server request. We could consider limiting the request to have a maximum batch size in the future. Signed-off-by: Derrick Stolee <stolee@gmail.com>
One way to significantly reduce the cost of a Git clone and later fetches is to use a blobless partial clone and combine that with a sparse-checkout that reduces the paths that need to be populated in the working directory. Not only does this reduce the cost of clones and fetches, the sparse-checkout reduces the number of objects needed to download from a promisor remote. However, history investigations can be expensie as computing blob diffs will trigger promisor remote requests for one object at a time. This can be avoided by downloading the blobs needed for the given sparse-checkout using 'git backfill' and its new '--sparse' mode, at a time that the user is willing to pay that extra cost. Note that this is distinctly different from the '--filter=sparse:<oid>' option, as this assumes that the partial clone has all reachable trees and we are using client-side logic to avoid downloading blobs outside of the sparse-checkout cone. This avoids the server-side cost of walking trees while also achieving a similar goal. It also downloads in batches based on similar path names, presenting a resumable download if things are interrupted. This augments the path-walk API to have a possibly-NULL 'pl' member that may point to a 'struct pattern_list'. This could be more general than the sparse-checkout definition at HEAD, but 'git backfill --sparse' is currently the only consumer. Be sure to test this in both cone mode and not cone mode. Cone mode has the benefit that the path-walk can skip certain paths once they would expand beyond the sparse-checkout. Signed-off-by: Derrick Stolee <stolee@gmail.com>
The previous change introduced the '--[no-]sparse' option for the 'git backfill' command, but did not assume it as enabled by default. However, this is likely the behavior that users will most often want to happen. Without this default, users with a small sparse-checkout may be confused when 'git backfill' downloads every version of every object in the full history. However, this is left as a separate change so this decision can be reviewed independently of the value of the '--[no-]sparse' option. Add a test of adding the '--sparse' option to a repo without sparse-checkout to make it clear that supplying it without a sparse-checkout is an error. Signed-off-by: Derrick Stolee <stolee@gmail.com>
builtin/backfill.c
Outdated
#include "path-walk.h" | ||
|
||
static const char * const builtin_backfill_usage[] = { | ||
N_("git backfill [--batch-size=<n>] [--[no-]sparse]"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In before @dscho: this needs an (EXPERIMENTAL!)
flag.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See here: 3793d09 @derrickstolee!
This is a highly useful command, and we want it to get some testing "in the wild". However, the patches have not yet been reviewed on the Git mailing list, and are therefore subject to change. By marking the command as experimental, users will be warned to pay attention to those changes. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great stuff! Most of the patches I saw before, in one form or other, so there were no surprises for me.
I've added one more commit (to mark the command as experimental), and plan on merging this tomorrow (so that we get a new snapshot, with a brand-new MSYS2 runtime, no less!).
Thanks, @dscho. One more big one to go! |
/add relnote feature The new, experimental The workflow run was started |
The new, experimental `git backfill` command [was added](git-for-windows/git#5172): It helps fetching relevant Git objects smartly in a partial, sparse clone. Signed-off-by: gitforwindowshelper[bot] <gitforwindowshelper-bot@users.noreply.github.com>
This change introduces the `git backfill` command which uses the path walk API to download missing blobs in a blobless partial clone. By downloading blobs that correspond to the same file path at the same time, we hope to maximize the potential benefits of delta compression against multiple versions. These downloads occur in a configurable batch size, presenting a mechanism to perform "resumable" clones: `git clone --filter=blob:none` gets the commits and trees, then `git backfill` will download all missing blobs. If `git backfill` is interrupted partway through, it can be restarted and will redownload only the missing objects. When combining blobless partial clones with sparse-checkout, `git backfill` will assume its `--sparse` option and download only the blobs within the sparse-checkout. Users may want to do this as the repo size will still be smaller than the full repo size, but commands like `git blame` or `git log -L` will not suffer from many one-by-one blob downloads. Future directions should consider adding a pathspec or file prefix to further focus which paths are being downloaded in a batch.
This change introduces the `git backfill` command which uses the path walk API to download missing blobs in a blobless partial clone. By downloading blobs that correspond to the same file path at the same time, we hope to maximize the potential benefits of delta compression against multiple versions. These downloads occur in a configurable batch size, presenting a mechanism to perform "resumable" clones: `git clone --filter=blob:none` gets the commits and trees, then `git backfill` will download all missing blobs. If `git backfill` is interrupted partway through, it can be restarted and will redownload only the missing objects. When combining blobless partial clones with sparse-checkout, `git backfill` will assume its `--sparse` option and download only the blobs within the sparse-checkout. Users may want to do this as the repo size will still be smaller than the full repo size, but commands like `git blame` or `git log -L` will not suffer from many one-by-one blob downloads. Future directions should consider adding a pathspec or file prefix to further focus which paths are being downloaded in a batch.
This change introduces the `git backfill` command which uses the path walk API to download missing blobs in a blobless partial clone. By downloading blobs that correspond to the same file path at the same time, we hope to maximize the potential benefits of delta compression against multiple versions. These downloads occur in a configurable batch size, presenting a mechanism to perform "resumable" clones: `git clone --filter=blob:none` gets the commits and trees, then `git backfill` will download all missing blobs. If `git backfill` is interrupted partway through, it can be restarted and will redownload only the missing objects. When combining blobless partial clones with sparse-checkout, `git backfill` will assume its `--sparse` option and download only the blobs within the sparse-checkout. Users may want to do this as the repo size will still be smaller than the full repo size, but commands like `git blame` or `git log -L` will not suffer from many one-by-one blob downloads. Future directions should consider adding a pathspec or file prefix to further focus which paths are being downloaded in a batch.
This change introduces the `git backfill` command which uses the path walk API to download missing blobs in a blobless partial clone. By downloading blobs that correspond to the same file path at the same time, we hope to maximize the potential benefits of delta compression against multiple versions. These downloads occur in a configurable batch size, presenting a mechanism to perform "resumable" clones: `git clone --filter=blob:none` gets the commits and trees, then `git backfill` will download all missing blobs. If `git backfill` is interrupted partway through, it can be restarted and will redownload only the missing objects. When combining blobless partial clones with sparse-checkout, `git backfill` will assume its `--sparse` option and download only the blobs within the sparse-checkout. Users may want to do this as the repo size will still be smaller than the full repo size, but commands like `git blame` or `git log -L` will not suffer from many one-by-one blob downloads. Future directions should consider adding a pathspec or file prefix to further focus which paths are being downloaded in a batch.
This change introduces the `git backfill` command which uses the path walk API to download missing blobs in a blobless partial clone. By downloading blobs that correspond to the same file path at the same time, we hope to maximize the potential benefits of delta compression against multiple versions. These downloads occur in a configurable batch size, presenting a mechanism to perform "resumable" clones: `git clone --filter=blob:none` gets the commits and trees, then `git backfill` will download all missing blobs. If `git backfill` is interrupted partway through, it can be restarted and will redownload only the missing objects. When combining blobless partial clones with sparse-checkout, `git backfill` will assume its `--sparse` option and download only the blobs within the sparse-checkout. Users may want to do this as the repo size will still be smaller than the full repo size, but commands like `git blame` or `git log -L` will not suffer from many one-by-one blob downloads. Future directions should consider adding a pathspec or file prefix to further focus which paths are being downloaded in a batch.
This change introduces the `git backfill` command which uses the path walk API to download missing blobs in a blobless partial clone. By downloading blobs that correspond to the same file path at the same time, we hope to maximize the potential benefits of delta compression against multiple versions. These downloads occur in a configurable batch size, presenting a mechanism to perform "resumable" clones: `git clone --filter=blob:none` gets the commits and trees, then `git backfill` will download all missing blobs. If `git backfill` is interrupted partway through, it can be restarted and will redownload only the missing objects. When combining blobless partial clones with sparse-checkout, `git backfill` will assume its `--sparse` option and download only the blobs within the sparse-checkout. Users may want to do this as the repo size will still be smaller than the full repo size, but commands like `git blame` or `git log -L` will not suffer from many one-by-one blob downloads. Future directions should consider adding a pathspec or file prefix to further focus which paths are being downloaded in a batch.
This change introduces the `git backfill` command which uses the path walk API to download missing blobs in a blobless partial clone. By downloading blobs that correspond to the same file path at the same time, we hope to maximize the potential benefits of delta compression against multiple versions. These downloads occur in a configurable batch size, presenting a mechanism to perform "resumable" clones: `git clone --filter=blob:none` gets the commits and trees, then `git backfill` will download all missing blobs. If `git backfill` is interrupted partway through, it can be restarted and will redownload only the missing objects. When combining blobless partial clones with sparse-checkout, `git backfill` will assume its `--sparse` option and download only the blobs within the sparse-checkout. Users may want to do this as the repo size will still be smaller than the full repo size, but commands like `git blame` or `git log -L` will not suffer from many one-by-one blob downloads. Future directions should consider adding a pathspec or file prefix to further focus which paths are being downloaded in a batch.
This change introduces the `git backfill` command which uses the path walk API to download missing blobs in a blobless partial clone. By downloading blobs that correspond to the same file path at the same time, we hope to maximize the potential benefits of delta compression against multiple versions. These downloads occur in a configurable batch size, presenting a mechanism to perform "resumable" clones: `git clone --filter=blob:none` gets the commits and trees, then `git backfill` will download all missing blobs. If `git backfill` is interrupted partway through, it can be restarted and will redownload only the missing objects. When combining blobless partial clones with sparse-checkout, `git backfill` will assume its `--sparse` option and download only the blobs within the sparse-checkout. Users may want to do this as the repo size will still be smaller than the full repo size, but commands like `git blame` or `git log -L` will not suffer from many one-by-one blob downloads. Future directions should consider adding a pathspec or file prefix to further focus which paths are being downloaded in a batch.
…-for-windows#5172) This change introduces the `git backfill` command which uses the path walk API to download missing blobs in a blobless partial clone. By downloading blobs that correspond to the same file path at the same time, we hope to maximize the potential benefits of delta compression against multiple versions. These downloads occur in a configurable batch size, presenting a mechanism to perform "resumable" clones: `git clone --filter=blob:none` gets the commits and trees, then `git backfill` will download all missing blobs. If `git backfill` is interrupted partway through, it can be restarted and will redownload only the missing objects. When combining blobless partial clones with sparse-checkout, `git backfill` will assume its `--sparse` option and download only the blobs within the sparse-checkout. Users may want to do this as the repo size will still be smaller than the full repo size, but commands like `git blame` or `git log -L` will not suffer from many one-by-one blob downloads. Future directions should consider adding a pathspec or file prefix to further focus which paths are being downloaded in a batch.
…-for-windows#5172) This change introduces the `git backfill` command which uses the path walk API to download missing blobs in a blobless partial clone. By downloading blobs that correspond to the same file path at the same time, we hope to maximize the potential benefits of delta compression against multiple versions. These downloads occur in a configurable batch size, presenting a mechanism to perform "resumable" clones: `git clone --filter=blob:none` gets the commits and trees, then `git backfill` will download all missing blobs. If `git backfill` is interrupted partway through, it can be restarted and will redownload only the missing objects. When combining blobless partial clones with sparse-checkout, `git backfill` will assume its `--sparse` option and download only the blobs within the sparse-checkout. Users may want to do this as the repo size will still be smaller than the full repo size, but commands like `git blame` or `git log -L` will not suffer from many one-by-one blob downloads. Future directions should consider adding a pathspec or file prefix to further focus which paths are being downloaded in a batch.
This change introduces the `git backfill` command which uses the path walk API to download missing blobs in a blobless partial clone. By downloading blobs that correspond to the same file path at the same time, we hope to maximize the potential benefits of delta compression against multiple versions. These downloads occur in a configurable batch size, presenting a mechanism to perform "resumable" clones: `git clone --filter=blob:none` gets the commits and trees, then `git backfill` will download all missing blobs. If `git backfill` is interrupted partway through, it can be restarted and will redownload only the missing objects. When combining blobless partial clones with sparse-checkout, `git backfill` will assume its `--sparse` option and download only the blobs within the sparse-checkout. Users may want to do this as the repo size will still be smaller than the full repo size, but commands like `git blame` or `git log -L` will not suffer from many one-by-one blob downloads. Future directions should consider adding a pathspec or file prefix to further focus which paths are being downloaded in a batch.
This change introduces the `git backfill` command which uses the path walk API to download missing blobs in a blobless partial clone. By downloading blobs that correspond to the same file path at the same time, we hope to maximize the potential benefits of delta compression against multiple versions. These downloads occur in a configurable batch size, presenting a mechanism to perform "resumable" clones: `git clone --filter=blob:none` gets the commits and trees, then `git backfill` will download all missing blobs. If `git backfill` is interrupted partway through, it can be restarted and will redownload only the missing objects. When combining blobless partial clones with sparse-checkout, `git backfill` will assume its `--sparse` option and download only the blobs within the sparse-checkout. Users may want to do this as the repo size will still be smaller than the full repo size, but commands like `git blame` or `git log -L` will not suffer from many one-by-one blob downloads. Future directions should consider adding a pathspec or file prefix to further focus which paths are being downloaded in a batch.
This change introduces the `git backfill` command which uses the path walk API to download missing blobs in a blobless partial clone. By downloading blobs that correspond to the same file path at the same time, we hope to maximize the potential benefits of delta compression against multiple versions. These downloads occur in a configurable batch size, presenting a mechanism to perform "resumable" clones: `git clone --filter=blob:none` gets the commits and trees, then `git backfill` will download all missing blobs. If `git backfill` is interrupted partway through, it can be restarted and will redownload only the missing objects. When combining blobless partial clones with sparse-checkout, `git backfill` will assume its `--sparse` option and download only the blobs within the sparse-checkout. Users may want to do this as the repo size will still be smaller than the full repo size, but commands like `git blame` or `git log -L` will not suffer from many one-by-one blob downloads. Future directions should consider adding a pathspec or file prefix to further focus which paths are being downloaded in a batch.
…-for-windows#5172) This change introduces the `git backfill` command which uses the path walk API to download missing blobs in a blobless partial clone. By downloading blobs that correspond to the same file path at the same time, we hope to maximize the potential benefits of delta compression against multiple versions. These downloads occur in a configurable batch size, presenting a mechanism to perform "resumable" clones: `git clone --filter=blob:none` gets the commits and trees, then `git backfill` will download all missing blobs. If `git backfill` is interrupted partway through, it can be restarted and will redownload only the missing objects. When combining blobless partial clones with sparse-checkout, `git backfill` will assume its `--sparse` option and download only the blobs within the sparse-checkout. Users may want to do this as the repo size will still be smaller than the full repo size, but commands like `git blame` or `git log -L` will not suffer from many one-by-one blob downloads. Future directions should consider adding a pathspec or file prefix to further focus which paths are being downloaded in a batch.
This change introduces the `git backfill` command which uses the path walk API to download missing blobs in a blobless partial clone. By downloading blobs that correspond to the same file path at the same time, we hope to maximize the potential benefits of delta compression against multiple versions. These downloads occur in a configurable batch size, presenting a mechanism to perform "resumable" clones: `git clone --filter=blob:none` gets the commits and trees, then `git backfill` will download all missing blobs. If `git backfill` is interrupted partway through, it can be restarted and will redownload only the missing objects. When combining blobless partial clones with sparse-checkout, `git backfill` will assume its `--sparse` option and download only the blobs within the sparse-checkout. Users may want to do this as the repo size will still be smaller than the full repo size, but commands like `git blame` or `git log -L` will not suffer from many one-by-one blob downloads. Future directions should consider adding a pathspec or file prefix to further focus which paths are being downloaded in a batch.
This change introduces the `git backfill` command which uses the path walk API to download missing blobs in a blobless partial clone. By downloading blobs that correspond to the same file path at the same time, we hope to maximize the potential benefits of delta compression against multiple versions. These downloads occur in a configurable batch size, presenting a mechanism to perform "resumable" clones: `git clone --filter=blob:none` gets the commits and trees, then `git backfill` will download all missing blobs. If `git backfill` is interrupted partway through, it can be restarted and will redownload only the missing objects. When combining blobless partial clones with sparse-checkout, `git backfill` will assume its `--sparse` option and download only the blobs within the sparse-checkout. Users may want to do this as the repo size will still be smaller than the full repo size, but commands like `git blame` or `git log -L` will not suffer from many one-by-one blob downloads. Future directions should consider adding a pathspec or file prefix to further focus which paths are being downloaded in a batch.
This change introduces the `git backfill` command which uses the path walk API to download missing blobs in a blobless partial clone. By downloading blobs that correspond to the same file path at the same time, we hope to maximize the potential benefits of delta compression against multiple versions. These downloads occur in a configurable batch size, presenting a mechanism to perform "resumable" clones: `git clone --filter=blob:none` gets the commits and trees, then `git backfill` will download all missing blobs. If `git backfill` is interrupted partway through, it can be restarted and will redownload only the missing objects. When combining blobless partial clones with sparse-checkout, `git backfill` will assume its `--sparse` option and download only the blobs within the sparse-checkout. Users may want to do this as the repo size will still be smaller than the full repo size, but commands like `git blame` or `git log -L` will not suffer from many one-by-one blob downloads. Future directions should consider adding a pathspec or file prefix to further focus which paths are being downloaded in a batch.
This change introduces the `git backfill` command which uses the path walk API to download missing blobs in a blobless partial clone. By downloading blobs that correspond to the same file path at the same time, we hope to maximize the potential benefits of delta compression against multiple versions. These downloads occur in a configurable batch size, presenting a mechanism to perform "resumable" clones: `git clone --filter=blob:none` gets the commits and trees, then `git backfill` will download all missing blobs. If `git backfill` is interrupted partway through, it can be restarted and will redownload only the missing objects. When combining blobless partial clones with sparse-checkout, `git backfill` will assume its `--sparse` option and download only the blobs within the sparse-checkout. Users may want to do this as the repo size will still be smaller than the full repo size, but commands like `git blame` or `git log -L` will not suffer from many one-by-one blob downloads. Future directions should consider adding a pathspec or file prefix to further focus which paths are being downloaded in a batch.
This change introduces the `git backfill` command which uses the path walk API to download missing blobs in a blobless partial clone. By downloading blobs that correspond to the same file path at the same time, we hope to maximize the potential benefits of delta compression against multiple versions. These downloads occur in a configurable batch size, presenting a mechanism to perform "resumable" clones: `git clone --filter=blob:none` gets the commits and trees, then `git backfill` will download all missing blobs. If `git backfill` is interrupted partway through, it can be restarted and will redownload only the missing objects. When combining blobless partial clones with sparse-checkout, `git backfill` will assume its `--sparse` option and download only the blobs within the sparse-checkout. Users may want to do this as the repo size will still be smaller than the full repo size, but commands like `git blame` or `git log -L` will not suffer from many one-by-one blob downloads. Future directions should consider adding a pathspec or file prefix to further focus which paths are being downloaded in a batch.
This change introduces the `git backfill` command which uses the path walk API to download missing blobs in a blobless partial clone. By downloading blobs that correspond to the same file path at the same time, we hope to maximize the potential benefits of delta compression against multiple versions. These downloads occur in a configurable batch size, presenting a mechanism to perform "resumable" clones: `git clone --filter=blob:none` gets the commits and trees, then `git backfill` will download all missing blobs. If `git backfill` is interrupted partway through, it can be restarted and will redownload only the missing objects. When combining blobless partial clones with sparse-checkout, `git backfill` will assume its `--sparse` option and download only the blobs within the sparse-checkout. Users may want to do this as the repo size will still be smaller than the full repo size, but commands like `git blame` or `git log -L` will not suffer from many one-by-one blob downloads. Future directions should consider adding a pathspec or file prefix to further focus which paths are being downloaded in a batch.
This change introduces the `git backfill` command which uses the path walk API to download missing blobs in a blobless partial clone. By downloading blobs that correspond to the same file path at the same time, we hope to maximize the potential benefits of delta compression against multiple versions. These downloads occur in a configurable batch size, presenting a mechanism to perform "resumable" clones: `git clone --filter=blob:none` gets the commits and trees, then `git backfill` will download all missing blobs. If `git backfill` is interrupted partway through, it can be restarted and will redownload only the missing objects. When combining blobless partial clones with sparse-checkout, `git backfill` will assume its `--sparse` option and download only the blobs within the sparse-checkout. Users may want to do this as the repo size will still be smaller than the full repo size, but commands like `git blame` or `git log -L` will not suffer from many one-by-one blob downloads. Future directions should consider adding a pathspec or file prefix to further focus which paths are being downloaded in a batch.
This change introduces the `git backfill` command which uses the path walk API to download missing blobs in a blobless partial clone. By downloading blobs that correspond to the same file path at the same time, we hope to maximize the potential benefits of delta compression against multiple versions. These downloads occur in a configurable batch size, presenting a mechanism to perform "resumable" clones: `git clone --filter=blob:none` gets the commits and trees, then `git backfill` will download all missing blobs. If `git backfill` is interrupted partway through, it can be restarted and will redownload only the missing objects. When combining blobless partial clones with sparse-checkout, `git backfill` will assume its `--sparse` option and download only the blobs within the sparse-checkout. Users may want to do this as the repo size will still be smaller than the full repo size, but commands like `git blame` or `git log -L` will not suffer from many one-by-one blob downloads. Future directions should consider adding a pathspec or file prefix to further focus which paths are being downloaded in a batch.
This change introduces the `git backfill` command which uses the path walk API to download missing blobs in a blobless partial clone. By downloading blobs that correspond to the same file path at the same time, we hope to maximize the potential benefits of delta compression against multiple versions. These downloads occur in a configurable batch size, presenting a mechanism to perform "resumable" clones: `git clone --filter=blob:none` gets the commits and trees, then `git backfill` will download all missing blobs. If `git backfill` is interrupted partway through, it can be restarted and will redownload only the missing objects. When combining blobless partial clones with sparse-checkout, `git backfill` will assume its `--sparse` option and download only the blobs within the sparse-checkout. Users may want to do this as the repo size will still be smaller than the full repo size, but commands like `git blame` or `git log -L` will not suffer from many one-by-one blob downloads. Future directions should consider adding a pathspec or file prefix to further focus which paths are being downloaded in a batch.
This change introduces the `git backfill` command which uses the path walk API to download missing blobs in a blobless partial clone. By downloading blobs that correspond to the same file path at the same time, we hope to maximize the potential benefits of delta compression against multiple versions. These downloads occur in a configurable batch size, presenting a mechanism to perform "resumable" clones: `git clone --filter=blob:none` gets the commits and trees, then `git backfill` will download all missing blobs. If `git backfill` is interrupted partway through, it can be restarted and will redownload only the missing objects. When combining blobless partial clones with sparse-checkout, `git backfill` will assume its `--sparse` option and download only the blobs within the sparse-checkout. Users may want to do this as the repo size will still be smaller than the full repo size, but commands like `git blame` or `git log -L` will not suffer from many one-by-one blob downloads. Future directions should consider adding a pathspec or file prefix to further focus which paths are being downloaded in a batch.
This change introduces the `git backfill` command which uses the path walk API to download missing blobs in a blobless partial clone. By downloading blobs that correspond to the same file path at the same time, we hope to maximize the potential benefits of delta compression against multiple versions. These downloads occur in a configurable batch size, presenting a mechanism to perform "resumable" clones: `git clone --filter=blob:none` gets the commits and trees, then `git backfill` will download all missing blobs. If `git backfill` is interrupted partway through, it can be restarted and will redownload only the missing objects. When combining blobless partial clones with sparse-checkout, `git backfill` will assume its `--sparse` option and download only the blobs within the sparse-checkout. Users may want to do this as the repo size will still be smaller than the full repo size, but commands like `git blame` or `git log -L` will not suffer from many one-by-one blob downloads. Future directions should consider adding a pathspec or file prefix to further focus which paths are being downloaded in a batch.
This change introduces the `git backfill` command which uses the path walk API to download missing blobs in a blobless partial clone. By downloading blobs that correspond to the same file path at the same time, we hope to maximize the potential benefits of delta compression against multiple versions. These downloads occur in a configurable batch size, presenting a mechanism to perform "resumable" clones: `git clone --filter=blob:none` gets the commits and trees, then `git backfill` will download all missing blobs. If `git backfill` is interrupted partway through, it can be restarted and will redownload only the missing objects. When combining blobless partial clones with sparse-checkout, `git backfill` will assume its `--sparse` option and download only the blobs within the sparse-checkout. Users may want to do this as the repo size will still be smaller than the full repo size, but commands like `git blame` or `git log -L` will not suffer from many one-by-one blob downloads. Future directions should consider adding a pathspec or file prefix to further focus which paths are being downloaded in a batch.
This change introduces the `git backfill` command which uses the path walk API to download missing blobs in a blobless partial clone. By downloading blobs that correspond to the same file path at the same time, we hope to maximize the potential benefits of delta compression against multiple versions. These downloads occur in a configurable batch size, presenting a mechanism to perform "resumable" clones: `git clone --filter=blob:none` gets the commits and trees, then `git backfill` will download all missing blobs. If `git backfill` is interrupted partway through, it can be restarted and will redownload only the missing objects. When combining blobless partial clones with sparse-checkout, `git backfill` will assume its `--sparse` option and download only the blobs within the sparse-checkout. Users may want to do this as the repo size will still be smaller than the full repo size, but commands like `git blame` or `git log -L` will not suffer from many one-by-one blob downloads. Future directions should consider adding a pathspec or file prefix to further focus which paths are being downloaded in a batch.
This change introduces the `git backfill` command which uses the path walk API to download missing blobs in a blobless partial clone. By downloading blobs that correspond to the same file path at the same time, we hope to maximize the potential benefits of delta compression against multiple versions. These downloads occur in a configurable batch size, presenting a mechanism to perform "resumable" clones: `git clone --filter=blob:none` gets the commits and trees, then `git backfill` will download all missing blobs. If `git backfill` is interrupted partway through, it can be restarted and will redownload only the missing objects. When combining blobless partial clones with sparse-checkout, `git backfill` will assume its `--sparse` option and download only the blobs within the sparse-checkout. Users may want to do this as the repo size will still be smaller than the full repo size, but commands like `git blame` or `git log -L` will not suffer from many one-by-one blob downloads. Future directions should consider adding a pathspec or file prefix to further focus which paths are being downloaded in a batch.
This change introduces the `git backfill` command which uses the path walk API to download missing blobs in a blobless partial clone. By downloading blobs that correspond to the same file path at the same time, we hope to maximize the potential benefits of delta compression against multiple versions. These downloads occur in a configurable batch size, presenting a mechanism to perform "resumable" clones: `git clone --filter=blob:none` gets the commits and trees, then `git backfill` will download all missing blobs. If `git backfill` is interrupted partway through, it can be restarted and will redownload only the missing objects. When combining blobless partial clones with sparse-checkout, `git backfill` will assume its `--sparse` option and download only the blobs within the sparse-checkout. Users may want to do this as the repo size will still be smaller than the full repo size, but commands like `git blame` or `git log -L` will not suffer from many one-by-one blob downloads. Future directions should consider adding a pathspec or file prefix to further focus which paths are being downloaded in a batch.
This change introduces the `git backfill` command which uses the path walk API to download missing blobs in a blobless partial clone. By downloading blobs that correspond to the same file path at the same time, we hope to maximize the potential benefits of delta compression against multiple versions. These downloads occur in a configurable batch size, presenting a mechanism to perform "resumable" clones: `git clone --filter=blob:none` gets the commits and trees, then `git backfill` will download all missing blobs. If `git backfill` is interrupted partway through, it can be restarted and will redownload only the missing objects. When combining blobless partial clones with sparse-checkout, `git backfill` will assume its `--sparse` option and download only the blobs within the sparse-checkout. Users may want to do this as the repo size will still be smaller than the full repo size, but commands like `git blame` or `git log -L` will not suffer from many one-by-one blob downloads. Future directions should consider adding a pathspec or file prefix to further focus which paths are being downloaded in a batch.
This change introduces the `git backfill` command which uses the path walk API to download missing blobs in a blobless partial clone. By downloading blobs that correspond to the same file path at the same time, we hope to maximize the potential benefits of delta compression against multiple versions. These downloads occur in a configurable batch size, presenting a mechanism to perform "resumable" clones: `git clone --filter=blob:none` gets the commits and trees, then `git backfill` will download all missing blobs. If `git backfill` is interrupted partway through, it can be restarted and will redownload only the missing objects. When combining blobless partial clones with sparse-checkout, `git backfill` will assume its `--sparse` option and download only the blobs within the sparse-checkout. Users may want to do this as the repo size will still be smaller than the full repo size, but commands like `git blame` or `git log -L` will not suffer from many one-by-one blob downloads. Future directions should consider adding a pathspec or file prefix to further focus which paths are being downloaded in a batch.
This change introduces the `git backfill` command which uses the path walk API to download missing blobs in a blobless partial clone. By downloading blobs that correspond to the same file path at the same time, we hope to maximize the potential benefits of delta compression against multiple versions. These downloads occur in a configurable batch size, presenting a mechanism to perform "resumable" clones: `git clone --filter=blob:none` gets the commits and trees, then `git backfill` will download all missing blobs. If `git backfill` is interrupted partway through, it can be restarted and will redownload only the missing objects. When combining blobless partial clones with sparse-checkout, `git backfill` will assume its `--sparse` option and download only the blobs within the sparse-checkout. Users may want to do this as the repo size will still be smaller than the full repo size, but commands like `git blame` or `git log -L` will not suffer from many one-by-one blob downloads. Future directions should consider adding a pathspec or file prefix to further focus which paths are being downloaded in a batch.
This change introduces the `git backfill` command which uses the path walk API to download missing blobs in a blobless partial clone. By downloading blobs that correspond to the same file path at the same time, we hope to maximize the potential benefits of delta compression against multiple versions. These downloads occur in a configurable batch size, presenting a mechanism to perform "resumable" clones: `git clone --filter=blob:none` gets the commits and trees, then `git backfill` will download all missing blobs. If `git backfill` is interrupted partway through, it can be restarted and will redownload only the missing objects. When combining blobless partial clones with sparse-checkout, `git backfill` will assume its `--sparse` option and download only the blobs within the sparse-checkout. Users may want to do this as the repo size will still be smaller than the full repo size, but commands like `git blame` or `git log -L` will not suffer from many one-by-one blob downloads. Future directions should consider adding a pathspec or file prefix to further focus which paths are being downloaded in a batch.
This change introduces the `git backfill` command which uses the path walk API to download missing blobs in a blobless partial clone. By downloading blobs that correspond to the same file path at the same time, we hope to maximize the potential benefits of delta compression against multiple versions. These downloads occur in a configurable batch size, presenting a mechanism to perform "resumable" clones: `git clone --filter=blob:none` gets the commits and trees, then `git backfill` will download all missing blobs. If `git backfill` is interrupted partway through, it can be restarted and will redownload only the missing objects. When combining blobless partial clones with sparse-checkout, `git backfill` will assume its `--sparse` option and download only the blobs within the sparse-checkout. Users may want to do this as the repo size will still be smaller than the full repo size, but commands like `git blame` or `git log -L` will not suffer from many one-by-one blob downloads. Future directions should consider adding a pathspec or file prefix to further focus which paths are being downloaded in a batch.
This change introduces the `git backfill` command which uses the path walk API to download missing blobs in a blobless partial clone. By downloading blobs that correspond to the same file path at the same time, we hope to maximize the potential benefits of delta compression against multiple versions. These downloads occur in a configurable batch size, presenting a mechanism to perform "resumable" clones: `git clone --filter=blob:none` gets the commits and trees, then `git backfill` will download all missing blobs. If `git backfill` is interrupted partway through, it can be restarted and will redownload only the missing objects. When combining blobless partial clones with sparse-checkout, `git backfill` will assume its `--sparse` option and download only the blobs within the sparse-checkout. Users may want to do this as the repo size will still be smaller than the full repo size, but commands like `git blame` or `git log -L` will not suffer from many one-by-one blob downloads. Future directions should consider adding a pathspec or file prefix to further focus which paths are being downloaded in a batch.
This change introduces the `git backfill` command which uses the path walk API to download missing blobs in a blobless partial clone. By downloading blobs that correspond to the same file path at the same time, we hope to maximize the potential benefits of delta compression against multiple versions. These downloads occur in a configurable batch size, presenting a mechanism to perform "resumable" clones: `git clone --filter=blob:none` gets the commits and trees, then `git backfill` will download all missing blobs. If `git backfill` is interrupted partway through, it can be restarted and will redownload only the missing objects. When combining blobless partial clones with sparse-checkout, `git backfill` will assume its `--sparse` option and download only the blobs within the sparse-checkout. Users may want to do this as the repo size will still be smaller than the full repo size, but commands like `git blame` or `git log -L` will not suffer from many one-by-one blob downloads. Future directions should consider adding a pathspec or file prefix to further focus which paths are being downloaded in a batch.
This change introduces the `git backfill` command which uses the path walk API to download missing blobs in a blobless partial clone. By downloading blobs that correspond to the same file path at the same time, we hope to maximize the potential benefits of delta compression against multiple versions. These downloads occur in a configurable batch size, presenting a mechanism to perform "resumable" clones: `git clone --filter=blob:none` gets the commits and trees, then `git backfill` will download all missing blobs. If `git backfill` is interrupted partway through, it can be restarted and will redownload only the missing objects. When combining blobless partial clones with sparse-checkout, `git backfill` will assume its `--sparse` option and download only the blobs within the sparse-checkout. Users may want to do this as the repo size will still be smaller than the full repo size, but commands like `git blame` or `git log -L` will not suffer from many one-by-one blob downloads. Future directions should consider adding a pathspec or file prefix to further focus which paths are being downloaded in a batch.
This change introduces the `git backfill` command which uses the path walk API to download missing blobs in a blobless partial clone. By downloading blobs that correspond to the same file path at the same time, we hope to maximize the potential benefits of delta compression against multiple versions. These downloads occur in a configurable batch size, presenting a mechanism to perform "resumable" clones: `git clone --filter=blob:none` gets the commits and trees, then `git backfill` will download all missing blobs. If `git backfill` is interrupted partway through, it can be restarted and will redownload only the missing objects. When combining blobless partial clones with sparse-checkout, `git backfill` will assume its `--sparse` option and download only the blobs within the sparse-checkout. Users may want to do this as the repo size will still be smaller than the full repo size, but commands like `git blame` or `git log -L` will not suffer from many one-by-one blob downloads. Future directions should consider adding a pathspec or file prefix to further focus which paths are being downloaded in a batch.
This change introduces the `git backfill` command which uses the path walk API to download missing blobs in a blobless partial clone. By downloading blobs that correspond to the same file path at the same time, we hope to maximize the potential benefits of delta compression against multiple versions. These downloads occur in a configurable batch size, presenting a mechanism to perform "resumable" clones: `git clone --filter=blob:none` gets the commits and trees, then `git backfill` will download all missing blobs. If `git backfill` is interrupted partway through, it can be restarted and will redownload only the missing objects. When combining blobless partial clones with sparse-checkout, `git backfill` will assume its `--sparse` option and download only the blobs within the sparse-checkout. Users may want to do this as the repo size will still be smaller than the full repo size, but commands like `git blame` or `git log -L` will not suffer from many one-by-one blob downloads. Future directions should consider adding a pathspec or file prefix to further focus which paths are being downloaded in a batch.
This change introduces the `git backfill` command which uses the path walk API to download missing blobs in a blobless partial clone. By downloading blobs that correspond to the same file path at the same time, we hope to maximize the potential benefits of delta compression against multiple versions. These downloads occur in a configurable batch size, presenting a mechanism to perform "resumable" clones: `git clone --filter=blob:none` gets the commits and trees, then `git backfill` will download all missing blobs. If `git backfill` is interrupted partway through, it can be restarted and will redownload only the missing objects. When combining blobless partial clones with sparse-checkout, `git backfill` will assume its `--sparse` option and download only the blobs within the sparse-checkout. Users may want to do this as the repo size will still be smaller than the full repo size, but commands like `git blame` or `git log -L` will not suffer from many one-by-one blob downloads. Future directions should consider adding a pathspec or file prefix to further focus which paths are being downloaded in a batch.
This change introduces the `git backfill` command which uses the path walk API to download missing blobs in a blobless partial clone. By downloading blobs that correspond to the same file path at the same time, we hope to maximize the potential benefits of delta compression against multiple versions. These downloads occur in a configurable batch size, presenting a mechanism to perform "resumable" clones: `git clone --filter=blob:none` gets the commits and trees, then `git backfill` will download all missing blobs. If `git backfill` is interrupted partway through, it can be restarted and will redownload only the missing objects. When combining blobless partial clones with sparse-checkout, `git backfill` will assume its `--sparse` option and download only the blobs within the sparse-checkout. Users may want to do this as the repo size will still be smaller than the full repo size, but commands like `git blame` or `git log -L` will not suffer from many one-by-one blob downloads. Future directions should consider adding a pathspec or file prefix to further focus which paths are being downloaded in a batch.
This change introduces the `git backfill` command which uses the path walk API to download missing blobs in a blobless partial clone. By downloading blobs that correspond to the same file path at the same time, we hope to maximize the potential benefits of delta compression against multiple versions. These downloads occur in a configurable batch size, presenting a mechanism to perform "resumable" clones: `git clone --filter=blob:none` gets the commits and trees, then `git backfill` will download all missing blobs. If `git backfill` is interrupted partway through, it can be restarted and will redownload only the missing objects. When combining blobless partial clones with sparse-checkout, `git backfill` will assume its `--sparse` option and download only the blobs within the sparse-checkout. Users may want to do this as the repo size will still be smaller than the full repo size, but commands like `git blame` or `git log -L` will not suffer from many one-by-one blob downloads. Future directions should consider adding a pathspec or file prefix to further focus which paths are being downloaded in a batch.
This change introduces the `git backfill` command which uses the path walk API to download missing blobs in a blobless partial clone. By downloading blobs that correspond to the same file path at the same time, we hope to maximize the potential benefits of delta compression against multiple versions. These downloads occur in a configurable batch size, presenting a mechanism to perform "resumable" clones: `git clone --filter=blob:none` gets the commits and trees, then `git backfill` will download all missing blobs. If `git backfill` is interrupted partway through, it can be restarted and will redownload only the missing objects. When combining blobless partial clones with sparse-checkout, `git backfill` will assume its `--sparse` option and download only the blobs within the sparse-checkout. Users may want to do this as the repo size will still be smaller than the full repo size, but commands like `git blame` or `git log -L` will not suffer from many one-by-one blob downloads. Future directions should consider adding a pathspec or file prefix to further focus which paths are being downloaded in a batch.
This change introduces the `git backfill` command which uses the path walk API to download missing blobs in a blobless partial clone. By downloading blobs that correspond to the same file path at the same time, we hope to maximize the potential benefits of delta compression against multiple versions. These downloads occur in a configurable batch size, presenting a mechanism to perform "resumable" clones: `git clone --filter=blob:none` gets the commits and trees, then `git backfill` will download all missing blobs. If `git backfill` is interrupted partway through, it can be restarted and will redownload only the missing objects. When combining blobless partial clones with sparse-checkout, `git backfill` will assume its `--sparse` option and download only the blobs within the sparse-checkout. Users may want to do this as the repo size will still be smaller than the full repo size, but commands like `git blame` or `git log -L` will not suffer from many one-by-one blob downloads. Future directions should consider adding a pathspec or file prefix to further focus which paths are being downloaded in a batch.
This change introduces the
git backfill
command which uses the path walk API to download missing blobs in a blobless partial clone.By downloading blobs that correspond to the same file path at the same time, we hope to maximize the potential benefits of delta compression against multiple versions.
These downloads occur in a configurable batch size, presenting a mechanism to perform "resumable" clones:
git clone --filter=blob:none
gets the commits and trees, thengit backfill
will download all missing blobs. Ifgit backfill
is interrupted partway through, it can be restarted and will redownload only the missing objects.When combining blobless partial clones with sparse-checkout,
git backfill
will assume its--sparse
option and download only the blobs within the sparse-checkout. Users may want to do this as the repo size will still be smaller than the full repo size, but commands likegit blame
orgit log -L
will not suffer from many one-by-one blob downloads.Future directions should consider adding a pathspec or file prefix to further focus which paths are being downloaded in a batch.