Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Add experimental 'git survey' builtin (#5174)
This introduces `git survey` to Git for Windows ahead of upstream for the express purpose of getting the path-based analysis in the hands of more folks. The inspiration of this builtin is [`git-sizer`](https://github.com/github/git-sizer), but since that command relies on `git cat-file --batch` to get the contents of objects, it has limits to how much information it can provide. This is mostly a rewrite of the `git survey` builtin that was introduced into the `microsoft/git` fork in microsoft#667. That version had a lot more bells and whistles, including an analysis much closer to what `git-sizer` provides. The biggest difference in this version is that this one is focused on using the path-walk API in order to visit batches of objects based on a common path. This allows identifying, for instance, the path that is contributing the most to the on-disk size across all versions at that path. For example, here are the top ten paths contributing to my local Git repository (which includes `microsoft/git` and `gitster/git`): ``` TOP FILES BY DISK SIZE ============================================================================ Path | Count | Disk Size | Inflated Size -----------------------------------------+-------+-----------+-------------- whats-cooking.txt | 1373 | 11637459 | 37226854 t/helper/test-gvfs-protocol | 2 | 6847105 | 17233072 git-rebase--helper | 1 | 6027849 | 15269664 compat/mingw.c | 6111 | 5194453 | 463466970 t/helper/test-parse-options | 1 | 3420385 | 8807968 t/helper/test-pkt-line | 1 | 3408661 | 8778960 t/helper/test-dump-untracked-cache | 1 | 3408645 | 8780816 t/helper/test-dump-fsmonitor | 1 | 3406639 | 8776656 po/vi.po | 104 | 1376337 | 51441603 po/de.po | 210 | 1360112 | 71198603 ``` This kind of analysis has been helpful in identifying the reasons for growth in a few internal monorepos. Those findings motivated the changes in #5157 and #5171. With this early version in Git for Windows, we can expand the reach of the experimental tool in advance of it being contributed to the upstream project. Unfortunately, this will mean that in the next `microsoft/git` rebase, @jeffhostetler's version will need to be pulled out since there are enough conflicts. These conflicts include how tables are stored and generated, as the version in this PR is slightly more general to allow for different kinds of data.
- Loading branch information