Skip to content

Commit

Permalink
Add the experimental git survey command to analyze (large) local repo…
Browse files Browse the repository at this point in the history
…sitories (#667)

This command is inspired by [`git
sizer`](https://github.com/github/git-sizer), having the advantage of
being much closer to the internals of Git.

The intention is to provide a built-in command that can be used to
analyze large repositories for performance and scaling problems, for
growth over time, and to correlate with other measurements (in
particular with Trace2 data collected e.g. via
https://github.com/git-ecosystem/trace2receiver/).
  • Loading branch information
dscho authored Jul 2, 2024
2 parents 648c5a2 + 45c981e commit 220ee02
Show file tree
Hide file tree
Showing 11 changed files with 3,215 additions and 0 deletions.
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -165,6 +165,7 @@
/git-submodule
/git-submodule--helper
/git-subtree
/git-survey
/git-svn
/git-switch
/git-symbolic-ref
Expand Down
2 changes: 2 additions & 0 deletions Documentation/config.txt
Original file line number Diff line number Diff line change
Expand Up @@ -531,6 +531,8 @@ include::config/status.txt[]

include::config/submodule.txt[]

include::config/survey.txt[]

include::config/tag.txt[]

include::config/tar.txt[]
Expand Down
41 changes: 41 additions & 0 deletions Documentation/config/survey.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
survey.namerev::
Boolean to show/hide `git name-rev` information for
each reported commit and the containing commit of each
reported tree and blob.

survey.progress::
Boolean to show/hide progress information. Defaults to
true when interactive (stderr is bound to a TTY).

survey.showBlobSizes::
A non-negative integer value. Requests details on the <n>
largest file blobs by size in bytes. Provides a default
value for `--blob-sizes=<n>` in linkgit:git-survey[1].

survey.showCommitParents::
A non-negative integer value. Requests details on the <n>
commits with the most number of parents. Provides a default
value for `--commit-parents=<n>` in linkgit:git-survey[1].

survey.showCommitSizes::
A non-negative integer value. Requests details on the <n>
largest commits by size in bytes. Generally, these are the
commits with the largest commit messages. Provides a default
value for `--commit-sizes=<n>` in linkgit:git-survey[1].

survey.showTreeEntries::
A non-negative integer value. Requests details on the <n>
trees (directories) with the most number of entries (files
and subdirectories). Provides a default value for
`--tree-entries=<n>` in linkgit:git-survey[1].

survey.showTreeSizes::
A non-negative integer value. Requests details on the <n>
largest trees (directories) by size in bytes. This will
set will usually be equal to the `survey.showTreeEntries`
set, but may be skewed by very long file or subdirectory
entry names. Provides a default value for
`--tree-sizes=<n>` in linkgit:git-survey[1].

survey.verbose::
Boolean to show/hide verbose output. Default to false.
108 changes: 108 additions & 0 deletions Documentation/git-survey.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,108 @@
git-survey(1)
=============

NAME
----
git-survey - EXPERIMENTAL: Measure various repository dimensions of scale

SYNOPSIS
--------
[verse]
(EXPERIMENTAL!) `git survey` <options>

DESCRIPTION
-----------

Survey the repository and measure various dimensions of scale.

As repositories grow to "monorepo" size, certain data shapes can cause
performance problems. `git-survey` attempts to measure and report on
known problem areas.

Ref Selection and Reachable Objects
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

In this first analysis phase, `git survey` will iterate over the set of
requested branches, tags, and other refs and treewalk over all of the
reachable commits, trees, and blobs and generate various statistics.

OPTIONS
-------

--progress::
Show progress. This is automatically enabled when interactive.

--json::
Print results in JSON rather than in a human-friendly format.

--[no-]name-rev::
Print `git name-rev` output for each commit, tree, and blob.
Defaults to true.

Ref Selection
~~~~~~~~~~~~~

The following options control the set of refs that `git survey` will examine.
By default, `git survey` will look at tags, local branches, and remote refs.
If any of the following options are given, the default set is cleared and
only refs for the given options are added.

--all-refs::
Use all refs. This includes local branches, tags, remote refs,
notes, and stashes. This option overrides all of the following.

--branches::
Add local branches (`refs/heads/`) to the set.

--tags::
Add tags (`refs/tags/`) to the set.

--remotes::
Add remote branches (`refs/remote/`) to the set.

--detached::
Add HEAD to the set.

--other::
Add notes (`refs/notes/`) and stashes (`refs/stash/`) to the set.

Large Item Selection
~~~~~~~~~~~~~~~~~~~~

The following options control the optional display of large items under
various dimensions of scale. The OID of the largest `n` objects will be
displayed in reverse sorted order. For each, `n` defaults to 10.

--commit-parents::
Shows the OIDs of the commits with the most parent commits.

--commit-sizes::
Shows the OIDs of the largest commits by size in bytes. This is
usually the ones with the largest commit messages.

--tree-entries::
Shows the OIDs of the trees with the most number of entries. These
are the directories with the most number of files or subdirectories.

--tree-sizes::
Shows the OIDs of the largest trees by size in bytes. This set
will usually be the same as the vector of number of entries unless
skewed by very long entry names.

--blob-sizes::
Shows the OIDs of the largest blobs by size in bytes.

OUTPUT
------

By default, `git survey` will print information about the repository in a
human-readable format that includes overviews and tables.

CONFIGURATION
-------------

include::config/survey.txt[]

GIT
---
Part of the linkgit:git[1] suite
1 change: 1 addition & 0 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -1332,6 +1332,7 @@ BUILTIN_OBJS += builtin/sparse-checkout.o
BUILTIN_OBJS += builtin/stash.o
BUILTIN_OBJS += builtin/stripspace.o
BUILTIN_OBJS += builtin/submodule--helper.o
BUILTIN_OBJS += builtin/survey.o
BUILTIN_OBJS += builtin/symbolic-ref.o
BUILTIN_OBJS += builtin/tag.o
BUILTIN_OBJS += builtin/unpack-file.o
Expand Down
1 change: 1 addition & 0 deletions builtin.h
Original file line number Diff line number Diff line change
Expand Up @@ -229,6 +229,7 @@ int cmd_status(int argc, const char **argv, const char *prefix);
int cmd_stash(int argc, const char **argv, const char *prefix);
int cmd_stripspace(int argc, const char **argv, const char *prefix);
int cmd_submodule__helper(int argc, const char **argv, const char *prefix);
int cmd_survey(int argc, const char **argv, const char *prefix);
int cmd_switch(int argc, const char **argv, const char *prefix);
int cmd_symbolic_ref(int argc, const char **argv, const char *prefix);
int cmd_tag(int argc, const char **argv, const char *prefix);
Expand Down
Loading

0 comments on commit 220ee02

Please sign in to comment.