Add some documentation for the model zoo.

PiperOrigin-RevId: 688954850
google-research · Oct 23, 2024 · 1eeb0b7 · 1eeb0b7
1 parent c89d05c
commit 1eeb0b7
Show file tree

Hide file tree

Showing 2 changed files with 41 additions and 20 deletions.
diff --git a/chirp/inference/README.md b/chirp/inference/README.md
@@ -85,26 +85,10 @@ notebook file into Google Drive and open it with Colab. Then use the
 
 ## The Embedding Model Interface
 
-We provide a model wrapping interface `interface.EmbeddingModel` which can be
-implemented by a wide range of models providing some combination of
-classification logits, embeddings, and separated audio. Implementations are
-provided in `models.py`, including:
-* a `PlaceholderModel` which can be used for testing,
-* `TaxonomyModelTF`: an exported Chirp classifier SavedModel,
-* `SeparatorModelTF`: an exported Chirp separation model,
-* `BirdNet`: applies the BirdNet saved model, which can be obtained from the
-  BirdNET-Analyzer git repository.
-* `BirbSepModelTF1`: Applies the separation model described in [the Bird MixIT
-  paper](https://arxiv.org/abs/2110.03209)
-* `SeparateEmbedModel`: Combines different separation and embedding/inference
-  models, by separating the target audio and then embedding each separate
-  channel. If the embedding model produces logits, the max logits are taken
-  over the separated channels.
-
-The primary function in the `EmbeddingModel` interface is
-`EmbeddingModel.embed(audio_array)` which runs model inference on the provided
-audio array. The outputs are an `interface.InferenceOutputs` instance, which
-contains optional embeddings, logits, and separated audio.
+To allow simple substitution of different models, we provide an `EmbeddingModel`
+interface, with a variety of implementations for common models (the model zoo).
+This is described in further detail in `chirp/projects/zoo/README.md`.
+
 
 # Inference Pipeline
 

diff --git a/chirp/projects/zoo/README.md b/chirp/projects/zoo/README.md
@@ -0,0 +1,37 @@
+# Model Zoo for Bioacoustics
+
+This package handles audio embedding models. We provide a simple interface
+(`zoo_interface.EmbeddingModel`) for wrapping models which transform audio clips
+(of any length) into embeddings. The common interface allows clean comparison
+of models for evaluation purposes, and also allows users to freely choose the
+most appropriate model for their work.
+
+The most convenient way to load a predefined model is like so:
+```m = model_configs.load_model_by_name('perch_8')```
+which loads the Perch v8 model automatically from Kaggle Models. The set of
+currently implemented models can be inspected in
+`model_configs.ModelConfigName`.
+
+## The Embedding Model Interface
+
+We provide a model wrapping interface `zoo_interface.EmbeddingModel` which can
+be implemented by a wide range of models providing some combination of
+classification logits, embeddings, and separated audio. Implementations are
+mostly provided in `models.py`, including:
+
+* a `PlaceholderModel` which can be used for testing,
+* `TaxonomyModelTF`: an exported Chirp classifier SavedModel,
+* `SeparatorModelTF`: an exported Chirp separation model,
+* `BirdNet`: applies the BirdNet saved model, which can be obtained from the
+  BirdNET-Analyzer git repository.
+* `BirbSepModelTF1`: Applies the separation model described in [the Bird MixIT
+  paper](https://arxiv.org/abs/2110.03209)
+* `SeparateEmbedModel`: Combines different separation and embedding/inference
+  models, by separating the target audio and then embedding each separate
+  channel. If the embedding model produces logits, the max logits are taken
+  over the separated channels.
+
+The primary function in the `EmbeddingModel` interface is
+`EmbeddingModel.embed(audio_array)` which runs model inference on the provided
+audio array. The outputs are an `zoo_interface.InferenceOutputs` instance, which
+contains optional embeddings, logits, and separated audio.