Skip to content

Commit

Permalink
Add some documentation for the model zoo.
Browse files Browse the repository at this point in the history
PiperOrigin-RevId: 688954850
  • Loading branch information
sdenton4 authored and copybara-github committed Oct 23, 2024
1 parent c89d05c commit 1eeb0b7
Show file tree
Hide file tree
Showing 2 changed files with 41 additions and 20 deletions.
24 changes: 4 additions & 20 deletions chirp/inference/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -85,26 +85,10 @@ notebook file into Google Drive and open it with Colab. Then use the

## The Embedding Model Interface

We provide a model wrapping interface `interface.EmbeddingModel` which can be
implemented by a wide range of models providing some combination of
classification logits, embeddings, and separated audio. Implementations are
provided in `models.py`, including:
* a `PlaceholderModel` which can be used for testing,
* `TaxonomyModelTF`: an exported Chirp classifier SavedModel,
* `SeparatorModelTF`: an exported Chirp separation model,
* `BirdNet`: applies the BirdNet saved model, which can be obtained from the
BirdNET-Analyzer git repository.
* `BirbSepModelTF1`: Applies the separation model described in [the Bird MixIT
paper](https://arxiv.org/abs/2110.03209)
* `SeparateEmbedModel`: Combines different separation and embedding/inference
models, by separating the target audio and then embedding each separate
channel. If the embedding model produces logits, the max logits are taken
over the separated channels.

The primary function in the `EmbeddingModel` interface is
`EmbeddingModel.embed(audio_array)` which runs model inference on the provided
audio array. The outputs are an `interface.InferenceOutputs` instance, which
contains optional embeddings, logits, and separated audio.
To allow simple substitution of different models, we provide an `EmbeddingModel`
interface, with a variety of implementations for common models (the model zoo).
This is described in further detail in `chirp/projects/zoo/README.md`.


# Inference Pipeline

Expand Down
37 changes: 37 additions & 0 deletions chirp/projects/zoo/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
# Model Zoo for Bioacoustics

This package handles audio embedding models. We provide a simple interface
(`zoo_interface.EmbeddingModel`) for wrapping models which transform audio clips
(of any length) into embeddings. The common interface allows clean comparison
of models for evaluation purposes, and also allows users to freely choose the
most appropriate model for their work.

The most convenient way to load a predefined model is like so:
```m = model_configs.load_model_by_name('perch_8')```
which loads the Perch v8 model automatically from Kaggle Models. The set of
currently implemented models can be inspected in
`model_configs.ModelConfigName`.

## The Embedding Model Interface

We provide a model wrapping interface `zoo_interface.EmbeddingModel` which can
be implemented by a wide range of models providing some combination of
classification logits, embeddings, and separated audio. Implementations are
mostly provided in `models.py`, including:

* a `PlaceholderModel` which can be used for testing,
* `TaxonomyModelTF`: an exported Chirp classifier SavedModel,
* `SeparatorModelTF`: an exported Chirp separation model,
* `BirdNet`: applies the BirdNet saved model, which can be obtained from the
BirdNET-Analyzer git repository.
* `BirbSepModelTF1`: Applies the separation model described in [the Bird MixIT
paper](https://arxiv.org/abs/2110.03209)
* `SeparateEmbedModel`: Combines different separation and embedding/inference
models, by separating the target audio and then embedding each separate
channel. If the embedding model produces logits, the max logits are taken
over the separated channels.

The primary function in the `EmbeddingModel` interface is
`EmbeddingModel.embed(audio_array)` which runs model inference on the provided
audio array. The outputs are an `zoo_interface.InferenceOutputs` instance, which
contains optional embeddings, logits, and separated audio.

0 comments on commit 1eeb0b7

Please sign in to comment.