From 1eeb0b71c2eb50fe9c61e2dbc616592da4f2325b Mon Sep 17 00:00:00 2001 From: Tom Denton Date: Wed, 23 Oct 2024 07:25:00 -0700 Subject: [PATCH] Add some documentation for the model zoo. PiperOrigin-RevId: 688954850 --- chirp/inference/README.md | 24 ++++------------------- chirp/projects/zoo/README.md | 37 ++++++++++++++++++++++++++++++++++++ 2 files changed, 41 insertions(+), 20 deletions(-) create mode 100644 chirp/projects/zoo/README.md diff --git a/chirp/inference/README.md b/chirp/inference/README.md index 873376d3..167b31f5 100644 --- a/chirp/inference/README.md +++ b/chirp/inference/README.md @@ -85,26 +85,10 @@ notebook file into Google Drive and open it with Colab. Then use the ## The Embedding Model Interface -We provide a model wrapping interface `interface.EmbeddingModel` which can be -implemented by a wide range of models providing some combination of -classification logits, embeddings, and separated audio. Implementations are -provided in `models.py`, including: -* a `PlaceholderModel` which can be used for testing, -* `TaxonomyModelTF`: an exported Chirp classifier SavedModel, -* `SeparatorModelTF`: an exported Chirp separation model, -* `BirdNet`: applies the BirdNet saved model, which can be obtained from the - BirdNET-Analyzer git repository. -* `BirbSepModelTF1`: Applies the separation model described in [the Bird MixIT - paper](https://arxiv.org/abs/2110.03209) -* `SeparateEmbedModel`: Combines different separation and embedding/inference - models, by separating the target audio and then embedding each separate - channel. If the embedding model produces logits, the max logits are taken - over the separated channels. - -The primary function in the `EmbeddingModel` interface is -`EmbeddingModel.embed(audio_array)` which runs model inference on the provided -audio array. The outputs are an `interface.InferenceOutputs` instance, which -contains optional embeddings, logits, and separated audio. +To allow simple substitution of different models, we provide an `EmbeddingModel` +interface, with a variety of implementations for common models (the model zoo). +This is described in further detail in `chirp/projects/zoo/README.md`. + # Inference Pipeline diff --git a/chirp/projects/zoo/README.md b/chirp/projects/zoo/README.md new file mode 100644 index 00000000..91d1331a --- /dev/null +++ b/chirp/projects/zoo/README.md @@ -0,0 +1,37 @@ +# Model Zoo for Bioacoustics + +This package handles audio embedding models. We provide a simple interface +(`zoo_interface.EmbeddingModel`) for wrapping models which transform audio clips +(of any length) into embeddings. The common interface allows clean comparison +of models for evaluation purposes, and also allows users to freely choose the +most appropriate model for their work. + +The most convenient way to load a predefined model is like so: +```m = model_configs.load_model_by_name('perch_8')``` +which loads the Perch v8 model automatically from Kaggle Models. The set of +currently implemented models can be inspected in +`model_configs.ModelConfigName`. + +## The Embedding Model Interface + +We provide a model wrapping interface `zoo_interface.EmbeddingModel` which can +be implemented by a wide range of models providing some combination of +classification logits, embeddings, and separated audio. Implementations are +mostly provided in `models.py`, including: + +* a `PlaceholderModel` which can be used for testing, +* `TaxonomyModelTF`: an exported Chirp classifier SavedModel, +* `SeparatorModelTF`: an exported Chirp separation model, +* `BirdNet`: applies the BirdNet saved model, which can be obtained from the + BirdNET-Analyzer git repository. +* `BirbSepModelTF1`: Applies the separation model described in [the Bird MixIT + paper](https://arxiv.org/abs/2110.03209) +* `SeparateEmbedModel`: Combines different separation and embedding/inference + models, by separating the target audio and then embedding each separate + channel. If the embedding model produces logits, the max logits are taken + over the separated channels. + +The primary function in the `EmbeddingModel` interface is +`EmbeddingModel.embed(audio_array)` which runs model inference on the provided +audio array. The outputs are an `zoo_interface.InferenceOutputs` instance, which +contains optional embeddings, logits, and separated audio.