Skip to content

Commit

Permalink
API/DOC/backwards: rework config defaults (#125)
Browse files Browse the repository at this point in the history
* API: the defaults change, and there's also some easier methods to pass arguments to every sampler.
* DOC: the documentation around the defaults is much better.
* TST: the config are tested too!
* TST BUG: fixes CI testing!
* TST: makes sure configs are proper experiments.
* MAINT: only change docs on release (lesson learned)

See #125 for more detail.
  • Loading branch information
stsievert authored Dec 21, 2021
1 parent aa9b703 commit b4ec4b1
Show file tree
Hide file tree
Showing 46 changed files with 1,370 additions and 651 deletions.
8 changes: 4 additions & 4 deletions .github/workflows/docs.yml
Original file line number Diff line number Diff line change
@@ -1,9 +1,9 @@
name: Documentation build

on: push
# on:
# release:
# types: [published]
# on: push
on:
release:
types: [published]

# Only run when release published (not created or edited, etc)
# https://docs.github.com/en/actions/reference/events-that-trigger-workflows#release
Expand Down
5 changes: 4 additions & 1 deletion .github/workflows/test.yml
Original file line number Diff line number Diff line change
Expand Up @@ -58,4 +58,7 @@ jobs:
until curl 127.0.0.1:8421 > /dev/null 2>&1; do :; done # wait for container to start
sudo docker ps
- name: Run all tests
run: sudo /usr/share/miniconda/bin/pytest
run: |
# sudo docker-compose logs -f & # if debugging; shows logs
# sudo /usr/share/miniconda/bin/pytest -s
sudo /usr/share/miniconda/bin/pytest
2 changes: 2 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -39,3 +39,5 @@ queries, scores, meta = sampler.get_queries(num=10_000)

This script allows the data scientist to score queries for an embedding they
specify.

[semver]:https://semver.org
4 changes: 2 additions & 2 deletions docs/source/adaptive.rst
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ The API the must conform to below:

.. autosummary::

salmon.backend.sampler.Runner
salmon.backend.sampler.Sampler

This API balances the fundamentally serial nature of adaptive algorithms with
the parallel context of web servers.
Expand Down Expand Up @@ -34,7 +34,7 @@ following:
async def process_answer(answer):
db.push(answer)
The :class:`~salmon.backend.sampler.Runner` API balances the two and runs the code
The :class:`~salmon.backend.sampler.Sampler` API balances the two and runs the code
to `receive answers` and `process answers` in separate processes.
`Processing the received answers` is an optimization that needs to be performed
quickly because ``model.best_query`` depends on the optimization.
Expand Down
22 changes: 19 additions & 3 deletions docs/source/api.rst
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,22 @@
API
===

Configuration
-------------

.. autosummary::
:toctree: generated/

salmon.triplets.manager.Config

This configuration has two optional components:

.. autosummary::
:toctree: generated/

salmon.triplets.manager.HTML
salmon.triplets.manager.Sampling

Offline embeddings
------------------

Expand All @@ -28,14 +44,14 @@ following API:
:toctree: generated/
:template: class.rst

salmon.backend.sampler.Runner
salmon.backend.sampler.Sampler

This class enables running a triplet embedding algorithm on Salmon: it provides
convenient hooks to the database like ``get_queries`` and ``post_answers`` if
you want to customize the running of the algorithm. By default, the algorithm
uses ``Runner.run`` to run the algorithm.
uses ``Sampler.run`` to run the algorithm.

Every class below inherits from :class:`~salmon.backend.sampler.Runner`.
Every class below inherits from :class:`~salmon.backend.sampler.Sampler`.


Passive Algorithms
Expand Down
3 changes: 2 additions & 1 deletion docs/source/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,7 @@
"github_user": "stsievert", # Username
"github_repo": "salmon", # Repo name
"github_version": "master", # Version
"conf_py_path": "/source/", # Path in the checkout to the docs root
"conf_py_path": "/docs/source/", # Path in the checkout to the docs root
}

html_theme_options = {
Expand All @@ -52,6 +52,7 @@
"sphinx.ext.autosummary",
"sphinx.ext.autodoc",
"numpydoc",
"sphinxcontrib.autodoc_pydantic",
]

# Add any paths that contain templates here, relative to this directory.
Expand Down
12 changes: 7 additions & 5 deletions docs/source/developers.rst
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,11 @@ expects two functions:
* ``get_queries``, which returns a list of queries and scores. These are
saved in the database, and popped when a user requests a query.

Use of ``get_queries`` is strongly recommended. Then Salmon's backend relies on
Dask, which allows for higher throughput (more concurrent users). ``get_query``
uses a single worker process, so it may get overloaded with a moderate number
of concurrent users.

For complete documentation, see :ref:`alg-api`. In short, your algorithm should
be a class that implement ``get_query`` and ``process_answers``.

Expand All @@ -25,7 +30,7 @@ After you have developed these functions, look at other algorithms in
to figure out inheritance details. In short, the following details are
important:

* **Inheriting from** :class:`~salmon.backend.alg.Runner`, which enables Salmon
* **Inheriting from** :class:`~salmon.backend.alg.Sampler`, which enables Salmon
to work with custom algorithms.
* **Accepting an** ``ident: str`` keyword argument in ``__init__`` **and
passing that argument to** ``super().__init__``. (``ident`` is passed to all
Expand All @@ -40,14 +45,11 @@ necessary but are highly encouraged:
* **Ensure query searches are fast enough.** The user will be waiting if
thousands of users come to Salmon and deplete all the searched queries.

It's not a strong requirement, but I would encourage both ``process_answers``
and ``get_queries`` to be quick and complete in about a second each.

Debugging
---------

Let's say you've integrated most of your algorithm into
:class:`~salmon.backend.sampler.Runner`. Now, you'd like to make sure everything is
:class:`~salmon.backend.sampler.Sampler`. Now, you'd like to make sure everything is
working properly.

This script will help:
Expand Down
12 changes: 11 additions & 1 deletion docs/source/faq.rst
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,16 @@ FAQ
Also relevant is the :ref:`troubleshooting`, which goes over some (blocking)
difficulties while launching.

.. note::

Please include the version in any bug reports or feature requests. The
version number should look something like ``v0.4.1``. It can be found at
``http://[url]:8421/docs`` or in the downloaded experiment file (found at
``http://[url]:8421/download`` which has a filename like
``exp-2021-05-20T07:31-salmon-v0.4.1.rdb``).

.. _random_vs_active:

When should I use random/active sampling?
-----------------------------------------

Expand All @@ -25,7 +35,6 @@ By default, Salmon will produce random embeddings. This is the simplest
sampler, and doesn't require any user configuration. Tips on how to use active
samplers are in :ref:`adaptiveconfig`.


.. _faq-n_responses:

How many responses will be needed?
Expand Down Expand Up @@ -83,6 +92,7 @@ configuration:
sampling:
probs: {"ARR": 85, "Random": 15}
Can I choose a different machine?
---------------------------------

Expand Down
Loading

0 comments on commit b4ec4b1

Please sign in to comment.