API/DOC/backwards: rework config defaults (#125)

* API: the defaults change, and there's also some easier methods to pass arguments to every sampler. * DOC: the documentation around the defaults is much better. * TST: the config are tested too! * TST BUG: fixes CI testing! * TST: makes sure configs are proper experiments. * MAINT: only change docs on release (lesson learned) See #125 for more detail.
stsievert · Dec 21, 2021 · b4ec4b1 · b4ec4b1
1 parent aa9b703
commit b4ec4b1
Show file tree

Hide file tree

Showing 46 changed files with 1,370 additions and 651 deletions.
diff --git a/.github/workflows/docs.yml b/.github/workflows/docs.yml
@@ -1,9 +1,9 @@
 name: Documentation build
 
-on: push
-# on:
-  # release:
-    # types: [published]
+# on: push
+on:
+  release:
+    types: [published]
 
 # Only run when release published (not created or edited, etc)
 # https://docs.github.com/en/actions/reference/events-that-trigger-workflows#release

diff --git a/.github/workflows/test.yml b/.github/workflows/test.yml
@@ -58,4 +58,7 @@ jobs:
           until curl 127.0.0.1:8421 > /dev/null 2>&1; do :; done  # wait for container to start
           sudo docker ps
     - name: Run all tests
-      run: sudo /usr/share/miniconda/bin/pytest
+      run: |
+          # sudo docker-compose logs -f &  # if debugging; shows logs
+          # sudo /usr/share/miniconda/bin/pytest -s
+          sudo /usr/share/miniconda/bin/pytest
diff --git a/README.md b/README.md
@@ -39,3 +39,5 @@ queries, scores, meta = sampler.get_queries(num=10_000)
 
 This script allows the data scientist to score queries for an embedding they
 specify.
+
+[semver]:https://semver.org
diff --git a/docs/source/adaptive.rst b/docs/source/adaptive.rst
@@ -6,7 +6,7 @@ The API the must conform to below:
 
 .. autosummary::
 
-   salmon.backend.sampler.Runner
+   salmon.backend.sampler.Sampler
 
 This API balances the fundamentally serial nature of adaptive algorithms with
 the parallel context of web servers.
@@ -34,7 +34,7 @@ following:
    async def process_answer(answer):
        db.push(answer)
 
-The :class:`~salmon.backend.sampler.Runner` API balances the two and runs the code
+The :class:`~salmon.backend.sampler.Sampler` API balances the two and runs the code
 to `receive answers` and `process answers` in separate processes.
 `Processing the received answers` is an optimization that needs to be performed
 quickly because ``model.best_query`` depends on the optimization.

diff --git a/docs/source/api.rst b/docs/source/api.rst
@@ -5,6 +5,22 @@
 API
 ===
 
+Configuration
+-------------
+
+.. autosummary::
+   :toctree: generated/
+
+   salmon.triplets.manager.Config
+
+This configuration has two optional components:
+
+.. autosummary::
+   :toctree: generated/
+
+   salmon.triplets.manager.HTML
+   salmon.triplets.manager.Sampling
+
 Offline embeddings
 ------------------
 
@@ -28,14 +44,14 @@ following API:
    :toctree: generated/
    :template: class.rst
 
-   salmon.backend.sampler.Runner
+   salmon.backend.sampler.Sampler
 
 This class enables running a triplet embedding algorithm on Salmon: it provides
 convenient hooks to the database like ``get_queries`` and ``post_answers`` if
 you want to customize the running of the algorithm. By default, the algorithm
-uses ``Runner.run`` to run the algorithm.
+uses ``Sampler.run`` to run the algorithm.
 
-Every class below inherits from :class:`~salmon.backend.sampler.Runner`.
+Every class below inherits from :class:`~salmon.backend.sampler.Sampler`.
 
 
 Passive Algorithms

diff --git a/docs/source/conf.py b/docs/source/conf.py
@@ -33,7 +33,7 @@
     "github_user": "stsievert",  # Username
     "github_repo": "salmon",  # Repo name
     "github_version": "master",  # Version
-    "conf_py_path": "/source/",  # Path in the checkout to the docs root
+    "conf_py_path": "/docs/source/",  # Path in the checkout to the docs root
 }
 
 html_theme_options = {
@@ -52,6 +52,7 @@
     "sphinx.ext.autosummary",
     "sphinx.ext.autodoc",
     "numpydoc",
+    "sphinxcontrib.autodoc_pydantic",
 ]
 
 # Add any paths that contain templates here, relative to this directory.

diff --git a/docs/source/developers.rst b/docs/source/developers.rst
@@ -17,6 +17,11 @@ expects two functions:
     * ``get_queries``, which returns a list of queries and scores. These are
       saved in the database, and popped when a user requests a query.
 
+Use of ``get_queries`` is strongly recommended. Then Salmon's backend relies on
+Dask, which allows for higher throughput (more concurrent users). ``get_query``
+uses a single worker process, so it may get overloaded with a moderate number
+of concurrent users.
+
 For complete documentation, see :ref:`alg-api`. In short, your algorithm should
 be a class that implement ``get_query`` and ``process_answers``.
 
@@ -25,7 +30,7 @@ After you have developed these functions, look at other algorithms in
 to figure out inheritance details. In short, the following details are
 important:
 
-* **Inheriting from** :class:`~salmon.backend.alg.Runner`, which enables Salmon
+* **Inheriting from** :class:`~salmon.backend.alg.Sampler`, which enables Salmon
   to work with custom algorithms.
 * **Accepting an** ``ident: str`` keyword argument in ``__init__`` **and
   passing that argument to** ``super().__init__``. (``ident`` is passed to all
@@ -40,14 +45,11 @@ necessary but are highly encouraged:
 * **Ensure query searches are fast enough.** The user will be waiting if
   thousands of users come to Salmon and deplete all the searched queries.
 
-It's not a strong requirement, but I would encourage both ``process_answers``
-and ``get_queries`` to be quick and complete in about a second each.
-
 Debugging
 ---------
 
 Let's say you've integrated most of your algorithm into
-:class:`~salmon.backend.sampler.Runner`. Now, you'd like to make sure everything is
+:class:`~salmon.backend.sampler.Sampler`. Now, you'd like to make sure everything is
 working properly.
 
 This script will help:

diff --git a/docs/source/faq.rst b/docs/source/faq.rst
@@ -6,6 +6,16 @@ FAQ
 Also relevant is the :ref:`troubleshooting`, which goes over some (blocking)
 difficulties while launching.
 
+.. note::
+
+   Please include the version in any bug reports or feature requests.  The
+   version number should look something like ``v0.4.1``. It can be found at
+   ``http://[url]:8421/docs`` or in the downloaded experiment file (found at
+   ``http://[url]:8421/download`` which has a filename like
+   ``exp-2021-05-20T07:31-salmon-v0.4.1.rdb``).
+
+.. _random_vs_active:
+
 When should I use random/active sampling?
 -----------------------------------------
 
@@ -25,7 +35,6 @@ By default, Salmon will produce random embeddings. This is the simplest
 sampler, and doesn't require any user configuration. Tips on how to use active
 samplers are in :ref:`adaptiveconfig`.
 
-
 .. _faq-n_responses:
 
 How many responses will be needed?
@@ -83,6 +92,7 @@ configuration:
    sampling:
      probs: {"ARR": 85, "Random": 15}
 
+
 Can I choose a different machine?
 ---------------------------------