Releases: tgen/jetstream
v1.7.4
Jetstream v1.7.4 Release Notes
Major changes
- Improved pipeline version parsing to use PEP440 style versioning -
development
andlatest
have been added as aliases to the latest development and stable release respectively.- Pipelines and their version now have a defined comparison format, e.g. defining
__lt__
and__eq__
functions, this allows for a sorted pipeline list.
- Pipelines and their version now have a defined comparison format, e.g. defining
- Improved handling support of JS_PIPELINE_PATH both within template via the
expand_vars
function and within the slurm_singularity backend.
Bug fixes
- The slurm_singularity backend has improved search functionality for finding cached images, previously only found cached images if the digest was explicitly defined for the task.
- Avoid erroneously attempting to bind $JS_PIPELINE_PATH if it has not been set, e.g. if the user is simply running
jetstream run
without any pipeline context.
Minor changes
- Linting related adjustments to the slurm_singularity.py backend
- Limiting the networkx version range to exclude the 3.0 release for now
Full Changelog: v1.7.3...v1.7.4
v1.7.3
Jetstream v1.7.3 Release Notes
Major changes
- For slurm backends, the
sacct
pinginess has been reduced, and we request less information instead of--all
, this reduces load on the slurmdbd - The slurm_singularity backend can now submit jobs without a container definition
- Added an
md5
andassignbin
filter for using in templates- Resolves #101
Bug fixes
- Not all asyncio.Event(loop)'s were fixed in previous commits, this should fix other cases impeding us from using python 3.10 #144
Minor changes
- Adjusted handling of gpu jobs for the
slurm_singularity
backend, we now setSINGULARITYENV_CUDA_VISIBLE_DEVICES
Ease of use updates
- A bash completion script is available under
extras/completions/jetstream.bash
, this is still in development, but can be used as a template for other users. This can be installed under~/.bash_completion
or to your preferred user completion script dir, e.g.~/.local/share/bash-completion/completions/jetstream.bash
v1.7.2
Jetstream v1.7.2 Release Notes
Major changes
- This adds the LibYAML implementation of yaml parsing if available, otherwise it defaults to the PyYAML implementation - more details available in issue #143
- Handling an issue from a downstream pipeline - #10
- By using the identity of the task for the slurm_singularity backend generated files, we avoid the potential for a sample.name or any user supplied variable generating a task name that is longer than 255 characters.
- Better containerization of the slurm_singularity backend, using
--contain
ensures that we don't bind /home or any other directories defined in the singularity.conf unless we explicitly bind them- We also only use
--nv
if CUDA_VISIBLE_DEVICES is defined, some users have been misled into thinking that the warning thrown when on a non-gpu box is a job breaking error.
- We also only use
Minor changes
- Updated mash report text - #116
v1.7.1
Jetstream v1.7.1 Release Notes
Bug Fixes
- We ran into a case where we were pinging the container registry way too frequently and getting IO timeouts. Now scripts generated by the slurm_singularity backend will have more extensive bash logic in order to use the cached singularity image if available, it should always be available unless the cache location is cleaned up post starting jetstream. This drastically reduces the "pinginess".
Minor changes
- Reduced complexity of slurm_singularity makedirs creation of output directories.
Dev notes
- Updated maintainer info in
__init__.py
v1.7
Jetstream v1.7 Release Notes
Major changes
-
A large number of backends have been added supporting docker, singularity, and dnanexus.
-
Slurm backend(s) settings have been moved to the overall settings config. Allows for user level slurm backend adjustments. For example,
"NODE_FAIL" is now considered as an active state since the job should be requeued and potentially completed by slurm.
Bug Fixes
-
A deprecation level bug was introduced in python 3.10 relating to certain asyncio functions. We currently support python 3.7+.
-
Version checking has been updated to use packaging.version instead of distutils.version to be inline with PEP 440.
Dev notes
-
Updated unit test for container based backends
-
Version checking has been updated to use packaging.version instead of distutils.version to be inline with PEP 440.
v1.6.2
v1.6.1
Jetstream v1.6.1 Release Notes
Major changes
- A new option
--mash-only/-m
allows users to mash two workflows prior to running a
pipeline or workflow.
Dev Notes
- Added unit test for mash only feature
v1.6
Jetstream v1.6 Release Notes
Major changes
-
A new option
--pipeline
will allow for pointing directly to a pipeline directory
instead of looking up by name. -
New task directive
reset
is understood by the workflow class. Reset directives can
be either a string or a sequence of strings. When the task is reset, it will also
trigger a reset on any listed task name. Special valuespredecessors
will trigger
a reset for any direct predecessors of the task. -
Pipeline and project paths are now exported as environment variables by the runner.
Here are the environment variables:
- JS_PIPELINE_PATH
- JS_PIPELINE_NAME
- JS_PIPELINE_VERSION
- JS_PROJECT_PATH -
Three new template global functions were added:
env
,getenv
,setenv
for
interacting with environment variables during template rendering. Details in
docs/templates.md -
Config file inputs via
-c/--config
and-C/--config-file
have been improved. There
are now options for loading plain text as a list of lines (txt
file type), and also
for loading tabular data without headers (csv-nh
,tsv-nh
). Tabular config data
can now be used with-C/--config-file
and will be accessible with the__config_file__
variable inside templates (json/yaml data will still be loaded at the top level).
Bug Fixes
-
Fixed bug with settings not creating correct pipelines variables in new config files.
-
Some arguments that worked with
jetstream run
but notjetstream pipelines
are now
working for both. Arguments like-r/--render-only
and-b/--build-only
will work
with pipelines command now. -
SlurmBackend slurm command checks are now silent instead of printing to terminal
-
Resolved single-column tsv parsing issue
Dev notes
-
Added unittests for entire pipelines and included a set of example pipelines
-
Version info is hardcoded in two places,
setup.py
andjetstream/__init__.py
.
There are guides added to the dev docs for how to handle features and releases.
v1.5 Release Notes
Major notes
Pipelines command
Jetstream now includes the jetstream pipelines
command. Pipelines are another
layer added to managing workflow templates. Since templates support
import/include/extend statements with Jinja, they can actually be modularized
across several files. The pipeline system helps organize complicated templates
with a few helpful features.
Pipelines are Jetstream templates that have been documented with version
information and added to a jetstream pipelines directory. This command allows
pipelines to be referenced by name when starting runs and automatically
includes any pipeline scripts and variables during process.
To create a pipeline:
Add the template file(s) to a directory that is in your pipelines searchpath.
The default searchpath is your user home directoy, but it can be changed in the
application settings (see jetstream settings -h)
Create a pipeline.yaml file in the directory and be sure to include the
required fields.
Pipelines allow templates to be referenced by a name and optional version
Have a template that you use all the time? Name it, document it, and then you
can use jetstream pipelines
to start runs with the name. It's also
version-aware, so you can reference a specific version of the pipeline, or just
let jetstream find the latest version that you have installed.
Variables can be included in the pipeline.yaml
Pipelines can include constant data used for rendering the templates. For
example, I use the pipelines.yaml
to contain the file paths to reference data
for Phoenix. This removes the need to repeat these paths throughout the
template source code, and also brings those variables under our version control
system (they used to be stored in files outside of the pipeline code).
Additional executables/scripts can be included with the pipelines
If a bin
property is added to the pipeline manifest, that directory will be
prepended to the user $PATH environment variable when the pipeline is started.
It's a handy way to bundle additional scripts with a pipeline and have them
all fall under the same project for version control purposes.
Tasks command updates
The tasks command was reworked internally, and there were some changes to the
cli options. The general philosophy for the command now is that a set of
filters is used to select the tasks of interest by name or status. The task
names are now given as positional arguments, for example:
jetstream tasks bwa_mem_sample_A haplotypecaller_sample_A ...
These arguments allow for glob wildcard matching: *
jetstream tasks bwa_mem_sample_*
Regex is also still supported:
jetstream tasks --regex 'bwa_mem_sample_[^A]'
Finally, tasks matching the patterns can be filtered by status:
jetstream tasks -s complete bwa_mem_sample_*
By default, this command just lists any tasks matching the name/status options.
The action options can be used to perform additional actions on those tasks.
For example, --verbose prints out a ton of information about each task matching
the query.
Template variables from command-line arguments
These options have changed (again...sorry), but this time I think they work
really well. The reason for this change is to get rid of the awkward pattern
of having to add the extra --
argument before listing any config args. Also,
this new format is much easier to parse, and results in more informative error
messages when there are problems. Here are some use case examples:
Note: config variables can be added when creating projects (they will be stored
in the project.yaml) or when the pipeline/template is run (but they will not
be saved in the project.yaml). I typically prefer to add variables when
creating projects, because it means you can always go back later and see what
was used to render the template (in addition to seeing the final values used
in the commands themselves)
Adding variables when creating a project:
Variables can be added one argument at a time
jetstream init myproject -c reference_path /path/to/reference/file
or multiple:
jetstream init myproject -c reference_path /path/to/reference/file -c email ryan@tgen.org
variables can have a type declared (string is default if no type is declared).
the type should be included with the key parameter, colon separated.
jetstream init myproject -c int:threads 8 -c str:email ryan@tgen.org
lots of variables can loaded from files (note upper-case C)
jetstream init myproject -C ~/myconfig.json
loading variables from multiple files is also supported, but you'll need to
provide a name for them to be added under:
jetstream init myproject -c file:samples ~/mysamples.json -c file:patients ~/mypatients.json
Notice we used the lowercase -c/--config
. Using uppercase -C/--config-file
overwrites the variable context entirely, and essentially adds the contents of
the file as "global" template variables. The lowercase -c file:...
syntax
will include the variables loaded from the file under the namespace assigned
by the variable key.
JSON strings can also be loaded without saving to a file first:
jetstream init myproject -c json:names '["ryan", "bob", "fred"]'
Projects
Projects have seen a couple small improvements. The project.yaml
and
config.yaml
have been collapsed into a single file: project.yaml
with the
previous contents of that file now being listed under the field __project__
.
This minor change has big impacts for loading variable data, any information
about the project can now be introspected in templates with the __project__
variable.
Re-initializing projects with jetstream init
command will update the
project.yaml
and will also add a record to the project history where you can
track changes to that file over time.
jetstream project
has been improved and will tell you whats going on with a
project.
Minor changes
-
Project
jetstream/pid.lock
file is now tracking pending runs on projects.
Thejetstream run
command will wait to acquire this file before starting. -
Template variables can no longer be stored in the user application settings
file. See more details in jetstream.templates -
Lots of unused code was removed, this really helped reduce the dependencies
-
Task identity is only computed with
cmd
andexec
directives. This means
changes to cpus will not automatically cause a task to be re-run. In the
future this may be adjusted to include runtime options like container ids or
conda envs. Related to next note: -
Workflow mash will always replace a task if the old version has failed. For
example: If 99/100 tasks passed, one failed due to memory requirements. When
you update themem
directive, only the failed task would be replaced, the
other 99 tasks do not need to be run since thecmd
hasn't changed. -
input/after/before directives cannot be mappings with an
re:
property any
more. Instead, use the new pattern directivesafter-re:
before-re:
and
input-re
. This fixes a number of issues when creating graphs.
1.3.0-beta release for demo
1.3.0-beta Release Notes
Major notes
New application settings system
Application settings are now handled via configuration files. There is a
detailed process for loading files and more information can be found with
the jetstream settings
command. Previously some settings were taken
from environment variables, and those must now be set in your user config
file. The path where your user config file should be saved can be found
with the jetstream settings
command. Here is an example config file:
# My user settings
# Find the correct location to save this file by running "jetstream settings"
backend: slurm
pipelines:
home: /path/to/your/pipelines/dir/
constants:
foo: bar
Refinements to template rendering
The process for loading data used to render templates has undergone some
minor tweaks:
-
Data should be added to a project with the
jetstream init
, adding data
after the project is initialized can be accomplished by editing the config.yaml
in the jetstream directory, or rerunning the init command. Instead of adding
files to the config folder of a project after creating it, just pass them
as command args. -
Command args for template data must follow an empty
--
argument. This
clearly distinguishes arguments for the application from template/project data
arguments. For example:# Old style jetstream build template.jst --variables samples.json
Must now be:
# New style jetstream build template.jst -- --variables samples.json
-
Template data will now be loaded from the following sources in order of
descending priority (any source will override all sources below it in this
list):- Command Arguments (eg.
jetstream run ... -- --str:foo bar --file:csv:samples mysamples.csv
) - Project config file (if working in a project)
- Pipeline manifest
constants: ...
section (if running a pipeline) - User application settings config file
constants: ...
section
- Command Arguments (eg.
-
Template rendering data will NO LONGER be automatically saved into the project
config file. This allows project config data to override pipeline data reliably.
To debug these features, enable debug logging-l debug
and look for a line like this:
templates:162 DEBUG 2019-02-05 16:44:10:Template render context:
.
Projects have slimmed down
Projects will only include the jetstream
directory after initialization. All
other directories are left to up to the workflow author. Projects are now
recognized by the presence of a the info file: jetstream/project.yaml
. Existing
projects, should still work, but you will need to re-run jetstream init
before
they are recognized. You can run the jetstream project
command inside a project
to get info about that project (or make sure it has been intialized correctly)
Tasks commands can be used on any workflow file
jetstream project tasks
has moved to jetstream tasks
and will accept a -w/--workflow
argument. This makes it function with any built workflow file. If you are working
inside of a project (or --project
argument is given), the project workflow file
will be used.
New task directive: retry
Retry is a task directive that will prevent a task from failing. The runner will resubmit
the task for the given number of attempts before failing. This state is loaded each
time the task is loaded, so it will NOT be preserved across multiple runs of the same
workflow. Here is a made-up workflow that demonstrates the directive:
- name: fails_once
retry: 1
cmd: |
if [ ! -f foo.txt ]; then
echo "File not found!"
touch foo.txt
exit 1
else
echo "File was found!"
fi
Other changes
-
YAML error messages should be slightly more informative
-
The package includes a
__main__.py
file. For developers, this means it can be used withpython -m jetstream
-
Tasks have a dynamic label that can be set by the runner backends. The slurm backend
uses this feature to add the job id to logging messages regarding that task. -
SlurmBackend will collect some account data for all tasks. This can be configured with
settings:backends:slurm:sacct_fields -
Tasks will include an
elapsed_time
state attribute for all runner backends. -
New subcommands added
render
settings
tasks