29 Nov 18:45

bryce-turner

f9594f7

v1.7.4 Latest

Latest

Jetstream v1.7.4 Release Notes

Major changes

Improved pipeline version parsing to use PEP440 style versioning - development and latest have been added as aliases to the latest development and stable release respectively.
- Pipelines and their version now have a defined comparison format, e.g. defining __lt__ and __eq__ functions, this allows for a sorted pipeline list.
Improved handling support of JS_PIPELINE_PATH both within template via the expand_vars function and within the slurm_singularity backend.

Bug fixes

The slurm_singularity backend has improved search functionality for finding cached images, previously only found cached images if the digest was explicitly defined for the task.
Avoid erroneously attempting to bind $JS_PIPELINE_PATH if it has not been set, e.g. if the user is simply running jetstream run without any pipeline context.

Minor changes

Linting related adjustments to the slurm_singularity.py backend
Limiting the networkx version range to exclude the 3.0 release for now

Full Changelog: v1.7.3...v1.7.4

Assets 2

19 Sep 01:36

bryce-turner

v1.7.3

0823bae

v1.7.3

Jetstream v1.7.3 Release Notes

Major changes

For slurm backends, the sacct pinginess has been reduced, and we request less information instead of --all, this reduces load on the slurmdbd
The slurm_singularity backend can now submit jobs without a container definition
Added an md5 and assignbin filter for using in templates
- Resolves #101

Bug fixes

Not all asyncio.Event(loop)'s were fixed in previous commits, this should fix other cases impeding us from using python 3.10 #144

Minor changes

Adjusted handling of gpu jobs for the slurm_singularity backend, we now set SINGULARITYENV_CUDA_VISIBLE_DEVICES

Ease of use updates

A bash completion script is available under extras/completions/jetstream.bash, this is still in development, but can be used as a template for other users. This can be installed under ~/.bash_completion or to your preferred user completion script dir, e.g. ~/.local/share/bash-completion/completions/jetstream.bash

Assets 2

12 Jan 17:33

bryce-turner

v1.7.2

f699486

v1.7.2

Jetstream v1.7.2 Release Notes

Major changes

This adds the LibYAML implementation of yaml parsing if available, otherwise it defaults to the PyYAML implementation - more details available in issue #143
Handling an issue from a downstream pipeline - #10
- By using the identity of the task for the slurm_singularity backend generated files, we avoid the potential for a sample.name or any user supplied variable generating a task name that is longer than 255 characters.
Better containerization of the slurm_singularity backend, using --contain ensures that we don't bind /home or any other directories defined in the singularity.conf unless we explicitly bind them
- We also only use --nv if CUDA_VISIBLE_DEVICES is defined, some users have been misled into thinking that the warning thrown when on a non-gpu box is a job breaking error.

Minor changes

Updated mash report text - #116

Assets 2

11 Oct 19:55

bryce-turner

v1.7.1

eca552c

v1.7.1

Jetstream v1.7.1 Release Notes

Bug Fixes

We ran into a case where we were pinging the container registry way too frequently and getting IO timeouts. Now scripts generated by the slurm_singularity backend will have more extensive bash logic in order to use the cached singularity image if available, it should always be available unless the cache location is cleaned up post starting jetstream. This drastically reduces the "pinginess".

Minor changes

Reduced complexity of slurm_singularity makedirs creation of output directories.

Dev notes

Updated maintainer info in __init__.py

Assets 2

22 Sep 17:15

bryce-turner

v1.7

64c44da

v1.7

Jetstream v1.7 Release Notes

Major changes

A large number of backends have been added supporting docker, singularity, and dnanexus.
Slurm backend(s) settings have been moved to the overall settings config. Allows for user level slurm backend adjustments. For example,
"NODE_FAIL" is now considered as an active state since the job should be requeued and potentially completed by slurm.

Bug Fixes

A deprecation level bug was introduced in python 3.10 relating to certain asyncio functions. We currently support python 3.7+.
Version checking has been updated to use packaging.version instead of distutils.version to be inline with PEP 440.

Dev notes

Updated unit test for container based backends
Version checking has been updated to use packaging.version instead of distutils.version to be inline with PEP 440.

Assets 2

25 Oct 21:37

bryce-turner

v1.6.2

5c3d4b7

v1.6.2

Jetstream v1.6.2 Release Notes

Major changes

Fixed issue #131
Fixed parsing of account info when the cluster does not supply accounting info
Fixed "RuntimeError: generator ignored GeneratorExit" exception handling

Dev Notes

Security issues resolved from dependabot

Assets 2

01 Feb 21:03

bryce-turner

v1.6.1

ad1d051

v1.6.1

Jetstream v1.6.1 Release Notes

Major changes

A new option --mash-only/-m allows users to mash two workflows prior to running a
pipeline or workflow.

Dev Notes

Added unit test for mash only feature

Assets 2

14 Feb 04:53

ryanrichholt

v1.6

22d7ef9

v1.6

Jetstream v1.6 Release Notes

Major changes

A new option --pipeline will allow for pointing directly to a pipeline directory
instead of looking up by name.
New task directive reset is understood by the workflow class. Reset directives can
be either a string or a sequence of strings. When the task is reset, it will also
trigger a reset on any listed task name. Special values predecessors will trigger
a reset for any direct predecessors of the task.
Pipeline and project paths are now exported as environment variables by the runner.
Here are the environment variables:
- JS_PIPELINE_PATH
- JS_PIPELINE_NAME
- JS_PIPELINE_VERSION
- JS_PROJECT_PATH
Three new template global functions were added: env, getenv, setenv for
interacting with environment variables during template rendering. Details in
docs/templates.md
Config file inputs via -c/--config and -C/--config-file have been improved. There
are now options for loading plain text as a list of lines (txt file type), and also
for loading tabular data without headers (csv-nh, tsv-nh). Tabular config data
can now be used with -C/--config-file and will be accessible with the __config_file__
variable inside templates (json/yaml data will still be loaded at the top level).

Bug Fixes

Fixed bug with settings not creating correct pipelines variables in new config files.
Some arguments that worked with jetstream run but not jetstream pipelines are now
working for both. Arguments like -r/--render-only and -b/--build-only will work
with pipelines command now.
SlurmBackend slurm command checks are now silent instead of printing to terminal
Resolved single-column tsv parsing issue

Dev notes

Added unittests for entire pipelines and included a set of example pipelines
Version info is hardcoded in two places, setup.py and jetstream/__init__.py.
There are guides added to the dev docs for how to handle features and releases.

Assets 2

12 Jul 22:46

ryanrichholt

v1.5

865cbf2

v1.5 Release Notes

Major notes

Pipelines command

Jetstream now includes the jetstream pipelines command. Pipelines are another
layer added to managing workflow templates. Since templates support
import/include/extend statements with Jinja, they can actually be modularized
across several files. The pipeline system helps organize complicated templates
with a few helpful features.

Pipelines are Jetstream templates that have been documented with version
information and added to a jetstream pipelines directory. This command allows
pipelines to be referenced by name when starting runs and automatically
includes any pipeline scripts and variables during process.

To create a pipeline:

Add the template file(s) to a directory that is in your pipelines searchpath.
The default searchpath is your user home directoy, but it can be changed in the
application settings (see jetstream settings -h)

Create a pipeline.yaml file in the directory and be sure to include the
required fields.

Pipelines allow templates to be referenced by a name and optional version

Have a template that you use all the time? Name it, document it, and then you
can use jetstream pipelines to start runs with the name. It's also
version-aware, so you can reference a specific version of the pipeline, or just
let jetstream find the latest version that you have installed.

Variables can be included in the `pipeline.yaml`

Pipelines can include constant data used for rendering the templates. For
example, I use the pipelines.yaml to contain the file paths to reference data
for Phoenix. This removes the need to repeat these paths throughout the
template source code, and also brings those variables under our version control
system (they used to be stored in files outside of the pipeline code).

Additional executables/scripts can be included with the pipelines

If a bin property is added to the pipeline manifest, that directory will be
prepended to the user $PATH environment variable when the pipeline is started.
It's a handy way to bundle additional scripts with a pipeline and have them
all fall under the same project for version control purposes.

Tasks command updates

The tasks command was reworked internally, and there were some changes to the
cli options. The general philosophy for the command now is that a set of
filters is used to select the tasks of interest by name or status. The task
names are now given as positional arguments, for example:

jetstream tasks bwa_mem_sample_A haplotypecaller_sample_A ...

These arguments allow for glob wildcard matching: *

jetstream tasks bwa_mem_sample_*

Regex is also still supported:

jetstream tasks --regex 'bwa_mem_sample_[^A]'

Finally, tasks matching the patterns can be filtered by status:

jetstream tasks -s complete bwa_mem_sample_*

By default, this command just lists any tasks matching the name/status options.
The action options can be used to perform additional actions on those tasks.
For example, --verbose prints out a ton of information about each task matching
the query.

Template variables from command-line arguments

These options have changed (again...sorry), but this time I think they work
really well. The reason for this change is to get rid of the awkward pattern
of having to add the extra -- argument before listing any config args. Also,
this new format is much easier to parse, and results in more informative error
messages when there are problems. Here are some use case examples:

Note: config variables can be added when creating projects (they will be stored
in the project.yaml) or when the pipeline/template is run (but they will not
be saved in the project.yaml). I typically prefer to add variables when
creating projects, because it means you can always go back later and see what
was used to render the template (in addition to seeing the final values used
in the commands themselves)

Adding variables when creating a project:

Variables can be added one argument at a time

jetstream init myproject -c reference_path /path/to/reference/file

or multiple:

jetstream init myproject -c reference_path /path/to/reference/file -c email ryan@tgen.org

variables can have a type declared (string is default if no type is declared).
the type should be included with the key parameter, colon separated.

jetstream init myproject -c int:threads 8 -c str:email ryan@tgen.org

lots of variables can loaded from files (note upper-case C)

jetstream init myproject -C ~/myconfig.json

loading variables from multiple files is also supported, but you'll need to
provide a name for them to be added under:

jetstream init myproject -c file:samples ~/mysamples.json -c file:patients ~/mypatients.json

Notice we used the lowercase -c/--config. Using uppercase -C/--config-file
overwrites the variable context entirely, and essentially adds the contents of
the file as "global" template variables. The lowercase -c file:... syntax
will include the variables loaded from the file under the namespace assigned
by the variable key.

JSON strings can also be loaded without saving to a file first:

jetstream init myproject -c json:names '["ryan", "bob", "fred"]'

Projects

Projects have seen a couple small improvements. The project.yaml and
config.yaml have been collapsed into a single file: project.yaml with the
previous contents of that file now being listed under the field __project__.
This minor change has big impacts for loading variable data, any information
about the project can now be introspected in templates with the __project__
variable.

Re-initializing projects with jetstream init command will update the
project.yaml and will also add a record to the project history where you can
track changes to that file over time.

jetstream project has been improved and will tell you whats going on with a
project.

Minor changes

Project jetstream/pid.lock file is now tracking pending runs on projects.
The jetstream run command will wait to acquire this file before starting.
Template variables can no longer be stored in the user application settings
file. See more details in jetstream.templates
Lots of unused code was removed, this really helped reduce the dependencies
Task identity is only computed with cmd and exec directives. This means
changes to cpus will not automatically cause a task to be re-run. In the
future this may be adjusted to include runtime options like container ids or
conda envs. Related to next note:
Workflow mash will always replace a task if the old version has failed. For
example: If 99/100 tasks passed, one failed due to memory requirements. When
you update the mem directive, only the failed task would be replaced, the
other 99 tasks do not need to be run since the cmd hasn't changed.
input/after/before directives cannot be mappings with an re: property any
more. Instead, use the new pattern directives after-re: before-re: and
input-re. This fixes a number of issues when creating graphs.

Assets 2

06 Feb 00:35

ryanrichholt

1.3.0-beta

212d893

1.3.0-beta release for demo Pre-release

Pre-release

1.3.0-beta Release Notes

Major notes

New application settings system

Application settings are now handled via configuration files. There is a
detailed process for loading files and more information can be found with
the jetstream settings command. Previously some settings were taken
from environment variables, and those must now be set in your user config
file. The path where your user config file should be saved can be found
with the jetstream settings command. Here is an example config file:

# My user settings 
# Find the correct location to save this file by running "jetstream settings"
backend: slurm

pipelines:
    home: /path/to/your/pipelines/dir/

constants:
    foo: bar

Refinements to template rendering

The process for loading data used to render templates has undergone some
minor tweaks:

Data should be added to a project with the jetstream init, adding data
after the project is initialized can be accomplished by editing the config.yaml
in the jetstream directory, or rerunning the init command. Instead of adding
files to the config folder of a project after creating it, just pass them
as command args.
Command args for template data must follow an empty -- argument. This
clearly distinguishes arguments for the application from template/project data
arguments. For example:
```
# Old style
jetstream build template.jst --variables samples.json
```
Must now be:
```
# New style
jetstream build template.jst -- --variables samples.json
```
Template data will now be loaded from the following sources in order of
descending priority (any source will override all sources below it in this
list):
- Command Arguments (eg. jetstream run ... -- --str:foo bar --file:csv:samples mysamples.csv)
- Project config file (if working in a project)
- Pipeline manifest constants: ... section (if running a pipeline)
- User application settings config file constants: ... section
Template rendering data will NO LONGER be automatically saved into the project
config file. This allows project config data to override pipeline data reliably.
To debug these features, enable debug logging -l debug and look for a line like this:
templates:162 DEBUG 2019-02-05 16:44:10:Template render context:.

Projects have slimmed down

Projects will only include the jetstream directory after initialization. All
other directories are left to up to the workflow author. Projects are now
recognized by the presence of a the info file: jetstream/project.yaml. Existing
projects, should still work, but you will need to re-run jetstream init before
they are recognized. You can run the jetstream project command inside a project
to get info about that project (or make sure it has been intialized correctly)

Tasks commands can be used on any workflow file

jetstream project tasks has moved to jetstream tasks and will accept a -w/--workflow
argument. This makes it function with any built workflow file. If you are working
inside of a project (or --project argument is given), the project workflow file
will be used.

New task directive: retry

Retry is a task directive that will prevent a task from failing. The runner will resubmit
the task for the given number of attempts before failing. This state is loaded each
time the task is loaded, so it will NOT be preserved across multiple runs of the same
workflow. Here is a made-up workflow that demonstrates the directive:

- name: fails_once
  retry: 1
  cmd: |
    if [ ! -f foo.txt ]; then
      echo "File not found!"
      touch foo.txt
      exit 1
    else
      echo "File was found!"
    fi

Other changes

YAML error messages should be slightly more informative
The package includes a __main__.py file. For developers, this means it can be used with python -m jetstream
Tasks have a dynamic label that can be set by the runner backends. The slurm backend
uses this feature to add the job id to logging messages regarding that task.
SlurmBackend will collect some account data for all tasks. This can be configured with
settings:backends:slurm:sacct_fields
Tasks will include an elapsed_time state attribute for all runner backends.
New subcommands added render settings tasks

Assets 2

Releases: tgen/jetstream

v1.7.4

Jetstream v1.7.4 Release Notes

Major changes

Bug fixes

Minor changes

v1.7.3

Jetstream v1.7.3 Release Notes

Major changes

Bug fixes

Minor changes

Ease of use updates

v1.7.2

Jetstream v1.7.2 Release Notes

Major changes

Minor changes

v1.7.1

Jetstream v1.7.1 Release Notes

Bug Fixes

Minor changes

Dev notes

v1.7

Jetstream v1.7 Release Notes

Major changes

Bug Fixes

Dev notes

v1.6.2

Jetstream v1.6.2 Release Notes

Major changes

Dev Notes

v1.6.1

Jetstream v1.6.1 Release Notes

Major changes

Dev Notes

v1.6

Jetstream v1.6 Release Notes

Major changes

Bug Fixes

Dev notes

v1.5 Release Notes

Major notes

Pipelines command

Pipelines allow templates to be referenced by a name and optional version

Variables can be included in the pipeline.yaml

Additional executables/scripts can be included with the pipelines

Tasks command updates

Template variables from command-line arguments

Adding variables when creating a project:

Projects

Minor changes

1.3.0-beta release for demo

1.3.0-beta Release Notes

Major notes

New application settings system

Refinements to template rendering

Projects have slimmed down

Tasks commands can be used on any workflow file

New task directive: retry

Other changes

Variables can be included in the `pipeline.yaml`