Skip to content

Naming and style

Peter Ebert edited this page May 10, 2024 · 3 revisions
author date tags
PE 2022-09-16 cubi, internal, convention, rule, policy, standard
PE 2024-05-09 cubi, internal, convention, rule, policy, standard

Naming and style conventions

The following guidelines can be broken if really necessary, and discussed if perceived as unnecessary or misguided.

For Python and closely related code (such as Snakemake modules), the CUBI follows the respective PEP8 style guide. Realizing this requirement is usually possible via code-formatting tools such as black and snakefmt, but it is advisable to recognize badly formatted code when you see it. The following excerpt from the PEP8 style guide also provides a reasonable view on the subject:

[...] code is read much more often than it is written. [...]

A style guide is about consistency. Consistency with this style guide is important. Consistency within a project is more important. Consistency within one module or function is the most important.

However, know when to be inconsistent – sometimes style guide recommendations just aren’t applicable. When in doubt, use your best judgment. Look at other examples and decide what looks best. And don’t hesitate to ask!

In particular: do not break backwards compatibility just to comply with this PEP!

Naming repositories

The CUBI maintains three types of code repositories (TODO: see dev guidelines for details):

  1. workflows
  2. projects
  3. other

Naming workflow repositories: workflow-[system]-[short-desc-name]

  1. [system]: the respective (eco-)system the workflow is designed for, e.g., snakemake, nextflow, common workflow language (CWL), workflow definition language etc.
    • snakemake: smk
    • nextflow: nxf
    • common workflow language: cwl
    • workflow definition language: wdl
    • reminder: the system part must be put in the pyproject.toml file under the section [cubi.workflow.template] with the key system = "system".
  2. [short-desc-name]: a short descriptive name using only the characters a-z and - (minus). Numbers 0-9 may be used if necessary and reasonable.
    • reminder: the short-desc-name part of the name must be put in the pyproject.toml file under section [cubi.workflow] with the key name = "short-desc-name".
  3. example: workflow-smk-longread-variant-calling

Naming project repositories: project-[type]-[short-desc-name]

  1. [type]: the type of the project (production, development etc.)
    • development: dev; the project was started to build a new workflow for the CUBI catalogue. The respective project repository may thus document the development while it is still in progress, or organize (preprocess) test data.
    • production (run): run; the project was started to process data with an existing workflow, and thus contains sample information (e.g., phenotypical annotation), or some specific routines for (meta-)data preprocessing.
    • benchmark: bmk; the project was started to evaluate performance aspects of an existing pipeline, e.g., as part of round robin tests in a consortium.
    • reminder: the type part must be put in the pyproject.toml file under the section [cubi.project] with the key type = "type".
  2. [short-desc-name]: a short descriptive name using only the characters a-z0-9 and - (minus).
    • Obviously, the short name of the project should not duplicate the short name of the executed pipeline(s), but be a reference to the project context.
    • reminder: the short-desc-name part of the name must be put in the pyproject.toml file under section [cubi.project] with the key name = "short-desc-name".
  3. example: project-TYPE-xyz-cohort

Naming other repositores: use your best judgement or ask your colleagues for feedback

Naming branches in repositories

The following rules primarily --- but not exclusively --- apply to repositories that exist to realize a concerted and orderly code design and implementation effort, i.e. workflow and (software) tool repositories. For CUBI project repositories in particular, please refer to the respective guideline document on how to structure a CUBI project repository.

Following current tendencies, the following naming conventions should be used in repositories:

  1. base (release) branch: main
  2. central merge (development) branch: dev
  3. feature development branch: limited to characters a-z0-9 and - (minus).
    • recommended: feat-[short-desc-name]
  4. branch to fix an issue: issueNN-[short-desc-name] where NN is the issue number (usually on github). Same character set restrictions as for feature branches.

Naming files (normalizing file names)

File names should be formed using only these characters:

  • A-Z: uppercase should be limited to names or IDs (such as sample names)
  • a-z: as-is
  • 0-9: as-is
  • -: minus, preferably used as "within-context" separator
    • example: context is specifying a date, i.e. 2022-09-16
  • _: underscore, preferably used as "between-context" separator
    • example: SAMPLE1_dataA and SAMPLE1_dataB, i.e. context one is the sample ID, and context two is the data type.
  • .: dot, preferably used to indicate file format changes
    • example: .vcf to .vcf.gz to vcf.gz.tbi

It is probably a universal truth that there is no single file naming scheme "to rule them all". Hence, think before (re-)naming files, but accept that you cannot find a perfect solution (note that it says "preferably" in the above guidelines).

Final remark: using whitespace or special characters in ordinary file names means that you have reached the antipode of perfection.