-
Notifications
You must be signed in to change notification settings - Fork 0
Naming and style
author | date | tags |
---|---|---|
PE | 2022-09-16 | cubi, internal, convention, rule, policy, standard |
PE | 2024-05-09 | cubi, internal, convention, rule, policy, standard |
The following guidelines can be broken if really necessary, and discussed if perceived as unnecessary or misguided.
For Python and closely related code (such as Snakemake modules),
the CUBI follows the respective PEP8 style guide.
Realizing this requirement is usually possible via code-formatting tools such
as black
and snakefmt
, but it is advisable to recognize badly formatted code
when you see it. The following excerpt from the PEP8 style guide also provides
a reasonable view on the subject:
[...] code is read much more often than it is written. [...]
A style guide is about consistency. Consistency with this style guide is important. Consistency within a project is more important. Consistency within one module or function is the most important.
However, know when to be inconsistent – sometimes style guide recommendations just aren’t applicable. When in doubt, use your best judgment. Look at other examples and decide what looks best. And don’t hesitate to ask!
In particular: do not break backwards compatibility just to comply with this PEP!
The CUBI maintains three types of code repositories (TODO: see dev guidelines for details):
- workflows
- projects
- other
Naming workflow repositories: workflow-[system]-[short-desc-name]
-
[system]
: the respective (eco-)system the workflow is designed for, e.g., snakemake, nextflow, common workflow language (CWL), workflow definition language etc.- snakemake:
smk
- nextflow:
nxf
- common workflow language:
cwl
- workflow definition language:
wdl
-
reminder: the
system
part must be put in thepyproject.toml
file under the section[cubi.workflow.template]
with the keysystem = "system"
.
- snakemake:
-
[short-desc-name]
: a short descriptive name using only the charactersa-z
and-
(minus). Numbers0-9
may be used if necessary and reasonable.-
reminder: the
short-desc-name
part of the name must be put in thepyproject.toml
file under section[cubi.workflow]
with the keyname = "short-desc-name"
.
-
reminder: the
- example:
workflow-smk-longread-variant-calling
Naming project repositories: project-[type]-[short-desc-name]
-
[type]
: the type of the project (production, development etc.)- development:
dev
; the project was started to build a new workflow for the CUBI catalogue. The respective project repository may thus document the development while it is still in progress, or organize (preprocess) test data. - production (run):
run
; the project was started to process data with an existing workflow, and thus contains sample information (e.g., phenotypical annotation), or some specific routines for (meta-)data preprocessing. - benchmark:
bmk
; the project was started to evaluate performance aspects of an existing pipeline, e.g., as part of round robin tests in a consortium. -
reminder: the
type
part must be put in thepyproject.toml
file under the section[cubi.project]
with the keytype = "type"
.
- development:
-
[short-desc-name]
: a short descriptive name using only the charactersa-z0-9
and-
(minus).- Obviously, the short name of the project should not duplicate the short name of the executed pipeline(s), but be a reference to the project context.
-
reminder: the
short-desc-name
part of the name must be put in thepyproject.toml
file under section[cubi.project]
with the keyname = "short-desc-name"
.
- example:
project-TYPE-xyz-cohort
Naming other repositores: use your best judgement or ask your colleagues for feedback
The following rules primarily --- but not exclusively --- apply to repositories that exist to realize a concerted and orderly code design and implementation effort, i.e. workflow and (software) tool repositories. For CUBI project repositories in particular, please refer to the respective guideline document on how to structure a CUBI project repository.
Following current tendencies, the following naming conventions should be used in repositories:
- base (release) branch:
main
- central merge (development) branch:
dev
- feature development branch: limited to characters
a-z0-9
and-
(minus).- recommended:
feat-[short-desc-name]
- recommended:
- branch to fix an issue:
issueNN-[short-desc-name]
whereNN
is the issue number (usually on github). Same character set restrictions as for feature branches.
File names should be formed using only these characters:
-
A-Z
: uppercase should be limited to names or IDs (such as sample names) -
a-z
: as-is -
0-9
: as-is -
-
: minus, preferably used as "within-context" separator- example: context is specifying a date, i.e.
2022-09-16
- example: context is specifying a date, i.e.
-
_
: underscore, preferably used as "between-context" separator- example:
SAMPLE1_dataA
andSAMPLE1_dataB
, i.e. context one is the sample ID, and context two is the data type.
- example:
-
.
: dot, preferably used to indicate file format changes- example:
.vcf
to.vcf.gz
tovcf.gz.tbi
- example:
It is probably a universal truth that there is no single file naming scheme "to rule them all". Hence, think before (re-)naming files, but accept that you cannot find a perfect solution (note that it says "preferably" in the above guidelines).
Final remark: using whitespace or special characters in ordinary file names means that you have reached the antipode of perfection.
Copyright © 2022-2024 Core Unit Bioinformatics, Medical Faculty, HHU
All content in this Wiki is published under the CC BY-NC-SA 4.0 license.