-
Notifications
You must be signed in to change notification settings - Fork 0
Dev process
author | date | tags |
---|---|---|
PE | 2024-06-07 | update, git, commands |
PE | 2022-09-21 | cubi, internal, convention, rule, policy, standard |
The code development process in the CUBI follows certain standards for the two main types of code repositories created by CUBI team members (described in the following).
Regarding naming and style conventions, please refer to the respective wiki article.
Workflow repositories contain pipeline code to execute a series of bioinformatic tools in the same way for many different samples or batches of samples. The goal of workflow design should always be that the code can be executed by third parties.
Workflow code must not make any assumptions about the execution infrastructure or environment, except that it will be a Linux system. In particular, never hard-code any of the following (non-exhaustive list):
- file system paths or locations
- non-standard environment variables
- download of resources on-the-fly from the internet
- (input) data transformations or (meta-)data cleanup specific to a project
In general, a new workflow repository should be derived from the respective workflow template repository. The relation between workflow and workflow template has to be documented in the workflow's pyproject.toml
.
Project repositories may contain metadata, small and usually hand-curated annotation files, project-specific code performing preprocessing or cleanup tasks, and should generally document project-specific decisions. A single project can make use of several workflows, potentially executed manually in a serial fashion. The relation between project and workflow(s) has to be documented in the project's pyproject.toml
.
Other repositories, such as this knowledge base, can be organized differently. Use your best judgement or ask your colleagues for advice.
The desired state for all shared repositories (mostly workflow and project) is to have a linear git commit history in the central branches main
and dev
.
Maintaining that state requires some effort to pursue a "rebase and merge" strategy. In other words, merging pull requests via "merge commits" or "squash commits" is a forbidden operation.
The git rebase
development and merging strategy cannot be applied to the two constant branches of a CUBI git repository main
and dev
(main
as the default target and main
to dev
only in case of emergency fixes that were applied to main
directly --- if there were really compelling reasons to do so ...). The git rebase
command always recomputes the
commit hashes (which are dependent on the parent commit and so on), which implies that if one would rebase dev
carelessly onto main
, one would risk to change
a substantial number of commit hashes that may already exist in some branch created from dev
. However, for a clean and linear commit history, one can simply merge
two
branches by fast forwarding, which does not result in a merge commit. In other words, the new commits from dev
are just applied on top of the last commit in main
.
This series of operations has to be realized on the command line level and does not work via the github web interface!
- For the non-primary branches (
feature-
,analysis-
,issue-
etc. --- so everything that is notmain
,dev
andprototype
[if applicable]), you may rebase ontodev
and use github's "rebase and merge" option to close an open pull request. Note that if you rebase w/o merging the branch intodev
, you will have topush --force
your local history to the remote, which will effectively break the history for all other developers working on that branch. Talk to them first!
$ git switch feature-xyz
$ git rebase --keep-base --interactive dev
# fix conflicts if necessary
$ git push --force all
- For the primary and constant branches
main
,dev
andprototype
[if applicable], do not merge via therebase
strategy for the reasons explained above. If properly implemented, there shouldn't be any commits inmain
(ordev
) that do not exist indev
(orprototype
) when you want to mergedev
intomain
. Hence, you can execute:
# assuming all branches are up-to-date
$ git switch main
$ git merge --ff-only dev
# this merges all new commits from dev into/onto main
# and does not create an unnecessary merge commit
- If it so happens that there are unshared commits between
main
anddev
and fast forwarding does not work or would create duplicated code changes, resolve the conflicts bycherry-picking
the commit(s). Cherry picking works on a single commit or on a series of commits and pulls commits from one branch into another. Use this strategy to ensure that fast-forwarding works.
Please do:
- make production releases
- coordinated single pushes of a bug fix only in absolute emergencies
- may be followed by a post-mortem to clarify what went wrong
- name this branch
main
Please don't:
- (force) push directly
- start feature or issues branches from here
- rename the branch
- delete the branch
- merge
main
into another branch
Please do:
- make development releases
- name this branch
dev
- start feature or issue branches from here
- rebase and merge finished feature, issue or analysis branches into
dev
Please don't:
- (force) push directly
- rename the branch
- delete the branch
- merge
dev
into another branch except for fast-forwarding (git merge --ff-only
) intomain
Please do:
- use only in the very beginning of the dev process "when nothing works"
- name this branch
prototype
- delete this branch as soon as main development has been moved to
dev
Please don't:
- keep using this branch forever
- keep using this branch if several people contribute to the development
- don't start feature or issue branches from here
- create the
main
branch and populate it with the appropriate metadata files- LICENSE, CITATION info,
pyproject.toml
etc. -
push
to git
- LICENSE, CITATION info,
- from main, create
dev
and populate it with template files- (if applicable)
-
push
to git
- from
dev
, createprototype
and start adding your code
Important: feature-
branches should only exist for workflow or tool repositories.
The analogous branch in standard CUBI project repositories is
called analysis-
and is more lenient in terms of the development and merge policy.
The following dos and don'ts are binding for workflow and tool repositories.
Please do:
- start a new branch for every single unit of work
- always branch off from
dev
- follow naming conventions as described in the naming and style wiki article
- clean up your commit history every now and then via
git rebase --interactive dev
(see below) - force push into the feature/issue branch after a rebase if necessary
- notify your colleagues if you are sharing the implementation work
Please don't:
- use a branch as a hidden fork of a repo and implement breaking changes
- keep branches alive in production and use them as pipeline run targets
- start a pull request before testing, linting and formatting your code
- forget to delete your branch everywhere (!) after it has been merged
Remark: the Mermaid Gitgraph capabilities are still under development, and the following examples are thus not showing the full (possible) complexity.
The goal is to have a simple, linear commit history in main
and dev
:
gitGraph
commit id: "A"
commit id: "B"
commit id: "C"
commit id: "D"
commit id: "E"
commit id: "F"
The dev
branch is used as the central development branch, i.e., it is the starting point for all feature or issue branches:
gitGraph
commit id: "A"
commit id: "B"
branch dev
commit id: "C"
commit id: "D"
branch feature-1
commit id: "E"
commit id: "F"
commit id: "G"
checkout dev
branch issue-1
commit id: "H"
commit id: "I"
commit id: "J"
Given the different speed of development in the various branches, the series of operations to merge branches back into dev
should be considered unpredictable. In the example below, the issue-1
branch should be merged back into dev
before the work in feature-1
is complete:
gitGraph
commit id: "A"
commit id: "B"
branch dev
commit id: "C"
commit id: "D"
branch feature-1
commit id: "E"
commit id: "F"
commit id: "G"
checkout dev
branch issue-1
commit id: "H"
commit id: "I"
commit id: "J"
checkout dev
merge issue-1
Remark: Mermaid does not support visualizing the git rebase
operation (yet).
Let's assume the issue-1
branch could be rebased and merged into dev
w/o problems because it had no conflicts with dev
. After the successful merge, the issue-1
branch was deleted. It is very much possible, though, that the code changes merged via issue-1
are in conflict with the feature-1
branch, in which case it would not be possible to also simply merge feature-1
back into dev
.
gitGraph
commit id: "A"
commit id: "B"
branch dev
commit id: "C"
commit id: "D"
branch feature-1
commit id: "E"
commit id: "F"
commit id: "G"
checkout dev
commit id: "H"
commit id: "I"
commit id: "J" tag: "v1.0.0dev"
At this point, you could rebase feature-1
on dev
as well even if the development is still incomplete just to resolve all conflicts and to thus make feature-1
coherent with dev
again. Since the history of the feature-1
branch is changed during the rebase (parent in dev
changed from D
to J
), the commit hashes have to be updated. Consequently, you would need to force push (git push --force all
) your changes to the git server, effectively rewriting history. This operation breaks the feature-1
branch for all other developers working on that branch, hence they should be notified.
gitGraph
commit id: "A"
commit id: "B"
branch dev
commit id: "C"
commit id: "D"
commit id: "H"
commit id: "I"
commit id: "J" tag: "v1.0.0dev"
branch feature-1
commit id: "E'"
commit id: "F'"
commit id: "G'"
commit id: "K"
commit id: "L"
As soon as the development in feature-1
is complete, it can be merged into dev
via merge --ff-only
or using the "rebase and merge" option from github.
Remember to delete the branch afterwards:
gitGraph
commit id: "A"
commit id: "B"
branch dev
commit id: "C"
commit id: "D"
commit id: "H"
commit id: "I"
commit id: "J" tag: "v1.0.0dev"
commit id: "E'"
commit id: "F'"
commit id: "G'"
commit id: "K"
commit id: "L"
For a production release, the dev
branch is merged into main
via merge --ff-only
. Do not use github's "rebase and merge" and do not rebase dev onto main
;
there shouldn't be any conflicts if the development process was implemented in a proper manner by all developers.
The development cycle can now start again, with the parent of the dev
branch being the last commit in main
:
gitGraph
commit id: "A"
commit id: "B"
commit id: "C"
commit id: "D"
commit id: "H"
commit id: "I"
commit id: "J" tag: "v1.0.0dev"
commit id: "E'"
commit id: "F'"
commit id: "G'"
commit id: "K"
commit id: "L" tag: "v1.0.0"
branch dev
commit id: "M"
The point of interactively rebasing a feature or issue branch on dev
is to reduce the commit history in the branch to the relevant commits. Consider the following example:
gitGraph
commit id: "A"
branch dev
commit id: "B"
branch feature-1
commit id: "add function X"
commit id: "add function Y"
commit id: "fix syntax error"
commit id: "fix syntax"
commit id: "add test case X"
commit id: "fix spelling"
commit id: "add test case Y"
commit id: "fix formatting"
commit id: "fix language"
commit id: "update docs"
commit id: "update docs 2"
Arguably, fixing trivial syntax errors or spelling mistakes are no changes worth keeping in the commit history after merging the feature-1
branch into dev
. Hence, during an interactive rebase, you are presented with various options to edit, combine or delete commits (see section "Changing Multiple Commit Messages"). In the above case, the fixup
option, which combines a commit with the previous one, could be used to simplify the commit history as follows:
gitGraph
commit id: "A"
branch dev
commit id: "B"
branch feature-1
commit id: "add function X"
commit id: "add function Y"
commit id: "add test case X"
commit id: "add test case Y"
commit id: "update docs"
Depending on the complexity of the fixed functions and test cases, it could even be made simpler by also rewording some commit messages in addition to the fixup
operation:
gitGraph
commit id: "A"
branch dev
commit id: "B"
branch feature-1
commit id: "add functions X and Y"
commit id: "add test cases X and Y"
commit id: "update docs"
Copyright © 2022-2024 Core Unit Bioinformatics, Medical Faculty, HHU
All content in this Wiki is published under the CC BY-NC-SA 4.0 license.