Teaching materials / slides for the de.NBI cloud user meeting 2020
BiBiGrid is written in Java and needs a Java Runtime Environment Version 8 or greater installed. Additionally a terminal client and an SSH client are needed.
The easiest and recommended way to use BiBiGrid is to download the latest prebuilt binary.
Alternatively, you may clone the BiBiGrid repository and build it yourself using Maven, which requires a Java Development Kit (>= 8) and Maven (>= 3.9) installed.
> git clone https://github.com/BiBiServ/bibigrid.git
> cd bibigrid
> mvn -P openstack clean package
BiBiGrid needs access to the Openstack API to work properly.
The access to the de.NBI Cloud sites' web-based user interface (Openstack Dashboard) is realized by an SSO mechanism using Elixir AAI. An API password is not set by default.
The OpenStack RC file is a file that contains the environment variables necessary to run OpenStack command-line clients. The file contains project-specific environment variables and allows access to the Openstack API. After login into the OpenStack Dashboard you can download the OpenStack RC File v3 by clicking on your account symbol in the upper right corner.
After downloading, open up a terminal and source the downloaded file (e.g. clum2020-openrc.sh) to get the credentials into your environment.
> source FILE.sh
Note: You have to source the RC file in every new terminal to be able to access the OpenStack API.
Note: Application credentials are unfortunately not an option, because Openstack4J - the library used by BiBiGrid to talk with OpenStack - does not support them.
The prefilled configuration template below works on the de.NBI cloud site Bielefeld (verified on 2020-10-05). You have to adjust many of the values when trying this on other de.NBI cloud sites.
#use openstack
mode: openstack
#Access
sshUser: ubuntu
region: Bielefeld
availabilityZone: default
#Network
subnet: XXXXXX # REPLACE
# Master instance
masterInstance:
type: de.NBI default + ephemeral
image: e4ff922e-7681-411c-aa9b-6784390a904e
# Worker instances
workerInstances:
- type: de.NBI small + ephemeral
image: e4ff922e-7681-411c-aa9b-6784390a904e
count: 3
useMasterWithPublicIp: yes
useMasterAsCompute: no
#services
nfs: yes
zabbix: yes
slurm: yes
ideConf:
ide: true
zabbixConf:
admin_password: XXXXX # REPLACE
- Download the prefilled configuration template
- Open it with an editor of your choice and replace the XXXXXXX values for
subnet
andzabbixConf.admin_password
BibiGrid creates a new SSH key pair (stored at ~/.bibigrid/keys
) for each cluster started. These cluster specific keys are used to connect to the master instance. It is possible to add additional SSH keys for communication with the remote compute system (not covered by our template above, see BiBigrid documentation for a precise description).
The ssh-user depends on the cloud image your cluster is based on. Since we run on top of Ubuntu 18.04 the ssh-user is ubuntu
.
The region can be determined easily by running the OpenStack CLI.
$ openstack region list
The availability zone where your instances are created.
$ openstack availability zone list
If you have the permissions to create networks, BiBiGrid offers the possibility to create a new private network connected to an existing router. For our tutorial we work on an existing subnet. Please determine the subnet name or ID using the CLI.
$ openstack subnet list
We would like to use a default Ubuntu 18.04 operating system image for our tutorial. Determine the ID of it using the CLI ...
$ openstack image list
... and add it to the master/worker configuration.
We use a typical cluster configuration for our workshop setup. That means we have to enable a shared fs (nfs
), a grid batch scheduler (slurm
), a monitoring framework (zabbix
) and a web IDE (theia
).
To keep the cluster setup process simple you can set an alias for the BiBiGrid JAR file installed before.
The Unix command should look like the following (depending on the JAR filename):
> alias bibigrid="java -jar /path/to/bibigrid-*.jar"
You can simply check your configuration using:
> bibigrid -o configuration.yml -ch
For information about the command set, you may now check the help command:
> bibigrid -o configuration.yml --help
Now we can create the first cluster with our previously generated configuration:
> bibigrid -o configuration.yml -c -v
If no problem occurs, our cluster should be ready to work within about 15 minutes ... time for a coffee break
It is possible to have more than one BiBiGrid cluster running simultaneously. List all running clusters (within the same OpenStack project) using:
> bibigrid -o configuration.yml --list
The command returns an informative list about all your running clusters.
SLUM is an open source and scalable cluster management and job scheduling system for large and small Linux clusters. As a cluster workload manager, Slurm has three key functions. First, it allocates access to resources on the worker nodes to users for some duration of time so they can perform work. Second, it provides a framework for starting, executing, and monitoring work (normally a parallel job) on the set of allocated workers. Finally, it arbitrates contention for resources by managing a queue of pending work. (See documentation for a detailed documentation).
scontrol
: View and/or modify Slurm statesinfo
: Reports state of Slurm partitions and nodessqueue
: Reports the state of jobs or job stepsscancel
: Cancel a pending or running job or job stepsrun
: Submit a job for execution- ...
BiBiGrid establishes a shared filesystem (NFS) between all cluster nodes. The master acts as an NFS server and all clients connects to it.
/vol/spool
-> shared filesystem between all nodes./vol/scratch
-> local disk space (ephemeral disk, if provided)
After a successful setup ...
SUCCESS: Cluster has been configured.
Ok :
You might want to set the following environment variable:
export BIBIGRID_MASTER=129.70.51.XXX
You can then log on the master node with:
ssh -i /Users/jkrueger/.bibigrid/keys/bibigridther0ysuts6vo37 ubuntu@$BIBIGRID_MASTER
The cluster id of your started cluster is: ther0ysuts6vo37
You can easily terminate the cluster at any time with:
bibigrid -t ther0ysuts6vo37
... you should be able to log into the master node. Run sinfo
to check if there are 3 workers available.
BiBiGrid offers a more comfortable way to work with your cloud instances using the web IDE Theia. Let's see how this works together with BiBiGrid.
If the ide option is enabled in the configuration, theia will be run as systemd service on localhost. For security reasons, theia is not binding to a standard network device. A valid certificate and some kind of authentication is needed to create a safe connection, which is not that easy in a dynamic cloud environment.
However, BiBiGrid has the possibility to open a SSH tunnel from the local machine to BiBiGrid's master instance and to open up a browser window running Theia web IDE.
bibigrid -o configuration.yml --ide <cluster id>
To see how the cluster with Slurm works in action, we start with a typical example : Hello World !
-
If not already done, connect to your cluster (via terminal or web IDE)
-
Create a new shell script
hello-world.sh
in the spool directory (/vol/spool
):
#!/bin/bash
echo Hello from $(hostname) !
sleep 10
-
Open a terminal and change into the spool directory.
cd /vol/spool
-
Make our helloworld script executable:
chmod u+x hello-world.sh
-
Submit this script as an array job 50 times:
sbatch --array=1-50 --job-name=helloworld hello-world.sh
-
See the status of our cluster:
squeue
-
See the output:
cat slurm-*.out
To get an overview about how your cluster is working, you can use Zabbix for monitoring.
Therefore it is necessary to use port-forwarding in order to access the Zabbix server through your local browser.
Log into the cluster:
ssh -L <local-port>:localhost:80 user@ip-address
As the <local-port>
you have to choose a free port on your local system (e.g. 8080
).
The ip-address
is the public IP address of your master instance, that you received after cluster launch in the line export BIBIGRID_MASTER=<ip-address>
. Alternatively, you can use the list
command from above to get an overview and copy the respective IP address in the row public-ip
.
After you have successfully logged into your master instance, type http://localhost:<local-port>/zabbix
into your browser address bar. Accordingly, it is the same <local-port>
have chosen before.
The public IP of your cluster should be visible with the list command bibigrid -l
and is also displayed after setup.
You can log in with the admin
user and the previously set admin password.
For a detailed documentation please visit the Getting Started Readme.
In some cases, you may want to scale down your cluster when you don't need all the worker instances or scale up when you need more of them. We scale down one worker instance of our first worker batch previously configured.
> bibigrid -o configuration.yml -sd <bibigrid-id> 1 1
Scaling down is quite fast, only the master node has to be reconfigured. Check the number of working nodes running sinfo
on the master node. Since we need three workers for the 2nd part of this workshop we now scale up by one worker instance ...
> bibigrid -o configuration.yml -su <bibigrid-id> 1 1
... and again check the number of worker nodes using sinfo
. Scaling up takes some time (a few minutes) since newly added workers are being configured from scratch.
Ansible is used by BiBiGrid to configure all launched instances. It can also be used to modify an existing cluster.
To terminate a running cluster you can simply use:
bibigrid -o configuration.yml -t <clusterid>
Optionally, it is possible to terminate more than one cluster appending the other IDs as follows:
bibigrid -o configuration.yml -t <clusterid1> <clusterid2> <clusterid3> ...
Another option is to terminate all your clusters using your username:
bibigrid -o configuration.yml -t <user>