Add Check for Different Front End #883

rstyd · 2024-07-18T21:13:14Z

Currently, if a user starts BEE on one front end of an arbitrary HPC cluster, let's call it clusterfe1, then tries to use beeflow commands on another front end of the same cluster the commands will fail since processes on different front ends usually can't communicate with each other.

If the user tries to run a beeflow core command they'll get the error:
Cannot connect to the beeflow daemon, is it running? Check the log at ".beeflow/logs/beeflow.log".

If the user tries to use any beeflow commands they'll get a message like
Submit: Could not reach WF Manager.

We should add a check in core.py and client.py for to make sure the host that the user is running on is the same as the host beeflow is currently running on.

One big issue is there isn't currently a clean way to get this information.

We have several options:

The beeflow log at .beeflow/logs/beeflow.log says the front end on which beeflow was last started in the format.

Running on cluster-fe1
Launching components in order: ['redis', 'scheduler', 'celery', 'slurmrestd', 'wf_manager', 'task_manager']

We could grep the last Running message out of the log (and verify there wasn't a Kill operation afterwards) to get this info. This could break if we ever make changes to the beeflow log and is kind of brittle.

Alternatively, we could add the hostname where beeflow is running to the workflow DB and get that information in the beeflow client. Currently, we're only using the workflow DB in the wf_manager so this would add another piece of code that depends on it which breaks our modularity somewhat. Another issue is that this won't work if in the future we enable a client to run on a separate system from the one where the workflow manager is running, but that situation wouldn't be impacted by this problem so we'd need to just not do this check if we're connecting to the workflow manager from another machine.

I think option 2 is the best solution at the moment.

The text was updated successfully, but these errors were encountered:

pagrubel · 2024-08-29T16:43:54Z

@kchilleri When you check for the location that beeflow is starting from can you also check if the environment variable SLURMD_NODENAME exists and print it out with a warning that they are on a compute node and not allow beeflow to start.

kchilleri · 2024-10-01T15:58:25Z

@kchilleri When you check for the location that beeflow is starting from can you also check if the environment variable SLURMD_NODENAME exists and print it out with a warning that they are on a compute node and not allow beeflow to start.

issue #932 addresses this.

rstyd added the bug Something isn't working label Jul 18, 2024

pagrubel added the High Priority label Aug 6, 2024

kchilleri self-assigned this Aug 6, 2024

kchilleri linked a pull request Oct 1, 2024 that will close this issue

Issue883/add check for different front end #933

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Check for Different Front End #883

Add Check for Different Front End #883

rstyd commented Jul 18, 2024

pagrubel commented Aug 29, 2024

kchilleri commented Oct 1, 2024 •

edited

Loading

Add Check for Different Front End #883

Add Check for Different Front End #883

Comments

rstyd commented Jul 18, 2024

pagrubel commented Aug 29, 2024

kchilleri commented Oct 1, 2024 • edited Loading

kchilleri commented Oct 1, 2024 •

edited

Loading