Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix reset error: addresses issue #757 #793

Closed
wants to merge 6 commits into from

Conversation

pagrubel
Copy link
Collaborator

@pagrubel pagrubel commented Mar 7, 2024

- check workflow state
- do not allow reset when there are Running or Intializing workflows
- add get_workflow_list method to eliminate duplicate code
- make response text message clearer to read in code

Resolves: issue #757

    - check workflow state
    - do not allow reset when there are Running or Intializing workflows
    - add get_workflow_list method to eliminate duplicate code
    - make response text message clearer to read in code
@rstyd
Copy link
Collaborator

rstyd commented Mar 7, 2024

Instead of stopping a reset when there are initializing workflows, I think it'd be better to just kill them outright, but ask the user first.

@pagrubel
Copy link
Collaborator Author

pagrubel commented Mar 7, 2024

Instead of stopping a reset when there are initializing workflows, I think it'd be better to just kill them outright, but ask the user first.

How should I do that? What needs to be killed?

@pagrubel pagrubel requested a review from rstyd March 7, 2024 22:31
@pagrubel
Copy link
Collaborator Author

I will try to modify this to search for any Running or Initializing workflows, give the user a chance to stop the reset process. If they want to continue I will attempt to cancel the workflows, then do the stop and delete dir. I may put a longer wait in too, just to get around the Initializing problem.

@pagrubel pagrubel added the WIP Work in progress label Mar 11, 2024
@rstyd
Copy link
Collaborator

rstyd commented Mar 11, 2024

Oh sorry missed this. We want to just kill all the currently initializing or running workflows exactly as you described.

@kchilleri
Copy link
Collaborator

This is what I get when I have an initializing workflow and try to beeflow core reset:

(base) [kchilleri@darwin-fe3 BEE]$ git checkout issue757/fix_reset_error
branch 'issue757/fix_reset_error' set up to track 'origin/issue757/fix_reset_error'.
Switched to a new branch 'issue757/fix_reset_error'
(base) [kchilleri@darwin-fe3 BEE]$ git status
On branch issue757/fix_reset_error
Your branch is up to date with 'origin/issue757/fix_reset_error'.
(base) [kchilleri@darwin-fe3 BEE]$ cd workdir
(base) [kchilleri@darwin-fe3 workdir]$ cp /vast/home/kchilleri/BEE/examples/cat-grep-tar/lorem.txt .
(base) [kchilleri@darwin-fe3 workdir]$ poetry shell
Spawning shell within /vast/home/kchilleri/.cache/pypoetry/virtualenvs/hpc-beeflow-PIafEbRq-py3.9
. /vast/home/kchilleri/.cache/pypoetry/virtualenvs/hpc-beeflow-PIafEbRq-py3.9/bin/activate
(base) [kchilleri@darwin-fe3 workdir]$ . /vast/home/kchilleri/.cache/pypoetry/virtualenvs/hpc-beeflow-PIafEbRq-py3.9/bin/activate
(hpc-beeflow-py3.9) (base) [kchilleri@darwin-fe3 workdir]$ beeflow core start
Checking dependencies...

Found Charliecloud 0.37
Starting beeflow...
Run `beeflow core status` for more information.
(hpc-beeflow-py3.9) (base) [kchilleri@darwin-fe3 workdir]$ beeflow core status
beeflow components:
redis ... RUNNING
scheduler ... RUNNING
celery ... RUNNING
slurmrestd ... RUNNING
wf_manager ... RUNNING
task_manager ... RUNNING
(hpc-beeflow-py3.9) (base) [kchilleri@darwin-fe3 workdir]$ beeflow list
There are currently no workflows.
(hpc-beeflow-py3.9) (base) [kchilleri@darwin-fe3 workdir]$ beeflow package /vast/home/kchilleri/BEE/examples/cat-grep-tar .
Package cat-grep-tar.tgz created successfully
(hpc-beeflow-py3.9) (base) [kchilleri@darwin-fe3 workdir]$ beeflow submit wf1 ./cat-grep-tar.tgz  workflow.cwl input.yml /vast/home/kchilleri/BEE/workdir
Package cat-grep-tar.tgz unpackaged successfully
Workflow submitted! Your workflow id is 67122d.
(hpc-beeflow-py3.9) (base) [kchilleri@darwin-fe3 workdir]$ beeflow list
Name	ID	Status
wf1	67122d	Initializing
(hpc-beeflow-py3.9) (base) [kchilleri@darwin-fe3 workdir]$ beeflow query 67122d
Initializing
(hpc-beeflow-py3.9) (base) [kchilleri@darwin-fe3 workdir]$ beeflow list
Name	ID	Status
wf1	67122d	Initializing
(hpc-beeflow-py3.9) (base) [kchilleri@darwin-fe3 workdir]$ beeflow core reset
There are 'Initializing' workflows. Reset may fail. Check 'beeflow list'i
A reset will remove this directory: /vast/home/kchilleri/.beeflow

Are you sure you want to reset?

Please ensure all workflows are complete before running a reset
Check the status of workflows by running 'beeflow list'

A reset will shutdown beeflow and its components.

A reset will delete the bee_workdir directory which results in:
Removing the archive of workflows executed.
Removing the archive of workflow containers.
Reset all databases associated with the beeflow app.
Removing all beeflow logs.

Beeflow configuration files from bee_cfg will remain.

Respond with yes(y)/no(n):  y
Beeflow has been shutdown.
Waiting for components to cleanly stop.
Unable to remove /vast/home/kchilleri/.beeflow.
 [Errno 39] Directory not empty: 'x86_64-linux-gnu'
(hpc-beeflow-py3.9) (base) [kchilleri@darwin-fe3 workdir]$ beeflow core status
Cannot connect to the beeflow daemon, is it running? Check the log at "/vast/home/kchilleri/.beeflow/logs/beeflow.log".
(hpc-beeflow-py3.9) (base) [kchilleri@darwin-fe3 workdir]$ beeflow core start
Checking dependencies...

Found Charliecloud 0.37
Starting beeflow...
Run `beeflow core status` for more information.
(hpc-beeflow-py3.9) (base) [kchilleri@darwin-fe3 workdir]$ beeflow core status
beeflow components:
redis ... RUNNING
scheduler ... RUNNING
celery ... RUNNING
slurmrestd ... RUNNING
wf_manager ... RUNNING
task_manager ... RUNNING
(hpc-beeflow-py3.9) (base) [kchilleri@darwin-fe3 workdir]$ beeflow list
Name	ID	Status
wf1	67122d	Initializing
(hpc-beeflow-py3.9) (base) [kchilleri@darwin-fe3 workdir]$ beeflow cancel 67122d
Workflow is Initializing cannot cancel.

 - Clean up code for archive is request
 - Make unique time stamped backup for archive
 - Use variables for large text for query choices and warnings
@pagrubel pagrubel removed the WIP Work in progress label Mar 18, 2024
@pagrubel
Copy link
Collaborator Author

I am going to close this pull request and open another, apparently I still have some rebasing problems with it. I will add some information about how I handle Running and Initializing workflows as well as other active workflows.

@pagrubel pagrubel closed this Mar 19, 2024
@pagrubel pagrubel deleted the issue757/fix_reset_error_revised branch March 20, 2024 21:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants