Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds possibility to make drop database #147

Merged
merged 25 commits into from
Oct 11, 2024
Merged

Adds possibility to make drop database #147

merged 25 commits into from
Oct 11, 2024

Conversation

Lucpen
Copy link
Collaborator

@Lucpen Lucpen commented Jul 30, 2024

PR checklist

  • Adds possibility to make drop database
  • If you've fixed a bug or added code that should be tested, add tests!
  • If you've added a new tool - have you followed the pipeline conventions in the contribution docs
  • Make sure your code lints (nf-core lint).
  • Ensure the test suite passes (nextflow run . -profile test,docker --outdir <OUTDIR>).
  • Check for unexpected warnings in debug mode (nextflow run . -profile debug,test,docker --outdir <OUTDIR>).
  • Usage Documentation in docs/usage.md is updated.
  • Output Documentation in docs/output.md is updated.
  • CHANGELOG.md is updated.
  • README.md is updated (including new tool citations and authors/contributors).

Copy link

github-actions bot commented Jul 30, 2024

nf-core lint overall result: Passed ✅ ⚠️

Posted for pipeline commit 28e537f

+| ✅ 167 tests passed       |+
#| ❔ 317 tests were ignored |#
!| ❗   9 tests had warnings |!

❗ Test warnings:

  • files_exist - File not found: .github/workflows/awstest.yml
  • files_exist - File not found: .github/workflows/awsfulltest.yml
  • nextflow_config - Config manifest.version should end in dev: 2.2.1
  • pipeline_todos - TODO string in methods_description_template.yml: #Update the HTML below to your preferred methods description, e.g. add publication citation for this pipeline
  • pipeline_todos - TODO string in test_full.config: Specify the paths to your full test data ( on nf-core/test-datasets or directly in repositories, e.g. SRA)
  • pipeline_todos - TODO string in test_full.config: Give any required params for the test so that command line flags are not needed
  • pipeline_todos - TODO string in base.config: Check the defaults for all processes
  • pipeline_todos - TODO string in base.config: Customise requirements for specific processes.
  • pipeline_todos - TODO string in ci.yml: You can customise CI pipeline run tests as required

❔ Tests ignored:

✅ Tests passed:

Run details

  • nf-core/tools version 2.14.1
  • Run at 2024-09-23 08:48:35

@Jakob37
Copy link
Contributor

Jakob37 commented Aug 19, 2024

I'll give this a go now! Will let you know how it goes.

@Jakob37
Copy link
Contributor

Jakob37 commented Aug 20, 2024

A first question. Should it be possible to generate the db without assigning the three DROP annotation parameters? Seems to me that it skipped DROP entirely when running without them.

@Lucpen
Copy link
Collaborator Author

Lucpen commented Aug 20, 2024

It should work, those options are now optional :)

@Jakob37
Copy link
Contributor

Jakob37 commented Aug 22, 2024

Looks like it is passing the DROP config building step now, nice! We will see whether it passes the AE and AS steps as well, will let you know.

@Lucpen
Copy link
Collaborator Author

Lucpen commented Aug 22, 2024

Great 🤞

@Jakob37
Copy link
Contributor

Jakob37 commented Aug 22, 2024

Some updates. It initially crashed. I recognize the error from my previous testing (gagneurlab/drop#568).

When debugging before, the error came from the columns with NA. These are translated to nan in the DROP R scripts, and crashing in different places.

Also, I think it needs the COUNT_MODE and COUNT_OVERLAPS to run the counting step.

After doing the following updates, it is running for me:

  • Removing GENE_COUNTS_FILE from the output (i.e. the column with NA values)
  • Setting COUNT_MODE to IntersectionStrict
  • Setting COUNT_OVERLAPS to True

After these changes it has been running fine for 15+ minutes. We'll see how it goes.

@Lucpen
Copy link
Collaborator Author

Lucpen commented Aug 22, 2024

Nice! Did you just remove the GENE_COUNTS_FILE column or left it empty?

@Jakob37
Copy link
Contributor

Jakob37 commented Aug 22, 2024

Nice! Did you just remove the GENE_COUNTS_FILE column or left it empty?

I totally removed it. Haven't tested what happens if leaving it empty (i.e. without the 'NA' strings).

@Jakob37
Copy link
Contributor

Jakob37 commented Aug 22, 2024

The aberrant expression run was successful! I now see the geneCounts.tsv.gz in the processed_results/exported_counts dir.

I'll try the splicing module next.

Caching did not seem to work for the AE process though - it restarted when I did -resume. Haven't investigated why though, disabled it temporarily to try the splicing now.

@Lucpen
Copy link
Collaborator Author

Lucpen commented Aug 22, 2024

Happy to hear that the AE module worked 😄
The caching does not work because the order in which the samples are provided to the annotation module varies, I will try to think of a way to prevent that form happening.

@Jakob37
Copy link
Contributor

Jakob37 commented Aug 23, 2024

For the splicing, I am running into the same DROP error as I received previously. It seems there is a bug triggered when running with no external counts. Seems to be a silly issue, due to symlink and create a folder in the same location.

See issue: gagneurlab/drop#558

Have you managed to get around this somehow? 🤔

@Lucpen
Copy link
Collaborator Author

Lucpen commented Aug 23, 2024

I have run without external counts but have not encountered this issue 🤔
Probably a stupid question, but does it happen consistently if you restart the snakemake run?

@Jakob37
Copy link
Contributor

Jakob37 commented Aug 23, 2024

I have run without external counts but have not encountered this issue 🤔 Probably a stupid question, but does it happen consistently if you restart the snakemake run?

Hmm, strange. Was it a different version of DROP?

Yes, I still get it on reruns!

@Lucpen
Copy link
Collaborator Author

Lucpen commented Aug 23, 2024

No, same version

@Jakob37
Copy link
Contributor

Jakob37 commented Aug 28, 2024

I think I figured out the issue (see notes in gagneurlab/drop#558).

Seems the splicing workflow cannot handle a relative "root" path as it will yield an invalid softlink in one step.

I am trying now a run where I have asigned the path in drop_config.py as such:

config_copy["root"] = str(Path.cwd() / "output")

Before it was:

config_copy["root"] = "output"

@Lucpen
Copy link
Collaborator Author

Lucpen commented Aug 28, 2024

Great! Let me know how it goes 😄

@Jakob37
Copy link
Contributor

Jakob37 commented Aug 29, 2024

Now it passed that step! Still crashing though :(

In the step Error in rule AberrantSplicing_pipeline_FRASER_08_extract_results_FraseR_R:

The debugging continues ...

conf/base.config Outdated Show resolved Hide resolved
--padjcutoff ${drop_padjcutoff_ae} \\
$zscorecutoff \\
--output config.yaml

snakemake aberrantExpression --cores ${task.cpus} --rerun-triggers mtime $args

if [[ !skip_export_counts_drop ]]; then
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this will work as intended.

I am testing running it as such now:

if [[ "${skip_export_counts_drop}" == "false" ]]; then

Copy link
Collaborator Author

@Lucpen Lucpen Oct 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have just finished running it and it did work for us. Let me know how it goes for you. However, there was a mistake in the output side and it only copied the file for the counts instead of the created folder. I have just corrected that

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm. Have you tried running with the setting turned both on and off?

This is bash here, i.e. you would need to prefix the variable with $ for the variable to be recognized.

Similarly to how you are doing below:

--genome_assembly $genome_assembly \\

Or am I missing something?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to find the old error messages, but did only find my crashes resolved by mkdir -p. Seems it shouldn't work, but not sure 🤔 I'll let you know if I run into this again, that time with actual error messages.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will give it another try and get back to you

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right, the folder appears even when the skip = true. I have tested your suggestion and it works, so I will update the code. Thanks for noticing :D

Copy link
Contributor

@fellen31 fellen31 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have to leave the python scripts for someone else 😄

CHANGELOG.md Show resolved Hide resolved
bin/drop_sample_annot.py Outdated Show resolved Hide resolved
bin/drop_sample_annot.py Outdated Show resolved Hide resolved
docs/usage.md Outdated Show resolved Hide resolved
docs/usage.md Outdated Show resolved Hide resolved
modules/local/drop_config_runAE.nf Show resolved Hide resolved
nextflow.config Outdated Show resolved Hide resolved
workflows/tomte.nf Outdated Show resolved Hide resolved
bin/drop_sample_annot.py Outdated Show resolved Hide resolved
bin/drop_sample_annot.py Outdated Show resolved Hide resolved
workflows/tomte.nf Outdated Show resolved Hide resolved
Lucpen and others added 3 commits October 10, 2024 09:57
Co-authored-by: Felix Lenner <52530259+fellen31@users.noreply.github.com>
@Lucpen Lucpen requested a review from fellen31 October 10, 2024 09:53
@Lucpen Lucpen marked this pull request as ready for review October 10, 2024 11:34
@Lucpen Lucpen requested a review from a team as a code owner October 10, 2024 11:34
@Lucpen Lucpen added the Ready for review Ready for review label Oct 10, 2024
Copy link
Contributor

@fellen31 fellen31 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment above about the python still stands, rest looks good to me! 👍

Copy link
Contributor

@rannick rannick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, I think the python scripts are good for now. In further development, I would suggest:

  • using type hints systematically
  • cover all functions with docstrings
  • split into smaller functions

bin/drop_sample_annot.py Outdated Show resolved Hide resolved
df_samples["COUNT_OVERLAPS"] = True
df_samples.to_csv(out_file, index=False, sep="\t")
else:
df_reference: DataFrame = read_csv(ref_annot, sep="\t")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here you are type hinting, it is nice and you could try using it more

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I will do so on one of the next PRs

Co-authored-by: Annick Renevey <47788523+rannick@users.noreply.github.com>
@Lucpen Lucpen merged commit 9bc110d into dev Oct 11, 2024
5 of 6 checks passed
@Lucpen Lucpen deleted the drop_db branch October 11, 2024 10:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Ready for review Ready for review
Projects
None yet
4 participants