-
Notifications
You must be signed in to change notification settings - Fork 35
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support or warnings (and tests) for edge cases (extension case, single files) #431
Comments
Hi! Sounds like an erroneous sample sheet. How does this look like?
Nicholai Hensley ***@***.***> schrieb am Fr., 18. Okt. 2024,
11:42:
… Description of the bug
Hi all, I keep encountering the following error on diverse datasets that I
am trying to use with quantms. I have tried it multiple times on the same
HPC cluster but with different downloads from nf-core and/or bigbio,
different memory allocations on the cluster, and different datasets
including data files that I know should work because I have been able to
run them using a different proteomics pipeline on the same cluster. I was
able to run the example file just fine (test_lfq) but as soon as I have my
own data, it fails with this same error.
ERROR ~ Error executing process > 'NFCORE_QUANTMS:QUANTMS:LFQ:PROTEOMICSLFQ (test_ants_exp_setup.sdrf_openms_design)'
Caused by:
Process `NFCORE_QUANTMS:QUANTMS:LFQ:PROTEOMICSLFQ (test_ants_exp_setup.sdrf_openms_design)` terminated with an error exit status (6)
Command executed:
ProteomicsLFQ \
-threads 12 \
-in PD225_Block1-4.mzML \
-ids PD225_Block1-4_consensus_fdr_filter.idXML \
-design test_ants_exp_setup.sdrf_openms_design.tsv \
-fasta GCF_000001405.40_protein_decoy.fasta \
-protein_inference bayesian \
-quantification_method feature_intensity \
-targeted_only false \
-feature_with_id_min_score 0 \
-feature_without_id_min_score 0 \
-mass_recalibration false \
-Seeding:intThreshold 100 \
-protein_quantification unique_peptides \
-alignment_order star \
\
-psmFDR 0.1 \
-proteinFDR 0.05 \
-picked_proteinFDR false \
-out_cxml test_ants_exp_setup.sdrf_openms_design_openms.consensusXML \
-out test_ants_exp_setup.sdrf_openms_design_openms.mzTab \
-out_msstats test_ants_exp_setup.sdrf_openms_design_msstats_in.csv \
\
-debug 1000 \
2>&1 | tee proteomicslfq.log
cat <<-END_VERSIONS > versions.yml
"NFCORE_QUANTMS:QUANTMS:LFQ:PROTEOMICSLFQ":
ProteomicsLFQ: $(ProteomicsLFQ 2>&1 | grep -E '^Version(.*)' | sed 's/Version: //g' | cut -d ' ' -f 1)
END_VERSIONS
Command exit status:
6
Command output:
The OpenMS team is collecting usage statistics for quality control and funding purposes.
We will never give out your personal data, but you may disable this functionality by
setting the environmental variable OPENMS_DISABLE_UPDATE_CHECK to ON.
Connecting to REST server failed. Skipping update check.
Error: Host unreachable
TOPPBase.cpp(1588): Value of string option 'no_progress': 0
TOPPBase.cpp(1588): Value of string option 'in': PD225_Block1-4.mzML
TOPPBase.cpp(1588): Checking input file 'PD225_Block1-4.mzML'
TOPPBase.cpp(1588): Value of string option 'out': test_ants_exp_setup.sdrf_openms_design_openms.mzTab
TOPPBase.cpp(1588): Checking output file 'test_ants_exp_setup.sdrf_openms_design_openms.mzTab'
TOPPBase.cpp(1588): Value of string option 'out_msstats': test_ants_exp_setup.sdrf_openms_design_msstats_in.csv
TOPPBase.cpp(1588): Checking output file 'test_ants_exp_setup.sdrf_openms_design_msstats_in.csv'
TOPPBase.cpp(1588): Value of string option 'out_triqler':
TOPPBase.cpp(1588): Value of string option 'ids': PD225_Block1-4_consensus_fdr_filter.idXML
TOPPBase.cpp(1588): Checking input file 'PD225_Block1-4_consensus_fdr_filter.idXML'
TOPPBase.cpp(1588): Value of string option 'design': test_ants_exp_setup.sdrf_openms_design.tsv
TOPPBase.cpp(1588): Checking input file 'test_ants_exp_setup.sdrf_openms_design.tsv'
TOPPBase.cpp(1588): Value of string option 'fasta': GCF_000001405.40_protein_decoy.fasta
TOPPBase.cpp(1588): Checking input file 'GCF_000001405.40_protein_decoy.fasta'
TOPPBase.cpp(1588): Value of string option 'quantification_method': feature_intensity
Invalid parameter: Spectra file basenames provided as input need to match a subset the experimental design file basenames.
TOPPBase.cpp(1588): Error occurred in line 1629 of file /opt/conda/conda-bld/openms-meta_1716538752609/work/src/topp/ProteomicsLFQ.cpp (in function: virtual OpenMS::TOPPBase::ExitCodes ProteomicsLFQ::main_(int, const char**)) !
Command error:
The OpenMS team is collecting usage statistics for quality control and funding purposes.
We will never give out your personal data, but you may disable this functionality by
setting the environmental variable OPENMS_DISABLE_UPDATE_CHECK to ON.
Connecting to REST server failed. Skipping update check.
Error: Host unreachable
TOPPBase.cpp(1588): Value of string option 'no_progress': 0
TOPPBase.cpp(1588): Value of string option 'in': PD225_Block1-4.mzML
TOPPBase.cpp(1588): Checking input file 'PD225_Block1-4.mzML'
TOPPBase.cpp(1588): Value of string option 'out': test_ants_exp_setup.sdrf_openms_design_openms.mzTab
TOPPBase.cpp(1588): Checking output file 'test_ants_exp_setup.sdrf_openms_design_openms.mzTab'
TOPPBase.cpp(1588): Value of string option 'out_msstats': test_ants_exp_setup.sdrf_openms_design_msstats_in.csv
TOPPBase.cpp(1588): Checking output file 'test_ants_exp_setup.sdrf_openms_design_msstats_in.csv'
TOPPBase.cpp(1588): Value of string option 'out_triqler':
TOPPBase.cpp(1588): Value of string option 'ids': PD225_Block1-4_consensus_fdr_filter.idXML
TOPPBase.cpp(1588): Checking input file 'PD225_Block1-4_consensus_fdr_filter.idXML'
TOPPBase.cpp(1588): Value of string option 'design': test_ants_exp_setup.sdrf_openms_design.tsv
TOPPBase.cpp(1588): Checking input file 'test_ants_exp_setup.sdrf_openms_design.tsv'
TOPPBase.cpp(1588): Value of string option 'fasta': GCF_000001405.40_protein_decoy.fasta
TOPPBase.cpp(1588): Checking input file 'GCF_000001405.40_protein_decoy.fasta'
TOPPBase.cpp(1588): Value of string option 'quantification_method': feature_intensity
Invalid parameter: Spectra file basenames provided as input need to match a subset the experimental design file basenames.
TOPPBase.cpp(1588): Error occurred in line 1629 of file /opt/conda/conda-bld/openms-meta_1716538752609/work/src/topp/ProteomicsLFQ.cpp (in function: virtual OpenMS::TOPPBase::ExitCodes ProteomicsLFQ::main_(int, const char**)) !
Command used and terminal output
The command I use is:
nextflow run main.nf -c /home/nicholaih/29apr2024_upperlip_expression/nf_quantms/quantms/test_ants.config \--input /home/nicholaih/29apr2024_upperlip_expression/dn_maxquant/test_run/test_ants_exp_setup.sdrf.tsv \--outdir /home/nicholaih/29apr2024_upperlip_expression/quantms_test_ants \--email ***@***.*** \--multiqc_title test_ants \--database /home/nicholaih/29apr2024_upperlip_expression/dn_maxquant/test_run/GCF_000001405.40_protein.fasta \-profile mamba
The overall output produces this file:
task_id hash native_id name status exit submit duration realtime %cpu peak_rss peak_vmem rchar wchar
1 66/5d8458 64595 NFCORE_QUANTMS:QUANTMS:INPUT_CHECK:SAMPLESHEET_CHECK (test_ants_exp_setup.sdrf.tsv) COMPLETED 0 2024-10-17 07:35:12.926 2m 2s 1m 51s 33.1% 500.5 MB 7.3 GB 72.2 MB 36.1 KB
2 c8/ad3d93 2552 NFCORE_QUANTMS:QUANTMS:CREATE_INPUT_CHANNEL:SDRFPARSING (test_ants_exp_setup.sdrf.tsv) COMPLETED 0 2024-10-17 07:37:15.102 19.3s 13.7s 141.5% 181 MB 6.5 GB 28.2 MB 3.6 KB
4 74/106a48 4044 NFCORE_QUANTMS:QUANTMS:FILE_PREPARATION:THERMORAWFILEPARSER (PD225_Block1-4) FAILED 1 2024-10-17 07:37:34.734 5.8s 5.7s - - - - -
3 2c/a0a6a5 4063 NFCORE_QUANTMS:QUANTMS:DECOYDATABASE (1) COMPLETED 0 2024-10-17 07:37:35.071 2m 3s 1m 44s 101.5% 135.7 MB 5 GB 190.9 MB 187.5 MB
5 4b/240662 4138 NFCORE_QUANTMS:QUANTMS:FILE_PREPARATION:THERMORAWFILEPARSER (PD225_Block1-4) COMPLETED 0 2024-10-17 07:37:40.860 7m 45s 7m 39s 166.1% 1.4 GB 4.2 GB 574.2 MB 382.1 MB
6 36/6ea94d 19045 NFCORE_QUANTMS:QUANTMS:FILE_PREPARATION:MZMLSTATISTICS (PD225_Block1-4) COMPLETED 0 2024-10-17 07:45:27.462 2m 14s 1m 49s 254.7% 813.6 MB 10.1 GB 456.6 MB 1.6 MB
8 32/a13204 19065 NFCORE_QUANTMS:QUANTMS:LFQ:ID:DATABASESEARCHENGINES:SEARCHENGINESAGE ([PD225_Block1-4]) COMPLETED 0 2024-10-17 07:45:27.879 13m 50s 13m 17s 343.8% 22.1 GB 29.2 GB 1.4 GB 159.5 MB
9 4d/632ae5 52473 NFCORE_QUANTMS:QUANTMS:LFQ:ID:PSMRESCORING:IDPEP (PD225_Block1-4) COMPLETED 0 2024-10-17 07:59:18.599 42.2s 28.3s 117.2% 164.6 MB 5 GB 68.6 MB 68.1 MB
7 f5/8a760e 19069 NFCORE_QUANTMS:QUANTMS:LFQ:ID:DATABASESEARCHENGINES:SEARCHENGINECOMET (PD225_Block1-4) COMPLETED 0 2024-10-17 07:45:27.949 1h 9m 50s 1h 9m 17s 773.2% 4.4 GB 9.9 GB 5.7 GB 153.9 MB
10 77/511ccd 14424 NFCORE_QUANTMS:QUANTMS:LFQ:ID:PSMRESCORING:IDPEP (PD225_Block1-4) COMPLETED 0 2024-10-17 08:55:17.970 52.7s 39.2s 117.4% 179.5 MB 4.6 GB 98.2 MB 98.2 MB
11 68/842048 16291 NFCORE_QUANTMS:QUANTMS:LFQ:ID:CONSENSUSID (PD225_Block1-4) COMPLETED 0 2024-10-17 08:56:11.244 1m 15s 1m 2s 107.3% 370.7 MB 5.2 GB 169.9 MB 69.9 MB
12 b6/5b4eeb 18638 NFCORE_QUANTMS:QUANTMS:LFQ:ID:PSMFDRCONTROL:FDRCONSENSUSID (PD225_Block1-4) COMPLETED 0 2024-10-17 08:57:26.622 43.4s 30.3s 119.4% 209.3 MB 5 GB 73.5 MB 50.1 MB
13 de/879e39 20397 NFCORE_QUANTMS:QUANTMS:LFQ:ID:PSMFDRCONTROL:IDFILTER (PD225_Block1-4) COMPLETED 0 2024-10-17 08:58:10.278 29.6s 16.6s 136.8% 164.8 MB 5 GB 57.1 MB 22.5 KB
14 0c/13b61a 21850 NFCORE_QUANTMS:QUANTMS:LFQ:PROTEOMICSLFQ (test_ants_exp_setup.sdrf_openms_design) FAILED 6 2024-10-17 08:58:40.676 14.9s 14.9s - - - - -
### Relevant files
[quantms.zip](https://github.com/user-attachments/files/17432158/quantms.zip)
I'm attaching the output log, the config file, and the nextflow log in a zip file. Any help in diagnosing what is wrong would be really great as I'm keen to getting this program to work.
### System information
I'm trying to use a Linux OS, HPC cluster to run quantms using a SLURM job scheduler and the node has the following configuration (1 TB, 32 cores):
CPUAlloc=24 CPUTot=32 CPULoad=3.08
AvailableFeatures=(null)
ActiveFeatures=(null)
Gres=(null)
NodeAddr=node91 NodeHostName=node91 Version=20.11.2
OS=Linux 3.10.0-1160.11.1.el7.x86_64 #1
<#1> SMP Fri Dec 18 16:34:56 UTC
2020
RealMemory=1 AllocMem=0 FreeMem=964132 Sockets=4 Boards=1
State=MIXED ThreadsPerCore=1 TmpDisk=0 Weight=1 Owner=N/A MCS_label=N/A
Partitions=largemem
BootTime=2023-02-01T08:46:44 SlurmdStartTime=2023-02-01T08:50:53
CfgTRES=cpu=32,mem=1M,billing=32
AllocTRES=cpu=24
CapWatts=n/a
CurrentWatts=0 AveWatts=0
ExtSensorsJoules=n/s ExtSensorsWatts=0 ExtSensorsTemp=n/s
Comment=(null)
—
Reply to this email directly, view it on GitHub
<#431>, or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AB52F3VDWNSL3OAHSGGTPDLZ4DJZLAVCNFSM6AAAAABQFR75DCVHI2DSMVQWIX3LMV43ASLTON2WKOZSGU4TOMBRGQ3DKMA>
.
You are receiving this because you are subscribed to this thread.Message
ID: ***@***.***>
|
@NikoHensley can you share your SDRF? |
Hi all! I've attached my SDRF (as a text, although I run it as a tsv) and the parsing log from the quantms output. test_ants_exp_setup.sdrf_parsing.log |
I think both using capitalized .RAW and using single files currently have undefined/untested behaviour. |
So to clarify, I should change all the file names to only end as ".raw" ? And I cannot do analyses that only have one technical or biological replicate with your pipeline? |
Yes, to be on the safe side, you should change the extension to .raw. This is probably the main issue. |
Those should all require only minor changes to handle it in the pipeline. It is just not something that we ever needed. |
Hi, I've made the changes you've suggested and tried re-running the program but there are still errors at the ProteomicsLFQ step. It does get further than before, so the changes in file names from ".raw" helped. I tried using just 1 input file, that that gave an Exit 8 status code. So now I'm using 3 sample files appropriately names, and that is producing an Exit 139 status code. It seems like it cannot find peaks in these files? Or at least match them between samples? Although I have run this just fine using MaxQuant for these 3 same files. I'm attaching the output and error logs. Maybe I can discern something specific, but any help is appreciated! Looks like there's progress to getting the pipeline to work. |
In a follow up to this, I've tried increasing the max_memory for the whole run as well as the memory available to the specific ProteomicsLFQ process (up to 350 G) and that does not help the problem. In the output file, with Exit Code error 6, ProteomicsLFQ terminates with: Invalid parameter: Spectra file basenames provided as input need to match a subset the experimental design file basenames. Which has been the problem from before. |
The first error that you had is due to a bug in a new version of thermorawfileparser that leads to infinite MZ values and therefore infinite memory allocation in ProteomicsLFQ (see the linked pull request). You will probably need to try the dev version until there is a Bugfix release. If you already cloned dev, you will need to pull the latest changes. The second error is strange if it went beyond the experimental design validation stage before that (with the same sdrf file). |
My SDRF is here (as a .log file but only for uploading): I will redownload quantms and also use the dev version in my next run to see how it performs, as you suggest. |
Sorry for this @NikoHensley, we are working with TRFP (compomics/ThermoRawFileParser#187) to solve this issue. |
Hi, can you also use .raw in the sdrf, please? I think this is actually more important than renaming the file. That is where we do the case sensitive replacement. |
Sorry, I may be misunderstanding the goal here. You want me to run it with ".raw" instead of ".RAW" even though you suggested that would make it fail? |
No, I think .RAW will fail, while .raw should work. The culprits are here: |
@NikoHensley if you want you way and we do a change in the code to tackle this issue, better that needs to rename the files from my point of view, we can do a PR to dev. |
@ypriverol Ideally we should do it anyway. I saw .RAW being used sometimes. We should also check if those raw files are actually Thermo Raw files. I think other vendors might also use .raw and TRFP will fail on them. We should throw a nice error in that case. |
I reproduced the error. The problem code is here quantms/modules/local/sdrfparsing/main.nf Line 33 in b7c9b6a
|
Can you check that the branching for file types etc works as expected too?
|
Yes. It works quantms/subworkflows/local/file_preparation.nf Lines 111 to 113 in b7c9b6a
sed ignores case use /I
|
Awesome I totally overlooked those details |
Ok, so I've tried running the bigbio/quantms (using the mamba profile), and with ".raw" endings. The program got up to the following before throwing an error (which is one I have not seen before). Ignore the FileParser failures, as the mspectra files are large and it retries them with more memory (and succeeds): [29/e6727e] NFC…t_ants_exp_setup.sdrf.tsv) | 1 of 1 ✔ Caused by: Command executed: IDFilter cat <<-END_VERSIONS > versions.yml Command exit status: |
Exit status 6 is usually "wrong parameter" in OpenMS tools. Do you have the actual log of this step? I.e. what was printed to stdout or equally what is in PD225_Block1-4_comet_feat_perc_pep_idfilter.log |
The first few lines of the requested file read as such: Unknown option(s) '[-score:psm]' given. Aborting! IDFilter -- Filters results from protein or peptide identification engines based on different criteria.
I am not sure if it's because I only had one search engine selected (comet), but I am re-running the program with comet and sage to see if that helps. I am also having it print more of the debugging info for the ID steps. |
I was suspecting this. This parameter was changed between openms 3.1 and 3.2. On dev we are definitely using the compatible one:
Not sure how 3.1 ended up on your workers. Are they sharing some condo env? Do you have a preexisting env? In general we are very much recommending container profiles. |
I found the bug. Thanks, @NikoHensley for reporting it. I will do a PR about it. |
@jpfeuffer It is interesting, I ran the version 3.2.0 of IDFilter and that parameter is there:
|
I know. But @NikoHensley is somehow using 3.1.0, which is not compatible to the current version of quantms. |
I updated my conda, nextflow, and re-added bioconda to my environment to see if they were grabbing the wrong instance of openms when the pipeline is initiated, but that did not fix the fact that when i try to run quantms using nextflow, it automatically tries to grab openms 3.1 ? Here's the command and the important part of the output: nextflow run bigbio/quantms -r dev -profile test,mamba And then it says (but always has) As well as the log: |
This is strange. Because I have done a full search in the code for 3.1.0 and found nothing. Thanks a lot for searching with us for this bug. I will try to reproduce it here. |
I'm also facing this same |
This looks like a different error @fiuzatayna |
I tried remaking the conda environment for nextflow and updating packages to see if it was pulling an old version of OpenMS from somewhere else, and that did not work either. I got the same error of it trying to use openms=3.1.0 |
Im trying to test also locally. to reproduce the error. Can you see in the |
I just updated my conda/mamba hoping that would also solve the issue and have now ruined my conda install, as nextflow now throws a different error at that step (127 instead of 6). However, my nextflow.log a few posts back details where it found the openms=3.1.0, which I've copied below. Oct-30 08:31:27.045 [Actor Thread 10] DEBUG nextflow.conda.CondaCache - mamba found local env for environment=bioconda::openms-thirdparty=3.1.0; path=/home/nicholaih/29apr2024_upperlip_expression/nf_quantms/quantms/work/conda/env-9a0636a369edb4e4410fb1491ffbb8cd I am not sure where this local environment came? I have a local, up-to-date version of quantms as well as trying to use this dev version with a conda/mamba profile |
Did you delete your conda work folder? I had multiple issues with conda (the main reason why I use now is mainly singularity) It may help to delete the work folder. If you see, it is using the CondaCache with mamba; probably, it doesn't update or use the new version because of that? |
This is most likely the culprit. You need to use |
I deleted my work directory and old install of quantms entirely to see if that fixed the issue of using openms=3.1.0. It did not. And I've tried using -latest and it throws this error below: (env_nf) bash-4.2$ nextflow run bigbio/quantms -r dev -profile test,mamba -latest N E X T F L O W ~ version 24.10.0 Pulling bigbio/quantms ... As opposed to when I use the normal command and it recapitulates the error of defaulting to using openms=3.1.0 (env_nf) bash-4.2$ nextflow run bigbio/quantms -r dev -profile test,mamba N E X T F L O W ~ version 24.10.0 NOTE: Your local project version looks outdated - a different revision is available in the remote repository [c70bc99] Maybe I'm just not familiar with nextflow enough or am doing something wrong with how I'm calling different versions of quantms. I have to use the conda/mamba versions because docker/singularity do not play nicely on the HPC I am using, as I have no root access. I just cloned the quantms dev repository and will try using main.nf (local version) instead to see if that uses openms=3.2 instead. |
Did you maybe accidentally pull changes in the cached clone of quantms that nextflow manages for you? You should always either:
|
I usually just let nextflow pull the repo, using nextflow run bigbio/quantms. But I have tried every iteration to get it to run, including have a local version. Maybe that did confuse it. I will wipe it all and start fresh |
I have restored my conda, deleted the local version of quantms, and have gotten nextflow to run quantms properly given the following test: nextflow run -latest bigbio/quantms -r dev -profile test,mamba This has been quite a journey and I am happy that the output worked on the example data with the proper openms=3.2.0. However, I cannot get this to now work on my sample data, and it returns an error about not being able to parse the pulled config file:
despite having just run successfully on the example. I am trying to use this on an HPC with a slurm submission process. However, when I try to run it on a head node (with limited memory), the nextflow processes instead starts but then fails at the SDRF checking step, despite my SDRF being the same as previously reported with 3 samples. I will explore these issues more, as I believe it seems like user error, but it is confusing as to why it would fail in different ways during the HPC submission versus on a head node. Thanks for all your help thus far! |
The config error could have been a hiccup when connecting or downloading from GitHub. Regarding SLURM: have you created a configuration for your cluster already? The minimum you will need to do is to set process.executor = "slurm" (which you could do from the command line). Once you need more parameters you should use a config file. |
Description of the bug
Hi all, I keep encountering the following error on diverse datasets that I am trying to use with quantms. I have tried it multiple times on the same HPC cluster but with different downloads from nf-core and/or bigbio, different memory allocations on the cluster, and different datasets including data files that I know should work because I have been able to run them using a different proteomics pipeline on the same cluster. I was able to run the example file just fine (test_lfq) but as soon as I have my own data, it fails with this same error.
Command used and terminal output
The overall output produces this file:
CPUAlloc=24 CPUTot=32 CPULoad=3.08
AvailableFeatures=(null)
ActiveFeatures=(null)
Gres=(null)
NodeAddr=node91 NodeHostName=node91 Version=20.11.2
OS=Linux 3.10.0-1160.11.1.el7.x86_64 #1 SMP Fri Dec 18 16:34:56 UTC 2020
RealMemory=1 AllocMem=0 FreeMem=964132 Sockets=4 Boards=1
State=MIXED ThreadsPerCore=1 TmpDisk=0 Weight=1 Owner=N/A MCS_label=N/A
Partitions=largemem
BootTime=2023-02-01T08:46:44 SlurmdStartTime=2023-02-01T08:50:53
CfgTRES=cpu=32,mem=1M,billing=32
AllocTRES=cpu=24
CapWatts=n/a
CurrentWatts=0 AveWatts=0
ExtSensorsJoules=n/s ExtSensorsWatts=0 ExtSensorsTemp=n/s
Comment=(null)
The text was updated successfully, but these errors were encountered: