Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segmentation fault error in a subset of samples using LUMPY Express #129

Open
AAvalos82 opened this issue Jul 21, 2016 · 13 comments
Open

Segmentation fault error in a subset of samples using LUMPY Express #129

AAvalos82 opened this issue Jul 21, 2016 · 13 comments

Comments

@AAvalos82
Copy link

AAvalos82 commented Jul 21, 2016

Hi,

I have been trying to use lumpyexpress on a set of 90 haploid genomes from honey bee drones. These genomes were aligned with BWA-MEM and did include the -M flag. However, although 80 of these samples ran with no issues, 10 of these encountered a segmentation fault error that lead to a core dump. The samples in question do not associate by population and alignments were done in separate parallel batches, so they also do not correlate with possible processing pipeline errors.

The analysis does produce readable vcf files but these are corrupted with a break around the second set of scaffolds (Group10.*). The error was consistent across these 10 samples generating similar corrupted vcf files.

Any help resolving this issue would be greatly appreciated.

Example command input from lumpyexpress -v flag as follows:

Sourcing executables from /home/apps/lumpy-sv/lumpy-sv-0.2.13/scripts/lumpyexpress.config ...

Checking for required python modules (/home/apps/python/python-2.7.3/bin/python)...

create temporary directory

Warning: The index file is older than the data file: /home/a-m/aavalos/2015_12_hb_aggression_popgen/data/2016_5_11_realign_bam/W197_WHAIPI005553-18_EHB.realigned.bai
Calculating insert distributions...
Library read groups: W197_WHAIPI005553-18_EHB
Library read length: 100
Removed 1890 outliers with isize >= 586
done
0
Running LUMPY...

/home/apps/lumpy-sv/lumpy-sv-0.2.13/bin/lumpy -P
-t W197_WHAIPI005553-18_EHB.bam.vcf.g71iro19am1r/W197_WHAIPI005553-18_EHB.bam.vcf
-msw 4
-tt 0
-x /home/a-m/aavalos/2015_12_hb_aggression_popgen/data/ref/numt.bed
-pe bam_file:/home/a-m/aavalos/2015_12_hb_aggression_popgen/data/2016_7_5_discord_split/discordant_reads_bam/filtered_dis/W197_WHAIPI005553-18_EHB.filtered_disc.bam,histo_file:W197_WHAIPI005553-18_EHB.bam.vcf.g71iro19am1r/W197_WHAIPI005553-18_EHB.bam.vcf.sample1.lib1.x4.histo,mean:453.706583919,stdev:78.806904399,read_length:100,min_non_overlap:100,discordant_z:5,back_distance:10,weight:1,id:W197_WHAIPI005553-18_EHB,min_mapping_threshold:20,read_group:W197_WHAIPI005553-18_EHB
-sr bam_file:/home/a-m/aavalos/2015_12_hb_aggression_popgen/data/2016_7_5_discord_split/split_reads_bam/filtered_split/W197_WHAIPI005553-18_EHB.filtered_split.bam,back_distance:10,min_mapping_threshold:20,weight:1,id:W197_WHAIPI005553-18_EHB,min_clip:20
> W197_WHAIPI005553-18_EHB.bam.vcf
496 0
Group1.1 1000000
Group1.10 1000000
Group1.11 1000000
...
GroupUn993 1000000
GroupUn994 1000000
GroupUn995 1000000
GroupUn997 1000000
/home/apps/lumpy-sv/lumpy-sv-0.2.13/scripts/lumpyexpress: line 411: 79691 Segmentation fault (core dumped) $LUMPY $PROB_CURVE -t ${TEMP_DIR}/${OUTBASE} -msw $MIN_SAMPLE_WEIGHT -tt $TRIM_THRES $EXCLUDE_BED_FMT $LUMPY_DISC_STRING $LUMPY_SPL_STRING > $OUTPUT

@ryanlayer
Copy link
Collaborator

To dig into this we will have to extract the problematic region of the BAM
file. Please email me directly ryan dot layer at gmail to work out the
particulars.

On Thu, Jul 21, 2016 at 12:49 PM, AAvalos82 notifications@github.com
wrote:

Hi,

I have been trying to use lumpyexpress on a set of 90 haploid genomes from
honey bee drones. These genomes were aligned with BWA-MEM and did include
the -M flag. However, although 80 of these samples ran with no issues, 10
of these encountered a segmentation fault error that lead to a core dump.
The samples in question do not associate by population and alignments were
done in separate parallel batches, so they also do not correlate with
possible processing pipeline errors.

The analysis does produce readable *.vcf files but these are corrupted
with a break around the second set of scaffolds (Group10.
). The error
was consistent across these 10 samples generating similar corrupted vcf
files.

Any help resolving this issue would be greatly appreciated.

Example command input from lumpyexpress -v flag as follows:

Sourcing executables from
/home/apps/lumpy-sv/lumpy-sv-0.2.13/scripts/lumpyexpress.config ...

Checking for required python modules
(/home/apps/python/python-2.7.3/bin/python)...

create temporary directory

Warning: The index file is older than the data file:
/home/a-m/aavalos/2015_12_hb_aggression_popgen/data/2016_5_11_realign_bam/W197_WHAIPI005553-18_EHB.realigned.bai
Calculating insert distributions...
Library read groups: W197_WHAIPI005553-18_EHB
Library read length: 100
Removed 1890 outliers with isize >= 586
done
0
Running LUMPY...

/home/apps/lumpy-sv/lumpy-sv-0.2.13/bin/lumpy -P
-t
W197_WHAIPI005553-18_EHB.bam.vcf.g71iro19am1r/W197_WHAIPI005553-18_EHB.bam.vcf

-msw 4
-tt 0
-x /home/a-m/aavalos/2015_12_hb_aggression_popgen/data/ref/numt.bed
-pe
bam_file:/home/a-m/aavalos/2015_12_hb_aggression_popgen/data/2016_7_5_discord_split/discordant_reads_bam/filtered_dis/W197_WHAIPI005553-18_EHB.filtered_disc.bam,histo_file:W197_WHAIPI005553-18_EHB.bam.vcf.g71iro19am1r/W197_WHAIPI005553-18_EHB.bam.vcf.sample1.lib1.x4.histo,mean:453.706583919,stdev:78.806904399,read_length:100,min_non_overlap:100,discordant_z:5,back_distance:10,weight:1,id:W197_WHAIPI005553-18_EHB,min_mapping_threshold:20,read_group:W197_WHAIPI005553-18_EHB

-sr
bam_file:/home/a-m/aavalos/2015_12_hb_aggression_popgen/data/2016_7_5_discord_split/split_reads_bam/filtered_split/W197_WHAIPI005553-18_EHB.filtered_split.bam,back_distance:10,min_mapping_threshold:20,weight:1,id:W197_WHAIPI005553-18_EHB,min_clip:20
\

W197_WHAIPI005553-18_EHB.bam.vcf
496 0
Group1.1 1000000
Group1.10 1000000
Group1.11 1000000
...
GroupUn993 1000000
GroupUn994 1000000
GroupUn995 1000000
GroupUn997 1000000
/home/apps/lumpy-sv/lumpy-sv-0.2.13/scripts/lumpyexpress: line 411: 79691
Segmentation fault (core dumped) $LUMPY $PROB_CURVE -t
${TEMP_DIR}/${OUTBASE} -msw $MIN_SAMPLE_WEIGHT -tt $TRIM_THRES
$EXCLUDE_BED_FMT $LUMPY_DISC_STRING $LUMPY_SPL_STRING > $OUTPUT


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
#129, or mute the thread
https://github.com/notifications/unsubscribe-auth/AAlDURmyH771z31nMhMyPwrLTa4yoPVqks5qX79KgaJpZM4JSGjo
.

Ryan Layer

@amaiacc
Copy link

amaiacc commented May 9, 2017

Hi,

I got the same segmentation as #129 fault error leading to a core dump. It only happens on a subset of my samples (6/52, human whole genomes, aligned with BWA-MEM, -M).

The output of these samples contains readable vcf headers without any calls.

I would greatly appreciate any help solving this issue.

Thanks in advance,

amaia

lumpyexpress \
>     -B t0_397.reordered.bam \
>     -S t0_397/t0_397.splitters.bam \
>     -D t0_397/t0_397.discordants.bam \
>     -T ./tmp \
>     -v -k \
>     -o BILGIN_t0_397_k_SV_lumpy.vcf
Sourcing executables from /data/corpora/MPI_workspace/lag/shared_spaces/Resource_DB/lumpy-sv/bin/lumpyexpress.config ...

Checking for required python modules (/usr/local/apps/python-2.7.11/bin/python)...

    create temporary directory
Calculating insert distributions...
Library read groups: HKAIPI000472-70,HKAIPI000472-70.1,HKAIPI000472-70.2,HKAIPI000472-70.3
Library read length: 90
Removed 130 outliers with isize >= 557
Library read groups: HKAIPI000471-71
Library read length: 90
Removed 68 outliers with isize >= 558
done
0
0
Running LUMPY...

/data/corpora/MPI_workspace/lag/shared_spaces/Resource_DB/lumpy-sv/bin/lumpy  \
    -t ./tmp/BILGIN_t0_397_k_SV_lumpy.vcf \
    -msw 4 \
    -tt 0 \
     \
     \
     -pe bam_file:/data/corpora/sge2/lag/projects/lg-hand/working/Bordeaux/SV/lumpy/t0_397/t0_397.discordants.bam,histo_file:./tmp/BILGIN_t0_397_k_SV_lumpy.vcf.sample1.lib1.x4.histo,mean:457.712174187,stdev:52.7052111789,read_length:90,min_non_overlap:90,discordant_z:5,back_distance:10,weight:1,id:t0_397,min_mapping_threshold:20,read_group:HKAIPI000472-70,read_group:HKAIPI000472-70.1,read_group:HKAIPI000472-70.2,read_group:HKAIPI000472-70.3 -pe bam_file:/data/corpora/sge2/lag/projects/lg-hand/working/Bordeaux/SV/lumpy/t0_397/t0_397.discordants.bam,histo_file:./tmp/BILGIN_t0_397_k_SV_lumpy.vcf.sample1.lib2.x4.histo,mean:457.844237813,stdev:53.707274301,read_length:90,min_non_overlap:90,discordant_z:5,back_distance:10,weight:1,id:t0_397,min_mapping_threshold:20,read_group:HKAIPI000471-71 \
     -sr bam_file:/data/corpora/sge2/lag/projects/lg-hand/working/Bordeaux/SV/lumpy/t0_397/t0_397.splitters.bam,back_distance:10,min_mapping_threshold:20,weight:1,id:t0_397,min_clip:20 \
    > BILGIN_t0_397_k_SV_lumpy.vcf
474     0
469     0
chrM    1000000
/data/corpora/MPI_workspace/lag/shared_spaces/Resource_DB/lumpy-sv/bin/lumpyexpress: line 480: 56317 Segmentation fault      $LUMPY $PROB_CURVE -t ${TEMP_DIR}/${OUTBASE} -msw $MIN_SAMPLE_WEIGHT -tt $TRIM_THRES $LUMPY_DEPTH_STRING $EXCLUDE_BED_FMT $LUMPY_DISC_STRING $EXCLUDE_BED_FMT $LUMPY_SPL_STRING > $OUTPUT

@ryanlayer
Copy link
Collaborator

ryanlayer commented Jun 6, 2017 via email

@amaiacc
Copy link

amaiacc commented Jun 6, 2017 via email

@ryanlayer
Copy link
Collaborator

ryanlayer commented Jun 6, 2017 via email

@amaiacc
Copy link

amaiacc commented Jun 30, 2017 via email

@rbatorsky
Copy link

Hello,
I'm curious if you found the resolution for this problem? I'm getting a similar error running lumpy express on the synthetic tumor-normal pairs from the ICGC DREAM challenge.

Im running like this:
$LUMPY_EXPRESS
-B $TUMOR_PATH,$NORMAL_PATH
-S ${TUMOR}.splitters.sorted.bam,${NORMAL}.splitters.sorted.bam
-D ${TUMOR}.discordants.sorted.bam,${NORMAL}.discordants.sorted.bam
-o ${TUMOR}_vs_normal.vcf

Here is my output:

Checking for required python modules (/usr/bin/python)...
Calculating insert distributions...
Library read groups: C09DF.1,C09DF.2,D0EN0.4,D0EN0.7,D0EN0.8
Library read length: 101
Removed 1606 outliers with isize >= 525
done
1
Calculating insert distributions...
Library read groups: C09DF.1,C09DF.2,D0EN0.4,D0EN0.7,D0EN0.8
Library read length: 101
Removed 1186 outliers with isize >= 515
done
1
Running LUMPY...
434 0
424 0
1 1000000
2 1000000
3 1000000
4 1000000
5 1000000
6 1000000
7 1000000
8 1000000
9 1000000
10 1000000
11 1000000
12 1000000
13 1000000
13 2000000
13 4000000
13 8000000
13 16000000
13 32000000
14 1000000
14 2000000
14 4000000
14 8000000
14 16000000
14 32000000
15 1000000
15 2000000
15 4000000
15 8000000
15 16000000
15 32000000
16 1000000
17 1000000
18 1000000
19 1000000
20 1000000
21 1000000
21 2000000
21 4000000
21 8000000
21 16000000
22 1000000
22 2000000
22 4000000
22 8000000
22 16000000
22 32000000
X 1000000
Y 1000000
Y 2000000
Y 4000000
MT 1000000
GL000207.1 1000000
GL000226.1 1000000
GL000229.1 1000000
GL000231.1 1000000
GL000210.1 1000000
GL000239.1 1000000
GL000235.1 1000000
GL000201.1 1000000
GL000247.1 1000000
GL000245.1 1000000
GL000197.1 1000000
GL000203.1 1000000
GL000246.1 1000000
GL000249.1 1000000
GL000196.1 1000000
GL000248.1 1000000
GL000244.1 1000000
GL000238.1 1000000
GL000202.1 1000000
GL000234.1 1000000
GL000232.1 1000000
GL000206.1 1000000
GL000240.1 1000000
GL000236.1 1000000
GL000241.1 1000000
GL000243.1 1000000
GL000242.1 1000000
GL000230.1 1000000
GL000237.1 1000000
GL000233.1 1000000
GL000204.1 1000000
GL000198.1 1000000
GL000208.1 1000000
GL000191.1 1000000
GL000227.1 1000000
GL000228.1 1000000
GL000214.1 1000000
GL000221.1 1000000
GL000209.1 1000000
GL000218.1 1000000
GL000220.1 1000000
GL000213.1 1000000
GL000211.1 1000000
GL000199.1 1000000
GL000217.1 1000000
GL000216.1 1000000
GL000215.1 1000000
GL000205.1 1000000
GL000219.1 1000000
GL000224.1 1000000
GL000223.1 1000000
GL000195.1 1000000
GL000212.1 1000000
GL000222.1 1000000
GL000200.1 1000000
GL000193.1 1000000
GL000194.1 1000000
GL000225.1 1000000
GL000192.1 1000000
NC_007605 1000000
lumpyexpress: line 413: 14007 Segmentation fault $LUMPY $PROB_CURVE -t ${TEMP_DIR}/${OUTBASE} -msw $MIN_SAMPLE_WEIGHT -tt $TRIM_THRES $EXCLUDE_BED_FMT $LUMPY_DISC_STRING $LUMPY_SPL_STRING > $OUTPUT

@hepcat72
Copy link

This is the problem I'm having with lumpy express - but only when run via galaxy. I do not get the segfault on my mac. I want to be able to implement my analysis pipeline over to galaxy, so I'm interested in getting this resolved. My 27 bam files (10 of which lead to a segfault) are fairly small, ranging from 1Mb to 2.7Mb.

@Sithara85
Copy link

HI,

I am getting similar error. What's the solution for this? I can not share data as these are real patient samples. I will greatly appreciate your help.

Thanks in advance!

@hepcat72
Copy link

I have worked on this issue some and implemented a workaround that may get around this issue in some cases. See issue #276 and my unmerged pull request #277. You might even try installing my fork with those changes to see if it works around the issue for you.

@hepcat72
Copy link

My fork is likely somewhat behind the current though...

@hepcat72
Copy link

I have worked on this issue some and implemented a workaround that may get around this issue in some cases. See issue #276 and my unmerged pull request #277. You might even try installing my fork with those changes to see if it works around the issue for you.

@hepcat72
Copy link

My fork is likely somewhat behind the current though...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants