Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segmentation during scanning for optimal CRF weight #43

Open
kuangzhuoran opened this issue Mar 3, 2023 · 4 comments
Open

Segmentation during scanning for optimal CRF weight #43

kuangzhuoran opened this issue Mar 3, 2023 · 4 comments

Comments

@kuangzhuoran
Copy link

Hello:

RFMIX v2.03-r0 - Local Ancestry and Admixture Inference
(c) 2016, 2017 Mark Koni Hamilton Wright
Bustamante Lab - Stanford University School of Medicine
Based on concepts developed in RFMIX v1 by Brian Keith Maples, et al.

This version is licensed for non-commercial academic research use only
For commercial licensing, please contact cdbadmin@stanford.edu

--- For use in scientific publications please cite original publication ---
Brian Maples, Simon Gravel, Eimear E. Kenny, and Carlos D. Bustamante (2013).
RFMix: A Discriminative Modeling Approach for Rapid and Robust Local-Ancestry
Inference. Am. J. Hum. Genet. 93, 278-288

Loading genetic map for chromosome Chr1 ... done
Mapping samples ... 29 samples combined
Scanning input VCFs for common SNPs on chromosome Chr1 ... 956161 SNPs
Loading haplotypes... done
Defining and initializing conditional random field...
setting up CRF points and random forest windows...
computing random forest window spacing overlay...
initializing apriori reference subpop across CRF...
setting up random forest probability estimation arrays... done
Defining and initializing conditional random field... done
9589734 (17.3%) variant alleles 0 (0.0%) missing alleles

Generating internal simulation samples...
Internally simulated 154 samples from 1 randomly selected reference parents.

Scanning for optimal CRF Weight....
/slurmState/slurmSpool/slurmd/job775448/slurm_script: line 17: 10145 Segmentation fault (core dumped) ./rfmix -f sp1.chr1.vcf -r sp2.chr1.vcf -m sp2.pop -g sp1.genetic.map -o outer --chromosome=Chr1

my command is : ./rfmix -f sp1.chr1.vcf -r sp2.chr1.vcf -m sp2.pop -g sp1.genetic.map -o outer --chromosome=Chr1
What could this be about? = =

@kuangzhuoran
Copy link
Author

I switched to another dataset and now run two more rows:
--- For use in scientific publications please cite original publication ---
Brian Maples, Simon Gravel, Eimear E. Kenny, and Carlos D. Bustamante (2013).
RFMix: A Discriminative Modeling Approach for Rapid and Robust Local-Ancestry
Inference. Am. J. Hum. Genet. 93, 278-288

Loading genetic map for chromosome Chr1 ... done
Mapping samples ... 26 samples combined
Scanning input VCFs for common SNPs on chromosome Chr1 ... 4258431 SNPs
Loading haplotypes... done
Defining and initializing conditional random field...
setting up CRF points and random forest windows...
computing random forest window spacing overlay...
initializing apriori reference subpop across CRF...
setting up random forest probability estimation arrays... done
Defining and initializing conditional random field... done
94462316 (42.7%) variant alleles 0 (0.0%) missing alleles

Generating internal simulation samples...
Internally simulated 185 samples from 2 randomly selected reference parents.
Growing Random Forest Trees -- (851687/851687) 100.0%
Scanning for optimal CRF Weight....
Conditional random field ... 211/ 211 (100.0%) [1] 1897230 segmentation fault (core dumped) ./rfmix -f Mp.Chr1.vcf -r Ma.Chr1.vcf -m Ma.pop -g MpMa.all.genetic.map -o

@chibispy
Copy link

chibispy commented Mar 3, 2023

I've got the same exact error, chromossomes 1-8 worked fine, but 9 and 10 didn't. Still haven't tried the rest but it's weird how it doesn't seems to be about the size of the chromossome. Aditionally, I stried upgrading the RAM to 4x the size of what worked with the chromossomes 1-8 and tried to increase and decrease the number of threads, but regardless it didn't solved it. it even seems to run a bit further than your output as it gives a few ancestries but immediatly crashes without writing any output, here's what I get:

rfmix -f 510k_hg38.vcf.gz -r RFmix/ALL.wgs.integrated_sv_map_v1_GRCh38.20130502.svs.genotypes.vcf.gz -g RFmix/chr10.modified -m RFmix/integrated_call_samples_v3.20130502.todos.panel -o maps510k/510k_hg38_chr10 --chromosome=10 --n-threads=4

RFMIX v2.03-r0 - Local Ancestry and Admixture Inference
(c) 2016, 2017 Mark Koni Hamilton Wright
Bustamante Lab - Stanford University School of Medicine
Based on concepts developed in RFMIX v1 by Brian Keith Maples, et al.

This version is licensed for non-commercial academic research use only
For commercial licensing, please contact cdbadmin@stanford.edu

--- For use in scientific publications please cite original publication ---
Brian Maples, Simon Gravel, Eimear E. Kenny, and Carlos D. Bustamante (2013).
RFMix: A Discriminative Modeling Approach for Rapid and Robust Local-Ancestry
Inference. Am. J. Hum. Genet. 93, 278-288

Loading genetic map for chromosome 10 ... done
Mapping samples ... 3358 samples combined
Scanning input VCFs for common SNPs on chromosome 10 ... 52 SNPs
Loading haplotypes... done
Defining and initializing conditional random field...
setting up CRF points and random forest windows...
computing random forest window spacing overlay...
initializing apriori reference subpop across CRF...
setting up random forest probability estimation arrays... done
Defining and initializing conditional random field... done
10523 (3.0%) variant alleles 2 (0.0%) missing alleles

Generating internal simulation samples...
Internally simulated 1132 samples from 263 randomly selected reference parents.
Growing Random Forest Trees -- (11/11) 100.0%
Scanning for optimal CRF Weight....
Conditional random field ... 4490/ 4490 (100.0%)

Maximum scoring weight is 1 (-inf)
Simulation results...
ACB ASW BEB CDX CEU CHB CHS CLM ESN FIN GBR GIH GWD IBS ITU JPT KHV LWK MSL MXL PEL PJL PUR STU TSI YRI
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
0Segmentation fault

@jamesfifer
Copy link

I switched to another dataset and now run two more rows: --- For use in scientific publications please cite original publication --- Brian Maples, Simon Gravel, Eimear E. Kenny, and Carlos D. Bustamante (2013). RFMix: A Discriminative Modeling Approach for Rapid and Robust Local-Ancestry Inference. Am. J. Hum. Genet. 93, 278-288

Loading genetic map for chromosome Chr1 ... done Mapping samples ... 26 samples combined Scanning input VCFs for common SNPs on chromosome Chr1 ... 4258431 SNPs Loading haplotypes... done Defining and initializing conditional random field... setting up CRF points and random forest windows... computing random forest window spacing overlay... initializing apriori reference subpop across CRF... setting up random forest probability estimation arrays... done Defining and initializing conditional random field... done 94462316 (42.7%) variant alleles 0 (0.0%) missing alleles

Generating internal simulation samples... Internally simulated 185 samples from 2 randomly selected reference parents. Growing Random Forest Trees -- (851687/851687) 100.0% Scanning for optimal CRF Weight.... Conditional random field ... 211/ 211 (100.0%) [1] 1897230 segmentation fault (core dumped) ./rfmix -f Mp.Chr1.vcf -r Ma.Chr1.vcf -m Ma.pop -g MpMa.all.genetic.map -o

It is likely a memory issue. I ran into the same problem and was unable to get it work no matter how much memory I allocated. I solved it by downsizing my genetic map (I initially had genetic distance for every single locus, but rfmix will still run fine with a subset)

If that doesnt work you can also use the example dataset here as a positive control

@bamorim-bio
Copy link

bamorim-bio commented Apr 28, 2024

I get this error too !

Loading genetic map for chromosome 21 ...  done
Mapping samples ... 1274 samples combined
Scanning input VCFs for common SNPs on chromosome 21 ...   47 SNPs
Loading haplotypes... done
Defining and initializing conditional random field...
   setting up CRF points and random forest windows...
   computing random forest window spacing overlay...
   initializing apriori reference subpop across CRF...
   setting up random forest probability estimation arrays... done
Defining and initializing conditional random field...   done
16639 (13.9%) variant alleles   0 (0.0%) missing alleles

Generating internal simulation samples...
Internally simulated 400 samples from 2 randomly selected reference parents.
Growing Random Forest Trees -- (10/10) 100.0%
Scanning for optimal CRF Weight....
Conditional random field ...         1674/  1674 (100.0%)

Maximum scoring weight is 1 (-inf)
Simulation results...
        Source1       Source2
        0       1
Segmentation fault      (core dumped)

All chromosomes ran fine, except 22...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants