Prepare input files for FINEMAP #232

hopedisastro · 2024-06-16T23:33:06Z

This script will prepare the z and ld files for FINEMAP, based on outputs from associaTR (meta-analysis) and corr_matrix_maker.py.

The z file contains the effect size and standard error estimates for each variant associated with a gene.
The ld file contains the correlation matrix calcualtions for each variant associated with a gene.

MattWellie · 2024-06-24T01:09:42Z

str/fine-mapping/finemap_files_prep.py

+                if (
+                    to_path(
+                        output_path(f'finemap_prep/{celltype}/{chrom}/{gene_name}.ld', 'analysis'),
+                    ).exists()
+                    and to_path(output_path(f'finemap_prep/{celltype}/{chrom}/{gene_name}.z', 'analysis')).exists()
+                ):
+                    continue


not a biggie, but be aware of your job scaling (type * chrom * gene), and you're creating 2 existence checks per combination. All that happens in the driver job so you're delaying the start of the actual work.

I'd experiment with a change here -

# get all files in the output folder, recursively, in a single query all_files = list(to_path(output_path('finemap_prep', 'analysis')).glob('**')) # check whether your intended outputs are in that list ld_file = output_path(f'finemap_prep/{celltype}/{chrom}/{gene_name}.ld', 'analysis') z_file = output_path(f'finemap_prep/{celltype}/{chrom}/{gene_name}.z', 'analysis') if all (filepath in all_files for filepath in [z_file, ld_file]): continue

I thiiiiink this should scale a lot better, by posting one large query instead of thousands of individual ones.

This also builds the full output file names, so you can pass them to the relevant methods (you pass the celltype, chrom, and gene name to your methods, but you already made the full path here to check if it exists)

hopedisastro added 5 commits June 17, 2024 09:30

Create finemap_files_prep.py

1e56db9

lint

bd86017

update file path

1824947

update docu

72b8193

round to 4 decimal places for LD matrix

74b68a4

hopedisastro requested a review from MattWellie June 18, 2024 23:11

MattWellie reviewed Jun 24, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Prepare input files for FINEMAP #232

Prepare input files for FINEMAP #232

hopedisastro commented Jun 16, 2024

MattWellie Jun 24, 2024 •

edited

Loading

Prepare input files for FINEMAP #232

Are you sure you want to change the base?

Prepare input files for FINEMAP #232

Conversation

hopedisastro commented Jun 16, 2024

MattWellie Jun 24, 2024 • edited Loading

Choose a reason for hiding this comment

MattWellie Jun 24, 2024 •

edited

Loading