This repository is of the integrative model of the WDR76-SPIN1-Nucleosome complex based on data from chemical crosslinking, X-ray crystallography, and structure prediction from Alphafold. It contains input data, scripts for data preprocessing, modeling and results including bead models and localization probability density maps. The modeling was performed using IMP (Integrative Modeling Platform).
The integrative structure is deposited in the PDB with accession code 9A8I (PDB-Dev accession: PDBDEV_00000382)
- data : contains the subdirectories for the input data used for modeling all the subcomplexes.
- scripts : contains all the scripts used for modeling and analysis of the models.
- results : contains the models and the localization probability densities of the top cluster of the subcomplexes .
- test : scripts for testing the sampling
- For crosslinks in sheetA of sheetA of
data/xlinks/original_suppmat_DataS3.xlsx
:Make a filepython get_protein_uniprot_mapping.py -x /home/shreyas/Dropbox/washburn_wdr_spin/xls_sheet1.csv
proteins_of_interest.txt
and use it to run:Finally, to generate the input file for modeling, do:python get_protein_uniprot_mapping.py -x /home/shreyas/Dropbox/washburn_wdr_spin/xls_sheet1.csv -m mapping -p proteins_of_interest.txt
python xl_preprocessing.py ~/Dropbox/washburn_wdr_spin/xls_sheet1.csv uniprot_mapping.yaml
- For crosslinks in sheetA of sheetA of
data/xlinks/original_suppmat_DataS3.xlsx
:
Run the following command to generate the crosslinks input file for modeling:python xl_change_protnames_from_ncbi2name.py
To run the sampling, run modeling scripts like this
for runid in `seq 1 50` ; do mpirun -np 8 $IMP python scripts/modeling.py prod $runid ; done
where,
$IMP
is the setup script corresponding to the IMP installation directory (omit for binary installation).
Good scoring models were selected using pmi_analysis
(Please refer to pmi_analysis tutorial for more detailed explaination) along with our variable_filter_v1.py
script. These scripts are run as described below:
-
First, run
run_analysis_trajectories_w_skip2.py
as follows:
$IMP run_analysis_trajectories_w_skip2.py modeling run_
where,$IMP
is the setup script corresponding to the IMP installation directory (omit for binary installation),
modeling
is the directory containing all the runs and
run_
is the prefix for the names of individual run directories. -
Then run
variable_filter_v1.py
on the major cluster obtained as follows:
$IMP variable_filter_v1.py -c N -g MODEL_ANALYSIS_DIR
where,$IMP
is the setup script corresponding to the IMP installation directory (omit for binary installation),
N
is the cluster number of the major cluster,
MODEL_ANALYSIS_DIR
is the location of the directory containing the selected_models*.csv.
This can also be run using thesubmit_variable_filter_v1.sh
script from thescripts/analysis/pmi_analysis
directory.
Please also refer to the comments in thevariable_filter_v1.py
for more details. -
The selected good scoring models were then extracted using
run_extract_models.py
as follows:
$IMP python run_extract_models.py modeling run_ CLUSTER_NUMBER
where,$IMP
is the setup script corresponding to the IMP installation directory (omit for binary installation),
modeling
is the path to the directory containing all the individual runs and
CLUSTER_NUMBER
is the number of the major cluster to be extracted.
A separate directory named sampcon
was created and a density.txt
file was added to it. This file contains the details of the domains to be split for plotting the localisation probability densities. Finally, sampling exhaustiveness tests were performed using imp-sampcon
as $imp python $sampcon/pyext/src/exhaust.py -n wdr_spin -a -m cpu_omp -c 2 -d analysis/density.txt -gp -g 2.0 -sa ../model_analysis/A_gsm_clust1.txt -sb ../model_analysis/B_gsm_clust1.txt -ra ../model_analysis/A_gsm_clust1.rmf3 -rb ../model_analysis/B_gsm_clust1.rmf3
.
-
Crosslink violations were analyzed as follows:
for xlfile in data/xlinks/modeling_xlfile_sheetA.dat data/xlinks/modeling_xlfile_sheetD.dat; do python get_xlviol_val_set_v2.py sampcon_cluster0_models.rmf3 $xlfile 35.0 & done
-
Average distance maps were plotted for the models from the major cluster as follows:
scripts/analysis/cosmic_and_distance-maps/submit_contact_maps_all_pairs_surface.py
This script calls thescripts/analysis/cosmic_and_distance-maps/contact_maps_all_pairs_surface.py
script. Please use--help
forcontact_maps_all_pairs_surface.py
script for more details. -
Plot distance versus model index plots for the models in the major cluster as follows:
python scripts/analysis/plot_mdlike.py
For the simulations, the following files are in the results directory
cluster_center_model.rmf3
: representative bead model of the major clusterchimera_densities.py
: to view the localization densities (.mrc files)xlviol
: Directory containing the logs for crosslink violationsdmaps
: Directory containing the average distance maps computed for all protein pairsprism_output
: Directory containing the output from PrISM
Author(s): Shreyas Arvindekar, Shruthi Viswanath
License: CC BY-SA 4.0
This work is licensed under the Creative Commons Attribution-ShareAlike 4.0
International License.
Last known good IMP version: Not tested
Testable: Yes
Parallelizeable: Yes
Publications: Liu X, Zhang Y, Wen Z, Hao Y, Banks CAS, Cesare J, Bhattacharya S, Arvindekar S, Lange JJ, Xie Y, Garcia BA, Slaughter BD, Unruh JR, Viswanath S, Florens L, Workman JL, Washburn MP. An integrated structural model of the DNA damage-responsive H3K4me3 binding WDR76:SPIN1 complex with the nucleosome. Proc Natl Acad Sci U S A. 2024 Aug 13;121(33):e2318601121. https://doi.org/10.1073/pnas.2318601121