diff --git a/README.rst b/README.rst index bc8c347..3326aa7 100644 --- a/README.rst +++ b/README.rst @@ -7,16 +7,7 @@ +-----------------------+-+ -TADdyn is a Python library that allows to model and explore single or time-series 3C-based data. -These datasets are constituted by interaction matrices that describe distinct stages of naturally -occurring or induced cellular process such as the cell trans-differentiation, or reprogramming. -With TADdyn the user can load at once the raw and normalised interaction binned matrices (Hi-C like matrices) -at each of the experimental stages, build 4D models, and finally, extract structural properties from the models. -The 4D models reproduce the expected interaction patterns at the experimental time-points, -and also describe the structural modifications at intermediate moments (between stages) under the hypothesis -that the changes occurring between consecutive experimental time-points are smooth. To do this, -TADdyn is designed as a combination of restraint-based modelling, and steered Langevin dynamics of Physics-based -chromatin models. +TADphys. Documentation ************* @@ -25,22 +16,11 @@ Documentation | 1 - Download lammps | git clone -b stable https://github.com/lammps/lammps.git mylammps - | 2 - Download the colvar modified version - | git clone https://github.com/david-castillo/colvars - - | 3 - Update the user-defined colvars library - | ./colvars/update-colvars-code.sh ./mylammps/ - - | 4 - Compile colvars library - | cd ./mylammps/lib/colvars - | make -f Makefile.g++ - - | 5 - Install lammps as a shared library + | 2 - Install lammps as a shared library | cd ../../src/ - | include "-DLAMMPS_EXCEPTIONS" in the LMP_INC line in src/MAKE/Makefile.serial - | make yes-user-colvars + | include "-DLAMMPS_EXCEPTIONS" in the LMP_INC line in src/MAKE/Makefile.mpi | make yes-molecule - | make serial mode=shlib + | make mpi mode=shlib | make install-python | cd ../../ @@ -49,38 +29,35 @@ Documentation | conda install -y scipy # scientific computing in python | conda install -y numpy # scientific computing in python | conda install -y matplotlib # to produce plots - | conda install -y jupyter # this notebook :) | conda install -y -c https://conda.anaconda.org/bcbio pysam # to deal with SAM/BAM files - | conda install -y -c https://conda.anaconda.org/salilab imp # for 3D modeling - | conda install -y -c bioconda mcl # for clustering -**Install TADdyn** - | 1 - Download TADdyn from the Github repository - | git clone https://github.com/david-castillo/TADbit.git -b TADdyn TADdyn +**Install TADphys** + | 1 - Download TADphys from the Github repository + | git clone https://github.com/MarcoDiS/TADphys.git -b TADphys TADphys - | 2 - Install TADdyn - | cd TADdyn + | 2 - Install TADphys + | cd TADphys | python setup.py install | cd .. **Try TADdyn** - | cd test/Sox2 - | python test_TADdyn_on_Sox2.py + | cd test/ + | python test_TADphys.py Citation ******** -Please, cite this article if you use TADdyn. +Please, cite this article if you use TADphys. Marco Di Stefano, Ralph Stadhouders, Irene Farabella, David Castillo, François Serra, Thomas Graf, Marc A. Marti-Renom. **Dynamic simulations of transcriptional control during cell reprogramming reveal spatial chromatin caging.** *bioRxiv* 642009; `doi: https://doi.org/10.1101/642009`_ -Methods implemented in TADdyn +Methods implemented in TADphys ----------------------------- -In the actual implementation, TADdyn relies on TADbit for the preparation of the 3C-based datasets from mapping to normalization, +In the actual implementation, TADphys relies on TADbit for the models' analysis and on LAMMPS [Plimpton]_ for the implementation of the simulations. Bibliography ************ -.. [Plimpton] Plimpton, S. Fast Parallel Algorithms for Short-Range Molecular Dynamics. J Comp Phys 117, 1-19 (1995) and Fiorin, G., Klein, M.L. & Hénin, J. Using collective variables to drive molecular dynamics simulations. Molecular Physics 111, 3345-3362 (2013). \ No newline at end of file +.. [Plimpton] Plimpton, S. Fast Parallel Algorithms for Short-Range Molecular Dynamics. J Comp Phys 117, 1-19 (1995) and Fiorin, G., Klein, M.L. & Hénin, J. Using collective variables to drive molecular dynamics simulations. Molecular Physics 111, 3345-3362 (2013). diff --git a/TADphys.egg-info/PKG-INFO b/TADphys.egg-info/PKG-INFO new file mode 100644 index 0000000..51910e4 --- /dev/null +++ b/TADphys.egg-info/PKG-INFO @@ -0,0 +1,73 @@ +Metadata-Version: 1.0 +Name: TADphys +Version: 0.1 +Summary: Tadphys is a Python library that allows to model and explore single or time-series 3C-based data. +Home-page: UNKNOWN +Author: Marco Di Stefano +Author-email: marco.di.distefano.1985@gmail.com +License: GPLv3 +Description: + + +-----------------------+-+ + | | | + | Current version: pipeline_v0.2.722 | + | | | + +-----------------------+-+ + + + TADphys. + + Documentation + ************* + + **Install LAMMPS as a shared library** + | 1 - Download lammps + | git clone -b stable https://github.com/lammps/lammps.git mylammps + + | 2 - Install lammps as a shared library + | cd ../../src/ + | include "-DLAMMPS_EXCEPTIONS" in the LMP_INC line in src/MAKE/Makefile.mpi + | make yes-molecule + | make mpi mode=shlib + | make install-python + + | cd ../../ + + **Install packages** + | conda install -y scipy # scientific computing in python + | conda install -y numpy # scientific computing in python + | conda install -y matplotlib # to produce plots + | conda install -y -c https://conda.anaconda.org/bcbio pysam # to deal with SAM/BAM files + + **Install TADphys** + | 1 - Download TADphys from the Github repository + | git clone https://github.com/MarcoDiS/TADphys.git -b TADphys TADphys + + | 2 - Install TADphys + | cd TADphys + | python setup.py install + | cd .. + + **Try TADdyn** + | cd test/ + | python test_TADphys.py + + Citation + ******** + Please, cite this article if you use TADphys. + + Marco Di Stefano, Ralph Stadhouders, Irene Farabella, David Castillo, François Serra, Thomas Graf, Marc A. Marti-Renom. + **Dynamic simulations of transcriptional control during cell reprogramming reveal spatial chromatin caging.** + *bioRxiv* 642009; `doi: https://doi.org/10.1101/642009`_ + + Methods implemented in TADphys + ----------------------------- + In the actual implementation, TADphys relies on TADbit for the models' analysis + and on LAMMPS [Plimpton]_ for the implementation of the simulations. + + Bibliography + ************ + + .. [Plimpton] Plimpton, S. Fast Parallel Algorithms for Short-Range Molecular Dynamics. J Comp Phys 117, 1-19 (1995) and Fiorin, G., Klein, M.L. & Hénin, J. Using collective variables to drive molecular dynamics simulations. Molecular Physics 111, 3345-3362 (2013). + +Platform: OS Independent diff --git a/TADphys.egg-info/SOURCES.txt b/TADphys.egg-info/SOURCES.txt new file mode 100644 index 0000000..8e025c7 --- /dev/null +++ b/TADphys.egg-info/SOURCES.txt @@ -0,0 +1,29 @@ +README.rst +setup.py +TADphys.egg-info/PKG-INFO +TADphys.egg-info/SOURCES.txt +TADphys.egg-info/dependency_links.txt +TADphys.egg-info/top_level.txt +src/3d-lib/squared_distance_matrix_calculation_py.c +tadphys/Chromosome_region.py +tadphys/__init__.py +tadphys/_version.py +tadphys/chromosome.py +tadphys/experiment.py +tadphys/hic_data.py +tadphys/modelling/HIC_CONFIG.py +tadphys/modelling/LAMMPS_CONFIG.py +tadphys/modelling/__init__.py +tadphys/modelling/impoptimizer.py +tadphys/modelling/lammps_modelling.py +tadphys/modelling/lammpsmodel.py +tadphys/modelling/restraints.py +tadphys/modelling/structuralmodel.py +tadphys/utils/__init__.py +tadphys/utils/extraviews.py +tadphys/utils/hic_filtering.py +tadphys/utils/hic_parser.py +tadphys/utils/maths.py +tadphys/utils/modelAnalysis.py +tadphys/utils/tadmaths.py +test/test_TADdyn_on_Sox2.py \ No newline at end of file diff --git a/TADphys.egg-info/dependency_links.txt b/TADphys.egg-info/dependency_links.txt new file mode 100644 index 0000000..8b13789 --- /dev/null +++ b/TADphys.egg-info/dependency_links.txt @@ -0,0 +1 @@ + diff --git a/TADphys.egg-info/top_level.txt b/TADphys.egg-info/top_level.txt new file mode 100644 index 0000000..1599f3f --- /dev/null +++ b/TADphys.egg-info/top_level.txt @@ -0,0 +1 @@ +tadphys diff --git a/taddyn/Chromosome_region.py b/build/lib.linux-x86_64-3.6/tadphys/Chromosome_region.py similarity index 100% rename from taddyn/Chromosome_region.py rename to build/lib.linux-x86_64-3.6/tadphys/Chromosome_region.py diff --git a/taddyn/__init__.py b/build/lib.linux-x86_64-3.6/tadphys/__init__.py similarity index 83% rename from taddyn/__init__.py rename to build/lib.linux-x86_64-3.6/tadphys/__init__.py index b66bc93..e19de3e 100644 --- a/taddyn/__init__.py +++ b/build/lib.linux-x86_64-3.6/tadphys/__init__.py @@ -3,7 +3,7 @@ from os import environ from subprocess import Popen, PIPE, check_call, CalledProcessError -from taddyn._version import __version__ +from tadphys._version import __version__ # ## Check if we have X display http://stackoverflow.com/questions/8257385/automatic-detection-of-display-availability-with-matplotlib # if not "DISPLAY" in environ: @@ -19,11 +19,11 @@ def get_dependencies_version(dico=False): """ - Check versions of TADdyn and all dependencies, as well and retrieves system + Check versions of TADphys and all dependencies, as well and retrieves system info. May be used to ensure reproducibility. :returns: string with description of versions installed """ - versions = {' TADdyn': __version__ + '\n\n'} + versions = {' TADphys': __version__ + '\n\n'} try: import scipy @@ -62,11 +62,11 @@ def get_dependencies_version(dico=False): sorted(versions.keys())]) -from taddyn.chromosome import Chromosome -from taddyn.experiment import Experiment, load_experiment_from_reads -from taddyn.chromosome import load_chromosome +from tadphys.chromosome import Chromosome +from tadphys.experiment import Experiment, load_experiment_from_reads +from tadphys.chromosome import load_chromosome # from taddyn.modelling.structuralmodels import StructuralModels # from taddyn.modelling.structuralmodels import load_structuralmodels # from taddyn.utils.hic_parser import load_hic_data_from_reads # from taddyn.utils.hic_parser import load_hic_data_from_bam -from taddyn.utils.hic_parser import read_matrix +from tadphys.utils.hic_parser import read_matrix diff --git a/taddyn/_version.py b/build/lib.linux-x86_64-3.6/tadphys/_version.py similarity index 100% rename from taddyn/_version.py rename to build/lib.linux-x86_64-3.6/tadphys/_version.py diff --git a/taddyn/chromosome.py b/build/lib.linux-x86_64-3.6/tadphys/chromosome.py similarity index 98% rename from taddyn/chromosome.py rename to build/lib.linux-x86_64-3.6/tadphys/chromosome.py index 96e01c5..186fd28 100644 --- a/taddyn/chromosome.py +++ b/build/lib.linux-x86_64-3.6/tadphys/chromosome.py @@ -5,14 +5,14 @@ from string import ascii_lowercase as letters from copy import deepcopy as copy -from cPickle import load, dump +from pickle import load, dump from random import random from math import sqrt from sys import stderr from os.path import exists -import taddyn -from taddyn.experiment import Experiment -# from taddyn.alignment import Alignment, randomization_test +import tadphys +from tadphys.experiment import Experiment +# from tadphys.alignment import Alignment, randomization_test try: import matplotlib.pyplot as plt @@ -276,7 +276,7 @@ def get_experiment(self, name): This can also be done directly with Chromosome.experiments[name]. :param name: name of the experiment to select - :returns: :class:`taddyn.Experiment` + :returns: :class:`tadphys.Experiment` """ for exp in self.experiments: if exp.name == name: @@ -379,9 +379,9 @@ def save_chromosome(self, out_f, fast=True, divide=True, force=False): # simply shuffled # :param 1000 rnd_num: number of randomizations to do # :param reciprocal method: if global, Needleman-Wunsch is used to align - # (see :func:`taddyn.boundary_aligner.globally.needleman_wunsch`); + # (see :func:`tadphys.boundary_aligner.globally.needleman_wunsch`); # if reciprocal, a method based on reciprocal closest boundaries is - # used (see :func:`taddyn.boundary_aligner.reciprocally.reciprocal`) + # used (see :func:`tadphys.boundary_aligner.reciprocally.reciprocal`) # :returns: an alignment object or, if the randomizattion was invoked, # an alignment object, and a list of statistics that are, the alignment @@ -485,7 +485,7 @@ def add_experiment(self, name, resolution=None, tad_def=None, hic_data=None, # verbose=True, max_tad_size="max", heuristic=True, # batch_mode=False, **kwargs): # """ - # Call the :func:`taddyn.tadbit.tadbit` function to calculate the + # Call the :func:`tadphys.tadbit.tadbit` function to calculate the # position of Topologically Associated Domain boundaries # :param experiment: A square matrix of interaction counts of Hi-C @@ -639,7 +639,7 @@ def __update_size(self, xpr): # :param True normalized: show the normalized data (weights might have # been calculated previously). *Note: white rows/columns may appear in # the matrix displayed; these rows correspond to filtered rows (see* - # :func:`taddyn.utils.hic_filtering.hic_filtering_for_modelling` *)* + # :func:`tadphys.utils.hic_filtering.hic_filtering_for_modelling` *)* # :param True relative: color scale is relative to the whole matrix of # data, not only to the region displayed # :param True decorate: draws color bar, title and axes labels @@ -903,14 +903,14 @@ def __update_size(self, xpr): class ExperimentList(list): """ Inherited from python built in :py:func:`list`, modified for TADbit - :class:`taddyn.Experiment`. + :class:`tadphys.Experiment`. Mainly, `getitem`, `setitem`, and `append` were modified in order to be able to search for experiments by index or by name, and to add experiments simply using Chromosome.experiments.append(Experiment). The whole ExperimentList object is linked to a Chromosome instance - (:class:`taddyn.Chromosome`). + (:class:`tadphys.Chromosome`). """ def __init__(self, thing, crm): @@ -971,12 +971,12 @@ def append(self, exp): class AlignmentDict(dict): """ - :py:func:`dict` of :class:`taddyn.Alignment` + :py:func:`dict` of :class:`tadphys.Alignment` Modified getitem, setitem, and append in order to be able to search alignments by index or by name. - linked to a :class:`taddyn.Chromosome` + linked to a :class:`tadphys.Chromosome` """ def __getitem__(self, nam): diff --git a/taddyn/experiment.py b/build/lib.linux-x86_64-3.6/tadphys/experiment.py similarity index 97% rename from taddyn/experiment.py rename to build/lib.linux-x86_64-3.6/tadphys/experiment.py index 78788d5..1d5307a 100644 --- a/taddyn/experiment.py +++ b/build/lib.linux-x86_64-3.6/tadphys/experiment.py @@ -4,26 +4,22 @@ """ -from copy import deepcopy as copy -from sys import stderr -from warnings import warn -from math import isnan -from numpy import log2, array -#from taddyn import HiC_data -from taddyn.modelling.HIC_CONFIG import CONFIG -from taddyn.utils.hic_parser import read_matrix -from taddyn.utils.extraviews import nicer -from taddyn.utils.tadmaths import zscore, nozero_log_matrix -from taddyn.utils.hic_filtering import hic_filtering_for_modelling -# from taddyn.modelling.structuralmodels import StructuralModels +from copy import deepcopy as copy +from sys import stderr +from warnings import warn +from math import isnan +from numpy import log2, array +from tadphys.modelling.HIC_CONFIG import CONFIG +from tadphys.utils.hic_parser import read_matrix +from tadphys.utils.extraviews import nicer +from tadphys.utils.tadmaths import zscore, nozero_log_matrix +from tadphys.utils.hic_filtering import hic_filtering_for_modelling from collections import OrderedDict try: - from taddyn.modelling.impoptimizer import IMPoptimizer - # from taddyn.modelling.imp_modelling import generate_3d_models - from taddyn.modelling.lammps_modelling import generate_lammps_models + from tadphys.modelling.lammps_modelling import generate_lammps_models except ImportError: - stderr.write('IMP not found, check PYTHONPATH\n') + pass try: import matplotlib.pyplot as plt @@ -420,7 +416,7 @@ def set_resolution(self, resolution, keep_original=True): Set a new value for the resolution. Copy the original data into Experiment._ori_hic and replace the Experiment.hic_data with the data corresponding to new data - (:func:`taddyn.Chromosome.compare_condition`). + (:func:`tadphys.Chromosome.compare_condition`). :param resolution: an integer representing the resolution. This number must be a multiple of the original resolution, and higher than it @@ -581,7 +577,7 @@ def load_hic_data(self, hic_data, parser=None, wanted_resolution=None, :param None resolution: resolution of the experiment in the file; it will be adjusted to the resolution of the experiment. By default the file is expected to contain a Hi-C experiment with the same resolution - as the :class:`taddyn.Experiment` created, and no change is made + as the :class:`tadphys.Experiment` created, and no change is made :param True filter_columns: filter the columns with unexpectedly high content of low values :param False silent: does not warn for removed columns @@ -627,7 +623,7 @@ def load_norm_data(self, norm_data, parser=None, resolution=None, :param None resolution: resolution of the experiment in the file; it will be adjusted to the resolution of the experiment. By default the file is expected to contain a Hi-C experiment with the same resolution - as the :class:`taddyn.Experiment` created, and no change is made + as the :class:`tadphys.Experiment` created, and no change is made :param True filter_columns: filter the columns with unexpectedly high content of low values :param False silent: does not warn for removed columns @@ -731,7 +727,7 @@ def load_norm_data(self, norm_data, parser=None, resolution=None, # rowsums=None, index=0): # """ # Normalize the Hi-C data. This normalization step does the same of - # the :func:`taddyn.tadbit.tadbit` function (default parameters), + # the :func:`tadphys.tadbit.tadbit` function (default parameters), # It fills the Experiment.norm variable with the Hi-C values divided by # the calculated weight. @@ -895,7 +891,7 @@ def model_region(self, start=1, end=None, n_models=5000, n_keep=1000, :: - from taddyn.imp.CONFIG import CONFIG + from tadphys.imp.CONFIG import CONFIG where CONFIG is a dictionarry of dictionnaries to be passed to this function: @@ -948,7 +944,7 @@ def model_region(self, start=1, end=None, n_models=5000, n_keep=1000, restart_file != False :param False useColvars: True if you want the restrains to be loaded by colvars - :returns: a :class:`taddyn.imp.structuralmodels.StructuralModels` object. + :returns: a :class:`tadphys.imp.structuralmodels.StructuralModels` object. """ if isinstance(stages, list) and tool == 'imp': @@ -1079,7 +1075,7 @@ def optimal_imp_parameters(self, start=1, end=None, n_models=500, n_keep=100, 0.002, 0.003, 0.004 and 0.005 - :returns: an :class:`taddyn.imp.impoptimizer.IMPoptimizer` object + :returns: an :class:`tadphys.imp.impoptimizer.IMPoptimizer` object """ if not self._normalization: @@ -1127,7 +1123,7 @@ def _sub_experiment_zscore(self, start, end, index=0): stderr.write('WARNING: normalizing according to visibility method\n') for i in idx: self.normalize_hic(index=i) - from taddyn import Chromosome + from tadphys import Chromosome if start < 1: raise ValueError('ERROR: start should be higher than 0\n') start -= 1 # things starts at 0 for python. we keep the end coordinate @@ -1368,7 +1364,7 @@ def print_hic_matrix(self, print_it=True, normalized=False, zeros=False, index=0 for i in xrange(siz)]) for j in xrange(siz)]) if print_it: - print out + print(out) else: return out + '\n' @@ -1402,7 +1398,7 @@ def print_hic_matrix(self, print_it=True, normalized=False, zeros=False, index=0 # :param True normalized: show the normalized data (weights might have # been calculated previously). *Note: white rows/columns may appear in # the matrix displayed; these rows correspond to filtered rows (see* - # :func:`taddyn.utils.hic_filtering.hic_filtering_for_modelling` *)* + # :func:`tadphys.utils.hic_filtering.hic_filtering_for_modelling` *)* # :param True relative: color scale is relative to the whole matrix of # data, not only to the region displayed # :param True decorate: draws color bar, title and axes labels diff --git a/taddyn/hic_data.py b/build/lib.linux-x86_64-3.6/tadphys/hic_data.py similarity index 100% rename from taddyn/hic_data.py rename to build/lib.linux-x86_64-3.6/tadphys/hic_data.py diff --git a/taddyn/modelling/HIC_CONFIG.py b/build/lib.linux-x86_64-3.6/tadphys/modelling/HIC_CONFIG.py similarity index 100% rename from taddyn/modelling/HIC_CONFIG.py rename to build/lib.linux-x86_64-3.6/tadphys/modelling/HIC_CONFIG.py diff --git a/build/lib.linux-x86_64-3.6/tadphys/modelling/LAMMPS_CONFIG.py b/build/lib.linux-x86_64-3.6/tadphys/modelling/LAMMPS_CONFIG.py new file mode 100644 index 0000000..3c0fd76 --- /dev/null +++ b/build/lib.linux-x86_64-3.6/tadphys/modelling/LAMMPS_CONFIG.py @@ -0,0 +1,103 @@ +""" +25 Oct 2016 + + +""" +############################################################################### +# Parameters to implement Kremer&Grest polymer model # +# Reference paper: # +# K. Kremer and G. S. Grest # +# Dynamics of entangled linear polymer melts: A molecular-dynamics simulation # +# J Chem Phys 92, 5057 (1990) # +############################################################################### + + +# units http://lammps.sandia.gov/doc/units.html +units = "lj" + +# atom_style http://lammps.sandia.gov/doc/atom_style.html +atom_style = "angle" + +# boundary conditions http://lammps.sandia.gov/doc/boundary.html +boundary = "p p p" +#boundary = "f f f" + +# mass http://lammps.sandia.gov/doc/mass.html +mass = "* 1.0" + +# neighbor http://lammps.sandia.gov/doc/neighbor.html +#neighbor = "3.0 nsq" # Optional for small and low density systems +neighbor = "0.3 bin" # Standard choice for large (> 10,000 particles) systems +#neigh_modify = "every 1 delay 1 check yes page 200000 one 20000" +neigh_modify = "every 1 delay 1 check yes" + +# thermo +run = 100 +thermo = 1000 #int(float(run)/100) +#thermo_style custom step temp epair emol press pxx pyy pzz pxy pxz pyz vol + +# Excluded volume term: Purely repulsive Lennard-Jones or Truncated and Shifted Lennard-Jones +################################################################### +# Lennard-Jones 12-6 potential with cutoff (=truncated): # +# potential E=4epsilon[ (sigma/r)^12 - (sigma/r)^6] for r 1: + time_dependent_steering_pairs = { + 'colvar_input' : HiCRestraints, + 'colvar_output' : colvars, + 'chrlength' : nloci, + 'binsize' : resolution, + 'timesteps_per_k_change' : [float(timesteps_per_k)]*6, + 'k_factor' : kfactor, + 'perc_enfor_contacts' : 100., + 'colvar_dump_freq' : int(timesteps_per_k/100), + 'adaptation_step' : adaptation_step, + } + if not initial_conformation: + initial_conformation = 'random' + else: + steering_pairs = { + 'colvar_input': HiCRestraints[0], + 'colvar_output': colvars, + 'binsize': resolution, + 'timesteps_per_k' : timesteps_per_k, + 'k_factor' : kfactor, + 'colvar_dump_freq' : int(timesteps_per_k/100), + 'timesteps_relaxation' : int(timesteps_per_k*10) + } + if not initial_conformation: + initial_conformation = 'random' + + if not container: + container = ['cube',1000.0] # http://lammps.sandia.gov/threads/msg48683.html + + ini_sm_model = None + sm_diameter = 1 + if initial_conformation != 'random': + if isinstance(initial_conformation, dict): + sm = [initial_conformation] + sm[0]['x'] = sm[0]['x'][0:nloci] + sm[0]['y'] = sm[0]['y'][0:nloci] + sm[0]['z'] = sm[0]['z'][0:nloci] + sm_diameter = float(resolution * CONFIG.HiC['scale']) + for single_m in sm: + for i in range(len(single_m['x'])): + single_m['x'][i] /= sm_diameter + single_m['y'][i] /= sm_diameter + single_m['z'][i] /= sm_diameter + cm0 = single_m.center_of_mass() + for i in range(len(single_m['x'])): + single_m['x'][i] -= cm0['x'] + single_m['y'][i] -= cm0['y'] + single_m['z'][i] -= cm0['z'] + ini_sm_model = [[single_sm.copy()] for single_sm in sm] + + models, ini_model = lammps_simulate(lammps_folder=tmp_folder, + run_time=run_time, + initial_conformation=ini_sm_model, + connectivity=connectivity, + steering_pairs=steering_pairs, + time_dependent_steering_pairs=time_dependent_steering_pairs, + initial_seed=initial_seed, + n_models=n_keep, + n_keep=n_keep, + n_cpus=n_cpus, + keep_restart_out_dir=keep_restart_out_dir, + confining_environment=container, + timeout_job=timeout_job, + cleanup=cleanup, to_dump=int(timesteps_per_k/100.), + hide_log=hide_log, + chromosome_particle_numbers=chromosome_particle_numbers, + restart_path=restart_path, + store_n_steps=store_n_steps, + useColvars=useColvars) +# for i, m in enumerate(models.values()): +# m['index'] = i + if outfile: + if exists(outfile): + old_models, _ = load(open(outfile)) + else: + old_models, _ = {}, {} + models.update(old_models) + out = open(outfile, 'w') + dump((models), out) + out.close() + else: + stages = {} + trajectories = {} + timepoints = None + if len(HiCRestraints)>1: + timepoints = time_dependent_steering_pairs['colvar_dump_freq'] + nbr_produced_models = len(models)//(timepoints*(len(HiCRestraints)-1)) + stages[0] = [i for i in range(nbr_produced_models)] + + for sm_id, single_m in enumerate(ini_model): + for i in range(len(single_m['x'])): + single_m['x'][i] *= sm_diameter + single_m['y'][i] *= sm_diameter + single_m['z'][i] *= sm_diameter + + lammps_model = LAMMPSmodel({ 'x' : single_m['x'], + 'y' : single_m['y'], + 'z' : single_m['z'], + 'index' : sm_id+1+initial_seed, + 'cluster' : 'Singleton', + 'objfun' : single_m['objfun'] if 'objfun' in single_m else 0, + 'log_objfun' : single_m['log_objfun'] if 'log_objfun' in single_m else [], + 'radius' : float(CONFIG.HiC['resolution'] * \ + CONFIG.HiC['scale'])/2, + 'rand_init' : str(sm_id+1+initial_seed)}) + + models[sm_id] = lammps_model + for timepoint in range((len(HiCRestraints)-1)*timepoints): + stages[timepoint+1] = [(t+nbr_produced_models+timepoint*nbr_produced_models) + for t in range(nbr_produced_models)] + for traj in range(nbr_produced_models): + trajectories[traj] = [stages[t][traj] for t in range(timepoints+1)] + + model_ensemble = { + 'loci': len(LOCI), + 'models': models, + 'resolution': resolution, + 'original_data': values if len(HiCRestraints)>1 else values[0], + 'zscores': zscores, + 'config': CONFIG.HiC, + 'zeros': zeros, + 'restraints': HiCRestraints[0]._get_restraints(), + 'stages': stages, + 'trajectories': trajectories, + 'models_per_step': timepoints + } + + return model_ensemble +# Initialize the lammps simulation with standard polymer physics based +# interactions: chain connectivity (FENE) ; excluded volume (WLC) ; and +# bending rigidity (KP) +def init_lammps_run(lmp, initial_conformation, + neighbor=CONFIG.neighbor, + hide_log=True, + chromosome_particle_numbers=None, + connectivity="FENE", + restart_file=False): + + """ + Initialise the parameters for the computation in lammps job + + :param lmp: lammps instance object. + :param initial_conformation: lammps input data file with the particles initial conformation. + :param CONFIG.neighbor neighbor: see LAMMPS_CONFIG.py. + :param True hide_log: do not generate lammps log information + :param FENE connectivity: use FENE for a fene bond or harmonic for harmonic + potential for neighbours + :param False restart_file: path to file to restore LAMMPs session (binary) + + """ + + if hide_log: + lmp.command("log none") + #os.remove("log.lammps") + + ####################################################### + # Box and units (use LJ units and period boundaries) # + ####################################################### + lmp.command("units %s" % CONFIG.units) + lmp.command("atom_style %s" % CONFIG.atom_style) #with stiffness + lmp.command("boundary %s" % CONFIG.boundary) + """ + try: + lmp.command("communicate multi") + except: + pass + """ + + ########################## + # READ "start" data file # + ########################## + if restart_file == False : + lmp.command("read_data %s" % initial_conformation) + else: + restart_time = int(restart_file.split('/')[-1].split('_')[4][:-8]) + print('Previous unfinished LAMMPS steps found') + print('Loaded %s file' %restart_file) + lmp.command("read_restart %s" % restart_file) + lmp.command("reset_timestep %i" % restart_time) + + lmp.command("mass %s" % CONFIG.mass) + + ################################################################## + # Pair interactions require lists of neighbours to be calculated # + ################################################################## + lmp.command("neighbor %s" % neighbor) + lmp.command("neigh_modify %s" % CONFIG.neigh_modify) + + ############################################################## + # Sample thermodynamic info (temperature, energy, pressure) # + ############################################################## + lmp.command("thermo %i" % CONFIG.thermo) + + ############################### + # Stiffness term # + # E = K * (1+cos(theta)), K>0 # + ############################### + lmp.command("angle_style %s" % CONFIG.angle_style) # Write function for kinks + lmp.command("angle_coeff * %f" % CONFIG.persistence_length) + + ################################################################### + # Pair interaction between non-bonded atoms # + # # + # Lennard-Jones 12-6 potential with cutoff: # + # potential E=4epsilon[ (sigma/r)^12 - (sigma/r)^6] for r timepoints) for ks in kseeds]): + # kseeds.append(rnd) + + #pool = ProcessPool(max_workers=n_cpus, max_tasks=n_cpus) + pool = multiprocessing.Pool(processes=n_cpus, maxtasksperchild=n_cpus) + + results = [] + def collect_result(result): + results.append((result[0], result[1], result[2])) + + initial_models = initial_conformation + if not initial_models: + initial_models = [] + + jobs = {} + for k_id, k in enumerate(kseeds): + k_folder = lammps_folder + 'lammps_' + str(k) + '/' + failedSeedLog = None + # First we check if the modelling fails with this seed + if restart_path != False: + restart_file = restart_path + 'lammps_' + str(k) + '/' + failedSeedLog = restart_file + 'runLog.txt' + if os.path.exists(failedSeedLog): + with open(failedSeedLog, 'r') as f: + for line in f: + prevRun = line.split() + # add number of models done so dont repeat same seed + if prevRun[1] == 'Failed': + k = int(prevRun[0]) + n_models + k_folder = lammps_folder + 'lammps_' + str(k) + '/' + + #print "#RandomSeed: %s" % k + keep_restart_out_dir2 = None + if keep_restart_out_dir != None: + keep_restart_out_dir2 = keep_restart_out_dir + 'lammps_' + str(k) + '/' + if not os.path.exists(keep_restart_out_dir2): + os.makedirs(keep_restart_out_dir2) + model_path = False + if restart_path != False: + # check presence of previously finished jobs + model_path = restart_path + 'lammps_' + str(k) + '/finishedModel_%s.pickle' %k + # define restart file by checking for finished jobs or last step + if model_path != False and os.path.exists(model_path): + with open(model_path, "rb") as input_file: + m = load(input_file) + results.append((m[0], m[1])) + else: + if restart_path != False: + restart_file = restart_path + 'lammps_' + str(k) + '/' + dirfiles = os.listdir(restart_file) + # check for last k and step + maxi = (0, 0, '') + for f in dirfiles: + if f.startswith('restart_kincrease_'): + kincrease = int(f.split('_')[2]) + step = int(f.split('_')[-1][:-8]) + if kincrease > maxi[0]: + maxi = (kincrease, step, f) + elif kincrease == maxi[0] and step > maxi[1]: + maxi = (kincrease, step, f) + # In case there is no restart file at all + if maxi[2] == '': + #print('Could not find a LAMMPS restart file') + # will check later if we have a path or a file + getIniConf = True + #restart_file = False + else: + restart_file = restart_file + maxi[2] + getIniConf = False + else: + restart_file = False + getIniConf = True + + ini_conf = None + if not os.path.exists(k_folder): + os.makedirs(k_folder) + if initial_conformation and getIniConf == True: + ini_conf = '%sinitial_conformation.dat' % k_folder + write_initial_conformation_file(initial_conformation[k_id], + chromosome_particle_numbers, + confining_environment, + out_file=ini_conf) + # jobs[k] = run_lammps(k, k_folder, run_time, + # initial_conformation, connectivity, + # neighbor, + # tethering, minimize, + # compress_with_pbc, compress_without_pbc, + # confining_environment, + # steering_pairs, + # time_dependent_steering_pairs, + # loop_extrusion_dynamics, + # to_dump, pbc,) + # jobs[k] = pool.schedule(run_lammps, + jobs[k] = partial(abortable_worker, run_lammps, timeout=timeout_job, + failedSeedLog=[failedSeedLog, k]) + pool.apply_async(jobs[k], + args=(k, k_folder, run_time, + ini_conf, connectivity, + neighbor, + tethering, minimize, + compress_with_pbc, compress_without_pbc, + initial_relaxation, + confining_environment, + steering_pairs, + time_dependent_steering_pairs, + compartmentalization, + loop_extrusion_dynamics, + to_dump, pbc, hide_log, + chromosome_particle_numbers, + keep_restart_out_dir2, + restart_file, + model_path, + store_n_steps, + useColvars,), callback=collect_result) + # , timeout=timeout_job) + pool.close() + pool.join() + +# for k in kseeds: +# try: +# #results.append((k, jobs[k])) +# results.append((k, jobs[k].result())) +# except TimeoutError: +# print "Model took more than %s seconds to complete ... canceling" % str(timeout_job) +# jobs[k].cancel() +# except Exception as error: +# print "Function raised %s" % error +# jobs[k].cancel() + + models = {} + initial_models = [] + ############ WARNING ############ + # PENDING TO ADD THE STORAGE OF INITIAL MODELS # + if timepoints > 1: + for t in range(timepoints): + time_models = [] + for res in results: + (k,restarr,init_conf) = res + time_models.append(restarr[t]) + for i, m in enumerate(time_models[:n_keep]): + models[i+t*len(time_models[:n_keep])+n_keep] = m + #for i, (_, m) in enumerate( + # sorted(time_models.items(), key=lambda x: x[1]['objfun'])[:n_keep]): + # models[i+t+1] = m + + else: + for i, (_, m, im) in enumerate( + sorted(results, key=lambda x: x[1][0]['objfun'])[:n_keep]): + models[i] = m[0] + if not initial_conformation: + initial_models += [im] + + if cleanup: + for k in kseeds: + k_folder = lammps_folder + '/lammps_' + str(k) + '/' + if os.path.exists(k_folder): + shutil.rmtree(k_folder) + + return models, initial_models + + + +# This performs the dynamics: I should add here: The steered dynamics (Irene and Hi-C based) ; +# The loop extrusion dynamics ; the binders based dynamics (Marenduzzo and Nicodemi)...etc... +def run_lammps(kseed, lammps_folder, run_time, + initial_conformation=None, connectivity="FENE", + neighbor=CONFIG.neighbor, + tethering=False, minimize=True, + compress_with_pbc=None, compress_without_pbc=None, + initial_relaxation=None, + confining_environment=None, + steering_pairs=None, + time_dependent_steering_pairs=None, + compartmentalization=None, + loop_extrusion_dynamics=None, + to_dump=10000, pbc=False, + hide_log=True, + chromosome_particle_numbers=None, + keep_restart_out_dir2=None, + restart_file=False, + model_path=False, + store_n_steps=10, + useColvars=False): + """ + Generates one lammps model + + :param kseed: Random number to identify the model. + :param initial_conformation_folder: folder where to store lammps input + data file with the particles initial conformation. + http://lammps.sandia.gov/doc/2001/data_format.html + :param FENE connectivity: use FENE for a fene bond or harmonic for harmonic + potential for neighbours (see init_lammps_run) + :param run_time: # of timesteps. + :param None initial_conformation: path to initial conformation file or None + for random walk initial start. + :param CONFIG.neighbor neighbor: see LAMMPS_CONFIG.py. + :param False tethering: whether to apply tethering command or not. + :param True minimize: whether to apply minimize command or not. + :param None compress_with_pbc: whether to apply the compression dynamics in case of a + system with cubic confinement and pbc. This compression step is usually apply + to obtain a system with the desired particle density. The input have to be a list + of three elements: + 0 - XXX; + 1 - XXX; + 2 - The compression simulation time span (in timesteps). + e.g. compress_with_pbc=[0.01, 0.01, 100000] + :param None compress_without_pbc: whether to apply the compression dynamics in case of a + system with spherical confinement. This compression step is usually apply to obtain a + system with the desired particle density. The simulation shrinks/expands the initial + sphere to a sphere of the desired radius using many short runs. In each short run the + radius is reduced by 0.1 box units. The input have to be a list of three elements: + 0 - Initial radius; + 1 - Final desired radius; + 2 - The time span (in timesteps) of each short compression run. + e.g. compress_without_pbc=[300, 100, 100] + :param None steering_pairs: particles contacts file from colvars fix + http://lammps.sandia.gov/doc/PDF/colvars-refman-lammps.pdf. + steering_pairs = { 'colvar_input' : "ENST00000540866.2chr7_clean_enMatch.txt", + 'colvar_output' : "colvar_list.txt", + 'kappa_vs_genomic_distance' : "kappa_vs_genomic_distance.txt", + 'chrlength' : 3182, + 'copies' : ['A'], + 'binsize' : 50000, + 'number_of_kincrease' : 1000, + 'timesteps_per_k' : 1000, + 'timesteps_relaxation' : 100000, + 'perc_enfor_contacts' : 10 + } + + :param None loop_extrusion_dynamics: dictionary with all the info to perform loop + extrusion dynamics. + loop_extrusion_dynamics = { 'target_loops_input' : "target_loops.txt", + 'loop_extrusion_steps_output' : "loop_extrusion_steps.txt", + 'attraction_strength' : 4.0, + 'equilibrium_distance' : 1.0, + 'chrlength' : 3182, + 'copies' : ['A'], + 'timesteps_per_loop_extrusion_step' : 1000, + 'timesteps_relaxation' : 100000, + 'perc_enfor_loops' : 10 + } + + Should at least contain Chromosome, loci1, loci2 as 1st, 2nd and 3rd column + :param None keep_restart_out_dir2: path to write files to restore LAMMPs + session (binary) + :param False restart_file: path to file to restore LAMMPs session (binary) + :param False model_path: path to/for pickle with finished model (name included) + :param 10 store_n_steps: Integer with number of steps to be saved if + restart_file != False + :param False useColvars: True if you want the restrains to be loaded by colvars + :returns: a LAMMPSModel object + + """ + + + lmp = lammps(cmdargs=['-screen','none','-log',lammps_folder+'log.lammps','-nocite']) + me = MPI.COMM_WORLD.Get_rank() + nprocs = MPI.COMM_WORLD.Get_size() + # check if we have a restart file or a path to which restart + if restart_file == False: + doRestart = False + saveRestart = False + elif os.path.isdir(restart_file): + doRestart = False + saveRestart = True + else: + doRestart = True + saveRestart = True + if not initial_conformation and doRestart == False: + initial_conformation = lammps_folder+'initial_conformation.dat' + generate_chromosome_random_walks_conformation ([len(LOCI)], + outfile=initial_conformation, + seed_of_the_random_number_generator=kseed, + confining_environment=confining_environment) + + # Just prepared the steps recovery for steering pairs + if steering_pairs and doRestart == True: + init_lammps_run(lmp, initial_conformation, + neighbor=neighbor, + hide_log=hide_log, + connectivity=connectivity, + restart_file=restart_file) + else: + init_lammps_run(lmp, initial_conformation, + neighbor=neighbor, + hide_log=hide_log, + chromosome_particle_numbers=chromosome_particle_numbers, + connectivity=connectivity) + + lmp.command("dump 1 all custom %i %slangevin_dynamics_*.XYZ id xu yu zu" % (to_dump,lammps_folder)) + #lmp.command("dump_modify 1 format line \"%d %.5f %.5f %.5f\" sort id append yes") + + # ########################################################## + # # Generate RESTART file, SPECIAL format, not a .txt file # + # # Useful if simulation crashes + # Prepared an optimisation for steering pairs, but not for the rest# + # ########################################################## + # create lammps restart files every x steps. 1000 is ok + # There was the doubt of using text format session info (which allows use in other computers) + # but since the binary can be converted later and this: "Because a data file is in text format, + # if you use a data file written out by this command to restart a simulation, the initial state + # of the new run will be slightly different than the final state of the old run (when the file + # was written) which was represented internally by LAMMPS in binary format. A new simulation + # which reads the data file will thus typically diverge from a simulation that continued + # in the original input script." will continue with binary. To convert use restart2data + #if keep_restart_out_dir2: + # lmp.command("restart %i %s/relaxation_%i_*.restart" % (keep_restart_step, keep_restart_out_dir2, kseed)) + + + ####################################################### + # Set up fixes # + # use NVE ensemble # + # Langevin integrator Tstart Tstop 1/friction rndseed # + # => sampling NVT ensamble # + ####################################################### + # Define the langevin dynamics integrator + lmp.command("fix 1 all nve") + lmp.command("fix 2 all langevin 1.0 1.0 2.0 %i" % kseed) + + # Define the tethering to the center of the confining environment + if tethering: + lmp.command("fix 3 all spring tether 50.0 0.0 0.0 0.0 0.0") + + # Do a minimization step to prevent particles + # clashes in the initial conformation + if minimize: + + if to_dump: + lmp.command("undump 1") + lmp.command("dump 1 all custom %i %sminimization_*.XYZ id xu yu zu" % (to_dump,lammps_folder)) + #lmp.command("dump_modify 1 format line \"%d %.5f %.5f %.5f\" sort id append yes") + + print("Performing minimization run...") + lmp.command("minimize 1.0e-4 1.0e-6 100000 100000") + + if to_dump: + lmp.command("undump 1") + lmp.command("dump 1 all custom %i %slangevin_dynamics_*.XYZ id xu yu zu" % (to_dump,lammps_folder)) + #lmp.command("dump_modify 1 format line \"%d %.5f %.5f %.5f\" sort id append yes") + + if compress_with_pbc: + if to_dump: + lmp.command("undump 1") + lmp.command("dump 1 all custom %i %scompress_with_pbc_*.XYZ id xu yu zu" % (to_dump,lammps_folder)) + #lmp.command("dump_modify 1 format line \"%d %.5f %.5f %.5f\" sort id append yes") + + # Re-setting the initial timestep to 0 + lmp.command("reset_timestep 0") + + lmp.command("unfix 1") + lmp.command("unfix 2") + + # default as in PLoS Comp Biol Di Stefano et al. 2013 compress_with_pbc = [0.01, 0.01, 100000] + lmp.command("fix 1 all nph iso %s %s 2.0" % (compress_with_pbc[0], + compress_with_pbc[1])) + lmp.command("fix 2 all langevin 1.0 1.0 2.0 %i" % kseed) + print("run %i" % compress_with_pbc[2]) + lmp.command("run %i" % compress_with_pbc[2]) + + lmp.command("unfix 1") + lmp.command("unfix 2") + + lmp.command("fix 1 all nve") + lmp.command("fix 2 all langevin 1.0 1.0 2.0 %i" % kseed) + + # Here We have to re-define the confining environment + print("# Previous particle density (nparticles/volume)", lmp.get_natoms()/(confining_environment[1]**3)) + confining_environment[1] = lmp.extract_global("boxxhi",1) - lmp.extract_global("boxxlo",1) + print("") + print("# New cubic box dimensions after isotropic compression") + print(lmp.extract_global("boxxlo",1), lmp.extract_global("boxxhi",1)) + print(lmp.extract_global("boxylo",1), lmp.extract_global("boxyhi",1)) + print(lmp.extract_global("boxzlo",1), lmp.extract_global("boxzhi",1)) + print("# New confining environment", confining_environment) + print("# New particle density (nparticles/volume)", lmp.get_natoms()/(confining_environment[1]**3)) + print("") + + if to_dump: + lmp.command("undump 1") + lmp.command("dump 1 all custom %i %slangevin_dynamics_*.XYZ id xu yu zu" % (to_dump,lammps_folder)) + #lmp.command("dump_modify 1 format line \"%d %.5f %.5f %.5f\" sort id append yes") + + if compress_without_pbc: + if to_dump: + lmp.command("undump 1") + lmp.command("dump 1 all custom %i %scompress_without_pbc_*.XYZ id xu yu zu" % (to_dump,lammps_folder)) + #lmp.command("dump_modify 1 format line \"%d %.5f %.5f %.5f\" sort id append yes") + + # Re-setting the initial timestep to 0 + lmp.command("reset_timestep 0") + + # default as in Sci Rep Di Stefano et al. 2016 + # compress_without_pbc = [initial_radius, final_radius, timesteps_per_minirun] + # = [350, 161.74, 100] + radius = compress_without_pbc[0] + while radius > compress_without_pbc[1]: + + print("New radius %f" % radius) + if radius != compress_without_pbc[0]: + lmp.command("region sphere delete") + + lmp.command("region sphere sphere 0.0 0.0 0.0 %f units box side in" % radius) + + # Performing the simulation + lmp.command("fix 5 all wall/region sphere lj126 1.0 1.0 1.12246152962189") + lmp.command("run %i" % compress_without_pbc[2]) + + radius -= 0.1 + + # Here we have to re-define the confining environment + volume = 4.*np.pi/3.0*(compress_without_pbc[0]**3) + print("# Previous particle density (nparticles/volume)", lmp.get_natoms()/volume) + confining_environment[1] = compress_without_pbc[1] + print("") + volume = 4.*np.pi/3.0*(compress_without_pbc[1]**3) + print("# New particle density (nparticles/volume)", lmp.get_natoms()/volume) + print("") + + if initial_relaxation: + + if to_dump: + lmp.command("undump 1") + lmp.command("dump 1 all custom %i %sinitial_relaxation.XYZ id xu yu zu" % (to_dump,lammps_folder)) + lmp.command("dump_modify 1 format line \"%d %.5f %.5f %.5f\" sort id append yes") + if "MSD" in initial_relaxation: + lmp.command("compute MSD all msd") + lmp.command("variable MSD equal c_MSD[4]") + lmp.command("variable dx equal c_MSD[1]") + lmp.command("variable dy equal c_MSD[2]") + lmp.command("variable dz equal c_MSD[3]") + lmp.command("variable step equal step") + lmp.command("fix MSD all print %i \"${step} ${dx} ${dy} ${dz} ${MSD}\" file MSD.txt" % (initial_relaxation["MSD"])) + + if "Distances" in initial_relaxation: + #lmp.command("compute xu all property/atom xu") + #lmp.command("compute yu all property/atom yu") + #lmp.command("compute zu all property/atom zu") + #pair_number = 0 + #for particle1 in range(1,chrlength[0]): + # for particle2 in range(particle1,chrlength[0]): + # pair_number += 1 + # lmp.command("variable x%i equal c_xu[%i]" % (particle1, particle1)) + # lmp.command("variable x%i equal c_xu[%i]" % (particle2, particle2)) + # lmp.command("variable y%i equal c_yu[%i]" % (particle1, particle1)) + # lmp.command("variable y%i equal c_yu[%i]" % (particle2, particle2)) + # lmp.command("variable z%i equal c_zu[%i]" % (particle1, particle1)) + # lmp.command("variable z%i equal c_zu[%i]" % (particle2, particle2)) + + # lmp.command("variable xLE%i equal v_x%i-v_x%i" % (pair_number, particle1, particle2)) + # lmp.command("variable yLE%i equal v_y%i-v_y%i" % (pair_number, particle1, particle2)) + # lmp.command("variable zLE%i equal v_z%i-v_z%i" % (pair_number, particle1, particle2)) + # lmp.command("variable dist_%i_%i equal sqrt(v_xLE%i*v_xLE%i+v_yLE%i*v_yLE%i+v_zLE%i*v_zLE%i)" % (particle1, particle2, pair_number, pair_number, pair_number, pair_number, pair_number, pair_number)) + + lmp.command("compute pairs all property/local patom1 patom2") + lmp.command("compute distances all pair/local dist") + #lmp.command("variable distances equal c_distances[1]") + lmp.command("dump distances all local %i distances.txt c_pairs[1] c_pairs[2] c_distances" % (initial_relaxation["Distances"])) + #lmp.command("fix distances all print %i \"${step} ${distances}\" file distances.txt" % (initial_relaxation["Distances"])) + + lmp.command("reset_timestep 0") + lmp.command("run %i" % initial_relaxation["relaxation_time"]) + lmp.command("write_data relaxed_conformation.txt nocoeff") + if "MSD" in initial_relaxation: + lmp.command("uncompute MSD") + if "distances" in initial_relaxation: + lmp.command("uncompute distances") + lmp.command("undum distances") + + timepoints = 1 + xc = [] + # Setup the pairs to co-localize using the COLVARS plug-in + if steering_pairs: + + if doRestart == False: + # Start relaxation step + lmp.command("reset_timestep 0") # cambiar para punto ionicial + lmp.command("run %i" % steering_pairs['timesteps_relaxation']) + lmp.command("reset_timestep %i" % 0) + + # Start Steered Langevin dynamics + if to_dump: + lmp.command("undump 1") + lmp.command("dump 1 all custom %i %ssteered_MD_*.XYZ id xu yu zu" % (to_dump,lammps_folder)) + #lmp.command("dump_modify 1 format line \"%d %.5f %.5f %.5f\" sort id") + + if 'number_of_kincrease' in steering_pairs: + nbr_kincr = steering_pairs['number_of_kincrease'] + else: + nbr_kincr = 1 + + if doRestart == True: + restart_k_increase = int(restart_file.split('/')[-1].split('_')[2]) + restart_time = int(restart_file.split('/')[-1].split('_')[4][:-8]) + + #steering_pairs['colvar_output'] = os.path.dirname(os.path.abspath(steering_pairs['colvar_output'])) + '/' + str(kseed) + '_'+ os.path.basename(steering_pairs['colvar_output']) + steering_pairs['colvar_output'] = lammps_folder+os.path.basename(steering_pairs['colvar_output']) + for kincrease in range(nbr_kincr): + # Write the file containing the pairs to constraint + # steering_pairs should be a dictionary with: + # Avoid to repeat calculations in case of restart + if (doRestart == True) and (kincrease < restart_k_increase): + continue + + if useColvars == True: + + generate_colvars_list(steering_pairs, kincrease+1) + + # Adding the colvar option + #print "fix 4 all colvars %s output %s" % (steering_pairs['colvar_output'],lammps_folder) + lmp.command("fix 4 all colvars %s output %sout" % (steering_pairs['colvar_output'],lammps_folder)) + + if to_dump: + # lmp.command("thermo_style custom step temp epair emol") + lmp.command("thermo_style custom step temp epair emol pe ke etotal f_4") + lmp.command("thermo_modify norm no flush yes") + lmp.command("variable step equal step") + lmp.command("variable objfun equal f_4") + lmp.command('''fix 5 all print %s "${step} ${objfun}" file "%sobj_fun_from_time_point_%s_to_time_point_%s.txt" screen "no" title "#Timestep Objective_Function"''' % (steering_pairs['colvar_dump_freq'],lammps_folder,str(0), str(1))) + + # will load the bonds directly into LAMMPS + else: + bond_list = generate_bond_list(steering_pairs) + for bond in bond_list: + lmp.command(bond) + + if to_dump: + lmp.command("thermo_style custom step temp etotal") + lmp.command("thermo_modify norm no flush yes") + lmp.command("variable step equal step") + lmp.command("variable objfun equal etotal") + lmp.command('''fix 5 all print %s "${step} ${objfun}" file "%sobj_fun_from_time_point_%s_to_time_point_%s.txt" screen "no" title "#Timestep Objective_Function"''' % (steering_pairs['colvar_dump_freq'],lammps_folder,str(0), str(1))) + #lmp.command("reset_timestep %i" % 0) + resettime = 0 + runtime = steering_pairs['timesteps_per_k'] + if (doRestart == True) and (kincrease == restart_k_increase): + resettime = restart_time + runtime = steering_pairs['timesteps_per_k'] - restart_time + + # Create 10 restarts with name restart_kincrease_%s_time_%s.restart every + if saveRestart == True: + if os.path.isdir(restart_file): + restart_file_new = restart_file + 'restart_kincrease_%s_time_*.restart' %(kincrease) + else: + restart_file_new = '/'.join(restart_file.split('/')[:-1]) + '/restart_kincrease_%s_time_*.restart' %(kincrease) + #print(restart_file_new) + lmp.command("restart %i %s" %(int(steering_pairs['timesteps_per_k']/store_n_steps), restart_file_new)) + + #lmp.command("reset_timestep %i" % resettime) + lmp.command("run %i" % runtime) + + # Setup the pairs to co-localize using the COLVARS plug-in + if time_dependent_steering_pairs: + timepoints = time_dependent_steering_pairs['colvar_dump_freq'] + + #if exists("objective_function_profile.txt"): + # os.remove("objective_function_profile.txt") + + #print "# Getting the time dependent steering pairs!" + time_dependent_restraints = get_time_dependent_colvars_list(time_dependent_steering_pairs) + time_points = sorted(time_dependent_restraints.keys()) + print("#Time_points",time_points) + sys.stdout.flush() + + time_dependent_steering_pairs['colvar_output'] = lammps_folder+os.path.basename(time_dependent_steering_pairs['colvar_output']) + # Performing the adaptation step from initial conformation to Tadphys excluded volume + if time_dependent_steering_pairs['adaptation_step']: + restraints = {} + for time_point in time_points[0:1]: + lmp.command("reset_timestep %i" % 0) + # Change to_dump with some way to load the conformations you want to store + # This Adaptation could be discarded in the trajectory files. + if to_dump: + lmp.command("undump 1") + lmp.command("dump 1 all custom %i %sadapting_MD_from_initial_conformation_to_Tadphys_at_time_point_%s.XYZ id xu yu zu" % (to_dump, lammps_folder, time_point)) + lmp.command("dump_modify 1 format line \"%d %.5f %.5f %.5f\" sort id append yes") + + restraints[time_point] = {} + print("# Step %s - %s" % (time_point, time_point)) + sys.stdout.flush() + for pair in list(time_dependent_restraints[time_point].keys()): + # Strategy changing gradually the spring constant and the equilibrium distance + # Case 1: The restraint is present at time point 0 and time point 1: + if pair in time_dependent_restraints[time_point]: + # Case A: The restrainttype is the same at time point 0 and time point 1 -> + # The spring force changes, and the equilibrium distance is the one at time_point+1 + restraints[time_point][pair] = [ + # Restraint type + [time_dependent_restraints[time_point][pair][0]], + # Initial spring constant + [time_dependent_restraints[time_point][pair][1]*time_dependent_steering_pairs['k_factor']], + # Final spring constant + [time_dependent_restraints[time_point][pair][1]*time_dependent_steering_pairs['k_factor']], + # Initial equilibrium distance + [time_dependent_restraints[time_point][pair][2]], + # Final equilibrium distance + [time_dependent_restraints[time_point][pair][2]], + # Number of timesteps for the gradual change + [int(time_dependent_steering_pairs['timesteps_per_k_change'][time_point]*0.1)]] + if useColvars == True: + generate_time_dependent_colvars_list(restraints[time_point], time_dependent_steering_pairs['colvar_output'], time_dependent_steering_pairs['colvar_dump_freq']) + copyfile(time_dependent_steering_pairs['colvar_output'], + "colvar_list_from_time_point_%s_to_time_point_%s.txt" % + (str(time_point), str(time_point))) + lmp.command("velocity all create 1.0 %s" % kseed) + # Adding the colvar option and perfoming the steering + if time_point != time_points[0]: + lmp.command("unfix 4") + print("#fix 4 all colvars %s" % time_dependent_steering_pairs['colvar_output']) + sys.stdout.flush() + lmp.command("fix 4 all colvars %s tstat 2 output %sout" % (time_dependent_steering_pairs['colvar_output'],lammps_folder)) + else: + bond_list = generate_time_dependent_bond_list(restraints[time_point]) + for bond in bond_list: + lmp.command(bond) + + lmp.command("run %i" % int(time_dependent_steering_pairs['timesteps_per_k_change'][time_point]*0.1)) + + # Time dependent steering + restraints = {} + #for i in xrange(time_points[0],time_points[-1]): + for time_point in time_points[0:-1]: + lmp.command("reset_timestep %i" % 0) + # Change to_dump with some way to load the conformations you want to store + if to_dump: + lmp.command("undump 1") + lmp.command("dump 1 all custom %i %ssteered_MD_from_time_point_%s_to_time_point_%s.XYZ id xu yu zu" % (to_dump, lammps_folder, time_point, time_point+1)) + lmp.command("dump_modify 1 format line \"%d %.5f %.5f %.5f\" sort id append yes") + + restraints[time_point] = {} + print("# Step %s - %s" % (time_point, time_point+1)) + sys.stdout.flush() + # Compute the current distance between any two particles + xc_tmp = np.array(lmp.gather_atoms("x",1,3)) + current_distances = compute_particles_distance(xc_tmp) + + for pair in set(list(time_dependent_restraints[time_point].keys())+list(time_dependent_restraints[time_point+1].keys())): + r = 0 + + # Strategy changing gradually the spring constant + # Case 1: The restraint is present at time point 0 and time point 1: + if pair in time_dependent_restraints[time_point] and pair in time_dependent_restraints[time_point+1]: + # Case A: The restrainttype is the same at time point 0 and time point 1 -> + # The spring force changes, and the equilibrium distance is the one at time_point+1 + if time_dependent_restraints[time_point][pair][0] == time_dependent_restraints[time_point+1][pair][0]: + r += 1 + restraints[time_point][pair] = [ + # Restraint type + [time_dependent_restraints[time_point+1][pair][0]], + # Initial spring constant + [time_dependent_restraints[time_point][pair][1] *time_dependent_steering_pairs['k_factor']], + # Final spring constant + [time_dependent_restraints[time_point+1][pair][1]*time_dependent_steering_pairs['k_factor']], + # Initial equilibrium distance + [time_dependent_restraints[time_point][pair][2]], + # Final equilibrium distance + [time_dependent_restraints[time_point+1][pair][2]], + # Number of timesteps for the gradual change + [int(time_dependent_steering_pairs['timesteps_per_k_change'][time_point])]] + # Case B: The restrainttype is different between time point 0 and time point 1 + if time_dependent_restraints[time_point][pair][0] != time_dependent_restraints[time_point+1][pair][0]: + # Case a: The restrainttype is "Harmonic" at time point 0 + # and "LowerBoundHarmonic" at time point 1 + if time_dependent_restraints[time_point][pair][0] == "Harmonic": + r += 1 + restraints[time_point][pair] = [ + # Restraint type + [time_dependent_restraints[time_point][pair][0], time_dependent_restraints[time_point+1][pair][0]], + # Initial spring constant + [time_dependent_restraints[time_point][pair][1]*time_dependent_steering_pairs['k_factor'], 0.0], + # Final spring constant + [0.0, time_dependent_restraints[time_point+1][pair][1]*time_dependent_steering_pairs['k_factor']], + # Initial equilibrium distance + [time_dependent_restraints[time_point][pair][2], time_dependent_restraints[time_point][pair][2]], + # Final equilibrium distance + [time_dependent_restraints[time_point+1][pair][2], time_dependent_restraints[time_point+1][pair][2]], + # Number of timesteps for the gradual change + #[int(time_dependent_steering_pairs['timesteps_per_k_change']), int(time_dependent_steering_pairs['timesteps_per_k_change'])]] + [int(time_dependent_steering_pairs['timesteps_per_k_change'][time_point]), int(time_dependent_steering_pairs['timesteps_per_k_change'][time_point])]] + # Case b: The restrainttype is "LowerBoundHarmonic" at time point 0 + # and "Harmonic" at time point 1 + if time_dependent_restraints[time_point][pair][0] == "HarmonicLowerBound": + r += 1 + restraints[time_point][pair] = [ + # Restraint type + [time_dependent_restraints[time_point][pair][0], time_dependent_restraints[time_point+1][pair][0]], + # Initial spring constant + [time_dependent_restraints[time_point][pair][1]*time_dependent_steering_pairs['k_factor'], 0.0], + # Final spring constant + [0.0, time_dependent_restraints[time_point+1][pair][1]*time_dependent_steering_pairs['k_factor']], + # Initial equilibrium distance + [time_dependent_restraints[time_point][pair][2], time_dependent_restraints[time_point][pair][2]], + # Final equilibrium distance + [time_dependent_restraints[time_point+1][pair][2], time_dependent_restraints[time_point+1][pair][2]], + # Number of timesteps for the gradual change + #[int(time_dependent_steering_pairs['timesteps_per_k_change']), int(time_dependent_steering_pairs['timesteps_per_k_change'])]] + [int(time_dependent_steering_pairs['timesteps_per_k_change'][time_point]), int(time_dependent_steering_pairs['timesteps_per_k_change'][time_point])]] + + # Case 2: The restraint is not present at time point 0, but it is at time point 1: + elif pair not in time_dependent_restraints[time_point] and pair in time_dependent_restraints[time_point+1]: + # List content: restraint_type,kforce,distance + r += 1 + restraints[time_point][pair] = [ + # Restraint type -> Is the one at time point time_point+1 + [time_dependent_restraints[time_point+1][pair][0]], + # Initial spring constant + [0.0], + # Final spring constant + [time_dependent_restraints[time_point+1][pair][1]*time_dependent_steering_pairs['k_factor']], + # Initial equilibrium distance + [time_dependent_restraints[time_point+1][pair][2]], + # Final equilibrium distance + [time_dependent_restraints[time_point+1][pair][2]], + # Number of timesteps for the gradual change + [int(time_dependent_steering_pairs['timesteps_per_k_change'][time_point])]] + + # Case 3: The restraint is present at time point 0, but it is not at time point 1: + elif pair in time_dependent_restraints[time_point] and pair not in time_dependent_restraints[time_point+1]: + # List content: restraint_type,kforce,distance + r += 1 + restraints[time_point][pair] = [ + # Restraint type -> Is the one at time point time_point + [time_dependent_restraints[time_point][pair][0]], + # Initial spring constant + [time_dependent_restraints[time_point][pair][1]*time_dependent_steering_pairs['k_factor']], + # Final spring constant + [0.0], + # Initial equilibrium distance + [time_dependent_restraints[time_point][pair][2]], + # Final equilibrium distance + [time_dependent_restraints[time_point][pair][2]], + # Number of timesteps for the gradual change + [int(time_dependent_steering_pairs['timesteps_per_k_change'][time_point])]] + + #current_distances[pair], + else: + print("#ERROR None of the previous conditions is matched!") + if pair in time_dependent_restraints[time_point]: + print("# Pair %s at timepoint %s %s " % (pair, time_point, time_dependent_restraints[time_point][pair])) + if pair in time_dependent_restraints[time_point+1]: + print("# Pair %s at timepoint %s %s " % (pair, time_point+1, time_dependent_restraints[time_point+1][pair])) + continue + + if r > 1: + print("#ERROR Two of the previous conditions are matched!") + + #if pair in time_dependent_restraints[time_point]: + # print "# Pair %s at timepoint %s %s " % (pair, time_point, time_dependent_restraints[time_point][pair]) + #else: + # print "# Pair %s at timepoint %s None" % (pair, time_point) + + #if pair in time_dependent_restraints[time_point+1]: + # print "# Pair %s at timepoint %s %s " % (pair, time_point+1, time_dependent_restraints[time_point+1][pair]) + #else: + # print "# Pair %s at timepoint %s None" % (pair, time_point+1) + #print restraints[pair] + #print "" + lmp.command("velocity all create 1.0 %s" % kseed) + if useColvars == True: + generate_time_dependent_colvars_list(restraints[time_point], time_dependent_steering_pairs['colvar_output'], time_dependent_steering_pairs['colvar_dump_freq']) + copyfile(time_dependent_steering_pairs['colvar_output'], + "%scolvar_list_from_time_point_%s_to_time_point_%s.txt" % + (lammps_folder, str(time_point), str(time_point+1))) + # Adding the colvar option and perfoming the steering + if time_point != time_points[0]: + lmp.command("unfix 4") + print("#fix 4 all colvars %s" % time_dependent_steering_pairs['colvar_output']) + sys.stdout.flush() + lmp.command("fix 4 all colvars %s tstat 2 output %sout" % (time_dependent_steering_pairs['colvar_output'],lammps_folder)) + if to_dump: + lmp.command("thermo_style custom step temp epair emol pe ke etotal f_4") + lmp.command("thermo_modify norm no flush yes") + lmp.command("variable step equal step") + lmp.command("variable objfun equal f_4") + lmp.command('''fix 5 all print %s "${step} ${objfun}" file "%sobj_fun_from_time_point_%s_to_time_point_%s.txt" screen "no" title "#Timestep Objective_Function"''' % (time_dependent_steering_pairs['colvar_dump_freq'],lammps_folder,str(time_point), str(time_point+1))) + else: + bond_list = generate_time_dependent_bond_list(restraints[time_point]) + for bond in bond_list: + lmp.command(bond) + if to_dump: + lmp.command("thermo_style custom step temp epair emol pe ke etotal") + lmp.command("thermo_modify norm no flush yes") + lmp.command("variable step equal step") + lmp.command("variable objfun equal etotal") + lmp.command('''fix 5 all print %s "${step} ${objfun}" file "%sobj_fun_from_time_point_%s_to_time_point_%s.txt" screen "no" title "#Timestep Objective_Function"''' % (time_dependent_steering_pairs['colvar_dump_freq'],lammps_folder,str(time_point), str(time_point+1))) + + lmp.command("run %i" % int(time_dependent_steering_pairs['timesteps_per_k_change'][time_point])) + + if time_point > 0: + + if exists("%sout.colvars.traj.BAK" % lammps_folder): + + copyfile("%sout.colvars.traj.BAK" % lammps_folder, "%srestrained_pairs_equilibrium_distance_vs_timestep_from_time_point_%s_to_time_point_%s.txt" % (lammps_folder, str(time_point-1), str(time_point))) + + os.remove("%sout.colvars.traj.BAK" % lammps_folder) + + # Set interactions for chromosome compartmentalization + if compartmentalization: + if to_dump: + lmp.command("undump 1") + lmp.command("dump 1 all custom %i %scompartmentalization_*.XYZ id xu yu zu" % (to_dump,lammps_folder)) + + # First we have to partition the genome in the defined compartments + for group in compartmentalization['partition']: + list_of_particles = get_list(compartmentalization['partition'][group]) + for atom in list_of_particles: + #print("set atom %s type %s" % (atom,group+1)) + lmp.command("set atom %s type %s" % (atom,group+1)) + + # Second we have to define the type of interactions + for pair in compartmentalization['interactions']: + #pair_coeff t1 t2 epsilon sigma rc + t1 = pair[0]+1 + t2 = pair[1]+1 + if t1 > t2: + t1 = pair[1]+1 + t2 = pair[0]+1 + + epsilon = compartmentalization['interactions'][pair][1] + + try: + sigma1 = compartmentalization['radii'][pair[0]] + except: + sigma1 = 0.5 + try: + sigma2 = compartmentalization['radii'][pair[1]] + except: + sigma2 = 0.5 + sigma = sigma1 + sigma2 + + if compartmentalization['interactions'][pair][0] == "attraction": + rc = sigma * 2.5 + if compartmentalization['interactions'][pair][0] == "repulsion": + rc = sigma * 1.12246152962189 + + print("pair_coeff %s %s lj/cut %s %s %s" % (t1,t2,epsilon,sigma,rc)) + lmp.command("pair_coeff %s %s lj/cut %s %s %s" % (t1,t2,epsilon,sigma,rc)) + try: + lmp.command("run %s" % (compartmentalization['runtime'])) + except: + pass + + + # Setup the pairs to co-localize using the COLVARS plug-in + if loop_extrusion_dynamics: + + # Start relaxation step + try: + lmp.command("reset_timestep 0") + lmp.command("run %i" % loop_extrusion_dynamics['timesteps_relaxation']) + except: + pass + + lmp.command("reset_timestep 0") + # Start Loop extrusion dynamics + if to_dump: + lmp.command("undump 1") + lmp.command("dump 1 all custom %i %sloop_extrusion_MD_*.XYZ id xu yu zu" % (to_dump,lammps_folder)) + lmp.command("dump_modify 1 format line \"%d %.5f %.5f %.5f\" sort id append no") + + # Randomly extract starting point of the extrusion dynamics between start and stop + extruders_positions = [] + nextruders = int(chromosome_particle_numbers[0]/loop_extrusion_dynamics['separation']) + print(nextruders) + for extruder in range(nextruders): + try: + occupied_positions = list(chain(*tmp_extruders_positions)) + except: + occupied_positions = [] + new_positions = draw_loop_extrusion_starting_point(loop_extrusion_dynamics['chrlength'][0]) + while (new_positions[0] in occupied_positions) or (new_positions[1] in occupied_positions) or (new_positions[0] in loop_extrusion_dynamics['barriers']) or (new_positions[1] in loop_extrusion_dynamics['barriers']): + new_positions = draw_loop_extrusion_starting_point(loop_extrusion_dynamics['chrlength'][0]) + extruders_positions.append(new_positions) + tmp_extruders_positions = [extruders_positions[x] for x in range(len(extruders_positions))] + print("Initial extruders' positions",extruders_positions) + + + # Initialise the lifetime of each extruder + extruders_lifetimes = [] + for extruder in range(nextruders): + extruders_lifetimes.append(int(0)) + print(extruders_positions, extruders_lifetimes) + + lmp.command("compute xu all property/atom xu") + lmp.command("compute yu all property/atom yu") + lmp.command("compute zu all property/atom zu") + + # Loop extrusion steps + for LES in range(int(run_time/loop_extrusion_dynamics['extrusion_time'])): + thermo_style="thermo_style custom step temp epair emol" + + # Update the bond restraint + loop_number = 1 + for particle1,particle2 in extruders_positions: + print("# fix LE%i all restrain bond %i %i %f %f %f" % (loop_number, + particle1, + particle2, + loop_extrusion_dynamics['attraction_strength'], + loop_extrusion_dynamics['attraction_strength'], + loop_extrusion_dynamics['equilibrium_distance'])) + + lmp.command("fix LE%i all restrain bond %i %i %f %f %f" % (loop_number, + particle1, + particle2, + loop_extrusion_dynamics['attraction_strength'], + loop_extrusion_dynamics['attraction_strength'], + loop_extrusion_dynamics['equilibrium_distance'])) + lmp.command("variable x%i equal c_xu[%i]" % (particle1, particle1)) + lmp.command("variable x%i equal c_xu[%i]" % (particle2, particle2)) + lmp.command("variable y%i equal c_yu[%i]" % (particle1, particle1)) + lmp.command("variable y%i equal c_yu[%i]" % (particle2, particle2)) + lmp.command("variable z%i equal c_zu[%i]" % (particle1, particle1)) + lmp.command("variable z%i equal c_zu[%i]" % (particle2, particle2)) + + lmp.command("variable xLE%i equal v_x%i-v_x%i" % (loop_number, particle1, particle2)) + lmp.command("variable yLE%i equal v_y%i-v_y%i" % (loop_number, particle1, particle2)) + lmp.command("variable zLE%i equal v_z%i-v_z%i" % (loop_number, particle1, particle2)) + lmp.command("variable dist_%i_%i equal sqrt(v_xLE%i*v_xLE%i+v_yLE%i*v_yLE%i+v_zLE%i*v_zLE%i)" % (particle1, + particle2, + loop_number, + loop_number, + loop_number, + loop_number, + loop_number, + loop_number)) + thermo_style += " v_dist_%i_%i" % (particle1, particle2) + loop_number += 1 + + lmp.command("%s" % thermo_style) + # Doing the LES + lmp.command("run %i" % loop_extrusion_dynamics['extrusion_time']) + #lmp.command("run 0") + + # update the lifetime of each extruder + for extruder in range(nextruders): + extruders_lifetimes[extruder] = extruders_lifetimes[extruder] + 1 + + # Remove the loop extrusion restraint! + loop_number = 1 + for particle1,particle2 in extruders_positions: + + print("# unfix LE%i" % (loop_number)) + lmp.command("unfix LE%i" % (loop_number)) + + loop_number += 1 + + # Update the particles involved in the loop extrusion interaction: + # decrease the initial_start by one until you get to start + # increase the initial_stop by one until you get to stop + for extruder in range(len(extruders_positions)): + # If the left part reaches the start of the chromosome -> Stop extruding + if extruders_positions[extruder][0] > 1: + random_number = uniform(0, 1) + if random_number <= loop_extrusion_dynamics['left_extrusion_rate']: + extruders_positions[extruder][0] -= 1 + # If the right part reaches the end of the chromosome -> Stop extruding + if extruders_positions[extruder][1] < chromosome_particle_numbers[0]: + random_number = uniform(0, 1) + if random_number <= loop_extrusion_dynamics['right_extrusion_rate']: + extruders_positions[extruder][1] += 1 + + # If the extruder bumps into another extruder bring it back + tmp_extruders_positions = [extruders_positions[x] for x in range(len(extruders_positions)) if x != extruder] + occupied_positions = list(chain(*tmp_extruders_positions)) + print("Extruder positions",extruders_positions[extruder]) + print("Occupied_positions",sorted(occupied_positions)) + if extruders_positions[extruder][0] in occupied_positions: + extruders_positions[extruder][0] += 1 + if extruders_positions[extruder][1] in occupied_positions: + extruders_positions[extruder][1] -= 1 + + # If the extruder reached its lifetime, put it in another position + if extruders_lifetimes[extruder] == loop_extrusion_dynamics['lifetime']: + extruders_positions[extruder] = draw_loop_extrusion_starting_point(loop_extrusion_dynamics['chrlength'][0]) + while (extruders_positions[extruder][0] in occupied_positions) or (extruders_positions[extruder][1] in occupied_positions) or (extruders_positions[extruder][0] in loop_extrusion_dynamics['barriers']) or (extruders_positions[extruder][1] in loop_extrusion_dynamics['barriers']): + extruders_positions[extruder] = draw_loop_extrusion_starting_point(loop_extrusion_dynamics['chrlength'][0]) + # Re-initialise the lifetime of the extruder + extruders_lifetimes[extruder] = 0 + print("Extruders' repositioning after lifetime",extruders_positions,extruders_positions[extruder]) + + # Check presence of barriers + if loop_extrusion_dynamics['barriers_permeability'] < 1.0: + # If a motor tries to overcome a barrier we stop it with a probability > than the permeability + # If the barrier is on the left of the extruders, which is extruding contrary to the chain index, we have to re-put the extruder forwards + if extruders_positions[extruder][0] in loop_extrusion_dynamics['barriers']: + random_number = uniform(0, 1) + if random_number >= loop_extrusion_dynamics['barriers_permeability']: + extruders_positions[extruder][0] += 1 + # If the barrier is on the left of the extruders, which is extruding with the chain index, we have to re-put the extruder backwards + if extruders_positions[extruder][1] in loop_extrusion_dynamics['barriers']: + random_number = uniform(0, 1) + if random_number >= loop_extrusion_dynamics['barriers_permeability']: + extruders_positions[extruder][1] -= 1 + + print("Extruders positions at step",LES,extruders_positions) + print("Extruders lifetimes at step",LES,extruders_lifetimes) + + ### Put here the creationg of a pickle with the complete trajectory ### + if to_dump: + lmp.command("undump 1") + lmp.command("dump 1 all custom %i %sloop_extrusion_MD_*.XYZ id xu yu zu" % (to_dump,lammps_folder)) + # Post-processing analysis + # Save coordinates + #for time in range(0,runtime,to_dump): + # xc.append(np.array(read_trajectory_file("%s/loop_extrusion_MD_%s.XYZ" % (lammps_folder, time)))) + + #xc.append(np.array(lmp.gather_atoms("x",1,3))) + + lmp.close() + + result = [] + for stg in range(len(xc)): + #log_objfun = read_objective_function("%sobj_fun_from_time_point_%s_to_time_point_%s.txt" % (lammps_folder, str(stg), str(stg+1))) + log_objfun = [0.0] + for timepoint in range(1,timepoints+1): + lammps_model = LAMMPSmodel({'x' : [], + 'y' : [], + 'z' : [], + 'cluster' : 'Singleton', + 'log_objfun' : log_objfun, + 'objfun' : log_objfun[-1], + 'radius' : 0.5, #float(CONFIG.HiC['resolution'] * CONFIG.HiC['scale'])/2, + 'index' : kseed+timepoint, + 'rand_init' : str(kseed+timepoint)}) + + if pbc: + store_conformation_with_pbc(xc[stg], lammps_model, confining_environment) + else: + skip_first = 0 + if time_dependent_steering_pairs: + skip_first = 1 + for i in range((timepoint-1+skip_first)*len(LOCI)*3,(timepoint+skip_first)*len(LOCI)*3,3): + lammps_model['x'].append(xc[stg][i]*float(CONFIG.HiC['resolution'] * CONFIG.HiC['scale'])) + lammps_model['y'].append(xc[stg][i+1]*float(CONFIG.HiC['resolution'] * CONFIG.HiC['scale'])) + lammps_model['z'].append(xc[stg][i+2]*float(CONFIG.HiC['resolution'] * CONFIG.HiC['scale'])) + result.append(lammps_model) + + #os.remove("%slog.cite" % lammps_folder) + # safe finished model + if model_path != False: + with open(model_path, "wb") as output_file: + dump((kseed,result), output_file) + ################### Special case for clusters with disk quota + # Remove the saved steps + if saveRestart == True: + if os.path.isdir(restart_file): + restart_path = restart_file + else: + restart_path = '/'.join(restart_file.split('/')[:-1]) + '/' + for pathfile in os.listdir(restart_path): + if pathfile.startswith('restart'): + os.remove(restart_path + pathfile) + ################################################################## + # Load initial conformation and return it + init_conf = read_conformation_file(initial_conformation) + + return (kseed, result, init_conf) + +def read_trajectory_file(fname): + + coords=[] + fhandler = open(fname) + line = next(fhandler) + try: + while True: + if line.startswith('ITEM: TIMESTEP'): + while not line.startswith('ITEM: ATOMS'): + line = next(fhandler) + if line.startswith('ITEM: ATOMS'): + line = next(fhandler) + line = line.strip() + if len(line) == 0: + continue + line_vals = line.split() + coords += [float(line_vals[1]),float(line_vals[2]),float(line_vals[3])] + line = next(fhandler) + except StopIteration: + pass + fhandler.close() + + return coords + +def read_conformation_file(fname): + + mod={'x':[], 'y':[], 'z':[]} + fhandler = open(fname) + line = next(fhandler) + try: + while True: + if line.startswith('LAMMPS input data file'): + while not line.startswith(' Atoms'): + line = next(fhandler) + if line.startswith(' Atoms'): + line = next(fhandler) + while len(line.strip()) == 0: + line = next(fhandler) + line = line.strip() + line_vals = line.split() + mod['x'].append(float(line_vals[3])) + mod['y'].append(float(line_vals[4])) + mod['z'].append(float(line_vals[5])) + line = next(fhandler) + if len(line.strip()) == 0: + break + except StopIteration: + pass + fhandler.close() + + return mod + +########## Part to perform the restrained dynamics ########## +# I should add here: The steered dynamics (Irene's and Hi-C based models) ; +# The loop extrusion dynamics ; the binders based dynamics (Marenduzzo and Nicodemi)...etc... + +def linecount(filename): + """ + Count valid lines of input colvars file + + :param filename: input colvars file. + + :returns: number of valid contact lines + + """ + + k = 0 + tfp = open(filename) + for i, line in enumerate(tfp): + + if line.startswith('#'): + continue + cols_vals = line.split() + if cols_vals[1] == cols_vals[2]: + continue + k += 1 + + return k + +########## + +def generate_colvars_list(steering_pairs, + kincrease=0, + colvars_header='# collective variable: monitor distances\n\ncolvarsTrajFrequency 1000 # output every 1000 steps\ncolvarsRestartFrequency 10000000\n', + colvars_template=''' + +colvar { + name %s + # %s %s %i + distance { + group1 { + atomNumbers %i + } + group2 { + atomNumbers %i + } + } +}''', + colvars_tail = ''' + +harmonic { + name h_pot_%s + colvars %s + centers %s + forceConstant %f +}\n''', colvars_harmonic_lower_bound_tail = ''' + +harmonicWalls { + name hlb_pot_%s + colvars %s + lowerWalls %s + forceConstant %f + lowerWallConstant 1.0 +}\n''' + ): + + """ + Generates lammps colvars file http://lammps.sandia.gov/doc/PDF/colvars-refman-lammps.pdf + + :param dict steering_pairs: dictionary containing all the information to write down the + the input file for the colvars implementation + :param exisiting_template colvars_header: header template for colvars file. + :param exisiting_template colvars_template: contact template for colvars file. + :param exisiting_template colvars_tail: tail template for colvars file. + + """ + + # Getting the input + # XXXThe target_pairs could be also a list as the one in output of get_HiCbased_restraintsXXX + target_pairs = steering_pairs['colvar_input'] + outfile = steering_pairs['colvar_output'] + if 'kappa_vs_genomic_distance' in steering_pairs: + kappa_vs_genomic_distance = steering_pairs['kappa_vs_genomic_distance'] + if 'chrlength' in steering_pairs: + chrlength = steering_pairs['chrlength'] + else: + chrlength = 0 + if 'copies' in steering_pairs: + copies = steering_pairs['copies'] + else: + copies = ['A'] + kbin = 10000000 + binsize = steering_pairs['binsize'] + if 'percentage_enforced_contacts' in steering_pairs: + percentage_enforced_contacts = steering_pairs['perc_enfor_contacts'] + else: + percentage_enforced_contacts = 100 + + # Here we extract from all the restraints only + # a random sub-sample of percentage_enforced_contacts/100*totcolvars + rand_lines = [] + i=0 + j=0 + if isinstance(target_pairs, str): + totcolvars = linecount(target_pairs) + ncolvars = int(totcolvars*(float(percentage_enforced_contacts)/100)) + + #print "Number of enforced contacts = %i over %i" % (ncolvars,totcolvars) + rand_positions = sample(list(range(totcolvars)), ncolvars) + rand_positions = sorted(rand_positions) + + tfp = open(target_pairs) + with open(target_pairs) as f: + for line in f: + line = line.strip() + if j >= ncolvars: + break + if line.startswith('#'): + continue + + cols_vals = line.split() + # Avoid to enforce restraints between the same bin + if cols_vals[1] == cols_vals[2]: + continue + + if i == rand_positions[j]: + rand_lines.append(line) + j += 1 + i += 1 + tfp.close() + elif isinstance(target_pairs, HiCBasedRestraints): + + rand_lines = target_pairs.get_hicbased_restraints() + totcolvars = len(rand_lines) + ncolvars = int(totcolvars*(float(percentage_enforced_contacts)/100)) + + #print "Number of enforced contacts = %i over %i" % (ncolvars,totcolvars) + rand_positions = sample(list(range(totcolvars)), ncolvars) + rand_positions = sorted(rand_positions) + + + else: + print("Unknown target_pairs") + return + + + + #print rand_lines + + seqdists = {} + poffset=0 + outf = open(outfile,'w') + outf.write(colvars_header) + for copy_nbr in copies: + i = 1 + for line in rand_lines: + if isinstance(target_pairs, str): + cols_vals = line.split() + else: + cols_vals = ['chr'] + line + + #print cols_vals + + if isinstance(target_pairs, HiCBasedRestraints) and cols_vals[3] != "Harmonic" and cols_vals[3] != "HarmonicLowerBound": + continue + + part1_start = int(cols_vals[1])*binsize + part1_end = (int(cols_vals[1])+1)*binsize + #print part1_start, part1_end + + part2_start = int(cols_vals[2])*binsize + part2_end = (int(cols_vals[2])+1)*binsize + #print part2_start, part2_end + + name = str(i)+copy_nbr + seqdist = abs(part1_start-part2_start) + #print seqdist + + region1 = cols_vals[0] + '_' + str(part1_start) + '_' + str(part1_end) + region2 = cols_vals[0] + '_' + str(part2_start) + '_' + str(part2_end) + + particle1 = int(cols_vals[1]) + 1 + poffset + particle2 = int(cols_vals[2]) + 1 + poffset + + seqdists[name] = seqdist + + outf.write(colvars_template % (name,region1,region2,seqdist,particle1,particle2)) + + if isinstance(target_pairs, HiCBasedRestraints): + # If the spring constant is zero we avoid to add the restraint! + if cols_vals[4] == 0.0: + continue + + centre = cols_vals[5] + kappa = cols_vals[4]*steering_pairs['k_factor'] + + if cols_vals[3] == "Harmonic": + outf.write(colvars_tail % (name,name,centre,kappa)) + + if cols_vals[3] == "HarmonicLowerBound": + outf.write(colvars_harmonic_lower_bound_tail % (name,name,centre,kappa)) + + i += 1 + poffset += chrlength + + outf.flush() + + #=========================================================================== + # if isinstance(target_pairs, HiCBasedRestraints): + # for copy_nbr in copies: + # i = 1 + # for line in rand_lines: + # cols_vals = line + # + # if cols_vals[3] == 0.0: + # continue + # + # name = str(i)+copy_nbr + # + # centre = cols_vals[4] + # kappa = cols_vals[3] + # + # if cols_vals[2] == "Harmonic": + # outf.write(colvars_tail % (name,name,centre,kappa)) + # + # elif cols_vals[2] == "HarmonicLowerBound": + # outf.write(colvars_harmonic_lower_bound_tail % (name,name,centre,kappa)) + # + # + # + # i += 1 + # poffset += chrlength + # + # outf.flush() + #=========================================================================== + + if 'kappa_vs_genomic_distance' in steering_pairs: + + kappa_values = {} + with open(kappa_vs_genomic_distance) as kgd: + for line in kgd: + line_vals = line.split() + kappa_values[int(line_vals[0])] = float(line_vals[1]) + + for seqd in set(seqdists.values()): + kappa = 0 + if seqd in kappa_values: + kappa = kappa_values[seqd]*kincrease + else: + for kappa_key in sorted(kappa_values, key=int): + if int(kappa_key) > seqd: + break + kappa = kappa_values[kappa_key]*kincrease + centres='' + names='' + for seq_name in seqdists: + if seqdists[seq_name] == seqd: + centres += ' 1.0' + names += ' '+seq_name + + outf.write(colvars_tail % (str(seqd),names,centres,kappa)) + + outf.flush() + + outf.close() + + +def generate_bond_list(steering_pairs): + + """ + Generates lammps bond commands + + :param dict steering_pairs: dictionary containing all the information to write down the + the input file for the bonds + """ + + # Getting the input + # The target_pairs could be also a list as the one in output of get_HiCbased_restraintsXXX + target_pairs = steering_pairs['colvar_input'] + if 'kappa_vs_genomic_distance' in steering_pairs: + kappa_vs_genomic_distance = steering_pairs['kappa_vs_genomic_distance'] + if 'chrlength' in steering_pairs: + chrlength = steering_pairs['chrlength'] + else: + chrlength = 0 + if 'copies' in steering_pairs: + copies = steering_pairs['copies'] + else: + copies = ['A'] + kbin = 10000000 + binsize = steering_pairs['binsize'] + if 'percentage_enforced_contacts' in steering_pairs: + percentage_enforced_contacts = steering_pairs['perc_enfor_contacts'] + else: + percentage_enforced_contacts = 100 + + # Here we extract from all the restraints only + # a random sub-sample of percentage_enforced_contacts/100*totcolvars + rand_lines = [] + i=0 + j=0 + if isinstance(target_pairs, str): + totcolvars = linecount(target_pairs) + ncolvars = int(totcolvars*(float(percentage_enforced_contacts)/100)) + + #print "Number of enforced contacts = %i over %i" % (ncolvars,totcolvars) + rand_positions = sample(list(range(totcolvars)), ncolvars) + rand_positions = sorted(rand_positions) + + tfp = open(target_pairs) + with open(target_pairs) as f: + for line in f: + line = line.strip() + if j >= ncolvars: + break + if line.startswith('#'): + continue + + cols_vals = line.split() + # Avoid to enforce restraints between the same bin + if cols_vals[1] == cols_vals[2]: + continue + + if i == rand_positions[j]: + rand_lines.append(line) + j += 1 + i += 1 + tfp.close() + elif isinstance(target_pairs, HiCBasedRestraints): + + rand_lines = target_pairs.get_hicbased_restraints() + totcolvars = len(rand_lines) + ncolvars = int(totcolvars*(float(percentage_enforced_contacts)/100)) + + #print "Number of enforced contacts = %i over %i" % (ncolvars,totcolvars) + rand_positions = sample(list(range(totcolvars)), ncolvars) + rand_positions = sorted(rand_positions) + + + else: + print("Unknown target_pairs") + return + + + + #print rand_lines + + seqdists = {} + poffset=0 + outf = [] #### a list + for copy_nbr in copies: + i = 1 + for line in rand_lines: + if isinstance(target_pairs, str): + cols_vals = line.split() + else: + cols_vals = ['chr'] + line + + #print cols_vals + + if isinstance(target_pairs, HiCBasedRestraints) and cols_vals[3] != "Harmonic" and cols_vals[3] != "HarmonicLowerBound": + continue + + part1_start = int(cols_vals[1])*binsize + part1_end = (int(cols_vals[1])+1)*binsize + #print part1_start, part1_end + + part2_start = int(cols_vals[2])*binsize + part2_end = (int(cols_vals[2])+1)*binsize + #print part2_start, part2_end + + name = str(i)+copy_nbr + seqdist = abs(part1_start-part2_start) + #print seqdist + + region1 = cols_vals[0] + '_' + str(part1_start) + '_' + str(part1_end) + region2 = cols_vals[0] + '_' + str(part2_start) + '_' + str(part2_end) + + particle1 = int(cols_vals[1]) + 1 + poffset + particle2 = int(cols_vals[2]) + 1 + poffset + + seqdists[name] = seqdist + + + if isinstance(target_pairs, HiCBasedRestraints): + # If the spring constant is zero we avoid to add the restraint! + if cols_vals[4] == 0.0: + continue + + centre = cols_vals[5] + kappa = cols_vals[4]*steering_pairs['k_factor'] + + bondType = None + if cols_vals[3] == "Harmonic": + bondType = 'bond' + elif cols_vals[3] == "HarmonicLowerBound": + bondType = 'lbound' + + if bondType: + outf.append('fix %s all restrain %s %d %d %f %f %f %f' %( + name, bondType, particle1, particle2, 0, kappa, + centre, centre)) + + + i += 1 + poffset += chrlength + + return outf + +########## + +def generate_time_dependent_bond_list(steering_pairs): + + + """ + Generates lammps bond commands + + :param dict steering_pairs: dictionary containing all the information to write down the + the input file for the bonds + """ + + outf = [] #### a list + # Defining the particle pairs + for pair in steering_pairs: + + #print steering_pairs[pair] + sys.stdout.flush() + for i in range(len(steering_pairs[pair][0])): + name = "%s_%s_%s" % (i, int(pair[0])+1, int(pair[1])+1) + seqdist = abs(int(pair[1])-int(pair[0])) + particle1 = int(pair[0])+1 + particle2 = int(pair[1])+1 + + restraint_type = steering_pairs[pair][0][i] + kappa_start = steering_pairs[pair][1][i] + kappa_stop = steering_pairs[pair][2][i] + centre_start = steering_pairs[pair][3][i] + centre_stop = steering_pairs[pair][4][i] + timesteps_per_k_change = steering_pairs[pair][5][i] + + bonType = None + if restraint_type == "Harmonic": + bonType = 'bond' + elif restraint_type == "HarmonicLowerBound": + bonType = 'lbound' + + if bonType: + outf.append('fix %s all restrain %s %d %d %f %f %f %f' %( + name, bonType, particle1, particle2, kappa_start, kappa_stop, + centre_start, centre_stop)) + return outf + +########## + +def generate_time_dependent_colvars_list(steering_pairs, + outfile, + colvar_dump_freq, + colvars_header='# collective variable: monitor distances\n\ncolvarsTrajFrequency %i # output every %i steps\ncolvarsRestartFrequency 1000000\n', + colvars_template=''' + +colvar { + name %s + # %s %s %i + width 1.0 + distance { + group1 { + atomNumbers %i + } + group2 { + atomNumbers %i + } + } +}''', + colvars_harmonic_tail = ''' + +harmonic { + name h_pot_%s + colvars %s + forceConstant %f + targetForceConstant %f + centers %s + targetCenters %s + targetNumSteps %s + outputEnergy yes +}\n''', + colvars_harmonic_lower_bound_tail = ''' +harmonicBound { + name hlb_pot_%s + colvars %s + forceConstant %f + targetForceConstant %f + centers %f + targetCenters %f + targetNumSteps %s + outputEnergy yes +}\n''' + ): + + + """ + harmonicWalls { + name hlb_pot_%s + colvars %s + forceConstant %f # This is the force constant at time_point + targetForceConstant %f # This is the force constant at time_point+1 + centers %f # This is the equilibrium distance at time_point+1 + targetCenters %f # This is the equilibrium distance at time_point+1 + targetNumSteps %d # This is the number of timesteps between time_point and time_point+1 + outputEnergy yes + }\n''', + + + colvars_harmonic_lower_bound_tail = ''' + + harmonicBound { + name hlb_pot_%s + colvars %s + forceConstant %f # This is the force constant at time_point + targetForceConstant %f # This is the force constant at time_point+1 + centers %f # This is the equilibrium distance at time_point+1 + targetCenters %f # This is the equilibrium distance at time_point+1 + targetNumSteps %d # This is the number of timesteps between time_point and time_point+1 + outputEnergy yes + }\n''', + + Generates lammps colvars file http://lammps.sandia.gov/doc/PDF/colvars-refman-lammps.pdf + + :param dict steering_pairs: dictionary containing all the information to write down the + the input file for the colvars implementation + :param exisiting_template colvars_header: header template for colvars file. + :param exisiting_template colvars_template: contact template for colvars file. + :param exisiting_template colvars_tail: tail template for colvars file. + + """ + + #restraints[pair] = [time_dependent_restraints[time_point+1][pair][0], # Restraint type -> Is the one at time point time_point+1 + #time_dependent_restraints[time_point][pair][1]*10., # Initial spring constant + #time_dependent_restraints[time_point+1][pair][1]*10., # Final spring constant + #time_dependent_restraints[time_point][pair][2], # Initial equilibrium distance + #time_dependent_restraints[time_point+1][pair][2], # Final equilibrium distance + #int(time_dependent_steering_pairs['timesteps_per_k_change'][time_point])] # Number of timesteps for the gradual change + + outf = open(outfile,'w') + #tfreq=10000 + #for pair in steering_pairs: + # if len(steering_pairs[pair][5]) >= 1: + # tfreq = int(steering_pairs[pair][5][0]/100) + # break + + tfreq = colvar_dump_freq + outf.write(colvars_header % (tfreq, tfreq)) + # Defining the particle pairs + for pair in steering_pairs: + + #print steering_pairs[pair] + sys.stdout.flush() + for i in range(len(steering_pairs[pair][0])): + name = "%s_%s_%s" % (i, int(pair[0])+1, int(pair[1])+1) + seqdist = abs(int(pair[1])-int(pair[0])) + region1 = "particle_%s" % (int(pair[0])+1) + region2 = "particle_%s" % (int(pair[1])+1) + + outf.write(colvars_template % (name,region1,region2,seqdist,int(pair[0])+1,int(pair[1])+1)) + + restraint_type = steering_pairs[pair][0][i] + kappa_start = steering_pairs[pair][1][i] + kappa_stop = steering_pairs[pair][2][i] + centre_start = steering_pairs[pair][3][i] + centre_stop = steering_pairs[pair][4][i] + timesteps_per_k_change = steering_pairs[pair][5][i] + + if restraint_type == "Harmonic": + outf.write(colvars_harmonic_tail % (name,name,kappa_start,kappa_stop,centre_start,centre_stop,timesteps_per_k_change)) + + if restraint_type == "HarmonicLowerBound": + outf.write(colvars_harmonic_lower_bound_tail % (name,name,kappa_start,kappa_stop,centre_start,centre_stop,timesteps_per_k_change)) + + + + + outf.flush() + + outf.close() + +########## + +def get_time_dependent_colvars_list(time_dependent_steering_pairs): + + """ + Generates lammps colvars file http://lammps.sandia.gov/doc/PDF/colvars-refman-lammps.pdf + + :param dict time_dependent_steering_pairs: dictionary containing all the information to write down the + the input file for the colvars implementation + """ + + # Getting the input + # XXXThe target_pairs_file could be also a list as the one in output of get_HiCbased_restraintsXXX + target_pairs = time_dependent_steering_pairs['colvar_input'] + outfile = time_dependent_steering_pairs['colvar_output'] + if 'chrlength' in time_dependent_steering_pairs: + chrlength = time_dependent_steering_pairs['chrlength'] + binsize = time_dependent_steering_pairs['binsize'] + if 'percentage_enforced_contacts' in time_dependent_steering_pairs: + percentage_enforced_contacts = time_dependent_steering_pairs['perc_enfor_contacts'] + else: + percentage_enforced_contacts = 100 + + # HiCbasedRestraints is a list of restraints returned by this function. + # Each entry of the list is a list of 5 elements describing the details of the restraint: + # 0 - particle_i + # 1 - particle_j + # 2 - type_of_restraint = Harmonic or HarmonicLowerBound or HarmonicUpperBound + # 3 - the kforce of the restraint + # 4 - the equilibrium (or maximum or minimum respectively) distance associated to the restraint + + # Here we extract from all the restraints only a random sub-sample + # of percentage_enforced_contacts/100*totcolvars + rand_lines = [] + i=0 + j=0 + if isinstance(target_pairs, str): + time_dependent_restraints = {} + totcolvars = linecount(target_pairs) + ncolvars = int(totcolvars*(float(percentage_enforced_contacts)/100)) + + #print "Number of enforced contacts = %i over %i" % (ncolvars,totcolvars) + rand_positions = sample(list(range(totcolvars)), ncolvars) + rand_positions = sorted(rand_positions) + + with open(target_pairs) as f: + for line in f: + line = line.strip() + if j >= ncolvars: + break + if line.startswith('#') or line == "": + continue + + # Line format: timepoint,particle1,particle2,restraint_type,kforce,distance + cols_vals = line.split() + + if int(cols_vals[1]) < int(cols_vals[2]): + pair = (int(cols_vals[1]), int(cols_vals[2])) + else: + pair = (int(cols_vals[2]), int(cols_vals[1])) + + try: + if pair in time_dependent_restraints[int(cols_vals[0])]: + print("WARNING: Check your restraint list! pair %s is repeated in time point %s!" % (pair, int(cols_vals[0]))) + # List content: restraint_type,kforce,distance + time_dependent_restraints[int(cols_vals[0])][pair] = [cols_vals[3], + float(cols_vals[4]), + float(cols_vals[5])] + except: + time_dependent_restraints[int(cols_vals[0])] = {} + # List content: restraint_type,kforce,distance + time_dependent_restraints[int(cols_vals[0])][pair] = [cols_vals[3], + float(cols_vals[4]), + float(cols_vals[5])] + if i == rand_positions[j]: + rand_lines.append(line) + j += 1 + i += 1 + elif isinstance(target_pairs, list): + time_dependent_restraints = dict((i,{}) for i in range(len(target_pairs))) + for time_point, HiCR in enumerate(target_pairs): + rand_lines = HiCR.get_hicbased_restraints() + totcolvars = len(rand_lines) + ncolvars = int(totcolvars*(float(percentage_enforced_contacts)/100)) + + #print "Number of enforced contacts = %i over %i" % (ncolvars,totcolvars) + rand_positions = sample(list(range(totcolvars)), ncolvars) + rand_positions = sorted(rand_positions) + + for cols_vals in rand_lines: + + if cols_vals[2] != "Harmonic" and cols_vals[2] != "HarmonicLowerBound": + continue + if int(cols_vals[0]) < int(cols_vals[1]): + pair = (int(cols_vals[0]), int(cols_vals[1])) + else: + pair = (int(cols_vals[1]), int(cols_vals[0])) + + if pair in time_dependent_restraints[time_point]: + print("WARNING: Check your restraint list! pair %s is repeated in time point %s!" % (pair, time_point)) + # List content: restraint_type,kforce,distance + time_dependent_restraints[time_point][pair] = [cols_vals[2], + float(cols_vals[3]), + float(cols_vals[4])] + + else: + print("Unknown target_pairs") + return + +# for time_point in sorted(time_dependent_restraints.keys()): +# for pair in time_dependent_restraints[time_point]: +# print "#Time_dependent_restraints", time_point,pair, time_dependent_restraints[time_point][pair] + return time_dependent_restraints + +### TODO Add the option to add also spheres of different radii (e.g. to simulate nucleoli) +########## Part to generate the initial conformation ########## +def generate_chromosome_random_walks_conformation ( chromosome_particle_numbers , + confining_environment=['sphere',100.] , + particle_radius=0.5 , + seed_of_the_random_number_generator=1 , + number_of_conformations=1, + outfile="Initial_random_walk_conformation.dat", + pbc=False, + center=True): + """ + Generates lammps initial conformation file by random walks + + :param chromosome_particle_numbers: list with the number of particles of each chromosome. + :param ['sphere',100.] confining_environment: dictionary with the confining environment of the conformation + Possible confining environments: + ['cube',edge_width] + ['sphere',radius] + ['ellipsoid',x-semiaxes, y-semiaxes, z-semiaxes] + ['cylinder', basal radius, height] + :param 0.5 particle_radius: Radius of each particle. + :param 1 seed_of_the_random_number_generator: random seed. + :param 1 number_of_conformations: copies of the conformation. + :param outfile: file where to store resulting initial conformation file + + """ + seed(seed_of_the_random_number_generator) + + # This allows to organize the largest chromosomes first. + # This is to get a better acceptance of the chromosome positioning. + chromosome_particle_numbers = [int(x) for x in chromosome_particle_numbers] + chromosome_particle_numbers.sort(key=int,reverse=True) + + for cnt in range(number_of_conformations): + + final_random_walks = generate_random_walks(chromosome_particle_numbers, + particle_radius, + confining_environment, + pbc=pbc, + center=center) + + # Writing the final_random_walks conformation + #print "Succesfully generated conformation number %d\n" % (cnt+1) + write_initial_conformation_file(final_random_walks, + chromosome_particle_numbers, + confining_environment, + out_file=outfile) + +########## + +def generate_chromosome_rosettes_conformation ( chromosome_particle_numbers , + fractional_radial_positions=None, + confining_environment=['sphere',100.] , + rosette_radius=12.0 , particle_radius=0.5 , + seed_of_the_random_number_generator=1 , + number_of_conformations=1, + outfile = "Initial_rosette_conformation.dat", + atom_types=1): + """ + Generates lammps initial conformation file by rosettes conformation + + :param chromosome_particle_numbers: list with the number of particles of each chromosome. + :param None fractional_radial_positions: list with fractional radial positions for all the chromosomes. + :param ['sphere',100.] confining_environment: dictionary with the confining environment of the conformation + Possible confining environments: + ['cube',edge_width] + ['sphere',radius] + ['ellipsoid',x-semiaxes, y-semiaxes, z-semiaxes] + ['cylinder', basal radius, height] + :param 0.5 particle_radius: Radius of each particle. + :param 1 seed_of_the_random_number_generator: random seed. + :param 1 number_of_conformations: copies of the conformation. + :param outfile: file where to store resulting initial conformation file + + """ + seed(seed_of_the_random_number_generator) + + # This allows to organize the largest chromosomes first. + # This is to get a better acceptance of the chromosome positioning. + chromosome_particle_numbers = [int(x) for x in chromosome_particle_numbers] + chromosome_particle_numbers.sort(key=int,reverse=True) + + initial_rosettes , rosettes_lengths = generate_rosettes(chromosome_particle_numbers, + rosette_radius, + particle_radius) + print(rosettes_lengths) + + + # Constructing the rosettes conformations + for cnt in range(number_of_conformations): + + particle_inside = 0 # 0 means a particle is outside + particles_overlap = 0 # 0 means two particles are overlapping + while particle_inside == 0 or particles_overlap == 0: + particle_inside = 1 + particles_overlap = 1 + segments_P1 = [] + segments_P0 = [] + side = 0 + init_rosettes = copy.deepcopy(initial_rosettes) + + # Guess of the initial segment conformation: + # 1 - each rod is placed inside the confining evironment + # in a random position and with random orientation + # 2 - possible clashes between generated rods are checked + if fractional_radial_positions: + if len(fractional_radial_positions) != len(chromosome_particle_numbers): + print("Please provide the desired fractional radial positions for all the chromosomes") + sys.exit() + segments_P1 , segments_P0 = generate_rods_biased_conformation(rosettes_lengths, rosette_radius, + confining_environment, + fractional_radial_positions, + max_number_of_temptative=100000) + else: + segments_P1 , segments_P0 = generate_rods_random_conformation(rosettes_lengths, rosette_radius, + confining_environment, + max_number_of_temptative=100000) + + # Roto-translation of the rosettes according to the segment position and orientation + final_rosettes = rosettes_rototranslation(init_rosettes, segments_P1, segments_P0) + + # Checking that the beads are all inside the confining environment and are not overlapping + for rosette_pair in list(combinations(final_rosettes,2)): + molecule0 = list(zip(rosette_pair[0]['x'],rosette_pair[0]['y'],rosette_pair[0]['z'])) + molecule1 = list(zip(rosette_pair[1]['x'],rosette_pair[1]['y'],rosette_pair[1]['z'])) + distances = spatial.distance.cdist(molecule1,molecule0) + print(len(molecule0),len(molecule0[0]),distances.min()) + if distances.min() < particle_radius*2.0*0.95: + particles_overlap = 0 + break + + if particles_overlap != 0: + for r in xrange(len(final_rosettes)): + molecule0 = list(zip(final_rosettes[r]['x'],final_rosettes[r]['y'],final_rosettes[r]['z'])) + print(len(molecule0),len(molecule0[0])) + + distances = spatial.distance.cdist(molecule0,molecule0) + print(distances.min()) + for i in xrange(len(molecule0)): + for j in xrange(i+1,len(molecule0)): + if distances[(i,j)] < particle_radius*2.0*0.95: + particles_overlap = 0 + if particles_overlap == 0: + break + if particles_overlap == 0: + break + if particles_overlap == 0: + break + + # Writing the final_rosettes conformation + print("Succesfully generated conformation number %d\n" % (cnt+1)) + write_initial_conformation_file(final_rosettes, + chromosome_particle_numbers, + confining_environment, + out_file=outfile, + atom_types=atom_types) + +########## + +def generate_chromosome_rosettes_conformation_with_pbc ( chromosome_particle_numbers , + fractional_radial_positions=None, + confining_environment=['cube',100.] , + rosette_radius=12.0 , particle_radius=0.5 , + seed_of_the_random_number_generator=1 , + number_of_conformations=1, + outfile = "Initial_rosette_conformation_with_pbc.dat", + atom_types=1): + """ + Generates lammps initial conformation file by rosettes conformation + + :param chromosome_particle_numbers: list with the number of particles of each chromosome. + :param None fractional_radial_positions: list with fractional radial positions for all the chromosomes. + :param ['cube',100.] confining_environment: dictionary with the confining environment of the conformation + Possible confining environments: + ['cube',edge_width] + :param 0.5 particle_radius: Radius of each particle. + :param 1 seed_of_the_random_number_generator: random seed. + :param 1 number_of_conformations: copies of the conformation. + :param outfile: file where to store resulting initial conformation file + + """ + seed(seed_of_the_random_number_generator) + + # This allows to organize the largest chromosomes first. + # This is to get a better acceptance of the chromosome positioning. + chromosome_particle_numbers = [int(x) for x in chromosome_particle_numbers] + chromosome_particle_numbers.sort(key=int,reverse=True) + + initial_rosettes , rosettes_lengths = generate_rosettes(chromosome_particle_numbers, + rosette_radius, + particle_radius) + print(rosettes_lengths) + + + # Constructing the rosettes conformations + for cnt in range(number_of_conformations): + + particles_overlap = 0 # 0 means two particles are overlapping + while particles_overlap == 0: + particles_overlap = 1 + segments_P1 = [] + segments_P0 = [] + side = 0 + init_rosettes = copy.deepcopy(initial_rosettes) + + # Guess of the initial segment conformation: + # 1 - each rod is placed in a random position and with random orientation + # 2 - possible clashes between generated rods are checked taking into account pbc + segments_P1 , segments_P0 = generate_rods_random_conformation_with_pbc ( + rosettes_lengths, + rosette_radius, + confining_environment, + max_number_of_temptative=100000) + + # Roto-translation of the rosettes according to the segment position and orientation + final_rosettes = rosettes_rototranslation(init_rosettes, segments_P1, segments_P0) + + # Checking that the beads once folded inside the simulation box (for pbc) are not overlapping + folded_rosettes = copy.deepcopy(final_rosettes) + for r in range(len(folded_rosettes)): + particle = 0 + for x, y, z in zip(folded_rosettes[r]['x'],folded_rosettes[r]['y'],folded_rosettes[r]['z']): + #inside_1 = check_point_inside_the_confining_environment(x, y, z, + # particle_radius, + # confining_environment) + #if inside_1 == 0: + # print inside_1, r, particle, x, y, z + + while x > (confining_environment[1]*0.5): + x -= confining_environment[1] + while x < -(confining_environment[1]*0.5): + x += confining_environment[1] + + while y > (confining_environment[1]*0.5): + y -= confining_environment[1] + while y < -(confining_environment[1]*0.5): + y += confining_environment[1] + + while z > (confining_environment[1]*0.5): + z -= confining_environment[1] + while z < -(confining_environment[1]*0.5): + z += confining_environment[1] + + #inside_2 = check_point_inside_the_confining_environment(x, y, z, + # particle_radius, + # confining_environment) + #if inside_2 == 1 and inside_1 == 0: + # print inside_2, r, particle, x, y, z + folded_rosettes[r]['x'][particle] = x + folded_rosettes[r]['y'][particle] = y + folded_rosettes[r]['z'][particle] = z + particle += 1 + + for rosette_pair in list(combinations(folded_rosettes,2)): + + for x0,y0,z0 in zip(rosette_pair[0]['x'],rosette_pair[0]['y'],rosette_pair[0]['z']): + for x1,y1,z1 in zip(rosette_pair[1]['x'],rosette_pair[1]['y'],rosette_pair[1]['z']): + + particles_overlap = check_particles_overlap(x0,y0,z0,x1,y1,z1,particle_radius) + + if particles_overlap == 0: # 0 means that the particles are overlapping -> PROBLEM!!! + print("Particle",x0,y0,z0,"and",x1,y1,z1,"overlap\n") + break + if particles_overlap == 0: + break + if particles_overlap == 0: + break + + # Writing the final_rosettes conformation + print("Succesfully generated conformation number %d\n" % (cnt+1)) + write_initial_conformation_file(final_rosettes, + chromosome_particle_numbers, + confining_environment, + out_file=outfile, + atom_types=atom_types) + +########## + +def generate_rosettes(chromosome_particle_numbers, rosette_radius, particle_radius): + # Genaration of the rosettes + # XXXA. Rosa publicationXXX + # List to contain the rosettes and the rosettes lengths + rosettes = [] + rosettes_lengths = [] + + for number_of_particles in chromosome_particle_numbers: + + # Variable to build the chain + phi = 0.0 + + # Dictory of lists to contain the rosette + rosette = {} + rosette['x'] = [] + rosette['y'] = [] + rosette['z'] = [] + + # Position of the first particle (x_0, 0.0, 0.0) + k = 6. + x = 0.38 + p = 1.0 + rosette['x'].append(rosette_radius * (x + (1 - x) * cos(k*phi) * cos(k*phi)) * cos(phi)) + rosette['y'].append(rosette_radius * (x + (1 - x) * cos(k*phi) * cos(k*phi)) * sin(phi)) + rosette['z'].append(p * phi / (2.0 * pi)) + #print "First bead is in position: %f %f %f" % (rosette['x'][0], rosette['y'][0], rosette['z'][0]) + + # Building the chain: The rosette is growing along the positive z-axes + for particle in range(1,number_of_particles): + + distance = 0.0 + while distance < (particle_radius*2.0): + phi = phi + 0.001 + x_tmp = rosette_radius * (x + (1 - x) * cos(k*phi) * cos(k*phi)) * cos(phi) + y_tmp = rosette_radius * (x + (1 - x) * cos(k*phi) * cos(k*phi)) * sin(phi) + z_tmp = phi / (2.0 * pi) + distance = sqrt((x_tmp - rosette['x'][-1])*(x_tmp - rosette['x'][-1]) + + (y_tmp - rosette['y'][-1])*(y_tmp - rosette['y'][-1]) + + (z_tmp - rosette['z'][-1])*(z_tmp - rosette['z'][-1])) + + rosette['x'].append(x_tmp) + rosette['y'].append(y_tmp) + rosette['z'].append(z_tmp) + if distance > ((particle_radius*2.0)*1.2): + print("%f %d %d %d" % (distance, particle-1, particle)) + + rosettes.append(rosette) + rosettes_lengths.append(rosette['z'][-1]-rosette['z'][0]) + + return rosettes , rosettes_lengths + +########## + +def generate_rods_biased_conformation(rosettes_lengths, rosette_radius, + confining_environment, + fractional_radial_positions, + max_number_of_temptative=100000): + # Construction of the rods initial conformation + segments_P0 = [] + segments_P1 = [] + + if confining_environment[0] != 'sphere': + print("ERROR: Biased chromosome positioning is currently implemented") + print("only for spherical confinement. If you need other shapes, please") + print("contact the developers") + + for length , target_radial_position in zip(rosettes_lengths,fractional_radial_positions): + tentative = 0 + clashes = 0 # 0 means that there is an clash -> PROBLEM + best_radial_position = 1.0 + best_radial_distance = 1.0 + best_segment_P0 = [] + best_segment_P1 = [] + + # Positioning the rods + while tentative < 100000 and best_radial_distance > 0.00005: + + print("Length = %f" % length) + + print("Trying to position terminus 0") + segment_P0_tmp = [] + segment_P0_tmp = draw_point_inside_the_confining_environment(confining_environment, + rosette_radius) + print("Successfully positioned terminus 0: %f %f %f" % (segment_P0_tmp[0], segment_P0_tmp[1], segment_P0_tmp[2])) + + print("Trying to position terminus 1") + segment_P1_tmp = [] + segment_P1_tmp = draw_second_extreme_of_a_segment_inside_the_confining_environment(segment_P0_tmp[0], + segment_P0_tmp[1], + segment_P0_tmp[2], + length, + rosette_radius, + confining_environment) + print("Successfully positioned terminus 1: %f %f %f" % (segment_P1_tmp[0], segment_P1_tmp[1], segment_P1_tmp[2])) + + # Check clashes with the previously positioned rods + clashes = 1 + for segment_P1,segment_P0 in zip(segments_P1,segments_P0): + clashes = check_segments_clashes(segment_P1, + segment_P0, + segment_P1_tmp, + segment_P0_tmp, + rosette_radius) + if clashes == 0: + break + + if clashes == 1: + # Check whether the midpoint of the segment is close to the target radial position + segment_midpoint = [] + segment_midpoint.append((segment_P0_tmp[0] + segment_P1_tmp[0])*0.5) + segment_midpoint.append((segment_P0_tmp[1] + segment_P1_tmp[1])*0.5) + segment_midpoint.append((segment_P0_tmp[2] + segment_P1_tmp[2])*0.5) + + radial_position = sqrt( ( segment_midpoint[0] * segment_midpoint[0] + + segment_midpoint[1] * segment_midpoint[1] + + segment_midpoint[2] * segment_midpoint[2] ) / + (confining_environment[1]*confining_environment[1])) + + radial_distance = fabs(radial_position-target_radial_position) + + print(radial_position , target_radial_position , radial_distance , best_radial_distance , tentative) + + # If the midpoint of the segment is closer to the target radial position than the + # previous guesses. Store the points as the best guesses! + if radial_distance < best_radial_distance: + best_radial_distance = radial_distance + best_radial_position = radial_position + best_tentative = tentative+1 # The variable tentative starts from 0 + + best_segment_P0 = [] + best_segment_P1 = [] + for component_P0 , component_P1 in zip(segment_P0_tmp,segment_P1_tmp): + best_segment_P0.append(component_P0) + best_segment_P1.append(component_P1) + + tentative = tentative + 1 + + if best_segment_P0 == []: + print("Valid placement not found for chromosome rosette after %d tentatives" % tentative) + sys.exit() + + print("Successfully positioned chromosome of length %lf at tentative %d of %d tentatives" % (length, best_tentative, tentative)) + segments_P0.append(best_segment_P0) + segments_P1.append(best_segment_P1) + + print("Successfully generated rod conformation!") + return segments_P1 , segments_P0 + +########## + +def generate_rods_random_conformation(rosettes_lengths, rosette_radius, + confining_environment, + max_number_of_temptative=100000): + # Construction of the rods initial conformation + segments_P0 = [] + segments_P1 = [] + + for length in rosettes_lengths: + tentative = 0 + clashes = 0 + # Random positioning of the rods + while tentative < 100000 and clashes == 0: + + tentative += 1 + clashes = 1 + #print "Length = %f" % length + + print("Trying to position terminus 0") + #pick uniformly within the confining environment using the rejection method + first_point = [] + first_point = draw_point_inside_the_confining_environment(confining_environment, + rosette_radius) + + print("Successfully positioned terminus 0: %f %f %f" % (first_point[0], first_point[1], first_point[2])) + + print("Trying to position terminus 1") + #pick from P0 another point one the sphere of radius length inside the confining environment + last_point = [] + last_point = draw_second_extreme_of_a_segment_inside_the_confining_environment(first_point[0], + first_point[1], + first_point[2], + length, + rosette_radius, + confining_environment) + + print("Successfully positioned terminus 1: %f %f %f" % (last_point[0], last_point[1], last_point[2])) + + # Check clashes with the previously positioned rods + clashes = 1 + for segment_P1,segment_P0 in zip(segments_P1,segments_P0): + clashes = check_segments_clashes(segment_P1, + segment_P0, + last_point, + first_point, + rosette_radius) + if clashes == 0: + break + + #print clashes + print("Successfully positioned chromosome of length %lf at tentative %d\n" % (length, tentative)) + segments_P1.append(last_point) + segments_P0.append(first_point) + + print("Successfully generated rod conformation!") + return segments_P1 , segments_P0 + +########## + +def generate_rods_random_conformation_with_pbc(rosettes_lengths, rosette_radius, + confining_environment, + max_number_of_temptative=100000): + + # Construction of the rods initial conformation + segments_P0 = [] + segments_P1 = [] + + for length in rosettes_lengths: + tentative = 0 + clashes = 0 + # Random positioning of the rods + while tentative < 100000 and clashes == 0: + + tentative += 1 + clashes = 1 + #print "Length = %f" % length + + print("Trying to position terminus 0") + #pick uniformly within the confining environment using the rejection method + first_point = [] + first_point = draw_point_inside_the_confining_environment(confining_environment, + rosette_radius) + + print("Successfully positioned terminus 0: %f %f %f" % (first_point[0], first_point[1], first_point[2])) + + print("Trying to position terminus 1") + #pick from P0 another point one the sphere of radius length inside the confining environment + last_point = [] + last_point = draw_second_extreme_of_a_segment(first_point[0], + first_point[1], + first_point[2], + length, + rosette_radius) + + print(last_point) + # Check clashes with the previously positioned rods + for segment_P1,segment_P0 in zip(segments_P1,segments_P0): + clashes = check_segments_clashes_with_pbc(segment_P1, + segment_P0, + last_point, + first_point, + rosette_radius, + confining_environment) + if clashes == 0: + break + + #print clashes + print("Successfully positioned chromosome of length %lf at tentative %d\n" % (length, tentative)) + segments_P1.append(last_point) + segments_P0.append(first_point) + + print("Successfully generated rod conformation!") + return segments_P1 , segments_P0 + +########## + +def generate_random_walks(chromosome_particle_numbers, + particle_radius, + confining_environment, + center, + pbc=False): + # Construction of the random walks initial conformation + random_walks = [] + + for number_of_particles in chromosome_particle_numbers: + #print "Trying to position random walk" + random_walk = {} + random_walk['x'] = [] + random_walk['y'] = [] + random_walk['z'] = [] + + + #print "Positioning first particle" + particle_overlap = 0 + while particle_overlap == 0: + particle_overlap = 1 + first_particle = [] + first_particle = draw_point_inside_the_confining_environment(confining_environment, + particle_radius) + + # Check if the particle is overlapping with any other particle in the system + for rand_walk in random_walks: + if pbc: + particle_overlap = check_particle_vs_all_overlap(first_particle[0], + first_particle[1], + first_particle[2], + rand_walk, + 2.0*particle_radius) + else: + particle_overlap = check_particle_vs_all_overlap(first_particle[0], + first_particle[1], + first_particle[2], + rand_walk, + 2.0*particle_radius) + + if particle_overlap == 0: + break + + random_walk['x'].append(first_particle[0]) + random_walk['y'].append(first_particle[1]) + random_walk['z'].append(first_particle[2]) + + for particle in range(1,number_of_particles): + #print "Positioning particle %d" % (particle+1) + particle_overlap = 0 # 0 means that there is an overlap -> PROBLEM + overlapCounter = -1 + maxIter = 1000 + while particle_overlap == 0: + overlapCounter += 1 + if overlapCounter > maxIter: + # raise error so log file is created to avoid k_seed + errorName = 'ERROR: Initial conformation non ending loop' + raise InitalConformationError(errorName) + particle_overlap = 1 + new_particle = [] + if pbc: + new_particle = draw_second_extreme_of_a_segment( + random_walk['x'][-1], + random_walk['y'][-1], + random_walk['z'][-1], + 2.0*particle_radius, + 2.0*particle_radius) + else: + new_particle = draw_second_extreme_of_a_segment_inside_the_confining_environment( + random_walk['x'][-1], + random_walk['y'][-1], + random_walk['z'][-1], + 2.0*particle_radius, + 2.0*particle_radius, + confining_environment) + + # Check if the particle is overlapping with any other particle in the system + for rand_walk in random_walks: + particle_overlap = check_particle_vs_all_overlap(new_particle[0], + new_particle[1], + new_particle[2], + rand_walk, + 2.0*particle_radius) + if particle_overlap == 0: + break + if particle_overlap == 0: + continue + + # The current random walk is not yet in the list above + particle_overlap = check_particle_vs_all_overlap(new_particle[0], + new_particle[1], + new_particle[2], + random_walk, + 2.0*particle_radius) + if particle_overlap == 0: + continue + + random_walk['x'].append(new_particle[0]) + random_walk['y'].append(new_particle[1]) + random_walk['z'].append(new_particle[2]) + + #print "Successfully positioned random walk of %d particles" % number_of_particles + random_walks.append(random_walk) + + #print "Successfully generated random walk conformation!" + if center: + for random_walk in random_walks: + x_com, y_com, z_com = (0.0,0.0,0.0) + cnt = 0 + for (x,y,z) in zip(random_walk['x'],random_walk['y'],random_walk['z']): + x_com += x + y_com += y + z_com += z + cnt += 1 + x_com, y_com, z_com = (x_com/cnt,y_com/cnt,z_com/cnt) + + for i in range(len(random_walk['x'])): + random_walk['x'][i] -= x_com + random_walk['y'][i] -= y_com + random_walk['z'][i] -= z_com + + x_com, y_com, z_com = (0.0,0.0,0.0) + cnt = 0 + for (x,y,z) in zip(random_walk['x'],random_walk['y'],random_walk['z']): + x_com += x + y_com += y + z_com += z + cnt += 1 + x_com, y_com, z_com = (x_com/cnt,y_com/cnt,z_com/cnt) + + return random_walks + +########## + +def check_particle_vs_all_overlap(x,y,z,chromosome,overlap_radius): + particle_overlap = 1 + + for x0, y0, z0 in zip(chromosome['x'],chromosome['y'],chromosome['z']): + particle_overlap = check_particles_overlap(x0,y0,z0,x,y,z,overlap_radius) + if particle_overlap == 0: + return particle_overlap + + return particle_overlap + +########## + +def draw_second_extreme_of_a_segment_inside_the_confining_environment(x0, y0, z0, + segment_length, + object_radius, + confining_environment): + inside = 0 + while inside == 0: + particle = [] + temp_theta = arccos(2.0*random()-1.0) + temp_phi = 2*pi*random() + particle.append(x0 + segment_length * cos(temp_phi) * sin(temp_theta)) + particle.append(y0 + segment_length * sin(temp_phi) * sin(temp_theta)) + particle.append(z0 + segment_length * cos(temp_theta)) + # Check if the particle is inside the confining_environment + inside = check_point_inside_the_confining_environment(particle[0], + particle[1], + particle[2], + object_radius, + confining_environment) + + return particle + +########## + +def draw_second_extreme_of_a_segment(x0, y0, z0, + segment_length, + object_radius): + particle = [] + temp_theta = arccos(2.0*random()-1.0) + temp_phi = 2*pi*random() + particle.append(x0 + segment_length * cos(temp_phi) * sin(temp_theta)) + particle.append(y0 + segment_length * sin(temp_phi) * sin(temp_theta)) + particle.append(z0 + segment_length * cos(temp_theta)) + + return particle + +########## + +def draw_point_inside_the_confining_environment(confining_environment, object_radius): + #pick a point uniformly within the confining environment using the rejection method + + if confining_environment[0] == 'cube': + dimension_x = confining_environment[1] * 0.5 + dimension_y = confining_environment[1] * 0.5 + dimension_z = confining_environment[1] * 0.5 + if len(confining_environment) > 2: + print("# WARNING: Defined a cubical confining environment with reduntant paramenters.") + print("# Only 2 are needed the identifier and the side") + + if confining_environment[0] == 'sphere': + dimension_x = confining_environment[1] + dimension_y = confining_environment[1] + dimension_z = confining_environment[1] + if len(confining_environment) > 2: + print("# WARNING: Defined a spherical confining environment with reduntant paramenters.") + print("# Only 2 are needed the identifier and the radius") + + if confining_environment[0] == 'ellipsoid': + if len(confining_environment) < 4: + print("# ERROR: Defined an ellipsoidal confining environment without the necessary paramenters.") + print("# 4 are needed the identifier, the x-semiaxes, the y-semiaxes, and the z-semiaxes") + sys.exit() + dimension_x = confining_environment[1] + dimension_y = confining_environment[2] + dimension_z = confining_environment[3] + + if confining_environment[0] == 'cylinder': + if len(confining_environment) < 3: + print("# WARNING: Defined a cylindrical confining environment without the necessary paramenters.") + print("# 3 are needed the identifier, the basal radius, and the height") + sys.exit() + dimension_x = confining_environment[1] + dimension_y = confining_environment[1] + dimension_z = confining_environment[2] + + inside = 0 + while inside == 0: + particle = [] + particle.append((2.0*random()-1.0)*(dimension_x - object_radius)) + particle.append((2.0*random()-1.0)*(dimension_y - object_radius)) + particle.append((2.0*random()-1.0)*(dimension_z - object_radius)) + # Check if the particle is inside the confining_environment + inside = check_point_inside_the_confining_environment(particle[0], + particle[1], + particle[2], + object_radius, + confining_environment) + + return particle + +########## + +def check_point_inside_the_confining_environment(Px, Py, Pz, + object_radius, + confining_environment): + # The shapes are all centered in the origin + # - sphere : radius r + # - cube : side + # - cylinder : basal radius , height + # - ellipsoid : semi-axes a , b , c ; + + if confining_environment[0] == 'sphere': + radius = confining_environment[1] - object_radius + if ((Px*Px)/(radius*radius) + (Py*Py)/(radius*radius) + (Pz*Pz)/(radius*radius)) < 1.0 : return 1 + + if confining_environment[0] == 'ellipsoid': + a = confining_environment[1] - object_radius + b = confining_environment[2] - object_radius + c = confining_environment[3] - object_radius + if ((Px*Px)/(a*a) + (Py*Py)/(b*b) + (Pz*Pz)/(c*c)) < 1.0 : return 1 + + if confining_environment[0] == 'cube': + hside = confining_environment[1] * 0.5 - object_radius + if (((Px*Px)/(hside*hside)) < 1.0) and (((Py*Py)/(hside*hside)) < 1.0) and (((Pz*Pz)/(hside*hside)) < 1.0) : return 1 + + if confining_environment[0] == 'cylinder': + radius = confining_environment[1] - object_radius + half_height = confining_environment[2]*0.5 - object_radius + if (((Px*Px)/(radius*radius) + (Py*Py)/(radius*radius)) < 1.0) and (((Pz*Pz)/(half_height*half_height)) < 1.0): return 1 + + return 0 + +########## + +def check_segments_clashes(s1_P1, s1_P0, s2_P1, s2_P0, rosette_radius): + + # Check steric clashes without periodic boundary conditions + if distance_between_segments(s1_P1, s1_P0, s2_P1, s2_P0) < 2.0*rosette_radius: + # print "Clash between segments",s1_P1,s1_P0,"and",s2_P1_tmp,s2_P0_tmp,"at distance", distance + return 0 + + return 1 + +########## + +def check_segments_clashes_with_pbc(s1_P1, s1_P0, s2_P1, s2_P0, + rosette_radius, + confining_environment): + + # Check steric clashes with periodic boundary conditions + if distance_between_segments(s1_P1, s1_P0, s2_P1, s2_P0) < 2.0*rosette_radius: + # print "Clash between segments",s1_P1,s1_P0,"and",s2_P1_tmp,s2_P0_tmp,"at distance", distance + return 0 + + return 1 + +########## + +def distance_between_segments(s1_P1, s1_P0, s2_P1, s2_P0): + + # Inspiration: http://softsurfer.com/Archive/algorithm_0106/algorithm_0106.htm + # Copyright 2001, softSurfer (www.softsurfer.com) + # This code may be freely used and modified for any purpose + # providing that this copyright notice is included with it. + # SoftSurfer makes no warranty for this code, and cannot be held + # liable for any real or imagined damage resulting from its use. + # Users of this code must verify correctness for their application. + + u = [] + v = [] + w = [] + dP = [] + + for c_s1_P1,c_s1_P0,c_s2_P1,c_s2_P0 in zip(s1_P1, s1_P0, s2_P1, s2_P0): + u.append(c_s1_P1 - c_s1_P0) + v.append(c_s2_P1 - c_s2_P0) + w.append(c_s1_P0 - c_s2_P0) + + a = scalar_product(u, u) + b = scalar_product(u, v) + c = scalar_product(v, v) + d = scalar_product(u, w) + e = scalar_product(v, w) + + D = a*c - b*b + sD = tD = D + + if D < (1.0e-7): + # Segments almost parallel + sN = 0.0 + sD = 1.0 + tN = e + tD = c + else: + # Get the closest points on the infinite lines + sN = (b*e - c*d) + tN = (a*e - b*d) + if (sN < 0.0): + # sc < 0 => the s=0 edge is visible + sN = 0.0 + tN = e + tD = c + elif sN > sD: # sc > 1 => the s=1 edge is visible + sN = sD + tN = e + b + tD = c + + if tN < 0.0: # tc < 0 => the t=0 edge is visible + tN = 0.0 + # Recompute sc for this edge + if -d < 0.0: + sN = 0.0 + elif -d > a: + sN = sD + else: + sN = -d + sD = a + + elif tN > tD: # tc > 1 => the t=1 edge is visible + tN = tD + # Recompute sc for this edge + if (-d + b) < 0.0: + sN = 0 + elif (-d + b) > a: + sN = sD; + else: + sN = (-d + b) + sD = a + + # Finally do the division to get sc and tc + if abs(sN) < (1.0e-7): + sc = 0.0 + else: + sc = sN / sD + + if abs(tN) < (1.0e-7): + tc = 0.0 + else: + tc = tN / tD + + # Get the difference of the two closest points + for i in range(len(w)): + dP.append(w[i] + ( sc * u[i] ) - ( tc * v[i] )) # = S1(sc) - S2(tc) + + return norm(dP) # return the closest distance + +########## + +def rosettes_rototranslation(rosettes, segments_P1, segments_P0): + + for i in range(len(segments_P1)): + vector = [] + theta = [] + + for component_P1 , component_P0 in zip(segments_P1[i], segments_P0[i]): + vector.append(component_P1-component_P0) + + # Rotation Angles + theta.append(atan2(vector[1],vector[2])) + + x_temp_2 = vector[0] + y_temp_2 = cos(theta[0]) * vector[1] - sin(theta[0]) * vector[2] + z_temp_2 = sin(theta[0]) * vector[1] + cos(theta[0]) * vector[2] + theta.append(atan2(x_temp_2,z_temp_2)) + + x_temp_1 = cos(theta[1]) * x_temp_2 - sin(theta[1]) * z_temp_2 + y_temp_1 = y_temp_2 + z_temp_1 = sin(theta[1]) * x_temp_2 + cos(theta[1]) * z_temp_2 + + if(z_temp_1 < 0.0): + z_temp_1 = -z_temp_1 + theta.append(pi) + else: + theta.append(0.0) + #print x_temp_1 , y_temp_1 , z_temp_1 + + # Chromosome roto-translations + for particle in range(len(rosettes[i]['x'])): + + x_temp_2 = rosettes[i]['x'][particle] + y_temp_2 = cos(theta[2]) * rosettes[i]['y'][particle] + sin(theta[2]) * rosettes[i]['z'][particle] + z_temp_2 = - sin(theta[2]) * rosettes[i]['y'][particle] + cos(theta[2]) * rosettes[i]['z'][particle] + + x_temp_1 = cos(theta[1]) * x_temp_2 + sin(theta[1]) * z_temp_2 + y_temp_1 = y_temp_2 + z_temp_1 = - sin(theta[1]) * x_temp_2 + cos(theta[1]) * z_temp_2 + + x = x_temp_1; + y = cos(theta[0]) * y_temp_1 + sin(theta[0]) * z_temp_1; + z = - sin(theta[0]) * y_temp_1 + cos(theta[0]) * z_temp_1; + + # Chromosome translations + rosettes[i]['x'][particle] = segments_P0[i][0] + x; + rosettes[i]['y'][particle] = segments_P0[i][1] + y; + rosettes[i]['z'][particle] = segments_P0[i][2] + z; + return rosettes + +########## + +def scalar_product(a, b): + + scalar = 0.0 + for c_a,c_b in zip(a,b): + scalar = scalar + c_a*c_b + + return scalar + +########## + +def norm(a): + + return sqrt(scalar_product(a, a)) + +########## + +def write_initial_conformation_file(chromosomes, + chromosome_particle_numbers, + confining_environment, + out_file="Initial_conformation.dat", + atom_types=1, + angle_types=1, + bond_types=1): + # Choosing the appropriate xlo, xhi...etc...depending on the confining environment + xlim = [] + ylim = [] + zlim = [] + if confining_environment[0] == 'sphere': + radius = confining_environment[1] + 1.0 + xlim.append(-radius) + xlim.append(radius) + ylim.append(-radius) + ylim.append(radius) + zlim.append(-radius) + zlim.append(radius) + + if confining_environment[0] == 'ellipsoid': + a = confining_environment[1] + 1.0 + b = confining_environment[2] + 1.0 + c = confining_environment[3] + 1.0 + xlim.append(-a) + xlim.append(a) + ylim.append(-b) + ylim.append(b) + zlim.append(-c) + zlim.append(c) + + if confining_environment[0] == 'cube': + hside = confining_environment[1] * 0.5 + xlim.append(-hside) + xlim.append(hside) + ylim.append(-hside) + ylim.append(hside) + zlim.append(-hside) + zlim.append(hside) + + if confining_environment[0] == 'cylinder': + radius = confining_environment[1] + 1.0 + hheight = confining_environment[2] * 0.5 + 1.0 + xlim.append(-radius) + xlim.append(radius) + ylim.append(-radius) + ylim.append(radius) + zlim.append(-hheight) + zlim.append(hheight) + + fileout = open(out_file,'w') + n_chr=len(chromosomes) + n_atoms=0 + for n in chromosome_particle_numbers: + n_atoms+=n + + fileout.write("LAMMPS input data file \n\n") + fileout.write("%9d atoms\n" % (n_atoms)) + fileout.write("%9d bonds\n" % (n_atoms-n_chr)) + fileout.write("%9d angles\n\n" % (n_atoms-2*n_chr)) + fileout.write("%9s atom types\n" % atom_types) + fileout.write("%9s bond types\n" % bond_types) + fileout.write("%9s angle types\n\n" % angle_types) + fileout.write("%6.3lf %6.3lf xlo xhi\n" % (xlim[0], xlim[1])) + fileout.write("%6.3lf %6.3lf ylo yhi\n" % (ylim[0], ylim[1])) + fileout.write("%6.3lf %6.3lf zlo zhi\n" % (zlim[0], zlim[1])) + + fileout.write("\n Atoms \n\n") + particle_number = 1 + for chromosome in chromosomes: + for x,y,z in zip(chromosome['x'],chromosome['y'],chromosome['z']): + fileout.write("%-8d %s %s %7.4lf %7.4lf %7.4lf\n" % (particle_number, "1", "1", x, y, z)) + particle_number += 1 + + # for(i = 0; i < N_NUCL; i++) + # { + # k++; + # fileout.write("%5d %s %s %7.4lf %7.4lf %7.4lf \n", k, "1", "1", P[i][0], P[i][1], P[i][2]); + # } + + fileout.write("\n Bonds \n\n") + bond_number = 1 + first_particle_index = 1 + for chromosome in chromosomes: + for i in range(len(chromosome['x'])-1): + fileout.write("%-4d %s %4d %4d\n" % (bond_number, "1", first_particle_index, first_particle_index+1)) + bond_number += 1 + first_particle_index += 1 + first_particle_index += 1 # I have to go to the end of the chromosome! + + fileout.write("\n Angles \n\n") + angle_number = 1 + first_particle_index = 1 + for chromosome in chromosomes: + for i in range(len(chromosome['x'])-2): + fileout.write("%-4d %s %5d %5d %5d\n" % (angle_number, "1", first_particle_index, first_particle_index+1, first_particle_index+2)) + angle_number += 1 + first_particle_index += 1 + first_particle_index += 2 # I have to go to the end of the chromosome! + + fileout.close() + +########## + +def distance(x0,y0,z0,x1,y1,z1): + return sqrt((x0-x1)*(x0-x1)+(y0-y1)*(y0-y1)+(z0-z1)*(z0-z1)) + +########## + +def check_particles_overlap(x0,y0,z0,x1,y1,z1,overlap_radius): + if distance(x0,y0,z0,x1,y1,z1) < overlap_radius: + #print "Particle %f %f %f and particle %f %f %f are overlapping\n" % (x0,y0,z0,x1,y1,z1) + return 0 + return 1 + +########## + +def store_conformation_with_pbc(xc, result, confining_environment): + # Reconstruct the different molecules and store them separatelly + ix , iy , iz = (0, 0, 0) + ix_tmp, iy_tmp, iz_tmp = (0, 0, 0) + x_tmp , y_tmp , z_tmp = (0, 0, 0) + + molecule_number = 0 # We start to count from molecule number 0 + + particles = [] + particles.append({}) + particles[molecule_number]['x'] = [] + particles[molecule_number]['y'] = [] + particles[molecule_number]['z'] = [] + + particle_counts = [] + particle_counts.append({}) # Initializing the particle counts for the first molecule + + max_bond_length = (1.5*1.5) # This is the default polymer-based bond length + + for i in range(0,len(xc),3): + particle = int(i/3) + + x = xc[i] + ix * confining_environment[1] + y = xc[i+1] + iy * confining_environment[1] + z = xc[i+2] + iz * confining_environment[1] + + # A - Check whether the molecule is broken because of pbc + # or if we are changing molecule + if particle > 0: + + # Compute the bond_length + bond_length = (particles[molecule_number]['x'][-1] - x)* \ + (particles[molecule_number]['x'][-1] - x)+ \ + (particles[molecule_number]['y'][-1] - y)* \ + (particles[molecule_number]['y'][-1] - y)+ \ + (particles[molecule_number]['z'][-1] - z)* \ + (particles[molecule_number]['z'][-1] - z) + + # Check whether the bond is too long. This could mean: + # 1 - Same molecule disjoint by pbc + # 2 - Different molecules + if bond_length > max_bond_length: + min_bond_length = bond_length + x_tmp , y_tmp , z_tmp = (x, y, z) + + # Check if we are in case 1: the same molecule continues + # in a nearby box + indeces_sets = product([-1, 0, 1], + [-1, 0, 1], + [-1, 0, 1]) + + for (l, m, n) in indeces_sets: + # Avoid to check again the same periodic copy + if (l, m, n) == (0, 0, 0): + continue + + # Propose a new particle position + x = xc[i] + (ix + l) * confining_environment[1] + y = xc[i+1] + (iy + m) * confining_environment[1] + z = xc[i+2] + (iz + n) * confining_environment[1] + + # Check the new bond length + bond_length = (particles[molecule_number]['x'][-1] - x)* \ + (particles[molecule_number]['x'][-1] - x)+ \ + (particles[molecule_number]['y'][-1] - y)* \ + (particles[molecule_number]['y'][-1] - y)+ \ + (particles[molecule_number]['z'][-1] - z)* \ + (particles[molecule_number]['z'][-1] - z) + + # Store the periodic copy with the minimum bond length + if bond_length < min_bond_length: + #print bond_length + x_tmp , y_tmp , z_tmp = (x , y , z ) + ix_tmp, iy_tmp, iz_tmp = (ix+l, iy+m, iz+n) + min_bond_length = bond_length + + # If the minimum bond length is yet too large + # we are dealing with case 2 + if min_bond_length > 10.: + # Start another molecule + molecule_number += 1 + + particles.append({}) + particles[molecule_number]['x'] = [] + particles[molecule_number]['y'] = [] + particles[molecule_number]['z'] = [] + + + particle_counts.append({}) # Initializing the particle counts for the new molecule + + # If the minimum bond length is sufficiently short + # we are dealing with case 2 + ix, iy, iz = (ix_tmp, iy_tmp, iz_tmp) + x , y , z = (x_tmp , y_tmp , z_tmp) + + # To fullfill point B (see below), we have to count how many + # particle we have of each molecule for each triplet + # (ix, iy, iz) + try: + particle_counts[molecule_number][(ix, iy, iz)] += 1.0 + except: + particle_counts[molecule_number][(ix, iy, iz)] = 0.0 + particle_counts[molecule_number][(ix, iy, iz)] += 1.0 + + particles[molecule_number]['x'].append(x) + particles[molecule_number]['y'].append(y) + particles[molecule_number]['z'].append(z) + + # B - Store in the final arrays each molecule in the periodic copy + # with more particle in the primary cell (0, 0, 0) + for molecule in range(molecule_number+1): + max_number = 0 + # Get the periodic box with more particles + for (l, m, n) in particle_counts[molecule]: + if particle_counts[molecule][(l, m, n)] > max_number: + ix, iy, iz = (l, m, n) + max_number = particle_counts[molecule][(l, m, n)] + + # Translate the molecule to include the largest portion of particles + # in the (0, 0, 0) image + for (x, y, z) in zip(particles[molecule]['x'],particles[molecule]['y'],particles[molecule]['z']): + x = x - ix * confining_environment[1] + y = y - iy * confining_environment[1] + z = z - iz * confining_environment[1] + + result['x'].append(x) + result['y'].append(y) + result['z'].append(z) + + +##### Loop extrusion dynamics functions ##### +def read_target_loops_input(input_filename, chromosome_length, percentage): + # Open input file + fp_input = open(input_filename, "r") + + loops = [] + target_loops = [] + # Get each loop per line and fill the output list of loops + for line in fp_input.readlines(): + + if line.startswith('#'): + continue + + splitted = line.strip().split() + loop = [] + loop.append(int(splitted[1])) + loop.append(int(splitted[2])) + + loops.append(loop) + + #ntarget_loops = int(len(loops)*percentage/100.) + ntarget_loops = int(len(loops)) + shuffle(loops) + target_loops = loops[0:ntarget_loops] + + return target_loops + +########## + +#def draw_loop_extrusion_starting_points(target_loops, chromosome_length): +# initial_loops = [] + # Scroll the target loops and draw a point between each start and stop +# for target_loop in target_loops: + +# random_particle = randint(target_loop[0], target_loop[1]) + +# loop = [] +# loop.append(random_particle) +# loop.append(random_particle+1) + +# initial_loops.append(loop) + +# return initial_loops + +def draw_loop_extrusion_starting_point(chromosome_length): + + # draw a starting point for extrusion along the chromosome + random_particle = randint(1, chromosome_length-1) + + return [random_particle,random_particle+1] + + + +########## + +def get_maximum_number_of_extruded_particles(target_loops, initial_loops): + # The maximum is the maximum sequence distance between a target start/stop particle of a loop + # and the initial random start/stop of a loop + maximum = 0 + + for target_loop,initial_loop in zip(target_loops,initial_loops): + #print initial_loop,target_loop + + l = abs(target_loop[0]-initial_loop[0])+1 + if l > maximum: + maximum = l + + l = abs(target_loop[1]-initial_loop[1])+1 + if l > maximum: + maximum = l + + return maximum + +########## + +def compute_particles_distance(xc): + + particles = [] + distances = {} + + # Getting the coordiantes of the particles + for i in range(0,len(xc),3): + x = xc[i] + y = xc[i+1] + z = xc[i+2] + particles.append((x, y, z)) + + # Checking whether the restraints are satisfied + for pair in combinations(range(len(particles)), 2): + dist = distance(particles[pair[0]][0], + particles[pair[0]][1], + particles[pair[0]][2], + particles[pair[1]][0], + particles[pair[1]][1], + particles[pair[1]][2]) + distances[pair] = dist + + return distances + +########## + +def compute_the_percentage_of_satysfied_restraints(input_file_name, + restraints, + output_file_name, + time_point, + timesteps_per_k_change): + + ### Change this function to use a posteriori the out.colvars.traj file similar to the obj funct calculation ### + infile = open(input_file_name , "r") + outfile = open(output_file_name, "w") + if os.path.getsize(output_file_name) == 0: + outfile.write("#%s %s %s %s\n" % ("timestep","satisfied", "satisfiedharm", "satisfiedharmLowBound")) + + #restraints[pair] = [time_dependent_restraints[time_point+1][pair][0], # Restraint type -> Is the one at time point time_point+1 + #time_dependent_restraints[time_point][pair][1]*10., # Initial spring constant + #time_dependent_restraints[time_point+1][pair][1]*10., # Final spring constant + #time_dependent_restraints[time_point][pair][2], # Initial equilibrium distance + #time_dependent_restraints[time_point+1][pair][2], # Final equilibrium distance + #int(time_dependent_steering_pairs['timesteps_per_k_change']*0.5)] # Number of timesteps for the gradual change + + # Write statistics on the restraints + nharm = 0 + nharmLowBound = 0 + ntot = 0 + for pair in restraints: + for i in range(len(restraints[pair][0])): + if restraints[pair][0][i] == "Harmonic": + nharm += 1 + ntot += 1 + if restraints[pair][0][i] == "HarmonicLowerBound": + nharmLowBound += 1 + ntot += 1 + outfile.write("#NumOfRestraints = %s , Harmonic = %s , HarmonicLowerBound = %s\n" % (ntot, nharm, nharmLowBound)) + + # Memorizing the restraint + restraints_parameters = {} + for pair in restraints: + for i in range(len(restraints[pair][0])): + #E_hlb_pot_p_106_189 + if restraints[pair][0][i] == "Harmonic": + name = "E_h_pot_%d_%d_%d" % (i, int(pair[0])+1, int(pair[1])+1) + if restraints[pair][0][i] == "HarmonicLowerBound": + name ="E_hlb_pot_%d_%d_%d" % (i, int(pair[0])+1, int(pair[1])+1) + restraints_parameters[name] = [restraints[pair][0][i], + restraints[pair][1][i], + restraints[pair][2][i], + restraints[pair][3][i], + restraints[pair][4][i], + restraints[pair][5][i]] + #print restraints_parameters + + # Checking whether the restraints are satisfied + columns_to_consider = {} + for line in infile.readlines(): + nsatisfied = 0. + nsatisfiedharm = 0. + nsatisfiedharmLowBound = 0. + ntot = 0. + ntotharm = 0. + ntotharmLowBound = 0. + + line = line.strip().split() + + # Checking which columns contain the pairwise distance + if line[0][0] == "#": + for column in range(2,len(line)): + # Get the columns with the distances + if "_pot_" not in line[column]: + columns_to_consider[column-1] = line[column] + #print columns_to_consider + else: + for column in range(1,len(line)): + if column in columns_to_consider: + if column >= len(line): + continue + dist = float(line[column]) + + # Get which restraints are between the 2 particles + for name in ["E_h_pot_"+columns_to_consider[column], "E_hlb_pot_"+columns_to_consider[column]]: + if name not in restraints_parameters: + #print "Restraint %s not present" % name + continue + else: + pass + #print name, restraints_parameters[name] + + restrainttype = restraints_parameters[name][0] + restraintd0 = float(restraints_parameters[name][3]) + float(line[0])/float(restraints_parameters[name][5])*(float(restraints_parameters[name][4]) - float(restraints_parameters[name][3])) + restraintk = float(restraints_parameters[name][1]) + float(line[0])/float(restraints_parameters[name][5])*(float(restraints_parameters[name][2]) - float(restraints_parameters[name][1])) + sqrt_k = sqrt(restraintk) + limit1 = restraintd0 - 2./sqrt_k + limit2 = restraintd0 + 2./sqrt_k + + if restrainttype == "Harmonic": + if dist >= limit1 and dist <= limit2: + nsatisfied += 1.0 + nsatisfiedharm += 1.0 + #print "#ESTABLISHED",time_point,name,restraints_parameters[name],limit1,dist,limit2 + else: + pass + #print "#NOESTABLISHED",time_point,name,restraints_parameters[name],limit1,dist,limit2 + ntotharm += 1.0 + if restrainttype == "HarmonicLowerBound": + if dist >= restraintd0: + nsatisfied += 1.0 + nsatisfiedharmLowBound += 1.0 + #print "#ESTABLISHED",time_point,name,restraints_parameters[name],dist,restraintd0 + else: + pass + #print "#NOESTABLISHED",time_point,name,restraints_parameters[name],dist,restraintd0 + ntotharmLowBound += 1.0 + ntot += 1.0 + #print int(line[0])+(time_point)*timesteps_per_k_change, nsatisfied, ntot, nsatisfiedharm, ntotharm, nsatisfiedharmLowBound, ntotharmLowBound + if ntotharm == 0.: + ntotharm = 1.0 + if ntotharmLowBound == 0.: + ntotharmLowBound = 1.0 + + + outfile.write("%d %lf %lf %lf\n" % (int(line[0])+(time_point)*timesteps_per_k_change, nsatisfied/ntot*100., nsatisfiedharm/ntotharm*100., nsatisfiedharmLowBound/ntotharmLowBound*100.)) + infile.close() + outfile.close() + +########## + +def read_objective_function(fname): + + obj_func=[] + fhandler = open(fname) + line = next(fhandler) + try: + while True: + if line.startswith('#'): + line = next(fhandler) + continue + line = line.strip() + if len(line) == 0: + continue + line_vals = line.split() + obj_func.append(float(line_vals[1])) + line = next(fhandler) + except StopIteration: + pass + fhandler.close() + + return obj_func +########## + +def compute_the_objective_function(input_file_name, + output_file_name, + time_point, + timesteps_per_k_change): + + + infile = open(input_file_name , "r") + outfile = open(output_file_name, "w") + if os.path.getsize(output_file_name) == 0: + outfile.write("#Timestep obj_funct\n") + + columns_to_consider = [] + + # Checking which columns contain the energies to sum + for line in infile.readlines(): + line = line.strip().split() + + # Checking which columns contain the energies to sum + if line[0][0] == "#": + for column in range(len(line)): + if "_pot_" in line[column]: + columns_to_consider.append(column-1) + else: + obj_funct = 0.0 + for column in columns_to_consider: + if column < len(line): + obj_funct += float(line[column]) + outfile.write("%d %s\n" % (int(line[0])+timesteps_per_k_change*(time_point), obj_funct)) + + infile.close() + outfile.close() + + +### get unique list ### + +def get_list(input_list): + + output_list = [] + + for element in input_list: + #print(type(element)) + if isinstance(element, (int)): + output_list.append(element) + if isinstance(element, (list)): + for subelement in element: + output_list.append(subelement) + if isinstance(element, (tuple)): + for subelement in range(element[0],element[1]+1,element[2]): + output_list.append(subelement) + return output_list + +########## +#MPI.Finalize() diff --git a/build/lib.linux-x86_64-3.6/tadphys/modelling/lammpsmodel.py b/build/lib.linux-x86_64-3.6/tadphys/modelling/lammpsmodel.py new file mode 100644 index 0000000..c604e39 --- /dev/null +++ b/build/lib.linux-x86_64-3.6/tadphys/modelling/lammpsmodel.py @@ -0,0 +1,38 @@ +""" +25 Oct 2016 + + +""" +from tadphys.modelling.structuralmodel import StructuralModel + +class LAMMPSmodel(StructuralModel): + """ + A container for the LAMMPS modelling results. The container is a dictionary + with the following keys: + + - rand_init: Random number generator feed (needed for model reproducibility) + - x, y, z: 3D coordinates of each particles. Each represented as a list + + """ + def __str__(self): + try: + return ('LAMMPS model ranked %s (%s particles) with: \n' + + ' - random initial value: %s\n' + + ' - first coordinates:\n'+ + ' X Y Z\n'+ + ' %7s%7s%7s\n'+ + ' %7s%7s%7s\n'+ + ' %7s%7s%7s\n') % ( + self['index'] + 1, + len(self['x']), self['rand_init'], + int(self['x'][0]), int(self['y'][0]), int(self['z'][0]), + int(self['x'][1]), int(self['y'][1]), int(self['z'][1]), + int(self['x'][2]), int(self['y'][2]), int(self['z'][2])) + except IndexError: + return ('LAMMPS model of %s particles with: \n' + + ' - random initial value: %s\n' + + ' - first coordinates:\n'+ + ' X Y Z\n'+ + ' %5s%5s%5s\n') % ( + len(self['x']), self['rand_init'], + self['x'][0], self['y'][0], self['z'][0]) diff --git a/taddyn/modelling/restraints.py b/build/lib.linux-x86_64-3.6/tadphys/modelling/restraints.py similarity index 100% rename from taddyn/modelling/restraints.py rename to build/lib.linux-x86_64-3.6/tadphys/modelling/restraints.py diff --git a/build/lib.linux-x86_64-3.6/tadphys/modelling/structuralmodel.py b/build/lib.linux-x86_64-3.6/tadphys/modelling/structuralmodel.py new file mode 100644 index 0000000..58b74f9 --- /dev/null +++ b/build/lib.linux-x86_64-3.6/tadphys/modelling/structuralmodel.py @@ -0,0 +1,955 @@ +""" +25 Oct 2016 + + +""" +from __future__ import print_function + + +from tadphys.utils.tadmaths import newton_raphson +from tadphys import __version__ as version +from math import sqrt, pi +import hashlib + +try: + basestring +except NameError: + basestring = str + +def model_header(model): + """ + Defines the header to write in output files for a given model + """ + if not 'description' in model: + return '' + outstr = '' + for desc in sorted(model['description']): + outstr += '# %-15s : %s\n' % (desc.upper(), model['description'][desc]) + return outstr + + +class StructuralModel(dict): + """ + A container for the structural modelling results. The container is a dictionary + with the following keys: + + - rand_init: Random number generator feed (needed for model reproducibility) + - x, y, z: 3D coordinates of each particles. Each represented as a list + + """ + def __len__(self): + return len(self['x']) + + def distance(self, part1, part2): + """ + Calculates the distance between one point of the model and an external + coordinate + + :param part1: index of a particle in the model + :param part2: index of a particle in the model + + :returns: distance between one point of the model and an external + coordinate + """ + if part1 == 0 or part2 == 0: + raise Exception('Particle number must be strictly positive\n') + return sqrt((self['x'][part1-1] - self['x'][part2-1])**2 + + (self['y'][part1-1] - self['y'][part2-1])**2 + + (self['z'][part1-1] - self['z'][part2-1])**2) + + + def _square_distance(self, part1, part2): + """ + Calculates the square instance between one point of the model and an + external coordinate + + :param part1: index of a particle in the model + :param part2: index of a particle in the model + + :returns: distance between one point of the model and an external + coordinate + """ + return ((self['x'][part1-1] - self['x'][part2-1])**2 + + (self['y'][part1-1] - self['y'][part2-1])**2 + + (self['z'][part1-1] - self['z'][part2-1])**2) + + + def _square_distance_to(self, part1, part2): + """ + :param part1: index of a particle in the model + :param part2: external coordinate (dict format with x, y, z keys) + + :returns: square distance between one point of the model and an external + coordinate + """ + return ((self['x'][part1] - part2[0])**2 + + (self['y'][part1] - part2[1])**2 + + (self['z'][part1] - part2[2])**2) + + + def center_of_mass(self): + """ + Gives the center of mass of a model + + :returns: the center of mass of a given model + """ + r_x = sum(self['x'])/len(self) + r_y = sum(self['y'])/len(self) + r_z = sum(self['z'])/len(self) + return dict((('x', r_x), ('y', r_y), ('z', r_z))) + + + def radius_of_gyration(self): + """ + Calculates the radius of gyration or gyradius of the model + + Defined as: + + .. math:: + + \sqrt{\\frac{\sum_{i=1}^{N} (x_i-x_{com})^2+(y_i-y_{com})^2+(z_i-z_{com})^2}{N}} + + with: + + * :math:`N` the number of particles in the model + * :math:`com` the center of mass + + :returns: the radius of gyration for the components of the tensor + """ + com = self.center_of_mass() + rog = sqrt(sum(self._square_distance_to(i, + (com['x'], com['y'], com['z'])) + for i in range(len(self))) / len(self)) + return rog + + + def contour(self): + """ + Total length of the model + + :returns: the totals length of the model + """ + dist = 0 + for i in range(1, len(self)): + dist += self.distance(i, i+1) + return dist + + + def longest_axe(self): + """ + Gives the distance between most distant particles of the model + + :returns: the maximum distance between two particles in the model + """ + maxdist = 0 + for i in range(1, len(self)): + for j in range(i + 1, len(self) + 1): + dist = self.distance(i, j) + if dist > maxdist: + maxdist = dist + return maxdist + + + def shortest_axe(self): + """ + Minimum distance between two particles in the model + + :returns: the minimum distance between two particles in the model + """ + mindist = float('inf') + for i in range(1, len(self)): + for j in range(i + 1, len(self) + 1): + dist = self.distance(i, j) + if dist < mindist: + mindist = dist + return mindist + + + def min_max_by_axis(self): + """ + Calculates the minimum and maximum coordinates of the model + + :returns: the minimum and maximum coordinates for each x, y and z axis + """ + return ((min(self['x']), max(self['x'])), + (min(self['y']), max(self['y'])), + (min(self['z']), max(self['z']))) + + + def cube_side(self): + """ + Calculates the side of a cube containing the model. + + :returns: the diagonal length of the cube containing the model + """ + return sqrt((min(self['x']) - max(self['x']))**2 + + (min(self['y']) - max(self['y']))**2 + + (min(self['z']) - max(self['z']))**2) + + + def cube_volume(self): + """ + Calculates the volume of a cube containing the model. + + :returns: the volume of the cube containing the model + """ + return self.cube_side()**3 + + + def inaccessible_particles(self, radius): + """ + Gives the number of loci/particles that are accessible to an object + (i.e. a protein) of a given size. + + :param radius: radius of the object that we want to fit in the model + + :returns: a list of numbers, each being the ID of a particles that would + never be reached by the given object + + TODO: remove this function + + """ + inaccessibles = [] + sphere = generate_sphere_points(100) + for i in range(len(self)): + impossibles = 0 + for xxx, yyy, zzz in sphere: + thing = (xxx * radius + self['x'][i], + yyy * radius + self['y'][i], + zzz * radius + self['z'][i]) + # print form % (k+len(self), thing['x'], thing['y'], thing['z'], + # 0, 0, 0, k+len(self)), + for j in range(len(self)): + if i == j: + continue + # print self._square_distance_to(j, thing), radius + if self._square_distance_to(j, thing) < radius**2: + # print i, j + impossibles += 1 + break + if impossibles == 100: + inaccessibles.append(i + 1) + return inaccessibles + + + # def persistence_length(self, start=1, end=None, return_guess=False): + # """ + # Calculates the persistence length (Lp) of given section of the model. + # Persistence length is calculated according to [Bystricky2004]_ : + + # .. math:: + + # = 2 \\times Lp^2 \\times (\\frac{Lc}{Lp} - 1 + e^{\\frac{-Lc}{Lp}}) + + # with the contour length as :math:`Lc = \\frac{d}{c}` where :math:`d` is + # the genomic dstance in bp and :math:`c` the linear mass density of the + # chromatin (in bp/nm). + + # :param 1 start: start particle from which to calculate the persistence + # length + # :param None end: end particle from which to calculate the persistence + # length. Uses the last particle by default + # :param False return_guess: Computes persistence length using the + # approximation :math:`Lp=\\frac{Lc}{Lp}` + + # :returns: persistence length, or 2 times the Kuhn length + # """ + # clength = float(self.contour()) + # end = end or len(self) + # sq_length = float(self._square_distance(start, end)) + + # guess = sq_length / clength + # if return_guess: + # return guess # incredible! + # kuhn = newton_raphson(guess, clength, sq_length) + # return 2 * kuhn + + + # def accessible_surface(self, radius, nump=100, superradius=200, + # include_edges=True, view_mesh=False, savefig=None, + # write_cmm_file=None, verbose=False, + # chimera_bin='chimera'): + # """ + # Calculates a mesh surface around the model (distance equal to input + # **radius**) and checks if each point of this mesh could be replaced by + # an object (i.e. a protein) of a given **radius** + + # Outer part of the model can be excluded from the estimation of + # accessible surface, as the occupancy outside the model is unknown (see + # superradius option). + + # :param radius: radius of the object we want to fit in the model. + # :param None write_cmm_file: path to file in which to write cmm with the + # colored meshed (red inaccessible points, green accessible points) + # :param 100 nump: number of points to draw around a given particle. This + # number also sets the number of points drawn around edges, as each + # point occupies a given surface (see maths below). *Note that this + # number is considerably lowered by the occupancy of edges, depending + # of the angle formed by the edges surrounding a given particle, only + # 10% to 50% of the ``nump`` will be drawn in fact.* + # :param True include_edges: if False, edges will not be included in the + # calculation of the accessible surface, only particles. Note that + # statistics on particles (like last item returned) will not change, + # and computation time will be significantly decreased. + # :param False view_mesh: launches chimera to display the mesh around the + # model + # :param None savefig: path where to save chimera image + # :param 'chimera' chimera_bin: path to chimera binary to use + # :param False verbose: prints stats about the surface + # :param 200 superradius: radius of an object used to exclude outer + # surface of the model. Superradius must be higher than radius. + + # This function will first define a mesh around the chromatin, + # representing all possible position of the center of the object we want + # to fit. This mesh will be at a distance of *radius* from the chromatin + # strand. All dots in the mesh represents an equal area (*a*), the whole + # surface of the chromatin strand being: :math:`A=n \\times a` (*n* being + # the total number of dots in the mesh). + + # The mesh consists of spheres around particles of the model, and + # cylinders around edges joining particles (no overlap is allowed between + # sphere and cylinders or cylinder and cylinder when they are + # consecutive). + + # If we want that all dots of the mesh representing the surface of the + # chromatin, corresponds to an equal area (:math:`a`) + + # .. math:: + + # a = \\frac{4\pi r^2}{s} = \\frac{2\pi r N_{(d)}}{c} + + # with: + + # * :math:`r` radius of the object to fit (as the input parameter **radius**) + # * :math:`s` number of points in sphere + # * :math:`c` number of points in circle (as the input parameter **nump**) + # * :math:`N_{(d)}` number of circles in an edge of length :math:`d` + + # According to this, when the distance between two particles is equal + # to :math:`2r` (:math:`N=2r`), we would have :math:`s=c`. + + # As : + + # .. math:: + + # 2\pi r = \sqrt{4\pi r^2} \\times \sqrt{\pi} + + # It is fair to state the number of dots represented along a circle as: + + # .. math:: + + # c = \sqrt{s} \\times \sqrt{\pi} + + # Thus the number of circles in an edge of length :math:`d` must be: + + # .. math:: + + # N_{(d)}=\\frac{s}{\sqrt{s}\sqrt{\pi}}\\times\\frac{d}{2r} + + # :returns: a list of *1-* the number of dots in the mesh that could be + # occupied by an object of the given radius *2-* the total number of + # dots in the mesh *3-* the estimated area of the mesh (in square + # micrometers) *4-* the area of the mesh of a virtually straight strand + # of chromatin defined as + # :math:`contour\\times 2\pi r + 4\pi r^2` (also in + # micrometers) *5-* a list of number of (accessibles, inaccessible) for + # each particle (percentage burried can be infered afterwards by + # accessible/(accessible+inaccessible) ) + + # """ + + # points, dots, superdots, points2dots = build_mesh( + # self['x'], self['y'], self['z'], len(self), nump, radius, + # superradius, include_edges) + + # # calculates the number of inaccessible peaces of surface + # if superradius: + # radius2 = (superradius - 4)**2 + # outdot = [] + # for x2, y2, z2 in superdots: + # for j, (x1, y1, z1) in enumerate(points): + # if fast_square_distance(x1, y1, z1, x2, y2, z2) < radius2: + # outdot.append(False) + # break + # else: + # outdot.append(True) + # continue + # points.insert(0, points.pop(j)) + # else: + # outdot = [False] * len(superdots) + + # # calculates the number of inaccessible peaces of surface + # radius2 = (radius - 2)**2 + # grey = (0.6, 0.6, 0.6) + # red = (1, 0, 0) + # green = (0, 1, 0) + # colors = [] + # for i, (x2, y2, z2) in enumerate(dots): + # if outdot[i]: + # colors.append(grey) + # continue + # for j, (x1, y1, z1) in enumerate(points): + # if fast_square_distance(x1, y1, z1, x2, y2, z2) < radius2: + # colors.append(red) + # break + # else: + # colors.append(green) + # continue + # points.insert(0, points.pop(j)) + # possibles = colors.count(green) + + # acc_parts = [] + # for p in sorted(points2dots.keys()): + # acc = 0 + # ina = 0 + # for dot in points2dots[p]: + # if colors[dot]==green: + # acc += 1 + # elif colors[dot]==red: + # ina += 1 + # acc_parts.append((p + 1, acc, ina)) + + # # some stats + # dot_area = 4 * pi * (float(radius) / 1000)**2 / nump + # area = (possibles * dot_area) + # total = (self.contour() / 1000 * 2 * pi * float(radius) / 1000 + 4 * pi + # * (float(radius) / 1000)**2) + # if verbose: + # print((' Accessible surface: %s micrometers^2' + + # '(%s accessible times %s micrometers)') % ( + # round(area, 2), possibles, dot_area)) + # print(' (%s accessible dots of %s total times %s micrometers)' % ( + # possibles, outdot.count(False), round(dot_area, 5))) + # print(' - %s%% of the contour mesh' % ( + # round((float(possibles)/outdot.count(False))*100, 2))) + # print(' - %s%% of a virtual straight chromatin (%s microm^2)' % ( + # round((area/total)*100, 2), round(total, 2))) + + # # write cmm file + # if savefig: + # view_mesh = True + # if write_cmm_file or view_mesh: + # out = '\n' + # form = ('\n') + # for k_2, thing in enumerate(dots): + # out += form % (1 + k_2, thing[0], thing[1], thing[2], + # colors[k_2][0], colors[k_2][1], colors[k_2][2]) + # if superradius: + # for k_3, thing in enumerate(superdots): + # out += form % (1 + k_3 + k_2 + 1, + # thing[0], thing[1], thing[2], + # 0.1, 0.1, 0.1) + # out += '\n' + # if view_mesh: + # out_f = open('/tmp/tmp_mesh.cmm', 'w') + # out_f.write(out) + # out_f.close() + # if write_cmm_file: + # out_f = open(write_cmm_file, 'w') + # out_f.write(out) + # out_f.close() + # if view_mesh: + # chimera_cmd = [ + # 'focus', + # 'bonddisplay never #1', + # 'shape tube #1 radius 15 bandLength 300 segmentSubdivisions 1 followBonds on', + # '~show #1', + # 'set bg_color white', 'windowsize 800 600', + # 'clip yon -500', 'set subdivision 1', 'set depth_cue', + # 'set dc_color black', 'set dc_start 0.5', 'set dc_end 1', + # 'scale 0.8'] + # if savefig: + # if savefig.endswith('.png'): + # chimera_cmd += ['copy file %s png' % (savefig)] + # elif savefig[-4:] in ('.mov', 'webm'): + # chimera_cmd += [ + # 'movie record supersample 1', 'turn y 3 120', + # 'wait 120', 'movie stop', + # 'movie encode output %s' % savefig] + # self.write_cmm('/tmp/') + # chimera_view(['/tmp/tmp_mesh.cmm', + # '/tmp/model.%s.cmm' % (self['rand_init'])], + # chimera_bin=chimera_bin, align=False, + # savefig=savefig, chimera_cmd=chimera_cmd) + + # return (possibles, outdot.count(False), area, total, acc_parts) + + + # def write_cmm(self, directory, color='index', rndname=True, + # model_num=None, filename=None, **kwargs): + # """ + # Save a model in the cmm format, read by Chimera + # (http://www.cgl.ucsf.edu/chimera). + + # **Note:** If none of model_num, models or cluster parameter are set, + # ALL the models will be written. + + # :param directory: location where the file will be written (note: the + # name of the file will be model_1.cmm if model number is 1) + # :param None model_num: the number of the model to save + # :param True rndname: If True, file names will be formatted as: + # model.RND.cmm, where RND is the random number feed used by IMP to + # generate the corresponding model. If False, the format will be: + # model_NUM_RND.cmm where NUM is the rank of the model in terms of + # objective function value + # :param None filename: overide the default file name writing + # :param 'index' color: can be: + + # * a string as: + # * '**index**' to color particles according to their position in the + # model (:func:`tadphys.utils.extraviews.color_residues`) + # * '**tad**' to color particles according to the TAD they belong to + # (:func:`tadphys.utils.extraviews.tad_coloring`) + # * '**border**' to color particles marking borders. Color according to + # their score (:func:`tadphys.utils.extraviews.tad_border_coloring`) + # coloring function like. + # * a function, that takes as argument a model and any other parameter + # passed through the kwargs. + # * a list of (r, g, b) tuples (as long as the number of particles). + # Each r, g, b between 0 and 1. + # :param kwargs: any extra argument will be passed to the coloring + # function + # """ + # if isinstance(color, basestring): + # if color == 'index': + # color = color_residues(self, **kwargs) + # elif color == 'tad': + # if not 'tads' in kwargs: + # raise Exception('ERROR: missing TADs\n ' + + # 'pass an Experiment.tads disctionary\n') + # color = tad_coloring(self, **kwargs) + # elif color == 'border': + # if not 'tads' in kwargs: + # raise Exception('ERROR: missing TADs\n ' + + # 'pass an Experiment.tads disctionary\n') + # color = tad_border_coloring(self, **kwargs) + # else: + # raise NotImplementedError(('%s type of coloring is not yet ' + + # 'implemeted\n') % color) + # elif hasattr(color, '__call__'): # it's a function + # color = color(self, **kwargs) + # elif not isinstance(color, list): + # raise TypeError('one of function, list or string is required\n') + # out = '\n' % (self['rand_init']) + # form = ('\n') + # for i in range(len(self['x'])): + # out += form % (i + 1, + # self['x'][i], self['y'][i], self['z'][i], + # color[i][0], color[i][1], color[i][2], i + 1) + # form = ('\n') + # break_chroms = [1] + # try: + # for beg, end in zip(self['description']['start'],self['description']['end']): + # break_chroms.append((end - beg)/self['description']['resolution']+break_chroms[-1]) + # except: + # pass + # for i in range(1, len(self['x'])): + # if i in break_chroms[1:]: + # continue + # out += form % (i, i + 1) + # out += '\n' + + # if filename: + # out_f = open('%s/%s' % (directory, filename), 'w') + # else: + # if rndname: + # out_f = open('%s/model.%s.cmm' % (directory, + # self['rand_init']), 'w') + # else: + # out_f = open('%s/model_%s_rnd%s.cmm' % ( + # directory, model_num, self['rand_init']), 'w') + # out_f.write(out) + # out_f.close() + + +# def write_xyz_OLD(self, directory, model_num=None, get_path=False, +# rndname=True): +# """ +# Writes a xyz file containing the 3D coordinates of each particle in the +# model. + +# **Note:** If none of model_num, models or cluster parameter are set, +# ALL the models will be written. + +# :param directory: location where the file will be written (note: the +# file name will be model.1.xyz, if the model number is 1) +# :param None model_num: the number of the model to save +# :param True rndname: If True, file names will be formatted as: +# model.RND.xyz, where RND is the random number feed used by IMP to +# generate the corresponding model. If False, the format will be: +# model_NUM_RND.xyz where NUM is the rank of the model in terms of +# objective function value +# :param False get_path: whether to return, or not, the full path where +# the file has been written +# """ +# if rndname: +# path_f = '%s/model.%s.xyz' % (directory, self['rand_init']) +# else: +# path_f = '%s/model_%s_rnd%s.xyz' % (directory, model_num, +# self['rand_init']) +# out = '' +# form = "%12s%12s%12.3f%12.3f%12.3f\n" +# for i in range(len(self['x'])): +# out += form % ('p' + str(i + 1), i + 1, round(self['x'][i], 3), +# round(self['y'][i], 3), round(self['z'][i], 3)) +# out_f = open(path_f, 'w') +# out_f.write(out) +# out_f.close() +# if get_path: +# return path_f +# else: +# return None + + + +# def write_json(self, directory, color='index', rndname=True, +# model_num=None, title=None, filename=None, **kwargs): +# """ +# Save a model in the json format, read by TADkit. + +# **Note:** If none of model_num, models or cluster parameter are set, +# ALL the models will be written. + +# :param directory: location where the file will be written (note: the +# name of the file will be model_1.cmm if model number is 1) +# :param None model_num: the number of the model to save +# :param True rndname: If True, file names will be formatted as: +# model.RND.cmm, where RND is the random number feed used by IMP to +# generate the corresponding model. If False, the format will be: +# model_NUM_RND.cmm where NUM is the rank of the model in terms of +# objective function value +# :param None filename: overide the default file name writing +# :param 'index' color: can be: + +# * a string as: +# * '**index**' to color particles according to their position in the +# model (:func:`tadphys.utils.extraviews.color_residues`) +# * '**tad**' to color particles according to the TAD they belong to +# (:func:`tadphys.utils.extraviews.tad_coloring`) +# * '**border**' to color particles marking borders. Color according to +# their score (:func:`tadphys.utils.extraviews.tad_border_coloring`) +# coloring function like. +# * a function, that takes as argument a model and any other parameter +# passed through the kwargs. +# * a list of (r, g, b) tuples (as long as the number of particles). +# Each r, g, b between 0 and 1. +# :param kwargs: any extra argument will be passed to the coloring +# function +# """ +# if isinstance(color, basestring): +# if color == 'index': +# color = color_residues(self, **kwargs) +# elif color == 'tad': +# if not 'tads' in kwargs: +# raise Exception('ERROR: missing TADs\n ' + +# 'pass an Experiment.tads disctionary\n') +# color = tad_coloring(self, **kwargs) +# elif color == 'border': +# if not 'tads' in kwargs: +# raise Exception('ERROR: missing TADs\n ' + +# 'pass an Experiment.tads disctionary\n') +# color = tad_border_coloring(self, **kwargs) +# else: +# raise NotImplementedError(('%s type of coloring is not yet ' + +# 'implemeted\n') % color) +# elif hasattr(color, '__call__'): # it's a function +# color = color(self, **kwargs) +# elif not isinstance(color, list): +# raise TypeError('one of function, list or string is required\n') +# form = ''' +# { +# "chromatin" : { +# "id" : %(sha)s, +# "title" : "%(title)s", +# "source" : "Tadphys %(version)s", +# "metadata": {%(descr)s}, +# "type" : "tadphys", +# "data": { +# "models": [ +# { %(xyz)s }, +# ], +# "clusters":[%(cluster)s], +# "centroid":[%(centroid)s], +# "restraints": [[][]], +# "chromatinColor" : [ ] +# } +# } +# } +# ''' +# fil = {} +# fil['sha'] = hashlib.new(fil['xyz']).hexdigest() +# fil['title'] = title or "Sample TADbit data" +# fil['version'] = version +# fil['descr'] = ''.join('\n', ',\n'.join([ +# '"%s" : %s' % (k, ('"%s"' % (v)) if isinstance(v, basestring) else v) +# for k, v in list(self.get('description', {}).items())]), '\n') +# fil['xyz'] = ','.join(['[%.4f,%.4f,%.4f]' % (self['x'][i], self['y'][i], +# self['z'][i]) +# for i in range(len(self['x']))]) + + +# if filename: +# out_f = open('%s/%s' % (directory, filename), 'w') +# else: +# if rndname: +# out_f = open('%s/model.%s.cmm' % (directory, +# self['rand_init']), 'w') +# else: +# out_f = open('%s/model_%s_rnd%s.cmm' % ( +# directory, model_num, self['rand_init']), 'w') +# out_f.write(out) +# out_f.close() + + + +# def write_xyz(self, directory, model_num=None, get_path=False, +# rndname=True, filename=None, header=True): +# """ +# Writes a xyz file containing the 3D coordinates of each particle in the +# model. +# Outfile is tab separated column with the bead number being the +# first column, then the genomic coordinate and finally the 3 +# coordinates X, Y and Z + +# **Note:** If none of model_num, models or cluster parameter are set, +# ALL the models will be written. + +# :param directory: location where the file will be written (note: the +# file name will be model.1.xyz, if the model number is 1) +# :param None model_num: the number of the model to save +# :param True rndname: If True, file names will be formatted as: +# model.RND.xyz, where RND is the random number feed used by IMP to +# generate the corresponding model. If False, the format will be: +# model_NUM_RND.xyz where NUM is the rank of the model in terms of +# objective function value +# :param False get_path: whether to return, or not, the full path where +# the file has been written +# :param None filename: overide the default file name writing +# :param True header: write a header describing the experiment from which +# """ +# if filename: +# path_f = '%s/%s' % (directory, filename) +# else: +# if rndname: +# path_f = '%s/model.%s.xyz' % (directory, self['rand_init']) +# else: +# path_f = '%s/model_%s_rnd%s.xyz' % (directory, model_num, +# self['rand_init']) +# out = '' +# if header: +# out += model_header(self) +# form = "%s\t%s\t%.3f\t%.3f\t%.3f\n" +# # TODO: do not use resolution directly -> specific to Hi-C +# chrom_list = self['description']['chromosome'] +# chrom_start = self['description']['start'] +# chrom_end = self['description']['end'] +# if not isinstance(chrom_list, list): +# chrom_list = [chrom_list] +# chrom_start = [chrom_start] +# chrom_end = [chrom_end] + +# chrom_start = [(int(c) // int(self['description']['resolution']) +# if int(c) else 0) +# for c in chrom_start] +# chrom_end = [(int(c) // int(self['description']['resolution']) +# if int(c) else len(self['x'])) +# for c in chrom_end] + +# offset = -chrom_start[0] +# for crm in range(len(chrom_list)): +# for i in range(chrom_start[crm] + offset, chrom_end[crm] + offset): +# out += form % ( +# i + 1, +# '%s:%s-%s' % ( +# chrom_list[crm], +# int(chrom_start[crm] or 0) + +# int(self['description']['resolution']) * (i - offset) + 1, +# int(chrom_start[crm] or 0) + +# int(self['description']['resolution']) * (i + 1 - offset)), +# round(self['x'][i], 3), +# round(self['y'][i], 3), round(self['z'][i], 3)) +# offset += (chrom_end[crm] - chrom_start[crm]) +# out_f = open(path_f, 'w') +# out_f.write(out) +# out_f.close() +# if get_path: +# return path_f +# else: +# return None + + # def write_xyz_babel(self, directory, model_num=None, get_path=False, + # rndname=True, filename=None): + # """ + # Writes a xyz file containing the 3D coordinates of each particle in the + # model using a file format compatible with babel + # (http://openbabel.org/wiki/XYZ_%28format%29). + # Outfile is tab separated column with the bead number being the + # first column, then the genomic coordinate and finally the 3 + # coordinates X, Y and Z + # **Note:** If none of model_num, models or cluster parameter are set, + # ALL the models will be written. + # :param directory: location where the file will be written (note: the + # file name will be model.1.xyz, if the model number is 1) + # :param None model_num: the number of the model to save + # :param True rndname: If True, file names will be formatted as: + # model.RND.xyz, where RND is the random number feed used by IMP to + # generate the corresponding model. If False, the format will be: + # model_NUM_RND.xyz where NUM is the rank of the model in terms of + # objective function value + # :param False get_path: whether to return, or not, the full path where + # the file has been written + # :param None filename: overide the default file name writing + # """ + # if filename: + # path_f = '%s/%s' % (directory, filename) + # else: + # if rndname: + # path_f = '%s/model.%s.xyz' % (directory, self['rand_init']) + # else: + # path_f = '%s/model_%s_rnd%s.xyz' % (directory, model_num, + # self['rand_init']) + # out = '' + # # Write header as number of atoms + # out += str(len(self['x'])) + # # Write comment as type of molecule + # out += "\nDNA\n" + + # form = "%s\t%.3f\t%.3f\t%.3f\n" + # # TODO: do not use resolution directly -> specific to Hi-C + # chrom_list = self['description']['chromosome'] + # chrom_start = self['description']['start'] + # chrom_end = self['description']['end'] + # if not isinstance(chrom_list, list): + # chrom_list = [chrom_list] + # chrom_start = [chrom_start] + # chrom_end = [chrom_end] + # chrom_start = [int(c)/int(self['description']['resolution']) for c in chrom_start] + # chrom_end = [int(c)/int(self['description']['resolution']) for c in chrom_end] + # offset = 0 + # for crm in range(len(chrom_list)): + # for i in range(chrom_start[crm]+offset,chrom_end[crm]+offset): + # out += form % ( + # i + 1, + # '%s:%s-%s' % ( + # chrom_list[crm], + # int(chrom_start[crm] or 0) + + # int(self['description']['resolution']) * (i - offset) + 1, + # int(chrom_start[crm] or 0) + + # int(self['description']['resolution']) * (i + 1 - offset)), + # round(self['x'][i], 3), + # round(self['y'][i], 3), round(self['z'][i], 3)) + # offset += (chrom_end[crm]-chrom_start[crm]) + # out_f = open(path_f, 'w') + # out_f.write(out) + # out_f.close() + # if get_path: + # return path_f + # else: + # return None + + # def view_model(self, tool='chimera', savefig=None, cmd=None, + # center_of_mass=False, gyradius=False, color='index', + # **kwargs): + # """ + # Visualize a selected model in the three dimensions. (either with Chimera + # or through matplotlib). + + # :param model_num: model to visualize + # :param 'chimera' tool: path to the external tool used to visualize the + # model. Can also be 'plot', to use matplotlib. + # :param None savefig: path to a file where to save the image OR movie + # generated (depending on the extension; accepted formats are png, mov + # and webm). if set to None, the image or movie will be shown using + # the default GUI. + # :param 'index' color: can be: + + # * a string as: + # * '**index**' to color particles according to their position in the + # model (:func:`tadphys.utils.extraviews.color_residues`) + # * '**tad**' to color particles according to the TAD they belong to + # (:func:`tadphys.utils.extraviews.tad_coloring`) + # * '**border**' to color particles marking borders. Color according to + # their score (:func:`tadphys.utils.extraviews.tad_border_coloring`) + # coloring function like. + # * a function, that takes as argument a model and any other parameter + # passed through the kwargs. + # * a list of (r, g, b) tuples (as long as the number of particles). + # Each r, g, b between 0 and 1. + # :param False center_of_mass: draws a dot representing the center of + # mass of the model + # :param False gyradius: draws the center of mass of the model as a sphere + # with radius equal to the radius of gyration of the model + # :param None cmd: list of commands to be passed to the viewer. + # The chimera list is: + + # :: + + # focus + # set bg_color white + # windowsize 800 600 + # bonddisplay never #0 + # represent wire + # shape tube #0 radius 5 bandLength 100 segmentSubdivisions 1 followBonds on + # clip yon -500 + # ~label + # set subdivision 1 + # set depth_cue + # set dc_color black + # set dc_start 0.5 + # set dc_end 1 + # scale 0.8 + + # Followed by the movie command to record movies: + + # :: + + # movie record supersample 1 + # turn y 3 120 + # wait 120 + # movie stop + # movie encode output SAVEFIG + + # Or the copy command for images: + + # :: + + # copy file SAVEFIG png + + # Passing as the following list as 'cmd' parameter: + # :: + + # cmd = ['focus', 'set bg_color white', 'windowsize 800 600', + # 'bonddisplay never #0', + # 'shape tube #0 radius 10 bandLength 200 segmentSubdivisions 100 followBonds on', + # 'clip yon -500', '~label', 'set subdivision 1', + # 'set depth_cue', 'set dc_color black', 'set dc_start 0.5', + # 'set dc_end 1', 'scale 0.8'] + + # will return the default image (other commands can be passed to + # modified the final image/movie). + + # :param kwargs: see :func:`tadphys.utils.extraviews.plot_3d_model` or + # :func:`tadphys.utils.extraviews.chimera_view` for other arguments + # to pass to this function + + # """ + # if gyradius: + # gyradius = self.radius_of_gyration() + # center_of_mass = True + # if tool == 'plot': + # x, y, z = self['x'], self['y'], self['z'] + # plot_3d_model(x, y, z, color=color, **kwargs) + # return + # self.write_cmm('/tmp/', color=color, **kwargs) + # chimera_view(['/tmp/model.%s.cmm' % (self['rand_init'])], + # savefig=savefig, chimera_bin=tool, chimera_cmd=cmd, + # center_of_mass=center_of_mass, gyradius=gyradius) diff --git a/build/lib.linux-x86_64-3.6/tadphys/squared_distance_matrix.cpython-36m-x86_64-linux-gnu.so b/build/lib.linux-x86_64-3.6/tadphys/squared_distance_matrix.cpython-36m-x86_64-linux-gnu.so new file mode 100755 index 0000000..b960ef2 Binary files /dev/null and b/build/lib.linux-x86_64-3.6/tadphys/squared_distance_matrix.cpython-36m-x86_64-linux-gnu.so differ diff --git a/taddyn/utils/__init__.py b/build/lib.linux-x86_64-3.6/tadphys/utils/__init__.py similarity index 100% rename from taddyn/utils/__init__.py rename to build/lib.linux-x86_64-3.6/tadphys/utils/__init__.py diff --git a/taddyn/utils/extraviews.py b/build/lib.linux-x86_64-3.6/tadphys/utils/extraviews.py similarity index 99% rename from taddyn/utils/extraviews.py rename to build/lib.linux-x86_64-3.6/tadphys/utils/extraviews.py index 49a984a..6f9e25f 100644 --- a/taddyn/utils/extraviews.py +++ b/build/lib.linux-x86_64-3.6/tadphys/utils/extraviews.py @@ -93,7 +93,6 @@ def plot_2d_optimization_result(result, # Commands for compatibility with the OLD version: #print axes_range if len(axes_range) == 4: - print "I'm here!!!" tmp_axes_range = axes_range tmp_axes_range[1] = [0.0] # kbending !!!New option!!! len_kbending_range = 1 @@ -268,7 +267,7 @@ def plot_2d_optimization_result(result, rect = patches.Rectangle((len(xax)-.5, -0.5), 2.5, len(yax), facecolor='grey', alpha=0.5) # Define the rectangles for - print dcutoff + #print dcutoff rect.set_clip_on(False) grid[cell-1].add_patch(rect) grid[cell-1].text(len(xax) + 1.0, len(yax) / 2., diff --git a/taddyn/utils/hic_filtering.py b/build/lib.linux-x86_64-3.6/tadphys/utils/hic_filtering.py similarity index 96% rename from taddyn/utils/hic_filtering.py rename to build/lib.linux-x86_64-3.6/tadphys/utils/hic_filtering.py index b033e03..a096e1c 100644 --- a/taddyn/utils/hic_filtering.py +++ b/build/lib.linux-x86_64-3.6/tadphys/utils/hic_filtering.py @@ -1,11 +1,12 @@ """ 06 Aug 2013 """ - +from __future__ import print_function # (at top of module) from warnings import warn from sys import stderr from re import sub + import numpy as np try: @@ -311,7 +312,7 @@ def _best_window_size(sorted_prc, size, beg, end, verbose=False): count = 0 if verbose: - print ' * first window size with stable median of cis-percentage: %d' % (win_size) + print(' * first window size with stable median of cis-percentage: %d' % (win_size)) return win_size @@ -419,12 +420,10 @@ def filter_by_cis_percentage(cisprc, beg=0.3, end=0.8, sigma=2, verbose=False, max_count = sorted_sum[-1] + 1 if verbose: - print ' * Lower cutoff applied until bin number: %d' % (cutoffL) - print ' * too few interactions defined as less than %9d interactions' % ( - min_count) - print ' * Upper cutoff applied until bin number: %d' % (cutoffR) - print ' * too much interactions defined as more than %9d interactions' % ( - max_count) + print(' * Lower cutoff applied until bin number: %d' % (cutoffL)) + print(' * too few interactions defined as less than %9d interactions' % (min_count)) + print(' * Upper cutoff applied until bin number: %d' % (cutoffR)) + print(' * too much interactions defined as more than %9d interactions' % (max_count)) # plot @@ -441,7 +440,6 @@ def filter_by_cis_percentage(cisprc, beg=0.3, end=0.8, sigma=2, verbose=False, elif cisprc[c][1] > max_count: # don't need get here, already cought in previous condition badcol[c] = cisprc.get(c, [0, 0])[1] countU += 1 - print ' => %d BAD bins (%d/%d/%d null/low/high counts) of %d (%.1f%%)' % ( - len(badcol), countZ, countL, countU, size, float(len(badcol)) / size * 100) + print(' => %d BAD bins (%d/%d/%d null/low/high counts) of %d (%.1f%%)' % (len(badcol), countZ, countL, countU, size, float(len(badcol)) / size * 100)) return badcol diff --git a/taddyn/utils/hic_parser.py b/build/lib.linux-x86_64-3.6/tadphys/utils/hic_parser.py similarity index 100% rename from taddyn/utils/hic_parser.py rename to build/lib.linux-x86_64-3.6/tadphys/utils/hic_parser.py diff --git a/taddyn/utils/maths.py b/build/lib.linux-x86_64-3.6/tadphys/utils/maths.py similarity index 100% rename from taddyn/utils/maths.py rename to build/lib.linux-x86_64-3.6/tadphys/utils/maths.py diff --git a/taddyn/utils/modelAnalysis.py b/build/lib.linux-x86_64-3.6/tadphys/utils/modelAnalysis.py similarity index 100% rename from taddyn/utils/modelAnalysis.py rename to build/lib.linux-x86_64-3.6/tadphys/utils/modelAnalysis.py diff --git a/taddyn/utils/tadmaths.py b/build/lib.linux-x86_64-3.6/tadphys/utils/tadmaths.py similarity index 100% rename from taddyn/utils/tadmaths.py rename to build/lib.linux-x86_64-3.6/tadphys/utils/tadmaths.py diff --git a/build/temp.linux-x86_64-3.6/src/3d-lib/squared_distance_matrix_calculation_py.o b/build/temp.linux-x86_64-3.6/src/3d-lib/squared_distance_matrix_calculation_py.o new file mode 100644 index 0000000..7c6ef6e Binary files /dev/null and b/build/temp.linux-x86_64-3.6/src/3d-lib/squared_distance_matrix_calculation_py.o differ diff --git a/dist/TADphys-0.1-py3.6-linux-x86_64.egg b/dist/TADphys-0.1-py3.6-linux-x86_64.egg new file mode 100644 index 0000000..cf13089 Binary files /dev/null and b/dist/TADphys-0.1-py3.6-linux-x86_64.egg differ diff --git a/dist/Tadphys-0.1-py3.6-linux-x86_64.egg b/dist/Tadphys-0.1-py3.6-linux-x86_64.egg new file mode 100644 index 0000000..e3e6130 Binary files /dev/null and b/dist/Tadphys-0.1-py3.6-linux-x86_64.egg differ diff --git a/setup.py b/setup.py index dac2cbf..8b69d25 100644 --- a/setup.py +++ b/setup.py @@ -3,23 +3,23 @@ def main(): # c++ module to compute the distance matrix of single model - squared_distance_matrix_module = Extension('taddyn.squared_distance_matrix', + squared_distance_matrix_module = Extension('tadphys.squared_distance_matrix', language = "c++", runtime_library_dirs=['3d-lib/'], sources=['src/3d-lib/squared_distance_matrix_calculation_py.c'], extra_compile_args=["-ffast-math"]) setup( - name = 'TADdyn', + name = 'TADphys', version = '0.1', - author = 'Marco Di Stefano, David Castillo', + author = 'Marco Di Stefano', author_email = 'marco.di.distefano.1985@gmail.com', ext_modules = [squared_distance_matrix_module], - packages = ['taddyn', 'taddyn.utils', - 'taddyn.modelling'], + packages = ['tadphys', 'tadphys.utils', + 'tadphys.modelling'], platforms = "OS Independent", license = "GPLv3", - description = 'TADdyn is a Python library that allows to model and explore single or time-series 3C-based data.', + description = 'Tadphys is a Python library that allows to model and explore single or time-series 3C-based data.', long_description = (open("README.rst").read()), #url = 'https://github.com/3DGenomes/tadbit', #download_url = 'https://github.com/3DGenomes/tadbit/tarball/master', diff --git a/tadphys/Chromosome_region.py b/tadphys/Chromosome_region.py new file mode 100644 index 0000000..0fbb5d3 --- /dev/null +++ b/tadphys/Chromosome_region.py @@ -0,0 +1,360 @@ +from math import isnan +from sys import stderr + +from taddyn.utils.hic_parser import read_matrix +from taddyn.utils.maths import zscore +from taddyn.modelling.HIC_CONFIG import CONFIG +from taddyn.modelling.lammps_modelling import generate_lammps_models + +class Chromosome_region(object): + """ + Chromosome Region. + + :param resolution: the resolution of the experiment (size of a bin in + bases) + :param None norm_data: whether a file or a list of lists corresponding to + the Hi-C data + """ + + + def __init__(self, crm, resolution, hic=None, size=None, zeros=None): + self.resolution = resolution + self.crm = crm + self.size = size + self.hic = None + self._zeros = zeros + self._zscores = [] + if hic: + self.load_data(hic, resolution=resolution, size=size) + + def load_data(self, hic_mat, resolution=None, size=None): + """ + Add a normalized Hi-C experiment to the Chromosome_region object. + + :param None norm_data: whether a file or a list of lists corresponding to + the normalized Hi-C data + + """ + hic_matrices = read_matrix(hic_mat, resolution=resolution, hic=False, size=size) + self.hic = [self.HiC_data(hic_mat['matrix'],hic_mat['size']) for hic_mat in hic_matrices] + mats_zeros = [hic_mat['masked'] for hic_mat in hic_matrices] + self.size = len(self.hic[0]) + + if not self._zeros: + if sum([len(mat_zero) for mat_zero in mats_zeros]) > 0: + self._zeros = mats_zeros + + def get_hic_matrix(self, focus=None, diagonal=True, index=0): + """ + Return the Hi-C matrix. + + :param None focus: if a tuple is passed (start, end), wil return a Hi-C + matrix starting at start, and ending at end (all inclusive). + :param True diagonal: replace the values in the diagonal by one. Used + for the filtering in order to smooth the distribution of mean values + :param False normalized: returns normalized data instead of raw Hi-C + :param 0 index: hic_data index or norm index from where to get the matrix + + :returns: list of lists representing the Hi-C data matrix of the + current experiment + """ + siz = self.size + hic = self.hic[index] + if focus: + start, end = focus + start -= 1 + else: + start = 0 + end = siz + if diagonal: + return [[hic[i * self.size + j] for i in range(start, end)] + for j in range(start, end)] + else: + mtrx = [[hic[i * self.size + j] for i in range(start, end)] + for j in range(start, end)] + for i in range(start, end): + mtrx[i][i] = 1 if mtrx[i][i] else 0 + return mtrx + + def _sub_experiment_zscore(self, start, end, index=0): + """ + Get the z-score of a sub-region of a Chromosome region. + + :param start: first bin to model (bin number) + :param end: first bin to model (bin number) + :param 0 index: hic_data index or norm index from where to compute + the zscores. A list is allowed to compute several zscores at the + same time + + :returns: z-score, raw values and zeros of the experiment + """ + if isinstance(index, list): + idx = index + else: + idx = [index] + if start < 1: + raise ValueError('ERROR: start should be higher than 0\n') + start -= 1 # things starts at 0 for python. we keep the end coordinate + # at its original value because it is inclusive + tmp_matrix = [] + for id_mat in idx: + matrix = self.get_hic_matrix(index=id_mat) + new_matrix = [[matrix[i][j] for i in range(start, end)] + for j in range(start, end)] + tmp_matrix.append(new_matrix) + + tmp = Chromosome_region(crm=self.crm, + resolution=self.resolution, + hic=tmp_matrix, + size=len(tmp_matrix[0])) + + # ... but the z-scores in this particular region + vals = [] + tmp._zeros = [] + for id_mat in idx: + tmp._zeros += [dict([(z - start, None) for z in self._zeros[id_mat] + if start <= z <= end - 1])] + if len(tmp._zeros[-1]) == (end - start): + raise Exception('ERROR: no interaction found in selected regions') + tmp.get_hic_zscores(index=id_mat) + values = [[float('nan') for _ in range(tmp.size)] + for _ in range(tmp.size)] + for i in range(tmp.size): + # zeros are rows or columns having a zero in the diagonal + if i in tmp._zeros: + continue + for j in range(i + 1, tmp.size): + if j in tmp._zeros: + continue + if (not tmp.hic[id_mat][i * tmp.size + j] + or not tmp.hic[id_mat][i * tmp.size + j]): + continue + values[i][j] = tmp.hic[id_mat][i * tmp.size + j] + values[j][i] = tmp.hic[id_mat][i * tmp.size + j] + vals.append(values) + return tmp._zscores, vals, tmp._zeros + + def get_hic_zscores(self, zscored=True, remove_zeros=True, index=0): + """ + The result will be stored into + the private Experiment._zscore list. + + :param True zscored: calculate the z-score of the data + :param False remove_zeros: remove null interactions. Dangerous, null + interaction are informative. + :param 0 index: hic_data index or norm index from where to produce the zscores + + """ + values = {} + zeros = {} + zscores = {} + + for i in range(self.size): + # zeros are rows or columns having a zero in the diagonal + if i in self._zeros: + continue + for j in range(i + 1, self.size): + if j in self._zeros: + continue + if (not self.hic[index][i * self.size + j] + and remove_zeros): + zeros[(i, j)] = None + continue + values[(i, j)] = self.hic[index][i * self.size + j] + # compute Z-score + if zscored: + zscore(values) + for i in range(self.size): + if i in self._zeros: + continue + for j in range(i + 1, self.size): + if j in self._zeros: + continue + if (i, j) in zeros and remove_zeros: + continue + zscores.setdefault(str(i), {}) + zscores[str(i)][str(j)] = values[(i, j)] + + if len(self._zscores) > index: + self._zscores[index] = zscores + else: + self._zscores.append(zscores) + + def model_region(self, start=1, end=None, n_models=5000, n_keep=1000, + n_cpus=1, verbose=0, close_bins=1, + outfile=None, config=CONFIG, container=None, + tmp_folder=None,timeout_job=10800, + stages=0, initial_conformation=None, connectivity="FENE", + timesteps_per_k=10000, kfactor=1, adaptation_step=False, + cleanup=True, start_seed=1, hide_log=True, remove_rstrn=[], + keep_restart_out_dir=None, restart_path=False, store_n_steps=10, + useColvars=False): + """ + Generates of three-dimensional models using IMP, for a given segment of + chromosome. + + :param 1 start: first bin to model (bin number) + :param None end: last bin to model (bin number). By default goes to the + last bin. + :param 5000 n_models: number of modes to generate + :param 1000 n_keep: number of models used in the final analysis + (usually the top 20% of the generated models). The models are ranked + according to their objective function value (the lower the better) + :param False keep_all: whether or not to keep the discarded models (if + True, models will be stored under tructuralModels.bad_models) + :param 1 close_bins: number of particles away (i.e. the bin number + difference) a particle pair must be in order to be considered as + neighbors (e.g. 1 means consecutive particles) + :param n_cpus: number of CPUs to use + :param 0 verbose: the information printed can be: nothing (0), the + objective function value the selected models (1), the objective + function value of all the models (2), all the modeling + information (3) + :param None container: restrains particle to be within a given object. Can + only be a 'cylinder', which is, in fact a cylinder of a given height to + which are added hemispherical ends. This cylinder is defined by a radius, + its height (with a height of 0 the cylinder becomes a sphere) and the + force applied to the restraint. E.g. for modeling E. coli genome (2 + micrometers length and 0.5 micrometer of width), these values could be + used: ['cylinder', 250, 1500, 50], and for a typical mammalian nuclei + (6 micrometers diameter): ['cylinder', 3000, 0, 50] + :param CONFIG config: a dictionary containing the standard + parameters used to generate the models. The dictionary should + contain the keys kforce, maxdist, upfreq and lowfreq. + Examples can be seen by doing: + + :: + + from taddyn.modelling.HIC_CONFIG import CONFIG + + where CONFIG is a dictionarry of dictionnaries to be passed to this + function: + + :: + + CONFIG = { + # use these paramaters with the Hi-C data from: + 'reference' : 'victor corces dataset 2013', + + # Force applied to the restraints inferred to neighbor particles + 'kforce' : 5, + + # Maximum experimental contact distance + 'maxdist' : 600, # OPTIMIZATION: 500-1200 + + # Minimum and maximum thresholds used to decide which experimental values have to be + # included in the computation of restraints. Z-score values bigger than upfreq + # and less that lowfreq will be include, whereas all the others will be rejected + 'upfreq' : 0.3, # OPTIMIZATION: min/max Z-score + + 'lowfreq' : -0.7 # OPTIMIZATION: min/max Z-score + + # How much space (radius in nm) ocupies a nucleotide + 'scale' : 0.005 + } + :param None tmp_folder: for lammps simulation, path to a temporary file + created during the clustering computation. Default will be created + in /tmp/ folder + :param 10800 timeout_job: maximum seconds a job can run in the multiprocessing + of lammps before is killed + :param 0 stages: index of the hic_data/norm data to model. For lammps a list of + indexes is allowed to perform dynamics between stages + :param None initial_conformation: initial structure for lammps dynamics. + 'random' to compute the initial conformation as a 3D random walk + {[x],[y],[z]} a dictionary containing lists with x,y,x positions, + e.g an IMPModel or LAMMPSModel object + :param True hide_log: do not generate lammps log information + :param FENE connectivity: use FENE for a fene bond or harmonic for harmonic + potential for neighbours + :param True cleanup: delete lammps folder after completion + :param [] remove_rstrn: list of particles which must not have restrains + :param None keep_restart_out_dir: path to write files to restore LAMMPs + session (binary) + :param False restart_path: path to files to restore LAMMPs session (binary) + :param 10 store_n_steps: Integer with number of steps to be saved if + restart_file != False + :param False useColvars: True if you want the restrains to be loaded by colvars + + :returns: a :list of trajectories as dictionaries with x, y, z coordinates. + + """ + if not end: + end = self.size + zscores, values, zeros = self._sub_experiment_zscore(start, end, stages) + coords = {'crm' : self.crm, + 'start': start, + 'end' : end} + allzeros = [True for i in range(end - start + 1)] + for zeros_stg in zeros: + for i in range(end - start + 1): + if i in zeros_stg: + allzeros[i] = False + allzeros = tuple(allzeros) + nloci = end - start + 1 + if verbose: + stderr.write('Preparing to model %s particles\n' % nloci) + + return generate_lammps_models(zscores, self.resolution, nloci, + values=values, n_models=n_models, + outfile=outfile, n_keep=n_keep, n_cpus=n_cpus, + verbose=verbose, first=0, + close_bins=close_bins, config=config, container=container, + coords=coords, zeros=allzeros, + tmp_folder=tmp_folder,timeout_job=timeout_job, + initial_conformation='random' if not initial_conformation \ + else initial_conformation, + connectivity=connectivity, + timesteps_per_k=timesteps_per_k, kfactor=kfactor, + adaptation_step=adaptation_step, cleanup=cleanup, + hide_log=hide_log, initial_seed=start_seed, + remove_rstrn=remove_rstrn, restart_path=restart_path, + keep_restart_out_dir=keep_restart_out_dir, + store_n_steps=store_n_steps, + useColvars=useColvars + ) + class HiC_data(dict): + + def __init__(self, items, size): + self.update(items) + self.__size = size + self._size2 = size**2 + + def __len__(self): + return self.__size + + def __getitem__(self, row_col): + """ + get items + """ + try: + row, col = row_col + pos = row * self.__size + col + if pos > self._size2: + raise IndexError( + 'ERROR: row or column larger than %s' % self.__size) + return self.get(pos, 0) + except TypeError: + if row_col > self._size2: + raise IndexError( + 'ERROR: position %d larger than %s^2' % (row_col, + self.__size)) + return self.get(row_col, 0) + + def __setitem__(self, row_col, val): + """ + set items + """ + try: + row, col = row_col + pos = row * self.__size + col + if pos > self._size2: + print(row, col, pos) + raise IndexError( + 'ERROR: row or column larger than %s' % self.__size) + super().__setitem__(self,pos, val) + except TypeError: + if hasattr(self, '_size2') and row_col > self._size2: + raise IndexError( + 'ERROR: position %d larger than %s^2' % (row_col, + self.__size)) + super().__setitem__(self,row_col, val) diff --git a/tadphys/__init__.py b/tadphys/__init__.py new file mode 100644 index 0000000..e19de3e --- /dev/null +++ b/tadphys/__init__.py @@ -0,0 +1,72 @@ +from future import standard_library +standard_library.install_aliases() +from os import environ +from subprocess import Popen, PIPE, check_call, CalledProcessError + +from tadphys._version import __version__ + +# ## Check if we have X display http://stackoverflow.com/questions/8257385/automatic-detection-of-display-availability-with-matplotlib +# if not "DISPLAY" in environ: +# import matplotlib +# matplotlib.use('Agg') +# else: +# try: +# check_call('python -c "import matplotlib.pyplot as plt; plt.figure()"', +# shell=True, stdout=PIPE, stderr=PIPE) +# except CalledProcessError: +# import matplotlib +# matplotlib.use('Agg') + +def get_dependencies_version(dico=False): + """ + Check versions of TADphys and all dependencies, as well and retrieves system + info. May be used to ensure reproducibility. + :returns: string with description of versions installed + """ + versions = {' TADphys': __version__ + '\n\n'} + + try: + import scipy + versions['scipy'] = scipy.__version__ + except ImportError: + versions['scipy'] = 'Not found' + + try: + import numpy + versions['numpy'] = numpy.__version__ + except ImportError: + versions['numpy'] = 'Not found' + try: + import matplotlib + versions['matplotlib'] = matplotlib.__version__ + except ImportError: + versions['matplotlib'] = 'Not found' + try: + mcl, _ = Popen(['mcl', '--version'], stdout=PIPE, + stderr=PIPE, universal_newlines=True).communicate() + versions['MCL'] = mcl.split()[1] + except: + versions['MCL'] = 'Not found' + + try: + uname, err = Popen(['uname', '-rom'], stdout=PIPE, + stderr=PIPE, universal_newlines=True).communicate() + versions[' Machine'] = uname + except: + versions[' Machine'] = 'Not found' + + if dico: + return versions + else: + return '\n'.join(['%15s : %s' % (k, versions[k]) for k in + sorted(versions.keys())]) + + +from tadphys.chromosome import Chromosome +from tadphys.experiment import Experiment, load_experiment_from_reads +from tadphys.chromosome import load_chromosome +# from taddyn.modelling.structuralmodels import StructuralModels +# from taddyn.modelling.structuralmodels import load_structuralmodels +# from taddyn.utils.hic_parser import load_hic_data_from_reads +# from taddyn.utils.hic_parser import load_hic_data_from_bam +from tadphys.utils.hic_parser import read_matrix diff --git a/taddyn/__init__.pyc b/tadphys/__init__.pyc similarity index 100% rename from taddyn/__init__.pyc rename to tadphys/__init__.pyc diff --git a/tadphys/_version.py b/tadphys/_version.py new file mode 100644 index 0000000..3dc1f76 --- /dev/null +++ b/tadphys/_version.py @@ -0,0 +1 @@ +__version__ = "0.1.0" diff --git a/tadphys/chromosome.py b/tadphys/chromosome.py new file mode 100644 index 0000000..186fd28 --- /dev/null +++ b/tadphys/chromosome.py @@ -0,0 +1,1009 @@ +""" +26 Nov 2012 + +""" + +from string import ascii_lowercase as letters +from copy import deepcopy as copy +from pickle import load, dump +from random import random +from math import sqrt +from sys import stderr +from os.path import exists +import tadphys +from tadphys.experiment import Experiment +# from tadphys.alignment import Alignment, randomization_test + +try: + import matplotlib.pyplot as plt +except ImportError: + stderr.write('matplotlib not found\n') + + +def load_chromosome(in_f, fast=2): + """ + Load a Chromosome object from a file. A Chromosome object can be saved with + the :func:`Chromosome.save_chromosome` function. + + :param in_f: path to a saved Chromosome object file + :param 2 fast: if fast=2 do not load the Hi-C data (in the case that they + were saved in a separate file see :func:`Chromosome.save_chromosome`). + If fast is equal to 1, the weights will be skipped from load to save + memory. Finally if fast=0, both the weights and Hi-C data will be loaded + + :returns: a Chromosome object + + TODO: remove first try/except type error... this is loading old experiments + """ + dico = load(open(in_f)) + name = '' + crm = Chromosome(dico['name']) + try: + exp_order = dico['experiment_order'] + except KeyError: + exp_order = dico['experiments'].keys() + for name in exp_order: + xpr = Experiment(name, dico['experiments'][name]['resolution'], + no_warn=True) + xpr.tads = dico['experiments'][name]['tads'] + xpr.norm = dico['experiments'][name]['wght'] + xpr.hic_data = dico['experiments'][name]['hi-c'] + xpr.conditions = dico['experiments'][name]['cond'] + xpr.size = dico['experiments'][name]['size'] + xpr._zeros = dico['experiments'][name].get('zero', {}) + try: # new in version post-CSDM13 + xpr.identifier = dico['experiments'][name]['iden'] + xpr.cell_type = dico['experiments'][name]['cell'] + xpr.exp_type = dico['experiments'][name]['expt'] + xpr.enzyme = dico['experiments'][name]['enzy'] + xpr.description = dico['experiments'][name]['desc'] + except KeyError: + xpr.identifier = None + xpr.cell_type = None + xpr.exp_type = None + xpr.enzyme = None + xpr.description = {} + try: + crm.experiments.append(xpr) + except TypeError: + continue + crm.size = dico['size'] + crm.r_size = dico['r_size'] + crm.max_tad_size = dico['max_tad_size'] + crm.forbidden = dico['forbidden'] + crm._centromere = dico['_centromere'] + try: # new in version post-CSDM13 + crm.species = dico['species'] + crm.assembly = dico['assembly'] + crm.description = dico['description'] + except KeyError: + crm.species = None + crm.assembly = None + crm.description = {} + if isinstance(dico['experiments'][name]['hi-c'], str) or fast != int(2): + try: + dicp = load(open(in_f + '_hic')) + for name in dico['experiments']: + crm.get_experiment(name).hic_data = dicp[name]['hi-c'] + if fast != 1: + crm.get_experiment(name).norm = dicp[name]['wght'] + except IOError: + try: + for name in dico['experiments']: + crm.get_experiment(name).hic_data = dico['experiments'][name]['hi-c'] + if fast != 1: + crm.get_experiment(name).norm = dico['experiments'][name]['wght'] + except KeyError: + raise Exception('ERROR: file %s not found\n' % ( + dico['experiments'][name]['hi-c'])) + elif not fast: + stderr.write('WARNING: data not saved correctly for fast loading.\n') + return crm + + +class Chromosome(object): + """ + A Chromosome object designed to deal with Topologically Associating Domains + predictions from different experiments, in different cell types for a given + chromosome of DNA, and to compare them. + + :param name: name of the chromosome (might be a chromosome name for example) + :param None species: species name + :param None assembly: version number of the genomic assembly used + :param None resolutions: list of resolutions corresponding to a list of + experiments passed. + :param None experiment_hic_data: :py:func:`list` of paths to files + containing the Hi-C count matrices corresponding to different experiments + :param None experiment_tads: :py:func:`list` of paths to files + containing the definition of TADs corresponding to different experiments + :param None experiment_names: :py:func:`list` of the names of each + experiment + :param infinite max_tad_size: maximum TAD size allowed. TADs longer than + this value will not be considered, and size of the corresponding + chromosome size will be reduced accordingly + :param 0 chr_len: size of the DNA chromosome in bp. By default it will be + inferred from the distribution of TADs + :param None parser: a parser function that returns a tuple of lists + representing the data matrix and the length of a row/column. With + the file example.tsv: + + :: + + chrT_001 chrT_002 chrT_003 chrT_004 + chrT_001 629 164 88 105 + chrT_002 164 612 175 110 + chrT_003 88 175 437 100 + chrT_004 105 110 100 278 + + the output of parser('example.tsv') would be be: + ``[([629, 164, 88, 105, 164, 612, 175, 110, 88, 175, 437, 100, 105, + 110, 100, 278]), 4]`` + :param None kw_descr: any other argument passed would be stored as + complementary descriptive field. For example:: + + crm = Chromosome('19', species='mus musculus', + subspecies='musculus musculus', + skin_color='black') + print crm + + # Chromosome 19: + # 0 experiment loaded: + # 0 alignment loaded: + # species : mus musculus + # assembly version: UNKNOWN + # subspecies : musculus musculus + # skin_color : black + + *note that these fields may appear in the header of generated out files* + + :return: Chromosome object + + + """ + def __init__(self, name, species=None, assembly=None, + experiment_resolutions=None, experiment_tads=None, + experiment_hic_data=None, experiment_norm_data=None, + experiment_names=None, max_tad_size=float('inf'), + chr_len=0, parser=None, centromere_search=False, + silent=False, **kw_descr): + self.name = name + self.size = self._given_size = self.r_size = chr_len + self.size = ChromosomeSize(self.size) + self.max_tad_size = max_tad_size + self.r_size = RelativeChromosomeSize(self.size) + self.forbidden = {} # only used for TAD alignment randomization + self.experiments = ExperimentList([], self) + self._centromere = None + self.alignment = AlignmentDict() + self.description = kw_descr + self.species = species + self.assembly = assembly + + self._search_centromere = centromere_search + if experiment_tads: + for i, handler in enumerate(experiment_tads or []): + name = experiment_names[i] if experiment_names else None + self.add_experiment(name, experiment_resolutions[i], + tad_def=handler, parser=parser) + if experiment_hic_data: + for i, handler in enumerate(experiment_hic_data or []): + name = experiment_names[i] if experiment_names else None + try: + xpr = self.get_experiment(name) + xpr.load_hic_data(handler, silent=silent) + continue + except: + pass + if isinstance(handler, Experiment): + handler.name = name or handler.name + self.experiments.append(handler) + else: + self.add_experiment(name, experiment_resolutions[i], + hic_data=handler, parser=parser, + silent=silent) + if experiment_norm_data: + for i, handler in enumerate(experiment_norm_data or []): + name = experiment_names[i] if experiment_names else None + try: + xpr = self.get_experiment(name) + xpr.load_norm_data(handler, silent=silent) + continue + except: + pass + if isinstance(handler, Experiment): + handler.name = name or handler.name + self.experiments.append(handler) + else: + self.add_experiment(name, experiment_resolutions[i], + norm_data=handler, parser=parser, + silent=silent) + + def __repr__(self): + outstr = 'Chromosome %s:\n' % self.name + outstr += (' %-2s experiment%s loaded: ' % ( + len(self.experiments), 's' * (len(self.experiments) > 0)) + + ', '.join([e.name for e in self.experiments]) + '\n') + outstr += (' %-2s alignment%s loaded: ' % ( + len(self.alignment), 's' * (len(self.alignment) > 0)) + + ', '.join([a.name for a in self.alignment]) + '\n') + try: # new in version post-CSDM13 + outstr += ' species : %s\n' % (self.species or 'UNKNOWN') + outstr += ' assembly version: %s\n' % (self.assembly or 'UNKNOWN') + for desc in self.description: + outstr += ' %-16s: %s\n' % (desc, self.description[desc]) + except AttributeError: + pass + return outstr + + + def _get_forbidden_region(self, xpr, resized=False): + """ + Find the regions for which there is no information in any of the + experiments. This is used to infer the relative chromosome size. + """ + if not xpr.tads: + return + forbidden = [] + for pos in xpr.tads: + start = float(xpr.tads[pos]['start']) + end = float(xpr.tads[pos]['end']) + diff = end - start + if diff * xpr.resolution > self.max_tad_size: + forbidden += range(int(start), int(end+1)) + xpr.tads[pos]['score'] = -abs(xpr.tads[pos]['score']) + else: + xpr.tads[pos]['score'] = abs(xpr.tads[pos]['score']) + if not self.forbidden: + self.forbidden = dict([(f, None) for f in forbidden]) + else: + self.forbidden = dict([(f, None) for f in + set(forbidden).intersection(self.forbidden)]) + # search for centromere: + if self._search_centromere: + self._find_centromere(xpr) + # add centromere as forbidden region: + if self._centromere: + for pos in xrange(int(self._centromere[0]), + int(self._centromere[1])): + self.forbidden[pos] = 'Centromere' + if not resized: + self.__update_size(xpr) + + + def get_experiment(self, name): + """ + Fetch an Experiment according to its name. + This can also be done directly with Chromosome.experiments[name]. + + :param name: name of the experiment to select + :returns: :class:`tadphys.Experiment` + """ + for exp in self.experiments: + if exp.name == name: + return exp + raise Exception(('ERROR: experiment ' + + '%s not found\n') % (name)) + + + def save_chromosome(self, out_f, fast=True, divide=True, force=False): + """ + Save a Chromosome object to a file (it uses :py:func:`pickle.load` from + the :py:mod:`cPickle`). Once saved, the object can be loaded with + :func:`load_chromosome`. + + :param out_f: path to the file where to store the :py:mod:`cPickle` + object + :param True fast: if True, skip Hi-C data and weights + :param True divide: if True writes two pickles, one with what would + result by using the fast option, and the second with the Hi-C and + weights data. The second file name will be extended by '_hic' (ie: + with out_f='chromosome12.pik' we would obtain chromosome12.pik and + chromosome12.pik_hic). When loaded :func:`load_chromosome` will + automatically search for both files + :param False force: overwrite the existing file + + """ + while exists(out_f) and not force: + out_f += '_' + dico = {'experiments': {}, + 'experiment_order': [xpr.name for xpr in self.experiments]} + if divide: + dicp = {} + for xpr in self.experiments: + dico['experiments'][xpr.name] = { + 'size' : xpr.size, + 'cond' : xpr.conditions, + 'tads' : xpr.tads, + 'resolution': xpr.resolution, + 'hi-c' : None, + 'wght' : None, + 'iden' : xpr.identifier, + 'cell' : xpr.cell_type, + 'expt' : xpr.exp_type, + 'enzy' : xpr.enzyme, + 'desc' : xpr.description, + 'zero' : xpr._zeros + } + if fast: + continue + if divide: + dicp[xpr.name] = { + 'wght': xpr.norm, + 'hi-c': xpr.hic_data} + dico['experiments'][xpr.name]['wght'] = None + dico['experiments'][xpr.name]['hi-c'] = None + else: + dico['experiments'][xpr.name]['wght'] = xpr.norm + dico['experiments'][xpr.name]['hi-c'] = xpr.hic_data + dico['name'] = self.name + dico['size'] = self.size + dico['r_size'] = self.r_size + dico['max_tad_size'] = self.max_tad_size + dico['forbidden'] = self.forbidden + dico['_centromere'] = self._centromere + dico['species'] = self.species + dico['assembly'] = self.assembly + dico['description'] = self.description + out = open(out_f, 'w') + dump(dico, out) + out.close() + if not fast and divide: + out = open(out_f + '_hic', 'w') + dump(dicp, out) + out.close() + + # def align_experiments(self, names=None, verbose=False, randomize=False, + # rnd_method='interpolate', rnd_num=1000, + # get_score=False, **kwargs): + # """ + # Align the predicted boundaries of two different experiments. The + # resulting alignment will be stored in the self.experiment list. + + # :param None names: list of names of the experiments to align. If None, + # align all + # :param experiment1: name of the first experiment to align + # :param experiment2: name of the second experiment to align + # :param -0.1 penalty: penalty for inserting a gap in the alignment + # :param 100000 max_dist: maximum distance between two boundaries + # allowing match (100Kb seems fair with HUMAN chromosomes) + # :param False verbose: if True, print some information about the + # alignments + # :param False randomize: check the alignment quality by comparing + # randomized boundaries over Chromosomes of the same size. This will + # return a extra value, the p-value of accepting that the observed + # alignment is not better than a random alignment + # :param False get_score: returns alignemnt object, alignment score and + # percentage of identity from one side and from the other + # :param interpolate rnd_method: by default uses the interpolation of TAD + # distribution. The alternative method is 'shuffle', where TADs are + # simply shuffled + # :param 1000 rnd_num: number of randomizations to do + # :param reciprocal method: if global, Needleman-Wunsch is used to align + # (see :func:`tadphys.boundary_aligner.globally.needleman_wunsch`); + # if reciprocal, a method based on reciprocal closest boundaries is + # used (see :func:`tadphys.boundary_aligner.reciprocally.reciprocal`) + + # :returns: an alignment object or, if the randomizattion was invoked, + # an alignment object, and a list of statistics that are, the alignment + # score, the probability that observed alignment performs better than + # randoms, the proportion of borders from the first experiment found + # aligned in the second experiment and the proportion of borders from + # the second experiment found aligned in the first experiment. + # Returned calues can be catched like this: + + # ali = crm.align_experiments() + + # or, with randomization test: + + # ali, (score, pval, prop1, prop2) = crm.align_experiments(randomize=True) + + # """ + # if names: + # xpers = ExperimentList([self.get_experiment(n) for n in names], + # self) + # else: + # xpers = self.experiments + # tads = [] + # for xpr in xpers: + # if not xpr.tads: + # raise Exception('No TADs defined, use find_tad function.\n') + # tads.append([xpr.tads[x]['brk'] * xpr.resolution for x in xpr.tads + # if xpr.tads[x]['score'] >= 0]) + # (aligneds, score, perc1, perc2), consensus = align(tads, verbose=verbose, **kwargs) + # name = tuple(sorted([x.name for x in xpers])) + # ali = Alignment(name, aligneds, xpers, consensus, score=score) + # self.alignment[name] = ali + # if verbose: + # print self.alignment[name] + # if not randomize: + # if get_score: + # return ali, score, perc1, perc2 + # else: + # return ali + # p_value = randomization_test(xpers, score=score, rnd_method=rnd_method, + # verbose=verbose, r_size=self.r_size, + # num=rnd_num, **kwargs) + # return ali, (score, p_value, perc1, perc2) + + + def add_experiment(self, name, resolution=None, tad_def=None, hic_data=None, + norm_data=None, replace=False, parser=None, + conditions=None, **kwargs): + """ + Add a Hi-C experiment to Chromosome + + :param name: name of the experiment or of the Experiment object + :param resolution: resolution of the experiment (needed if name is not + an Experiment object) + :param None hic_data: whether a file or a list of lists corresponding to + the Hi-C data + :param None tad_def: a file or a dict with precomputed TADs for this + experiment + :param False replace: overwrite the experiments loaded under the same + name + :param None parser: a parser function that returns a tuple of lists + representing the data matrix and the length of a row/column. With + a file example.tsv containing: + + :: + + chrT_001 chrT_002 chrT_003 chrT_004 + chrT_001 629 164 88 105 + chrT_002 164 612 175 110 + chrT_003 88 175 437 100 + chrT_004 105 110 100 278 + + the output of parser('example.tsv') would be: + ``[([629, 164, 88, 105, 164, 612, 175, 110, 88, 175, 437, 100, 105, + 110, 100, 278]), 4]`` + + """ + if not name: + name = ''.join([letters[int(random() * len(letters))] \ + for _ in xrange(5)]) + stderr.write('WARNING: No name provided, random name ' + + 'generated: %s\n' % (name)) + if name in self.experiments: + if 'hi-c' in self.get_experiment(name) and not replace: + stderr.write( + '''WARNING: Hi-C data already loaded under the name: %s. + This experiment will be kept under %s.\n''' % (name, + name + '_')) + name += '_' + if isinstance(name, Experiment): + self.experiments.append(name) + elif resolution: + self.experiments.append(Experiment( + name, resolution, hic_data=hic_data, norm_data=norm_data, + tad_def=tad_def, parser=parser, conditions=conditions, + **kwargs)) + else: + raise Exception('resolution param is needed\n') + + + # def find_tad(self, experiments, name=None, n_cpus=1, + # verbose=True, max_tad_size="max", heuristic=True, + # batch_mode=False, **kwargs): + # """ + # Call the :func:`tadphys.tadbit.tadbit` function to calculate the + # position of Topologically Associated Domain boundaries + + # :param experiment: A square matrix of interaction counts of Hi-C + # data or a list of such matrices for replicated experiments. The + # counts must be evenly sampled and not normalized. 'experiment' + # can be either a list of lists, a path to a file or a file handler + # :param True normalized: if False simple normalization will be computed, + # as well as a simple column filtering will be applied (remove columns + # where value at the diagonal is null) + # :param 1 n_cpus: The number of CPUs to allocate to TADbit. If + # n_cpus='max' the total number of CPUs will be used + # :param max max_tad_size: an integer defining the maximum size of a + # TAD. Default (auto) defines it as the number of rows/columns + # :param True heuristic: whether to use or not some heuristics + # :param False batch_mode: if True, all the experiments will be + # concatenated into one for the search of TADs. The resulting TADs + # found are stored under the name 'batch' plus a concatenation of the + # experiment names passed (e.g.: if experiments=['exp1', 'exp2'], the + # name would be: 'batch_exp1_exp2'). + + # """ + # experiments = experiments or self.experiments + # if not isinstance(experiments, list): + # experiments = [experiments] + # xprs = [] + # for xpr in experiments: + # if not isinstance(xpr, Experiment): + # xpr = self.get_experiment(xpr) + # xprs.append(xpr) + # # if normalized and (not xpr._zeros or not xpr._normalization): + # # raise Exception('ERROR: Experiments should be normalized, and' + + # # ' filtered first') + # if len(xprs) <= 1 and batch_mode: + # raise Exception('ERROR: batch_mode implies that more than one ' + + # 'experiment is passed') + # if batch_mode: + # matrix = [] + # if not name: + # name = 'batch' + # resolution = xprs[0].resolution + # for xpr in sorted(xprs, key=lambda x: x.name): + # if xpr.resolution != resolution: + # raise Exception('All Experiments must have the same ' + + # 'resolution\n') + # matrix.append(xpr.hic_data[0]) + # if name.startswith('batch'): + # name += '_' + xpr.name + # siz = xprs[0].size + # tmp = reduce(lambda x, y: x.__add__(y, silent=True), xprs) + # tmp.filter_columns(silent=kwargs.get('silent', False)) + # remove = tuple([1 if i in tmp._zeros else 0 + # for i in xrange(siz)]) + # result = tadbit(matrix, + # remove=remove, + # n_cpus=n_cpus, verbose=verbose, + # max_tad_size=max_tad_size, + # no_heuristic=not heuristic, **kwargs) + # xpr = Experiment(name, resolution, hic_data=matrix, + # tad_def=result, **kwargs) + # xpr._zeros = xprs[0]._zeros + # for other in xprs[1:]: + # xpr._zeros = dict([(k, None) for k in + # set(xpr._zeros.keys()).intersection( + # other._zeros.keys())]) + # self.add_experiment(xpr) + # return + # for xpr in xprs: + # result = tadbit( + # xpr.hic_data, + # remove=tuple([1 if i in xpr._zeros else 0 for i in + # xrange(xpr.size)]), + # n_cpus=n_cpus, verbose=verbose, + # max_tad_size=max_tad_size, + # no_heuristic=not heuristic, **kwargs) + # xpr.load_tad_def(result) + # self._get_forbidden_region(xpr) + + def __update_size(self, xpr): + """ + Update the chromosome size and relative size after loading new Hi-C + experiments, unless the Chromosome size was defined by hand. + + """ + if not self._given_size: + self.size = max(xpr.tads[max(xpr.tads)]['end'] * xpr.resolution, + self.size) + self.size = ChromosomeSize(self.size) + self._get_forbidden_region(xpr, resized=True) + + self.r_size = self.size - len(self.forbidden) * xpr.resolution + self.r_size = RelativeChromosomeSize(self.size) + + + # def tad_density_plot(self, name, axe=None, focus=None, extras=None, + # normalized=True, savefig=None, shape='ellipse'): + # """ + # Draw an summary of the TAD found in a given experiment and their density + # in terms of relative Hi-C interaction count. + + # :param name: name of the experiment to visualize + # :param None focus: can pass a tuple (bin_start, bin_stop) to display the + # alignment between these genomic bins + # :param None extras: list of coordinates (genomic bin) where to draw a + # red cross + # :param None ymax: limit the y axis up to a given value + # :param ('grey', ): successive colors for alignment + # :param True normalized: normalized Hi-C count are plotted instead of raw + # data. + # :param 'ellipse' shape: which kind of shape to use as schematic + # representation of TADs. Implemented: 'ellipse', 'rectangle', + # 'triangle' + # :param None savefig: path to a file where to save the image generated; + # if None, the image will be shown using matplotlib GUI (the extension + # of the file name will determine the desired format). + # """ + # if not self.experiments[name].tads: + # raise Exception("TAD borders not found\n") + # _tad_density_plot(self.experiments[name], axe=axe, focus=focus, + # extras=extras, normalized=normalized, + # savefig=savefig, shape=shape) + + + # def visualize(self, names=None, tad=None, focus=None, paint_tads=False, + # axe=None, show=True, logarithm=True, normalized=False, + # relative=True, decorate=True, savefig=None, clim=None, + # scale=(8, 6), cmap='jet'): + # """ + # Visualize the matrix of Hi-C interactions of a given experiment + + # :param None names: name of the experiment to visualize, or list of + # experiment names. If None, all experiments will be shown + # :param None tad: a given TAD in the form: + # :: + + # {'start': start, + # 'end' : end, + # 'brk' : end, + # 'score': score} + + # **Alternatively** a list of the TADs can be passed (all the TADs + # between the first and last one passed will be showed. Thus, passing + # more than two TADs might be superfluous) + # :param None focus: a tuple with the start and end positions of the + # region to visualize + # :param False paint_tads: draw a box around the TADs defined for this + # experiment + # :param None axe: an axe object from matplotlib can be passed in order + # to customize the picture + # :param True show: either to pop-up matplotlib image or not + # :param True logarithm: show the logarithm values + # :param True normalized: show the normalized data (weights might have + # been calculated previously). *Note: white rows/columns may appear in + # the matrix displayed; these rows correspond to filtered rows (see* + # :func:`tadphys.utils.hic_filtering.hic_filtering_for_modelling` *)* + # :param True relative: color scale is relative to the whole matrix of + # data, not only to the region displayed + # :param True decorate: draws color bar, title and axes labels + # :param None savefig: path to a file where to save the image generated; + # if None, the image will be shown using matplotlib GUI (the extension + # of the file name will determine the desired format). + # :param None clim: tuple with minimum and maximum value range for color + # scale. I.e. clim=(-4, 10) + # :param 'jet' cmap: color map from matplotlib. Can also be a + # preconfigured cmap object. + # """ + # if names == None: + # names = [xpr.name for xpr in self.experiments] + # if not isinstance(names, list) and not isinstance(names, tuple): + # names = [names] + # cols = 1 + # rows = 1 + # else: + # sqrtxpr = sqrt(len(names)) + # cols = int(round(sqrtxpr + (0.0 if int(sqrtxpr)==sqrtxpr else .5))) + # rows = int(sqrtxpr+.5) + # notaxe = axe == None + # if not scale: + # scale = (8, 6) + # if notaxe and len(names) != 1: + # fig = plt.figure(figsize=(scale[0] * cols, scale[1] * rows)) + # for i in xrange(rows): + # for j in xrange(cols): + # if i * cols + j >= len(names): + # break + # if notaxe and len(names) != 1: + # axe = fig.add_subplot( + # rows, cols, i * cols + j + 1) + # if (isinstance(names[i * cols + j], tuple) or + # isinstance(names[i * cols + j], list)): + # if not axe: + # fig = plt.figure(figsize=(scale[0] * cols, scale[1] * rows)) + # axe = fig.add_subplot( + # rows, cols, i * cols + j + 1) + # xpr1 = self.get_experiment(names[i * cols + j][0]) + # xpr2 = self.get_experiment(names[i * cols + j][1]) + # img = xpr1.view(tad=tad, focus=focus, paint_tads=paint_tads, + # axe=axe, show=False, logarithm=logarithm, + # normalized=normalized, relative=relative, + # decorate=decorate, savefig=False, + # where='up', clim=clim, cmap=cmap) + # img = xpr2.view(tad=tad, focus=focus, paint_tads=paint_tads, + # axe=axe, show=False, logarithm=logarithm, + # normalized=normalized, relative=relative, + # decorate=False, savefig=False, where='down', + # clim=clim or img.get_clim(), cmap=cmap) + # #axe = axe.twinx() + # #axe.set_aspect('equal',adjustable='box-forced',anchor='NE') + # if decorate: + # plt.text(1.01, .5, + # 'Chromosome %s experiment %s' % ( + # self.name, xpr2.name), + # rotation=-90, va='center', size='large', + # ha='left', transform=axe.transAxes) + # else: + # xper = self.get_experiment(names[i * cols + j]) + # if not xper.hic_data and not xper.norm: + # continue + # xper.view(tad=tad, focus=focus, paint_tads=paint_tads, + # axe=axe, show=False, logarithm=logarithm, + # normalized=normalized, relative=relative, + # decorate=decorate, savefig=False, clim=clim, + # cmap=cmap) + # if savefig: + # tadbit_savefig(savefig) + # if show: + # plt.show() + + + # def get_tad_hic(self, tad, x_name, normed=True, matrix_num=0): + # """ + # Retrieve the Hi-C data matrix corresponding to a given TAD. + + # :param tad: a given TAD (:py:class:`dict`) + # :param x_name: name of the experiment + # :param True normed: if True, normalize the Hi-C data + + # :returns: Hi-C data matrix for the given TAD + # """ + # beg, end = int(tad['start']), int(tad['end']) + # xpr = self.get_experiment(x_name) + # size = xpr.size + # matrix = [[0 for _ in xrange(beg, end)]\ + # for _ in xrange(beg, end)] + # for i, tadi in enumerate(xrange(beg, end)): + # tadi = tadi * size + # for j, tadj in enumerate(xrange(beg, end)): + # if normed: + # matrix[j][i] = xpr.norm[0][tadi + tadj] + # else: + # matrix[j][i] = xpr.hic_data[matrix_num][tadi + tadj] + # return matrix + + + # def iter_tads(self, x_name, normed=True): + # """ + # Iterate over the TADs corresponding to a given experiment. + + # :param x_name: name of the experiment + # :param True normed: normalize Hi-C data returned + + # :yields: Hi-C data corresponding to each TAD + # """ + # if not self.get_experiment(x_name).hic_data: + # raise Exception('No Hi-c data for %s experiment\n' % (x_name)) + # for name, ref in self.get_experiment(x_name).tads.iteritems(): + # yield name, self.get_tad_hic(ref, x_name, normed=normed) + + + # def set_max_tad_size(self, value): + # """ + # Change the maximum size allowed for TADs. It also applies to the + # computed experiments. + + # :param value: an int value (default is 5000000) + # """ + # self.max_tad_size = value + # for xpr in self.experiments: + # for tad in xpr.tads: + # xpr.tads[tad]['brk'] = xpr.tads[tad]['end'] + # if ((xpr.tads[tad]['end'] - xpr.tads[tad]['start']) + # * xpr.resolution) > self.max_tad_size: + # xpr.tads[tad]['score'] = -abs(xpr.tads[tad]['score']) + + + # def _find_centromere(self, xpr): + # """ + # Search for the centromere in a chromosome, assuming that + # :class:`Chromosome` corresponds to a real chromosome. + # Add a boundary to all the experiments where the centromere is. + # * A centromere is defined as the largest area where the rows/columns + # of the Hi-C matrix are empty. + # """ + # beg = end = 0 + # size = xpr.size + # try: + # hic = xpr.hic_data[0] + # except TypeError: + # return + # # search for largest empty region of the chromosome + # best = (0, 0, 0) + # pos = 0 + # for pos, raw in enumerate(xrange(0, size * size, size)): + # if sum([hic[i] for i in xrange(raw, raw + size)]) == 0 and not beg: + # beg = float(pos) + # if sum([hic[i] for i in xrange(raw, raw + size)]) != 0 and beg: + # end = float(pos) + # if (end - beg) > best[0]: + # best = ((end - beg), beg, end) + # beg = end = 0 + # # this is for weared cases where centromere is at the end of Hi-C data + # if beg and not end: + # end = float(pos) + # if (end - beg) > best[0]: + # best = ((end - beg), beg, end) + # beg, end = best[1:] + # if not beg or not end: + # return + # tads = xpr.tads + # # if we already have a centromere defined, check if it can be reduced + # if self._centromere: + # if beg > self._centromere[0]: + # # readjust TADs that have been split around the centromere + # for tad in tads: + # if tads[tad]['end'] == self._centromere[0]: + # tads[tad]['end'] = beg + # self._centromere[0] = beg + # if end < self._centromere[1]: + # # readjust TADs that have been split around the centromere + # for tad in tads: + # if tads[tad]['start'] == self._centromere[1]: + # tads[tad]['start'] = end + # self._centromere[1] = end + # else: + # self._centromere = [beg, end] + # # split TADs overlapping with the centromere + # if [True for t in tads.values() \ + # if t['start'] < beg < t['end'] \ + # and t['start'] < end < t['end']]: + # tad = len(tads) + 1 + # plus = 0 + # while tad + plus > 1: + # start = tads[tad - 1 + plus]['start'] + # final = tads[tad - 1 + plus]['end'] + # # centromere found? + # if start < beg < final and start < end < final: + # tads[tad] = copy(tads[tad - 1]) + # tads[tad]['start'] = end + # tads[tad]['score'] = abs(tads[tad]['score']) + # if (tads[tad]['end'] - tads[tad]['start']) \ + # * xpr.resolution > self.max_tad_size: + # xpr.tads[tad]['score'] = -abs(xpr.tads[tad]['score']) + # tads[tad]['brk'] = tads[tad]['end'] + # tad -= 1 + # tads[tad] = copy(tads[tad]) + # tads[tad]['score'] = abs(tads[tad]['score']) + # tads[tad]['end'] = beg + # if (tads[tad]['end'] - tads[tad]['start']) \ + # * xpr.resolution > self.max_tad_size: + # xpr.tads[tad]['score'] = -abs(xpr.tads[tad]['score']) + # tads[tad]['brk'] = tads[tad]['end'] + # plus = 1 + # else: + # tads[tad] = copy(tads[tad - 1 + plus]) + # tad -= 1 + # # if tad includes centromere but starts in the same point + # elif [True for t in tads.values() \ + # if t['start'] == beg and end < t['end']]: + # tad = len(tads) + 1 + # while tad > 1: + # start = tads[tad - 1]['start'] + # final = tads[tad - 1]['end'] + # # centromere found? + # if start == beg: + # tads[tad] = copy(tads[tad - 1]) + # tads[tad]['start'] = end + # tads[tad]['score'] = abs(tads[tad]['score']) + # if (tads[tad]['end'] - tads[tad]['start']) \ + # * xpr.resolution > self.max_tad_size: + # xpr.tads[tad]['score'] = -abs(xpr.tads[tad]['score']) + # else: + # tads[tad] = copy(tads[tad - 1]) + # tad -= 1 + # # if tad includes centromere but ends in the same point + # elif [True for t in tads.values() \ + # if t['end'] == end and beg > t['start']]: + # tad = len(tads) + 1 + # plus = 0 + # while tad + plus > 1: + # start = tads[tad - 1 + plus]['start'] + # final = tads[tad - 1 + plus]['end'] + # # centromere found? + # if final == end: + # tads[tad] = copy(tads[tad - 1]) + # tads[tad]['start'] = beg + # tads[tad]['score'] = abs(tads[tad]['score']) + # if (tads[tad]['end'] - tads[tad]['start']) \ + # * xpr.resolution > self.max_tad_size: + # xpr.tads[tad]['score'] = -abs(xpr.tads[tad]['score']) + # tads[tad]['brk'] = tads[tad]['end'] + # tad -= 1 + # tads[tad] = copy(tads[tad]) + # tads[tad]['score'] = abs(tads[tad]['score']) + # tads[tad]['end'] = beg + # if (tads[tad]['end'] - tads[tad]['start']) \ + # * xpr.resolution > self.max_tad_size: + # xpr.tads[tad]['score'] = -abs(xpr.tads[tad]['score']) + # tads[tad]['brk'] = tads[tad]['end'] + # plus = 1 + # else: + # tads[tad] = copy(tads[tad - 1 + plus]) + # tad -= 1 + + + +class ExperimentList(list): + """ + Inherited from python built in :py:func:`list`, modified for TADbit + :class:`tadphys.Experiment`. + + Mainly, `getitem`, `setitem`, and `append` were modified in order to + be able to search for experiments by index or by name, and to add + experiments simply using Chromosome.experiments.append(Experiment). + + The whole ExperimentList object is linked to a Chromosome instance + (:class:`tadphys.Chromosome`). + + """ + def __init__(self, thing, crm): + super(ExperimentList, self).__init__(thing) + self.crm = crm + + + def __getitem__(self, i): + try: + return super(ExperimentList, self).__getitem__(i) + except TypeError: + for nam in self: + if nam.name == i: + return nam + raise KeyError('Experiment %s not found\n' % (i)) + + + def __setitem__(self, i, exp): + try: + super(ExperimentList, self).__setitem__(i, exp) + exp.crm = self.crm + self.crm._get_forbidden_region(exp) + except TypeError: + for j, nam in enumerate(self): + if nam.name == i: + exp.crm = self.crm + self[j] = exp + self.crm._get_forbidden_region(exp) + break + else: + exp.crm = self.crm + self.append(exp) + self.crm._get_forbidden_region(exp) + + + def __delitem__(self, i): + try: + super(ExperimentList, self).__delitem__(i) + except TypeError: + for j, nam in enumerate(self): + if nam.name == i: + exp = self.pop(j) + del(exp) + break + else: + raise KeyError('Experiment %s not found\n' % (i)) + + + def append(self, exp): + if exp.name in [e.name for e in self]: + self[exp.name] = exp + self.crm._get_forbidden_region(exp) + else: + super(ExperimentList, self).append(exp) + self.crm._get_forbidden_region(exp) + exp.crm = self.crm + + +class AlignmentDict(dict): + """ + :py:func:`dict` of :class:`tadphys.Alignment` + + Modified getitem, setitem, and append in order to be able to search + alignments by index or by name. + + linked to a :class:`tadphys.Chromosome` + """ + + def __getitem__(self, nam): + try: + return super(AlignmentDict, self).__getitem__(tuple(sorted(nam))) + except KeyError: + for i, key in enumerate(self): + if nam == i: + return self[key] + raise TypeError('Alignment %s not found\n' % i) + + +class ChromosomeSize(int): + """ + Simple class inheriting from integer designed to hold chromosome size in + base pairs + """ + def __init__(self, thing): + super(ChromosomeSize, self).__init__(thing) + + +class RelativeChromosomeSize(int): + """ + Relative Chromosome size in base pairs. Equal to Chromosome size minus + forbidden regions (eg: the centromere) + + Only used for TAD alignment randomization. + """ + def __init__(self, thing): + super(RelativeChromosomeSize, self).__init__(thing) diff --git a/tadphys/experiment.py b/tadphys/experiment.py new file mode 100644 index 0000000..1d5307a --- /dev/null +++ b/tadphys/experiment.py @@ -0,0 +1,1745 @@ +""" +20 Feb 2013 + + +""" + +from copy import deepcopy as copy +from sys import stderr +from warnings import warn +from math import isnan +from numpy import log2, array +from tadphys.modelling.HIC_CONFIG import CONFIG +from tadphys.utils.hic_parser import read_matrix +from tadphys.utils.extraviews import nicer +from tadphys.utils.tadmaths import zscore, nozero_log_matrix +from tadphys.utils.hic_filtering import hic_filtering_for_modelling +from collections import OrderedDict + +try: + from tadphys.modelling.lammps_modelling import generate_lammps_models +except ImportError: + pass + +try: + import matplotlib.pyplot as plt + from matplotlib.cm import jet +except ImportError: + stderr.write('matplotlib not found\n') + + +def load_experiment_from_reads(name, fnam, genome_seq, resolution, + conditions=None, identifier=None, cell_type=None, + enzyme=None, exp_type='Hi-C', **kw_descr): + """ + Loads an experiment object from TADbit-generated read files, that are lists + of pairs of reads mapped to a reference genome. + + :param fnam: tsv file with reads1 and reads2 + :param name: name of the experiment + :param resolution: the resolution of the experiment (size of a bin in + bases) + :param None identifier: some identifier relative to the Hi-C data + :param None cell_type: cell type on which the experiment was done + :param None enzyme: restriction enzyme used in the Hi-C experiment + :param Hi-C exp_type: name of the experiment used (currently only Hi-C is + supported) + :param None conditions: :py:func:`list` of experimental conditions, e.g. + the cell type, the enzyme... (i.e.: ['HindIII', 'cortex', 'treatment']). + This parameter may be used to compare the effect of this conditions on + the TADs + :param None kw_descr: any other argument passed would be stored as + complementary descriptive field. For example:: + + exp = Experiment('k562_rep2', resolution=100000, + identifier='SRX015263', cell_type='K562', + enzyme='HindIII', cylce='synchronized') + print exp + + # Experiment k562_rep2: + # resolution : 100Kb + # TADs : None + # Hi-C rows : None + # normalized : None + # identifier : SRX015263 + # cell type : K562 + # restriction enzyme: HindIII + # cylce : synchronized + + *note that these fields may appear in the header of generated out files* + """ + size = 0 + section_sizes = {} + sections = [] + for crm in genome_seq: + len_crm = int(float(len(genome_seq[crm])) / resolution + 1) + section_sizes[(crm,)] = len_crm + size += len_crm + 1 + sections.extend([(crm, '%04d' % i) for i in xrange(len_crm + 1)]) + imx = HiC_data((), size) + dict_sec = dict([(j, i) for i, j in enumerate(sections)]) + for line in open(fnam): + _, cr1, ps1, _, _, _, _, cr2, ps2, _ = line.split('\t', 9) + ps1 = dict_sec[(cr1, '%04d' % (int(ps1) / resolution))] + ps2 = dict_sec[(cr2, '%04d' % (int(ps2) / resolution))] + imx[ps1 + ps2 * size] += 1 + imx[ps2 + ps1 * size] += 1 + + return Experiment(name, resolution=resolution, hic_data=imx, + conditions=conditions, identifier=identifier, + cell_type=cell_type, enzyme=enzyme, exp_type=exp_type, + **kw_descr) + +class Experiment(object): + """ + Hi-C experiment. + + :param name: name of the experiment + :param resolution: the resolution of the experiment (size of a bin in + bases) + :param None identifier: some identifier relative to the Hi-C data + :param None cell_type: cell type on which the experiment was done + :param None enzyme: restriction enzyme used in the Hi-C experiment + :param Hi-C exp_type: name of the experiment used (currently only Hi-C is + supported) + :param None hic_data: whether a file or a list of lists corresponding to + the Hi-C data + :param None tad_def: a file or a dict with precomputed TADs for this + experiment + :param None parser: a parser function that returns a tuple of lists + representing the data matrix and the length of a row/column. With + the file example.tsv: + + :: + + chrT_001 chrT_002 chrT_003 chrT_004 + chrT_001 629 164 88 105 + chrT_002 164 612 175 110 + chrT_003 88 175 437 100 + chrT_004 105 110 100 278 + + the output of parser('example.tsv') would be be: + ``[([629, 164, 88, 105, 164, 612, 175, 110, 88, 175, 437, 100, 105, + 110, 100, 278]), 4]`` + :param None conditions: :py:func:`list` of experimental conditions, e.g. + the cell type, the enzyme... (i.e.: ['HindIII', 'cortex', 'treatment']). + This parameter may be used to compare the effect of this conditions on + the TADs + :param True filter_columns: filter the columns with unexpectedly high + content of low values + :param None kw_descr: any other argument passed would be stored as + complementary descriptive field. For example:: + + exp = Experiment('k562_rep2', resolution=100000, + identifier='SRX015263', cell_type='K562', + enzyme='HindIII', cylce='synchronized') + print exp + + # Experiment k562_rep2: + # resolution : 100Kb + # TADs : None + # Hi-C rows : None + # normalized : None + # identifier : SRX015263 + # cell type : K562 + # restriction enzyme: HindIII + # cylce : synchronized + + *note that these fields may appear in the header of generated out files* + + TODO: doc conditions + TODO: normalization + """ + + + def __init__(self, name, resolution, hic_data=None, norm_data=None, + tad_def=None, parser=None, no_warn=False, weights=None, + conditions=None, identifier=None, + cell_type=None, enzyme=None, exp_type='Hi-C', **kw_descr): + self.name = name + self.resolution = resolution + self.identifier = identifier + self.cell_type = cell_type + self.enzyme = enzyme + self.description = kw_descr + self.exp_type = exp_type + self.crm = None + self._ori_resolution = resolution + self.hic_data = None + self._ori_hic = None + self._ori_norm = None + self._ori_size = None + self.conditions = sorted(conditions) if conditions else [] + self.size = None + self.tads = {} + self.norm = None + self.bias = None + self._normalization = None + self._filtered_cols = False + self._zeros = [] + self._zscores = [] + if hic_data: + self.load_hic_data(hic_data, parser, **kw_descr) + if norm_data: + self.load_norm_data(norm_data, parser, **kw_descr) + if tad_def: + self.load_tad_def(tad_def, weights=weights) + elif not hic_data and not no_warn and not norm_data: + stderr.write('WARNING: this is an empty shell, no data here.\n') + + + def __repr__(self): + return 'Experiment %s (resolution: %s, TADs: %s, Hi-C rows: %s, normalized: %s)' % ( + self.name, nicer(self.resolution), len(self.tads) or None, + self.size, self._normalization if self._normalization else 'None') + + + def __str__(self): + outstr = 'Experiment %s:\n' % (self.name) + outstr += ' resolution : %s\n' % (nicer(self.resolution)) + outstr += ' TADs : %s\n' % (len(self.tads) or None) + outstr += ' Hi-C rows : %s\n' % (self.size) + outstr += ' normalized : %s\n' % (self._normalization or None) + ukw = 'UNKNOWN' + try: # new in version post-CSDM13 + outstr += ' identifier : %s\n' % (self.identifier or ukw) + outstr += ' cell type : %s\n' % (self.cell_type or ukw) + outstr += ' restriction enzyme: %s\n' % (self.enzyme or ukw) + for desc in self.description: + outstr += ' %-18s: %s\n' % (desc, self.description[desc]) + except AttributeError: + pass + return outstr + + + def __add__(self, other, silent=False): + """ + sum Hi-C data of experiments into a new one. + """ + reso1, reso2 = self.resolution, other.resolution + if self.resolution == other.resolution: + resolution = self.resolution + changed_reso = False + else: + resolution = max(reso1, reso2) + self.set_resolution(resolution) + other.set_resolution(resolution) + if not silent: + stderr.write('WARNING: experiments of different resolution, ' + + 'setting both resolution of %s, and normalizing ' + + 'at this resolution\n' % (resolution)) + norm1 = copy(self.norm) + norm2 = copy(other.norm) + if self._normalization: + self.normalize_hic() + if other._normalization: + other.normalize_hic() + changed_reso = True + if self.hic_data: + new_hicdata = HiC_data([], size=self.size) + for i in self.hic_data[0]: + new_hicdata[i] = self.hic_data[0].get(i) + for i in other.hic_data[0]: + new_hicdata[i] += other.hic_data[0].get(i) + else: + new_hicdata = None + xpr = Experiment(name='%s+%s' % (self.name, other.name), + resolution=resolution, + hic_data=new_hicdata, no_warn=True) + # check if both experiments are normalized with the same method + # and sum both normalized data + if self._normalization != None and other._normalization != None: + if (self._normalization.split('_factor:')[0] == + other._normalization.split('_factor:')[0]): + xpr.norm = [HiC_data([], size=self.size)] + for i in self.norm[0]: + xpr.norm[0][i] = self.norm[0].get(i) + for i in other.norm[0]: + xpr.norm[0][i] += other.norm[0].get(i) + # The final value of the factor should be the sum of each + try: + xpr._normalization = ( + self._normalization.split('_factor:')[0] + + '_factor:' + + str(int(self._normalization.split('_factor:')[1]) + + int(other._normalization.split('_factor:')[1]))) + except IndexError: # no factor there + xpr._normalization = (self._normalization) + elif self.norm or other.norm: + try: + if (self.norm[0] or other.norm[0]) != {}: + if not silent: + raise Exception('ERROR: normalization differs between' + + ' each experiment\n') + else: + if not silent: + stderr.write('WARNING: experiments should be ' + + 'normalized before being summed\n') + except TypeError: + if not silent: + stderr.write('WARNING: experiments should be normalized ' + + 'before being summed\n') + else: + if not silent: + stderr.write('WARNING: experiments should be normalized ' + + 'before being summed\n') + if changed_reso: + self.set_resolution(reso1) + self.norm = norm1 + other.set_resolution(reso2) + other.norm = norm2 + xpr.crm = self.crm + if not xpr.size: + xpr.size = len(xpr.norm[0]) + + def __merge(own, fgn): + "internal function to merge descriptions" + if own == fgn: + return own + return '%s+%s' % (own , fgn) + + xpr.identifier = __merge(self.identifier , other.identifier ) + xpr.cell_type = __merge(self.cell_type , other.cell_type ) + xpr.enzyme = __merge(self.enzyme , other.enzyme ) + xpr.description = __merge(self.description, other.description) + xpr.exp_type = __merge(self.exp_type , other.exp_type ) + + for des in self.description: + if not des in other.description: + continue + xpr.description[des] = __merge(self.description[des], + other.description[des]) + return xpr + + + def __div__(self, other, silent=False): + """ + sum Hi-C data of experiments into a new one. + """ + reso1, reso2 = self.resolution, other.resolution + if self.resolution == other.resolution: + resolution = self.resolution + changed_reso = False + else: + resolution = max(reso1, reso2) + self.set_resolution(resolution) + other.set_resolution(resolution) + if not silent: + stderr.write('WARNING: experiments of different resolution, ' + + 'setting both resolution of %s, and normalizing ' + + 'at this resolution\n' % (resolution)) + norm1 = copy(self.norm) + norm2 = copy(other.norm) + if self._normalization: + self.normalize_hic() + if other._normalization: + other.normalize_hic() + changed_reso = True + if self.hic_data: + new_hicdata = HiC_data([], size=self.size) + for i in self.hic_data[0]: + new_hicdata[i] = self.hic_data[0].get(i) + for i in other.hic_data[0]: + try: + new_hicdata[i] /= other.hic_data[0].get(i) + except ZeroDivisionError: + new_hicdata[i] = float('NaN') + else: + new_hicdata = None + xpr = Experiment(name='%s/%s' % (self.name, other.name), + resolution=resolution, + hic_data=new_hicdata, no_warn=True) + # check if both experiments are normalized with the same method + # and sum both normalized data + if self._normalization != None and other._normalization != None: + if (self._normalization.split('_factor:')[0] == + other._normalization.split('_factor:')[0]): + xpr.norm = [HiC_data([], size=self.size)] + for i in self.norm[0]: + xpr.norm[0][i] = self.norm[0].get(i) + for i in other.norm[0]: + try: + xpr.norm[0][i] /= other.norm[0].get(i) + except ZeroDivisionError: + xpr.norm[0][i] = float('NaN') + # The final value of the factor should be the same of each + try: + xpr._normalization = ( + self._normalization.split('_factor:')[0] + + '_factor:' + + str(int(self._normalization.split('_factor:')[1]) + + int(other._normalization.split('_factor:')[1]))) / 2 + except IndexError: # no factor there + xpr._normalization = (self._normalization) + elif self.norm or other.norm: + try: + if (self.norm[0] or other.norm[0]) != {}: + if not silent: + raise Exception('ERROR: normalization differs between' + + ' each experiment\n') + except TypeError: + pass + if changed_reso: + self.set_resolution(reso1) + self.norm = norm1 + other.set_resolution(reso2) + other.norm = norm2 + xpr.crm = self.crm + if not xpr.size: + xpr.size = len(xpr.norm[0]) + + + + def __merge(own, fgn): + "internal function to merge descriptions" + if own == fgn: + return own + return '%s+%s' % (own , fgn) + + xpr.identifier = __merge(self.identifier , other.identifier ) + xpr.cell_type = __merge(self.cell_type , other.cell_type ) + xpr.enzyme = __merge(self.enzyme , other.enzyme ) + xpr.description = __merge(self.description, other.description) + xpr.exp_type = __merge(self.exp_type , other.exp_type ) + + for des in self.description: + if not des in other.description: + continue + xpr.description[des] = __merge(self.description[des], + other.description[des]) + return xpr + + + + def set_resolution(self, resolution, keep_original=True): + """ + Set a new value for the resolution. Copy the original data into + Experiment._ori_hic and replace the Experiment.hic_data + with the data corresponding to new data + (:func:`tadphys.Chromosome.compare_condition`). + + :param resolution: an integer representing the resolution. This number + must be a multiple of the original resolution, and higher than it + :param True keep_original: either to keep or not the original data + + """ + if resolution < self._ori_resolution: + raise Exception('New resolution might be higher than original.') + if resolution % self._ori_resolution: + raise Exception('New resolution might be a multiple original.\n' + + ' otherwise it is too complicated for me :P') + if resolution == self.resolution: + return + # if we want to go back to original resolution + if resolution == self._ori_resolution: + self.hic_data = self._ori_hic + self.norm = self._ori_norm + self.size = self._ori_size + self.resolution = self._ori_resolution + return + # if current resolution is the original one + if self.resolution == self._ori_resolution: + if self.hic_data: + self._ori_hic = copy(self.hic_data) + if self.norm: + self._ori_norm = self.norm[:] + # change the factor value in normalization description + try: + self._normalization = ( + self._normalization.split('_factor:')[0] + + '_factor:'+ + str(int(self._normalization.split('factor:')[1]) + * (resolution / self.resolution))) + except IndexError: # no factor there + pass + self.resolution = resolution + fact = self.resolution / self._ori_resolution + # super for! + try: + size = len(self._ori_hic[0]) + except TypeError: + size = len(self._ori_norm[0]) + self.size = size / fact + rest = size % fact + if rest: + self.size += 1 + self.hic_data = [HiC_data([], size / fact + (1 if rest else 0))] + self.norm = [HiC_data([], size / fact + (1 if rest else 0))] + def resize(mtrx, copee): + "resize both hic_data and normalized data" + for i in xrange(0, size, fact): + for j in xrange(0, size, fact): + val = 0 + for k in xrange(fact): + if i + k >= size: + break + for l in xrange(fact): + if j + l >= size: + break + val += copee[(i + k) * size + j + l] + if val: + mtrx[i/fact * self.size + j/fact] = val + try: + resize(self.hic_data[0], self._ori_hic[0]) + except TypeError: + pass + try: + resize(self.norm[0], self._ori_norm[0]) + except TypeError: + pass + # we need to recalculate zeros: + if self._filtered_cols: + stderr.write('WARNING: definition of filtered columns lost at ' + + 'this resolution\n') + self._filtered_cols = False + if not keep_original: + del(self._ori_hic) + del(self._ori_norm) + + + def filter_columns(self, silent=False, draw_hist=False, savefig=None, + diagonal=True, perc_zero=90, auto=True, min_count=None, + index=0): + """ + Call filtering function, to remove artifactual columns in a given Hi-C + matrix. This function will detect columns with very low interaction + counts; columns passing through a cell with no interaction in the + diagonal; and columns with NaN values (in this case NaN will be replaced + by zero in the original Hi-C data matrix). Filtered out columns will be + stored in the dictionary Experiment._zeros. + + :param False silent: does not warn for removed columns + :param False draw_hist: shows the distribution of mean values by column + the polynomial fit, and the cut applied. + :param None savefig: path to a file where to save the image generated; + if None, the image will be shown using matplotlib GUI (the extension + of the file name will determine the desired format). + :param True diagonal: remove row/columns with zero in the diagonal + :param 90 perc_zero: maximum percentage of cells with no interactions + allowed. + :param None min_count: minimum number of reads mapped to a bin (recommended + value could be 2500). If set this option overrides the perc_zero + filtering... This option is slightly slower. + :param True auto: if False, only filters based on the given percentage + zeros + :param 0 index: hic_data index to normalize + + """ + try: + data = self.hic_data[index] + ssize = self.hic_data[index]['size'] + except: + data = self.norm[index] + ssize = self.norm[index]['size'] + diagonal = True + self._zeros[index], has_nans = hic_filtering_for_modelling( + data, silent=silent, draw_hist=draw_hist, savefig=savefig, + diagonal=diagonal, perc_zero=perc_zero, auto=auto, + min_count=min_count) + if has_nans: # to make it simple + size2 = ssize**2 + for i in xrange(ssize): + if repr(self.hic_data[index].get(i, 0)) == 'nan': + del(self.hic_data[index][i]) + # Also remove columns where there is no data in the diagonal + # size = self.size + # else: + # self._zeros.update(dict([(i, None) for i in xrange(size) + # if not self.norm[0][i * size + i]])) + self._filtered_cols = True + + + def load_hic_data(self, hic_data, parser=None, wanted_resolution=None, + data_resolution=None, silent=False, **kwargs): + """ + Add a Hi-C experiment to the Chromosome object. + + :param None hic_data: whether a file or a list of lists corresponding to + the Hi-C data + :param name: name of the experiment + :param False force: overwrite the experiments loaded under the same + name + :param None parser: a parser function that returns a tuple of lists + representing the data matrix and the length of a row/column. + With the file example.tsv: + + :: + + chrT_001 chrT_002 chrT_003 chrT_004 + chrT_001 629 164 88 105 + chrT_002 86 612 175 110 + chrT_003 159 216 437 105 + chrT_004 100 111 146 278 + + the output of parser('example.tsv') would be: + ``[([629, 86, 159, 100, 164, 612, 216, 111, 88, 175, 437, 146, 105, + 110, 105, 278]), 4]`` + :param None resolution: resolution of the experiment in the file; it + will be adjusted to the resolution of the experiment. By default the + file is expected to contain a Hi-C experiment with the same resolution + as the :class:`tadphys.Experiment` created, and no change is made + :param True filter_columns: filter the columns with unexpectedly high + content of low values + :param False silent: does not warn for removed columns + + """ + self.hic_data = read_matrix(hic_data, parser=parser, one=False) + self._ori_size = self.size = self.hic_data[0]['size'] + self._ori_resolution = self.resolution = (data_resolution or + self._ori_resolution) + wanted_resolution = wanted_resolution or self.resolution + self.set_resolution(wanted_resolution, keep_original=False) + self._zeros = [] + for nmatrix in xrange(len(self.hic_data)): + ssize = self.hic_data[nmatrix]['size'] + self._zeros.append({}) + for index in xrange(ssize): + if self.hic_data[nmatrix][index].bads: + self._zeros[nmatrix][index] = self.hic_data[nmatrix][index].bads + self._filtered_cols = True + + def load_norm_data(self, norm_data, parser=None, resolution=None, + normalization='visibility', **kwargs): + """ + Add a normalized Hi-C experiment to the Chromosome object. + + :param None norm_data: whether a file or a list of lists corresponding to + the normalized Hi-C data + :param name: name of the experiment + :param False force: overwrite the experiments loaded under the same + name + :param None parser: a parser function that returns a tuple of lists + representing the data matrix and the length of a row/column. + With the file example.tsv: + + :: + + chrT_001 chrT_002 chrT_003 chrT_004 + chrT_001 12.5 164 8.8 0.5 + chrT_002 8.6 61.2 1.5 1.1 + chrT_003 15.9 21.6 3.7 0.5 + chrT_004 0.0 1.1 1.6 2.8 + + :param None resolution: resolution of the experiment in the file; it + will be adjusted to the resolution of the experiment. By default the + file is expected to contain a Hi-C experiment with the same resolution + as the :class:`tadphys.Experiment` created, and no change is made + :param True filter_columns: filter the columns with unexpectedly high + content of low values + :param False silent: does not warn for removed columns + + """ + self.norm = read_matrix(norm_data, parser=parser, hic=False, one=False) + self._ori_size = self.size = self.norm[0]['size'] + self._ori_resolution = self.resolution = resolution or self._ori_resolution + if not self._zeros: # in case we do not have original Hi-C data + self._zeros = [] + for nmatrix in xrange(len(self.norm)): + ssize = self.norm[nmatrix]['size'] + self._zeros.append({}) + for i in xrange(ssize): + if all([isnan(j) for j in + [self.norm[nmatrix]['matrix'].get(k, float('NaN')) for k in + xrange(i * ssize, + i * ssize + ssize)]]): + self._zeros[nmatrix][i] = None + # remove NaNs, we do not need them as we have zeroes + for nmatrix in xrange(len(self.norm)): + for i in self.norm[nmatrix]['matrix'].keys(): + if isnan(self.norm[nmatrix]['matrix'][i]): + del(self.norm[nmatrix]['matrix'][i]) + self._normalization = normalization + + # this part remains to be fixed, since i dont understand its use + # ALERT + # ALERT ###################################################################################### + # for nmatrix in xrange(len(self.norm)): + # for index in xrange(len(self.norm[nmatrix]['matrix'])): + # print 'la' + # if self.norm[nmatrix]['matrix'][index].bads: + # self._zeros[nmatrix]['matrix'][index] = self.norm[nmatrix]['matrix'][index].bads + # self._filtered_cols = True + + + # def load_tad_def(self, tad_def, index=0, weights=None): + # """ + # Add the Topologically Associated Domains definition detection to Slice + + # :param None tad_def: a file or a dict with pre-computed TADs for this + # experiment + # :param None name: name of the experiment, if None f_name will be used + # :param 0 index: hic_data index to use + # :param None weights: Store information about the weights, corresponding + # to the normalization of the Hi-C data (see TADbit function + # documentation) + + # """ + # tads, norm = parse_tads(tad_def) + # last = max(tads.keys()) + # if not self.size: + # self.size = tads[last]['end'] + # self.tads = tads + # if not self.norm: + # self.norm = weights or norm + # if self.norm: + # self._normalization = 'visibility' + # if self._normalization: + # norms = self.norm[index] + # elif self.hic_data: + # norms = self.hic_data[index] + # else: + # warn("WARNING: raw Hi-C data not available, " + + # "TAD's height fixed to 1") + # norms = None + # diags = [] + # siz = self.size + # sp1 = siz + 1 + # zeros = self._zeros[index] or {} + # if norms: + # for k in xrange(1, siz): + # s_k = siz * k + # diags.append(sum([norms[i * sp1 + s_k] + # if not (i in zeros + # or (i + k) in zeros) else 1. # 1 is the mean + # for i in xrange(siz - k)]) / (siz - k)) + # for tad in tads: + # start, end = (int(tads[tad]['start']) + 1, + # int(tads[tad]['end']) + 1) + # if norms: + # matrix = sum([norms[i + siz * j] + # if not (i in zeros + # or j in zeros) else 1. + # for i in xrange(start - 1, end - 1) + # for j in xrange(i + 1, end - 1)]) + # try: + # if norms: + # height = float(matrix) / sum( + # [diags[i-1] * (end - start - i) + # for i in xrange(1, end - start)]) + # else: + # height = tads[tad].get('height', 1.0) + # except ZeroDivisionError: + # height = 0. + # tads[tad]['height'] = height + + + # def normalize_hic(self, factor=1, iterations=0, max_dev=0.1, silent=False, + # rowsums=None, index=0): + # """ + # Normalize the Hi-C data. This normalization step does the same of + # the :func:`tadphys.tadbit.tadbit` function (default parameters), + + # It fills the Experiment.norm variable with the Hi-C values divided by + # the calculated weight. + + # The weight of a given cell in column i and row j corresponds to the + # square root of the product of the sum of column i by the sum of row + # j. + + # normalization is done according to this formula: + + # .. math:: + + # weight_{(I,J)} = \\frac{\\sum^N_{j=0}\\sum^N_{i=0}(matrix(i,j))} + # {\\sum^N_{i=0}(matrix(i,J)) \\times \\sum^N_{j=0}(matrix(I,j))} + + # with N being the number or rows/columns of the Hi-C matrix in both + # cases. + + # :param 1 factor: final mean number of normalized interactions wanted + # per cell + # :param False silent: does not warn when overwriting weights + # :param None rowsums: input a list of rowsums calculated elsewhere + # :param 0 index: hic_data index to normalize + # """ + + # if not self.hic_data or len(self.hic_data) <= index: + # raise Exception('ERROR: No Hi-C data loaded\n') + # if self.norm and len(self.norm) > index and not silent: + # stderr.write('WARNING: removing previous weights\n') + # size = self.size + # self.bias = iterative(self.hic_data[index], iterations=iterations, + # max_dev=max_dev, bads=self._zeros[index], + # verbose=not silent) + # norm = HiC_data([(i + j * size, float(self.hic_data[index][i, j]) / + # self.bias[i] / + # self.bias[j] * size) + # for i in self.bias for j in self.bias], size) + # if not self.norm: + # self.norm = [norm] + # elif len(self.norm) > index: + # self.norm[index] = norm + # else: + # self.norm.append(norm) + + # # no need to use lists, tuples use less memory + # if factor: + # self._normalization = 'visibility_factor:' + str(factor) + # factor = sum(self.norm[0].values()) / (self.size * self.size * factor) + # for n in self.norm[0]: + # self.norm[0][n] = self.norm[0][n] / factor + # else: + # self._normalization = 'visibility' + + + def get_hic_zscores(self, normalized=True, zscored=True, remove_zeros=True, index=0): + """ + Normalize the Hi-C raw data. The result will be stored into + the private Experiment._zscore list. + + :param True normalized: whether to normalize the result using the + weights (see :func:`normalize_hic`) + :param True zscored: calculate the z-score of the data + :param False remove_zeros: remove null interactions. Dangerous, null + interaction are informative. + :param 0 index: hic_data index or norm index from where to produce the zscores + + """ + values = {} + zeros = {} + if not normalized and (not self.hic_data or len(self.hic_data) <= index): + raise Exception('ERROR: No Hi-C data loaded\n') + if normalized and (not self.norm or len(self.norm) <= index): + raise Exception('ERROR: No normalized data loaded\n') + zscores = {} + if normalized: + ssize = self.size + for i in xrange(self.size): + # zeros are rows or columns having a zero in the diagonal + if i in self._zeros: + continue + for j in xrange(i + 1, ssize): + if j in self._zeros: + continue + if (not self.norm[index]['matrix'].get(i * ssize + j, 0) + and remove_zeros): + zeros[(i, j)] = None + continue + values[(i, j)] = self.norm[index]['matrix'].get(i * ssize + j, 0) + else: + ssize = self.size + for i in xrange(ssize): + if i in self._zeros[index]: + continue + for j in xrange(i + 1, ssize): + if j in self._zeros[index]: + continue + values[(i, j)] = self.hic_data[index]['matrix'].get(i * ssize + j, 0) + # compute Z-score + if zscored: + zscore(values) + for i in xrange(ssize): + if i in self._zeros: + continue + for j in xrange(i + 1, ssize): + if j in self._zeros: + continue + if (i, j) in zeros and remove_zeros: + continue + zscores.setdefault(str(i), {}) + zscores[str(i)][str(j)] = values[(i, j)] + + if len(self._zscores) > index: + self._zscores[index] = zscores + else: + self._zscores.append(zscores) + + def model_region(self, start=1, end=None, n_models=5000, n_keep=1000, + n_cpus=1, verbose=0, keep_all=False, close_bins=1, + outfile=None, config=CONFIG, container=None, + tool='imp',tmp_folder=None,timeout_job=10800, + stages=0, initial_conformation=None, connectivity="FENE", + timesteps_per_k=10000, kfactor=1, adaptation_step=False, + cleanup=True, single_particle_restraints=None, use_HiC=True, + start_seed=1, hide_log=True, remove_rstrn=[], + keep_restart_out_dir=None, + restart_path=False, store_n_steps=10, + useColvars=False): + """ + Generates of three-dimensional models using IMP, for a given segment of + chromosome. + + :param 1 start: first bin to model (bin number) + :param None end: last bin to model (bin number). By default goes to the + last bin. + :param 5000 n_models: number of modes to generate + :param 1000 n_keep: number of models used in the final analysis + (usually the top 20% of the generated models). The models are ranked + according to their objective function value (the lower the better) + :param False keep_all: whether or not to keep the discarded models (if + True, models will be stored under tructuralModels.bad_models) + :param 1 close_bins: number of particles away (i.e. the bin number + difference) a particle pair must be in order to be considered as + neighbors (e.g. 1 means consecutive particles) + :param n_cpus: number of CPUs to use + :param 0 verbose: the information printed can be: nothing (0), the + objective function value the selected models (1), the objective + function value of all the models (2), all the modeling + information (3) + :param None container: restrains particle to be within a given object. Can + only be a 'cylinder', which is, in fact a cylinder of a given height to + which are added hemispherical ends. This cylinder is defined by a radius, + its height (with a height of 0 the cylinder becomes a sphere) and the + force applied to the restraint. E.g. for modeling E. coli genome (2 + micrometers length and 0.5 micrometer of width), these values could be + used: ['cylinder', 250, 1500, 50], and for a typical mammalian nuclei + (6 micrometers diameter): ['cylinder', 3000, 0, 50] + :param CONFIG config: a dictionary containing the standard + parameters used to generate the models. The dictionary should + contain the keys kforce, maxdist, upfreq and lowfreq. + Examples can be seen by doing: + + :: + + from tadphys.imp.CONFIG import CONFIG + + where CONFIG is a dictionarry of dictionnaries to be passed to this + function: + + :: + + CONFIG = { + # use these paramaters with the Hi-C data from: + 'reference' : 'victor corces dataset 2013', + + # Force applied to the restraints inferred to neighbor particles + 'kforce' : 5, + + # Maximum experimental contact distance + 'maxdist' : 600, # OPTIMIZATION: 500-1200 + + # Minimum and maximum thresholds used to decide which experimental values have to be + # included in the computation of restraints. Z-score values bigger than upfreq + # and less that lowfreq will be include, whereas all the others will be rejected + 'upfreq' : 0.3, # OPTIMIZATION: min/max Z-score + + 'lowfreq' : -0.7 # OPTIMIZATION: min/max Z-score + + # How much space (radius in nm) ocupies a nucleotide + 'scale' : 0.005 + } + :param imp tool: use imp for montecarlo simulated annealing or lammps for + molecular dynamics + :param None tmp_folder: for lammps simulation, path to a temporary file + created during the clustering computation. Default will be created + in /tmp/ folder + :param 10800 timeout_job: maximum seconds a job can run in the multiprocessing + of lammps before is killed + :param 0 stages: index of the hic_data/norm data to model. For lammps a list of + indexes is allowed to perform dynamics between stages + :param tadbit initial_conformation: initial structure for lammps dynamics. + 'tadbit' to compute the initial conformation with montecarlo simulated annealing + 'random' to compute the initial conformation as a 3D random walk + {[x],[y],[z]} a dictionary containing lists with x,y,x positions, + e.g an IMPModel or LAMMPSModel object + :param True hide_log: do not generate lammps log information + :param FENE connectivity: use FENE for a fene bond or harmonic for harmonic + potential for neighbours + :param True cleanup: delete lammps folder after completion + :param [] remove_rstrn: list of particles which must not have restrains + :param None keep_restart_out_dir: path to write files to restore LAMMPs + session (binary) + :param False restart_path: path to files to restore LAMMPs session (binary) + :param 10 store_n_steps: Integer with number of steps to be saved if + restart_file != False + :param False useColvars: True if you want the restrains to be loaded by colvars + + :returns: a :class:`tadphys.imp.structuralmodels.StructuralModels` object. + + """ + if isinstance(stages, list) and tool == 'imp': + stderr.write('ERROR: tool imp does not allow dynamics\n') + return + if not self._normalization: + stderr.write('WARNING: not normalized data, should run ' + + 'Experiment.normalize_hic()\n') + if not end: + end = self.size + zscores, values, zeros = self._sub_experiment_zscore(start, end, stages) + if self.hic_data and self.hic_data[0].chromosomes: + coords = [] + tot = 0 + chrs = [] + chrom_offset_start = 1 + chrom_offset_end = 0 + for k, v in self.hic_data[0].chromosomes.iteritems(): + tot += v + if start > tot: + chrom_offset_start = start - tot + if end <= tot: + chrom_offset_end = tot - end + chrs.append(k) + break + if start < tot and end >= tot: + chrs.append(k) + + for k in chrs: + coords.append({'crm' : k, + 'start': 1, + 'end' : self.hic_data[0].chromosomes[k]}) + coords[0]['start'] = chrom_offset_start + coords[-1]['end'] -= chrom_offset_end + + else: + coords = {'crm' : self.crm.name, + 'start': start, + 'end' : end} + zeros = [tuple([i not in zeros_stg for i in xrange(end - start + 1)]) for zeros_stg in zeros] + nloci = end - start + 1 + if verbose: + stderr.write('Preparing to model %s particles\n' % nloci) + if tool=='lammps': + return generate_lammps_models(zscores, self.resolution, nloci, + values=values, n_models=n_models, + outfile=outfile, n_keep=n_keep, n_cpus=n_cpus, + verbose=verbose, first=0, + close_bins=close_bins, config=config, container=container, + coords=coords, zeros=zeros, + tmp_folder=tmp_folder,timeout_job=timeout_job, + initial_conformation='tadbit' if not initial_conformation \ + else initial_conformation, + connectivity=connectivity, + timesteps_per_k=timesteps_per_k, kfactor=kfactor, + adaptation_step=adaptation_step, cleanup=cleanup, + hide_log=hide_log, initial_seed=start_seed, + remove_rstrn=remove_rstrn, restart_path=restart_path, + keep_restart_out_dir=keep_restart_out_dir, + store_n_steps=store_n_steps, + useColvars=useColvars + ) + + + def optimal_imp_parameters(self, start=1, end=None, n_models=500, n_keep=100, + n_cpus=1, upfreq_range=(0, 1, 0.1), close_bins=1, + kbending_range=0.0, + lowfreq_range=(-1, 0, 0.1), + scale_range=[0.01][:], + maxdist_range=(400, 1400, 100), + dcutoff_range=[2][:], + outfile=None, verbose=True, corr='spearman', + off_diag=1, savedata=None, + container=None): + """ + Find the optimal set of parameters to be used for the 3D modeling in + IMP. + + :param 1 start: first bin to model (bin number) + :param None end: last bin to model (bin number). By default goes to the + last bin. + :param 500 n_models: number of modes to generate + :param 100 n_keep: number of models used in the final analysis (usually + the top 20% of the generated models). The models are ranked + according to their objective function value (the lower the better) + :param 1 close_bins: number of particles away (i.e. the bin number + difference) a particle pair must be in order to be considered as + neighbors (e.g. 1 means consecutive particles) + :param n_cpus: number of CPUs to use + :param False verbose: if set to True, information about the distance, + force and Z-score between particles will be printed + :param (-1,0,0.1) lowfreq_range: range of lowfreq values to be + optimized. The last value of the input tuple is the incremental step + for the lowfreq values + :param (0,1,0.1,0.1) upfreq_range: range of upfreq values to be + optimized. The last value of the input tuple is the incremental step + for the upfreq values + :param (400,1400,100) maxdist_range: upper and lower bounds used to + search for the optimal maximum experimental distance. The last value + of the input tuple is the incremental step for maxdist values + :param [0.01] scale_range: upper and lower bounds used to search for + the optimal scale parameter (nm per nucleotide). The last value of + the input tuple is the incremental step for scale parameter values + :param [2] dcutoff_range: upper and lower bounds used to search for + the optimal distance cutoff parameter (distance, in number of beads, + from which to consider 2 beads as being close). The last value of the + input tuple is the incremental step for scale parameter values + :param None cutoff: distance cutoff (nm) to define whether two particles + are in contact or not, default is 2 times resolution, times scale. + :param None container: restrains particle to be within a given object. Can + only be a 'cylinder', which is, in fact a cylinder of a given height to + which are added hemispherical ends. This cylinder is defined by a radius, + its height (with a height of 0 the cylinder becomes a sphere) and the + force applied to the restraint. E.g. for modeling E. coli genome (2 + micrometers length and 0.5 micrometer of width), these values could be + used: ['cylinder', 250, 1500, 50], and for a typical mammalian nuclei + (6 micrometers diameter): ['cylinder', 3000, 0, 50] + :param True verbose: print the results to the standard output + + .. note:: + + Each of the *_range* parameters accept tuples in the form + *(start, end, step)*, or a list with the list of values to test. + + E.g.: + * scale_range=[0.001, 0.005, 0.006] will test these three values. + * scale_range=(0.001, 0.005, 0.001) will test the values 0.001, + 0.002, 0.003, 0.004 and 0.005 + + + :returns: an :class:`tadphys.imp.impoptimizer.IMPoptimizer` object + + """ + if not self._normalization: + stderr.write('WARNING: not normalized data, should run ' + + 'Experiment.normalize_hic()\n') + if not end: + end = self.size + optimizer = IMPoptimizer(self, start, end, n_keep=n_keep, + n_models=n_models, close_bins=close_bins, + container=container) + optimizer.run_grid_search(maxdist_range=maxdist_range, + kbending_range=kbending_range, + upfreq_range=upfreq_range, + lowfreq_range=lowfreq_range, + scale_range=scale_range, + dcutoff_range=dcutoff_range, corr=corr, + n_cpus=n_cpus, verbose=verbose, + off_diag=off_diag, savedata=savedata) + + if outfile: + optimizer.write_result(outfile) + + return optimizer + + + def _sub_experiment_zscore(self, start, end, index=0): + """ + Get the z-score of a sub-region of an experiment. + + TODO: find a nicer way to do this... + + :param start: first bin to model (bin number) + :param end: first bin to model (bin number) + :param 0 index: hic_data index or norm index from where to compute + the zscores. A list is allowed to compute several zscores at the + same time + + :returns: z-score, raw values and zeros of the experiment + """ + if isinstance(index, list): + idx = index + else: + idx = [index] + if not self._normalization or not self._normalization.startswith('visibility'): + stderr.write('WARNING: normalizing according to visibility method\n') + for i in idx: + self.normalize_hic(index=i) + from tadphys import Chromosome + if start < 1: + raise ValueError('ERROR: start should be higher than 0\n') + start -= 1 # things starts at 0 for python. we keep the end coordinate + # at its original value because it is inclusive + siz = self.size + tmp = Chromosome('tmp') + tmp.add_experiment('exp1', resolution=self.resolution, filter_columns=False) + exp = tmp.experiments[0] + tmp_matrix = [] + exp.norm = [] + try: + for id in idx: + matrix = self.get_hic_matrix(index=id) + new_matrix = [[matrix[i][j] for i in xrange(start, end)] + for j in xrange(start, end)] + tmp_matrix.append(new_matrix) + # We want the weights and zeros calculated in the full chromosome + exp.norm.append([self.norm[id][i + siz * j] for i in xrange(start, end) + for j in xrange(start, end)]) + exp.load_hic_data(hic_data=tmp_matrix) + except TypeError: # no Hi-C data provided + for id in idx: + matrix = self.get_hic_matrix(normalized=True,index=id) + new_matrix = [[matrix[i][j] for i in xrange(start, end)] + for j in xrange(start, end)] + tmp_matrix.append(new_matrix) + exp.load_norm_data(norm_data=tmp_matrix) + + # ... but the z-scores in this particular region + vals = [] + exp._zeros = [] + for id in idx: + exp._zeros += [dict([(z - start, None) for z in self._zeros[id] + if start <= z <= end - 1])] + if len(exp._zeros[-1]) == (end - start): + raise Exception('ERROR: no interaction found in selected regions') + exp.get_hic_zscores(index=id) + values = [[float('nan') for _ in xrange(exp.size)] + for _ in xrange(exp.size)] + + for i in xrange(exp.size): + # zeros are rows or columns having a zero in the diagonal + if i in exp._zeros[-1]: + continue + for j in xrange(i + 1, exp.size): + if j in exp._zeros[-1]: + continue + if (not exp.norm[id]['matrix'].get(i * exp.size + j, 0) + or not exp.norm[id]['matrix'].get(i * exp.size + j, 0)): + continue + values[i][j] = exp.norm[id]['matrix'][i * exp.size + j] + values[j][i] = exp.norm[id]['matrix'][i * exp.size + j] + vals.append(values) + return exp._zscores, vals, exp._zeros + + + def write_interaction_pairs(self, fname, normalized=True, zscored=True, + diagonal=False, cutoff=None, header=False, + true_position=False, uniq=True, + remove_zeros=False, focus=None, format='tsv', + index=0): + """ + Creates a tab separated file with all the pairwise interactions. + + :param fname: file name where to write the pairwise interactions + :param True zscored: computes the z-score of the log10(data) + :param True normalized: use the weights to normalize the data + :param None cutoff: if defined, only the zscores above the cutoff will + be writen to the file + :param False uniq: only writes one representent per interacting pair + :param False true_position: if, true writes genomic coordinates, + otherwise, genomic bin. + :param None focus: writes interactions between the start and stop bin + passed to this parameter. + :param 'tsv' format: in which to write the file, can be tab separated + (tsv) or JSON (json) + :param 0 index: hic_data index or norm index + + """ + if (not self._zscores or len(self.norm) < index) and zscored: + self.get_hic_zscores(index=index) + if (not self.norm or len(self.norm) < index) and normalized: + raise Exception('Experiment not normalized.') + # write to file + if isinstance(fname, str): + out = open(fname, 'w') + elif isinstance(fname, file): + out = fname + else: + raise Exception('Not recognize file type\n') + if header: + if format == 'tsv': + out.write('elt1\telt2\t%s\n' % ('zscore' if zscored else + 'normalized hi-c' if normalized + else 'raw hi-c')) + elif format == 'json': + out.write(''' +{ + "metadata": { + "formatVersion" : 3, + %s + "species" : "%s", + "cellType" : "%s", + "experimentType" : "%s", + "identifier" : "%s", + "resolution" : %s, + "chromosome" : "%s", + "start" : %s, + "end" : %s + }, + "interactions": [ + ''' % ('\n'.join(['"%s": "%s",' % (k, self.description[k]) + for k in self.description]), + self.description.get('species', ''), + self.cell_type, + self.exp_type, + self.identifier, + self.resolution, + self.crm.name, + focus[0] * self.resolution if focus else 1, + (focus[1] * self.resolution if focus else + self.resolution * self.size))) + if focus: + start, end = focus[0], focus[1] + 1 + else: + start, end = 0, self.size + for i in xrange(start, end): + if i in self._zeros[index]: + continue + newstart = i if uniq else 0 + for j in xrange(newstart, end): + if j in self._zeros[index]: + continue + if not diagonal and i == j: + continue + if zscored: + try: + if self._zscores[index][str(i)][str(j)] < cutoff: + continue + if self._zscores[index][str(i)][str(j)] == -99: + continue + except KeyError: + continue + val = self._zscores[index][str(i)][str(j)] + elif normalized: + val = self.norm[index][self.size*i+j] + else: + val = self.hic_data[index][self.size*i+j] + if remove_zeros and not val: + continue + if true_position: + if format == 'tsv': + out.write('%s\t%s\t%s\n' % ( + self.resolution * (i + 1), + self.resolution * (j + 1), val)) + elif format == 'json': + out.write('%s,%s,%s,\n' % ( + self.resolution * (i + 1), + self.resolution * (j + 1), val)) + else: + if format == 'tsv': + out.write('%s\t%s\t%s\n' % ( + i + 1 - start, j + 1 - start, val)) + elif format == 'json': + out.write('%s,%s,%s\n' % ( + i + 1 - start, j + 1 - start, val)) + if format == 'json': + out.write(']}\n') + out.close() + + + def get_hic_matrix(self, focus=None, diagonal=True, normalized=False, index=0): + """ + Return the Hi-C matrix. + + :param None focus: if a tuple is passed (start, end), wil return a Hi-C + matrix starting at start, and ending at end (all inclusive). + :param True diagonal: replace the values in the diagonal by one. Used + for the filtering in order to smooth the distribution of mean values + :param False normalized: returns normalized data instead of raw Hi-C + :param 0 index: hic_data index or norm index from where to get the matrix + + :returns: list of lists representing the Hi-C data matrix of the + current experiment + """ + siz = self.size + if normalized: + try: + hic = self.norm[index]['matrix'] + except TypeError: + raise Exception('ERROR: experiment not normalized yet') + else: + hic = self.hic_data[index]['matrix'] + if focus: + start, end = focus + start -= 1 + else: + start = 0 + end = siz + if diagonal: + return [[hic.get(i + siz * j, 0) for i in xrange(start, end)] + for j in xrange(start, end)] + else: + mtrx = [[hic.get(i + siz * j, 0) for i in xrange(start, end)] + for j in xrange(start, end)] + for i in xrange(start, end): + mtrx[i][i] = 1 if mtrx[i][i] else 0 + return mtrx + + + def print_hic_matrix(self, print_it=True, normalized=False, zeros=False, index=0): + """ + Return the Hi-C matrix as string + + :param True print_it: Otherwise, returns the string + :param False normalized: returns normalized data, instead of raw Hi-C + :param False zeros: take into account filtered columns + :param 0 index: hic_data index or norm index from where to print the matrix + :returns: list of lists representing the Hi-C data matrix of the + current experiment + """ + siz = self.size + try: + if normalized: + hic = self.norm[index]['matrix'] + else: + hic = self.hic_data[index]['matrix'] + except TypeError: + raise Exception('ERROR: no hic_data with index ',index) + + if zeros: + out = '\n'.join(['\t'.join( + ['nan' if (i in self._zeros[index] or j in self._zeros[index]) else + str(hic.get(i + siz * j, 0)) for i in xrange(siz)]) + for j in xrange(siz)]) + else: + out = '\n'.join(['\t'.join([str(hic.get(i + siz * j, 0)) + for i in xrange(siz)]) + for j in xrange(siz)]) + if print_it: + print(out) + else: + return out + '\n' + + + # def view(self, tad=None, focus=None, paint_tads=False, axe=None, + # show=True, logarithm=True, normalized=False, relative=True, + # decorate=True, savefig=None, where='both', clim=None, + # cmap='jet', index=0): + # """ + # Visualize the matrix of Hi-C interactions + + # :param None tad: a given TAD in the form: + # :: + + # {'start': start, + # 'end' : end, + # 'brk' : end, + # 'score': score} + + # **Alternatively** a list of the TADs can be passed (all the TADs + # between the first and last one passed will be showed. Thus, passing + # more than two TADs might be superfluous) + # :param None focus: a tuple with the start and end positions of the + # region to visualize + # :param False paint_tads: draw a box around the TADs defined for this + # experiment + # :param None axe: an axe object from matplotlib can be passed in order + # to customize the picture + # :param True show: either to pop-up matplotlib image or not + # :param True logarithm: show the logarithm values + # :param True normalized: show the normalized data (weights might have + # been calculated previously). *Note: white rows/columns may appear in + # the matrix displayed; these rows correspond to filtered rows (see* + # :func:`tadphys.utils.hic_filtering.hic_filtering_for_modelling` *)* + # :param True relative: color scale is relative to the whole matrix of + # data, not only to the region displayed + # :param True decorate: draws color bar, title and axes labels + # :param None savefig: path to a file where to save the image generated; + # if None, the image will be shown using matplotlib GUI (the extension + # of the file name will determine the desired format). + # :param None clim: tuple with minimum and maximum value range for color + # scale. I.e. clim=(-4, 10) + # :param 'jet' cmap: color map from matplotlib. Can also be a + # preconfigured cmap object. + # :param 0 index: hic_data index or norm index + # """ + # if logarithm==True: + # fun = log2 + # elif logarithm: + # fun = logarithm + # else: + # fun = lambda x: x + # size = self.size + # if normalized and not self.norm: + # raise Exception('ERROR: weights not calculated for this ' + + # 'experiment. Run Experiment.normalize_hic\n') + # if tad and focus: + # raise Exception('ERROR: only one of "tad" or "focus" might be set') + # start = end = None + # if focus: + # start, end = focus + # if start == 0: + # stderr.write('WARNING: Hi-C matrix starts at 1, setting ' + + # 'starting point to 1.\n') + # start = 1 + # elif isinstance(tad, dict): + # start = int(tad['start']) + # end = int(tad['end']) + # elif isinstance(tad, list): + # if isinstance(tad[0], dict): + # start = int(sorted(tad, + # key=lambda x: int(x['start']))[0 ]['start']) + # end = int(sorted(tad, + # key=lambda x: int(x['end' ]))[-1]['end' ]) + # elif self.tads: + # start = self.tads[min(self.tads)]['start'] + 1 + # end = self.tads[max(self.tads)]['end' ] + 1 + # else: + # start = 1 + # end = size + # try: + # if normalized: + # hic = self.norm[index] + # else: + # hic = self.hic_data[index] + # except TypeError: + # raise Exception('ERROR: no hic_data with index ',index) + + # if relative and not clim: + # if normalized: + # # find minimum, if value is non-zero... for logarithm + # mini = min([i for i in hic.values() if i]) + # if mini == int(mini): + # vmin = min(hic.values()) + # else: + # vmin = mini + # vmin = fun(vmin or (1 if logarithm else 0)) + # vmax = fun(max(hic.values())) + # else: + # vmin = fun(min(hic.values()) or + # (1 if logarithm else 0)) + # vmax = fun(max(hic.values())) + # elif clim: + # vmin, vmax = clim + # if axe is None: + # plt.figure(figsize=(8, 6)) + # axe = plt.subplot(111) + # if tad or focus: + # if start > -1: + # if normalized: + # matrix = [ + # [hic[i+size*j] + # if (not i in self._zeros[index] + # and not j in self._zeros[index]) else float('nan') + # for i in xrange(int(start) - 1, int(end))] + # for j in xrange(int(start) - 1, int(end))] + # else: + # matrix = [ + # [hic[i+size*j] + # for i in xrange(int(start) - 1, int(end))] + # for j in xrange(int(start) - 1, int(end))] + # elif isinstance(tad, list): + # if normalized: + # stderr.write('WARNING: List passed, not going to be ' + + # 'normalized.\n') + # matrix = tad + # else: + # # TODO: something... matrix not declared... + # pass + # else: + # if normalized: + # matrix = [[hic[i+size*j] + # if (not i in self._zeros[index] + # and not j in self._zeros[index]) else float('nan') + # for i in xrange(size)] + # for j in xrange(size)] + # else: + # matrix = [[hic[i+size*j]\ + # for i in xrange(size)] \ + # for j in xrange(size)] + # if where == 'up': + # for i in xrange(int(end - start)): + # for j in xrange(i, int(end - start)): + # matrix[i][j] = vmin + # alphas = array([0, 0] + [1] * 256 + [0]) + # jet._init() + # jet._lut[:, -1] = alphas + # elif where == 'down': + # for i in xrange(int(end - start)): + # for j in xrange(i + 1): + # matrix[i][j] = vmin + # alphas = array([0, 0] + [1] * 256 + [0]) + # jet._init() + # jet._lut[:,-1] = alphas + + # if isinstance(cmap, str): + # cmap = plt.get_cmap(cmap) + # cmap.set_bad('darkgrey', 1) + # if relative: + # img = axe.imshow(nozero_log_matrix(matrix, fun), origin='lower', vmin=vmin, vmax=vmax, + # interpolation="nearest", cmap=cmap, + # extent=(int(start or 1) - 0.5, + # int(start or 1) + len(matrix) - 0.5, + # int(start or 1) - 0.5, + # int(start or 1) + len(matrix) - 0.5)) + # else: + # img = axe.imshow(nozero_log_matrix(matrix, fun), origin='lower', + # interpolation="nearest", cmap=cmap, + # extent=(int(start or 1) - 0.5, + # int(start or 1) + len(matrix) - 0.5, + # int(start or 1) - 0.5, + # int(start or 1) + len(matrix) - 0.5)) + # if decorate: + # cbar = axe.figure.colorbar(img) + # cbar.ax.set_ylabel('%sHi-C %sinteraction count' % ( + # 'Log2 ' * (logarithm==True), 'normalized ' * normalized), rotation=-90) + # axe.set_title(('Chromosome %s experiment %s %s') % ( + # self.crm.name, self.name, + # 'focus: %s-%s' % (start, end) if tad else '')) + # axe.set_xlabel('Genomic bin (resolution: %s)' % (self.resolution)) + # if paint_tads: + # axe.set_ylabel('TAD number') + # else: + # axe.set_ylabel('Genomic bin (resolution: %s)' % ( + # self.resolution)) + # if not paint_tads: + # axe.set_ylim(int(start or 1) - 0.5, + # int(start or 1) + len(matrix) - 0.5) + # axe.set_xlim(int(start or 1) - 0.5, + # int(start or 1) + len(matrix) - 0.5) + # if show: + # plt.show() + # if savefig: + # tadbit_savefig(savefig) + # return img + # pwidth = 1 + # tads = dict([(t, self.tads[t]) for t in self.tads + # if ((int(self.tads[t]['start']) + 1 >= start + # and int(self.tads[t]['end' ]) + 1 <= end) + # or not start)]) + # for i, tad in tads.iteritems(): + # t_start = int(tad['start']) + .5 + # t_end = int(tad['end']) + 1.5 + # nwidth = float(abs(tad['score'])) / 4 + # if where in ['down', 'both']: + # axe.hlines(t_start, t_start, t_end, colors='k', lw=pwidth) + # if where in ['up', 'both']: + # axe.hlines(t_end , t_start, t_end, colors='k', lw=nwidth) + # if where in ['up', 'both']: + # axe.vlines(t_start, t_start, t_end, colors='k', lw=pwidth) + # if where in ['down', 'both']: + # axe.vlines(t_end , t_start, t_end, colors='k', lw=nwidth) + # pwidth = nwidth + # if tad['score'] < 0: + # for j in xrange(0, int(t_end) - int(t_start), 2): + # axe.plot((t_start , t_start + j), + # (t_end - j, t_end ), color='k') + # axe.plot((t_end , t_end - j), + # (t_start + j, t_start ), color='k') + # axe.set_ylim(int(start or 1) - 0.5, + # int(start or 1) + len(matrix) - 0.5) + # axe.set_xlim(int(start or 1) - 0.5, + # int(start or 1) + len(matrix) - 0.5) + # if paint_tads: + # ticks = [] + # labels = [] + # for tad, tick in [(t, tads[t]['start'] + (tads[t]['end'] - + # tads[t]['start'] - 1)) + # for t in tads.keys()[::(len(tads)/11 + 1)]]: + # ticks.append(tick) + # labels.append(tad + 1) + # axe.set_yticks(ticks) + # axe.set_yticklabels(labels) + # if show: + # plt.show() + # if savefig: + # tadbit_savefig(savefig) + # return img + + + # def write_tad_borders(self, density=False, savedata=None, normalized=False, index=0): + # """ + # Print a table summarizing the TADs found by tadbit. This function outputs + # something similar to the R function. + + # :param False density: if True, adds a column with the relative + # interaction frequency measured within each TAD (value of 1 means an + # interaction frequency equal to the expectation in the experiment) + # :param None savedata: path to a file where to save the density data + # generated (1 column per step + 1 for particle number). If None, print + # a table. + # :param False normalized: uses normalized data to calculate the density + # :param 0 index: hic_data index or norm index + # """ + # try: + # if normalized and self.norm: + # norms = self.norm[index] + # elif self.hic_data: + # if normalized: + # warn("WARNING: weights not available, using raw data") + # norms = self.hic_data[index] + # else: + # warn("WARNING: raw Hi-C data not available, " + + # "TAD's height fixed to 1") + # norms = None + # except TypeError: + # raise Exception('ERROR: no hic_data with index ',index) + + # zeros = self._zeros[index] or {} + # table = '' + # table += '%s\t%s\t%s\t%s%s\n' % ('#', 'start', 'end', 'score', + # '' if not density else '\tdensity') + # tads = self.tads + # sp1 = self.size + 1 + # diags = [] + # if norms: + # for k in xrange(1, self.size): + # s_k = self.size * k + # diags.append(sum([norms[i * sp1 + s_k] + # if not (i in zeros + # or (i + k) in zeros) else 0. + # for i in xrange( + # self.size - k)]) / (self.size - k)) + # for tad in tads: + # table += '%s\t%s\t%s\t%s%s\n' % ( + # tad, int(tads[tad]['start'] + 1), int(tads[tad]['end'] + 1), + # abs(tads[tad]['score']), '' if not density else + # '\t%s' % (round(float(tads[tad]['height']), 3))) + # if not savedata: + # print table + # return + # if isinstance(savedata, file): + # out = savedata + # else: + # out = open(savedata, 'w') + # out.write(table) + + # def write_json(self, filename, focus=None, normalized=False): + # """ + # Save hic matrix in the json format, read by TADkit. + + # :param filename: location where the file will be written + # :param None focus: if a tuple is passed (start, end), json will contain a Hi-C + # matrix starting at start, and ending at end (all inclusive). + # :para False normalized: use normalized data instead of raw Hi-C + + # """ + # if not self.crm.species: + # warn("WARNING: no species specified in chromosome. TADkit will not be able to interpret the file") + # if not self.crm.name: + # warn("WARNING: no name specified in chromosome. TADkit will not be able to interpret the file") + + # if focus: + # start, end = focus + # size = end-start+1 + # else: + # start = 0 + # end = size = self.size + + # if size > 1200: + # warn("WARNING: this is a very big matrix, consider using focus. TADkit will not be able to render the file") + + # new_hic_data = self.get_hic_matrix(focus=focus, normalized=normalized) + + # chrom_start = [] + # chrom_end = [] + # chrom = [] + # if self.hic_data and self.hic_data[0].chromosomes: + # tot = 0 + # chrs = [] + # chrom_offset_start = start + # chrom_offset_end = 0 + # for k, v in self.hic_data[0].chromosomes.iteritems(): + # tot += v + # if start > tot: + # chrom_offset_start = start - tot + # if end <= tot: + # chrom_offset_end = tot - end + # chrs.append((k,v)) + # break + # if start < tot and end >= tot: + # chrs.append((k,v)) + + # for k, v in chrs: + # chrom.append(k) + # chrom_start.append(0) + # chrom_end.append(v * self.resolution) + # chrom_start[0] = chrom_offset_start * self.resolution + # chrom_end[-1] -= chrom_offset_end * self.resolution + + # else: + # chrom.append(self.crm.name) + # chrom_start.append(start * self.resolution) + # chrom_end.append(end * self.resolution) + + + # descr = {'chromosome' : chrom, + # 'species' : self.crm.species, + # 'resolution' : self.resolution, + # 'chrom_start' : chrom_start, + # 'chrom_end' : chrom_end, + # 'start' : self.resolution, + # 'end' : size * self.resolution} + + # # Fake structural models object to produce json + # sm = StructuralModels(nloci=size, models = [], bad_models = [], experiment=self, resolution=self.resolution, original_data=new_hic_data, description=descr, config={'scale':0.01}) + # sm.write_json(filename=filename) + + # def generate_densities(self): + # """ + # Related to the generation of 3D models. + # In the case of Hi-C data, the density is equal to the number of + # nucleotides in a bin, which is equal to the experiment resolution. + # """ + # dens = {} + # for i in self.size: + # dens[i] = self.resolution + # return dens diff --git a/tadphys/hic_data.py b/tadphys/hic_data.py new file mode 100644 index 0000000..e69de29 diff --git a/tadphys/modelling/HIC_CONFIG.py b/tadphys/modelling/HIC_CONFIG.py new file mode 100644 index 0000000..d0485d7 --- /dev/null +++ b/tadphys/modelling/HIC_CONFIG.py @@ -0,0 +1,46 @@ +""" +07 Feb 2013 + + +""" + +# GENERAL +######### + +CONFIG = { + # use these paramaters with the Hi-C data from: + 'reference' : 'example dataset', + + # Force applied to the restraints inferred to neighbor particles + 'kforce' : 5, + + # How much space (in nm) ocupies a nucleotide + 'scale' : 0.01, + + # Strength of the bending interaction + 'kbending' : 0.0, # OPTIMIZATION: + + # Maximum experimental contact distance + 'maxdist' : 600, # OPTIMIZATION: 500-1200 + + # Minimum thresholds used to decide which experimental values have to be + # included in the computation of restraints. Z-score values bigger than upfreq + # and less that lowfreq will be include, whereas all the others will be rejected + 'lowfreq' : -0.7, # OPTIMIZATION: min/max Z-score + + # Maximum thresholds used to decide which experimental values have to be + # included in the computation of restraints. Z-score values bigger than upfreq + # and less that lowfreq will be include, whereas all the others will be rejected + 'upfreq' : 0.3 # OPTIMIZATION: min/max Z-score + + } + + +# MonteCarlo optimizer parameters +################################# +# number of iterations +NROUNDS = 10000 +# number of MonteCarlo steps per round +STEPS = 1 +# number of local steps per round +LSTEPS = 5 diff --git a/tadphys/modelling/LAMMPS_CONFIG.py b/tadphys/modelling/LAMMPS_CONFIG.py new file mode 100644 index 0000000..3c0fd76 --- /dev/null +++ b/tadphys/modelling/LAMMPS_CONFIG.py @@ -0,0 +1,103 @@ +""" +25 Oct 2016 + + +""" +############################################################################### +# Parameters to implement Kremer&Grest polymer model # +# Reference paper: # +# K. Kremer and G. S. Grest # +# Dynamics of entangled linear polymer melts: A molecular-dynamics simulation # +# J Chem Phys 92, 5057 (1990) # +############################################################################### + + +# units http://lammps.sandia.gov/doc/units.html +units = "lj" + +# atom_style http://lammps.sandia.gov/doc/atom_style.html +atom_style = "angle" + +# boundary conditions http://lammps.sandia.gov/doc/boundary.html +boundary = "p p p" +#boundary = "f f f" + +# mass http://lammps.sandia.gov/doc/mass.html +mass = "* 1.0" + +# neighbor http://lammps.sandia.gov/doc/neighbor.html +#neighbor = "3.0 nsq" # Optional for small and low density systems +neighbor = "0.3 bin" # Standard choice for large (> 10,000 particles) systems +#neigh_modify = "every 1 delay 1 check yes page 200000 one 20000" +neigh_modify = "every 1 delay 1 check yes" + +# thermo +run = 100 +thermo = 1000 #int(float(run)/100) +#thermo_style custom step temp epair emol press pxx pyy pzz pxy pxz pyz vol + +# Excluded volume term: Purely repulsive Lennard-Jones or Truncated and Shifted Lennard-Jones +################################################################### +# Lennard-Jones 12-6 potential with cutoff (=truncated): # +# potential E=4epsilon[ (sigma/r)^12 - (sigma/r)^6] for r tot: + chrom_offset_start = start - tot + if end <= tot: + chrom_offset_end = tot - end + chrs.append(k) + break + if start < tot and end >= tot: + chrs.append(k) + + for k in chrs: + self.coords.append({'crm' : k, + 'start': 1, + 'end' : experiment.hic_data[0].chromosomes[k]}) + self.coords[0]['start'] = chrom_offset_start + self.coords[-1]['end'] -= chrom_offset_end + + else: + self.coords = {'crm' : experiment.crm.name, + 'start': start, + 'end' : end} + + self.tool = tool + self.tmp_folder = tmp_folder + self.single_particle_restraints = single_particle_restraints + + # For clarity, the order in which the optimized parameters are managed should be + # always the same: scale, kbending, maxdist, lowfreq, upfreq + self.scale_range = [] + self.kbending_range = [] + self.maxdist_range = [] + self.lowfreq_range = [] + self.upfreq_range = [] + + self.dcutoff_range = [] + + self.container = container + self.results = {} + + + def run_grid_search(self, + scale_range=0.01, + kbending_range=0.0, + maxdist_range=(400, 1500, 100), + lowfreq_range=(-1, 0, 0.1), + upfreq_range=(0, 1, 0.1), + dcutoff_range=None, + corr='spearman', off_diag=1, + savedata=None, n_cpus=1, verbose=True, + use_HiC=True, use_confining_environment=True, + use_excluded_volume=True, kforce=5, + ev_kforce=5, timeout_job=300, + connectivity="FENE", hide_log=True, + kfactor=1, cleanup=False, + initial_conformation=None, + remove_rstrn=[], initial_seed=0, + keep_restart_out_dir=None, + restart_path=False, store_n_steps=10, + useColvars=False): + """ + This function calculates the correlation between the models generated + by IMP and the input data for the four main IMP parameters (scale, + kbending, maxdist, lowfreq and upfreq) in the given ranges of values. + The range can be expressed as a list. + + :param n_cpus: number of CPUs to use + :param 0.01 scale_range: upper and lower bounds used to search for + the optimal scale parameter (unit nm per nucleotide). The last value of + the input tuple is the incremental step for scale parameter values + :param (0,2.0,0.5) kbending_range: values of the bending rigidity + strength to enforce in the models + :param (400,1400,100) maxdist_range: upper and lower bounds + used to search for the optimal maximum experimental distance. + The last value of the input tuple is the incremental step for maxdist + values + :param (-1,0,0.1) lowfreq_range: range of lowfreq values to be + optimized. The last value of the input tuple is the incremental + step for the lowfreq values. To be precise "freq" refers to the + Z-score. + :param (0,1,0.1) upfreq_range: range of upfreq values to be optimized. + The last value of the input tuple is the incremental step for the + upfreq values. To be precise "freq" refers to the Z-score. + :param 2 dcutoff_range: upper and lower bounds used to search for + the optimal distance cutoff parameter (in nm). The last value of the + input tuple is the incremental step for scale parameter values + :param None savedata: concatenate all generated models into a dictionary + and save it into a file named by this argument + :param True verbose: print the results to the standard output + :param True hide_log: do not generate lammps log information + :param FENE connectivity: use FENE for a fene bond or harmonic for harmonic + potential for neighbours + :param 1 kfactor: Factor by which multiply the adjusted (square root) values + of the ZScores before feeding them to LAMMPS. Used to decrease + maximum values bellow 1. E.j.: kfactor=0.1 + :param True cleanup: delete lammps folder after completion + :param tadbit initial_conformation: initial structure for lammps dynamics. + 'tadbit' to compute the initial conformation with montecarlo simulated annealing + 'random' to compute the initial conformation as a 3D random walk + {[x],[y],[z]} a dictionary containing lists with x,y,x positions, + e.g an IMPModel or LAMMPSModel object + :param [] remove_rstrn: list of particles which must not have restrains + :param 0 initial_seed: Initial random seed for modelling. + :param None keep_restart_out_dir: recover stopped computation + :param False restart_path: path to files to restore LAMMPs session (binary) + :param 10 store_n_steps: Integer with number of steps to be saved if + restart_file != False + :param False useColvars: True if you want the restrains to be loaded by colvars + + """ + if verbose: + stderr.write('Optimizing %s particles\n' % self.nloci) + + # These commands transform the ranges defined in input as tuples + # in list of values to use in the grid search of the best parameters + # scale + if isinstance(scale_range, tuple): + scale_step = scale_range[2] + scale_arange = np.arange(scale_range[0], + scale_range[1] + scale_step / 2, + scale_step) + else: + if isinstance(scale_range, (float, int)): + scale_range = [scale_range] + scale_arange = scale_range + # kbending + if isinstance(kbending_range, tuple): + kbending_step = kbending_range[2] + kbending_arange = np.arange(kbending_range[0], + kbending_range[1] + kbending_step / 2, + kbending_step) + else: + if isinstance(kbending_range, (float, int)): + kbending_range = [kbending_range] + kbending_arange = kbending_range + # maxdist + if isinstance(maxdist_range, tuple): + maxdist_step = maxdist_range[2] + maxdist_arange = range(maxdist_range[0], + maxdist_range[1] + maxdist_step, + maxdist_step) + else: + if isinstance(maxdist_range, (float, int)): + maxdist_range = [maxdist_range] + maxdist_arange = maxdist_range + # lowfreq + if isinstance(lowfreq_range, tuple): + lowfreq_step = lowfreq_range[2] + lowfreq_arange = np.arange(lowfreq_range[0], + lowfreq_range[1] + lowfreq_step / 2, + lowfreq_step) + else: + if isinstance(lowfreq_range, (float, int)): + lowfreq_range = [lowfreq_range] + lowfreq_arange = lowfreq_range + # upfreq + if isinstance(upfreq_range, tuple): + upfreq_step = upfreq_range[2] + upfreq_arange = np.arange(upfreq_range[0], + upfreq_range[1] + upfreq_step / 2, + upfreq_step) + else: + if isinstance(upfreq_range, (float, int)): + upfreq_range = [upfreq_range] + upfreq_arange = upfreq_range + # dcutoff + if isinstance(dcutoff_range, tuple): + dcutoff_step = dcutoff_range[2] + dcutoff_arange = np.arange(dcutoff_range[0], + dcutoff_range[1] + dcutoff_step / 2, + dcutoff_step) + else: + if isinstance(dcutoff_range, (float, int)): + dcutoff_range = [dcutoff_range] + dcutoff_arange = dcutoff_range + + # These commands round all the values in the ranges defined as input + # scale + if not self.scale_range: + self.scale_range = [my_round(i) for i in scale_arange ] + else: + self.scale_range = sorted([my_round(i) for i in scale_arange + if not my_round(i) in self.scale_range] + + self.scale_range) + # scale + if not self.kbending_range: + self.kbending_range = [my_round(i) for i in kbending_arange] + else: + self.kbending_range = sorted([my_round(i) for i in kbending_arange + if not my_round(i) in self.kbending_range] + + self.kbending_range) + # maxdist + if not self.maxdist_range: + self.maxdist_range = [my_round(i) for i in maxdist_arange] + else: + self.maxdist_range = sorted([my_round(i) for i in maxdist_arange + if not my_round(i) in self.maxdist_range] + + self.maxdist_range) + # lowfreq + if not self.lowfreq_range: + self.lowfreq_range = [my_round(i) for i in lowfreq_arange] + else: + self.lowfreq_range = sorted([my_round(i) for i in lowfreq_arange + if not my_round(i) in self.lowfreq_range] + + self.lowfreq_range) + # upfreq + if not self.upfreq_range: + self.upfreq_range = [my_round(i) for i in upfreq_arange ] + else: + self.upfreq_range = sorted([my_round(i) for i in upfreq_arange + if not my_round(i) in self.upfreq_range] + + self.upfreq_range) + # dcutoff + if not self.dcutoff_range: + if dcutoff_arange is None or len(dcutoff_arange) == 0: + self.dcutoff_range = [int(2 * self.resolution * float(sc)) for sc in self.scale_range] + dcutoff_arange = self.dcutoff_range + else: + self.dcutoff_range = [my_round(i) for i in dcutoff_arange] + else: + self.dcutoff_range = sorted([my_round(i) for i in dcutoff_arange + if not my_round(i) in self.dcutoff_range] + + self.dcutoff_range) + + # These commands perform the grid search of the best parameters + models = {} + count = 0 + if verbose: + stderr.write(' %-4s%-5s\t%-8s\t%-7s\t%-7s\t%-6s\t%-7s\t%-11s\n' % ( + "num","scale","kbending","maxdist","lowfreq","upfreq","dcutoff","correlation")) + #print scale_arange, kbending_arange, maxdist_arange, lowfreq_arange, upfreq_arange, dcutoff_arange + parameters_sets = itertools.product([my_round(i) for i in scale_arange ], + [my_round(i) for i in kbending_arange], + [my_round(i) for i in maxdist_arange ], + [my_round(i) for i in lowfreq_arange ], + [my_round(i) for i in upfreq_arange ]) + + + #for (scale, maxdist, upfreq, lowfreq, kbending) in zip([my_round(i) for i in scale_arange ], + for (scale, kbending, maxdist, lowfreq, upfreq) in parameters_sets: + #print (scale, kbending, maxdist, lowfreq, upfreq) + + # This check whether this optimization has been already done for this set of parameters + if (scale, kbending, maxdist, lowfreq, upfreq) in [tuple(k[:5]) for k in self.results]: + k = [k for k in self.results + if (scale, kbending, maxdist, lowfreq, upfreq) == tuple(k[:5]) + ][0] + result = self.results[(scale, kbending, maxdist, lowfreq, upfreq, k[-1])] + if verbose: + verb = ' %-5s\t%-8s\t%-7s\t%-7s\t%-6s\t%-7s\n' % ( + 'xx', scale, kbending, maxdist, lowfreq, upfreq, k[-1]) + + if verbose == 2: + stderr.write(verb + str(round(result, 4)) + '\n') + else: + print(verb + str(round(result, 4))) + continue + + config_tmp = {'kforce' : float(kforce), + 'ev_kforce': float(ev_kforce), + 'scale' : float(scale), + 'kbending' : float(kbending), + #'lowrdist' : 1.0, # This parameters is fixed to XXX + 'lowrdist' : 100, + 'maxdist' : int(maxdist), + 'lowfreq' : float(lowfreq), + 'upfreq' : float(upfreq)} + + try: + count += 1 + avg_result = dict((i,0) for i in dcutoff_arange) + for i in xrange(len(self.zscores)): + if self.tool=='lammps': + tdm = generate_lammps_models(self.zscores, self.resolution, self.nloci, + values=self.values, n_models=self.n_models, + n_keep=self.n_keep, + n_cpus=n_cpus, connectivity=connectivity, + verbose=verbose, first=0,coords = self.coords, + close_bins=self.close_bins, config=config_tmp, + container=self.container, zeros=self.zeros, + tmp_folder=self.tmp_folder,timeout_job=timeout_job, + hide_log=hide_log, + keep_restart_out_dir=keep_restart_out_dir, kfactor=kfactor, + cleanup=cleanup, initial_conformation='random' if not initial_conformation \ + else initial_conformation, + restart_path=restart_path, remove_rstrn=remove_rstrn, + initial_seed=initial_seed, store_n_steps=store_n_steps, + useColvars=useColvars) + else: + raise Exception(('ERROR: Tool must be lammps')) + + result = 0 + matrices = get_contact_matrix(tdm, + #cutoff=[int(dc * self.resolution * float(scale)) for dc in dcutoff_arange]) + cutoff=[int(dc) for dc in dcutoff_arange]) + for m in matrices: + cut = int(m**0.5) + sub_result = correlate_with_real_data(tdm, cutoff=cut, corr=corr, + off_diag=off_diag, + contact_matrix=matrices[m])[0] + + avg_result[cut] += sub_result + + # Update nmodels with the ones that really finished + if self.n_models != len(tdm): + print('WARNING: not all models produced: step %s, nmodels=%s' %(i+1, len(tdm))) + self.n_models = len(tdm) + + except ValueError: + _, e, _ = sys.exc_info() + print(' SKIPPING: %s' % e) + result = 0 + cutoff = my_round(dcutoff_arange[0]) + for ct,m in enumerate(avg_result): + + result = avg_result[m]/len(self.zscores) + + cutoff = int(m) + + if verbose: + verb = ' %-4s%-5s\t%-8s\t%-7s\t%-7s\t%-6s\t%-7s' % ( + count+ct, scale, kbending, maxdist, lowfreq, upfreq, cutoff) + if verbose == 2: + stderr.write(verb + str(round(result, 4)) + '\n') + else: + print(verb + str(round(result, 4))) + + count += ct + # Store the correlation for the TADdyn parameters set + self.results[(scale, kbending, maxdist, lowfreq, upfreq, cutoff)] = result + + if savedata and result: + models[(scale, kbending, maxdist, lowfreq, upfreq, cutoff)] = tdm + + if savedata: + out = open(savedata, 'w') + dump(models, out) + out.close() + + self.kbending_range.sort( key=float) + self.scale_range.sort( key=float) + self.maxdist_range.sort(key=float) + self.lowfreq_range.sort(key=float) + self.upfreq_range.sort( key=float) + self.dcutoff_range.sort(key=float) + + + def load_grid_search_OLD(self, filenames, corr='spearman', off_diag=1, + verbose=True, n_cpus=1): + """ + Loads one file or a list of files containing pre-calculated Structural + Models (keep_models parameter used). And correlate each set of models + with real data. Useful to run different correlation on the same data + avoiding to re-calculate each time the models. + + :param filenames: either a path to a file or a list of paths. + :param spearman corr: correlation coefficient to use + 'param 1 off_diag: + :param True verbose: print the results to the standard output + + """ + if isinstance(filenames, str): + filenames = [filenames] + models = {} + for filename in filenames: + inf = open(filename) + models.update(load(inf)) + inf.close() + count = 0 + pool = mu.Pool(n_cpus, maxtasksperchild=1) + jobs = {} + for scale, maxdist, upfreq, lowfreq, dcutoff in models: + svd = models[(scale, maxdist, upfreq, lowfreq, dcutoff)] + jobs[(scale, maxdist, upfreq, lowfreq, dcutoff)] = pool.apply_async( + _mu_correlate, args=(svd, corr, off_diag, + scale, maxdist, upfreq, lowfreq, dcutoff, + verbose, count)) + count += 1 + pool.close() + pool.join() + for scale, maxdist, upfreq, lowfreq, dcutoff in models: + self.results[(scale, maxdist, upfreq, lowfreq, dcutoff)] = \ + jobs[(scale, maxdist, upfreq, lowfreq, dcutoff)].get() + if not scale in self.scale_range: + self.scale_range.append(scale) + if not maxdist in self.maxdist_range: + self.maxdist_range.append(maxdist) + if not lowfreq in self.lowfreq_range: + self.lowfreq_range.append(lowfreq) + if not upfreq in self.upfreq_range: + self.upfreq_range.append(upfreq) + if not dcutoff in self.dcutoff_range: + self.dcutoff_range.append(dcutoff) + self.scale_range.sort( key=float) + self.maxdist_range.sort(key=float) + self.lowfreq_range.sort(key=float) + self.upfreq_range.sort( key=float) + self.dcutoff_range.sort(key=float) + + + def load_grid_search(self, filenames, corr='spearman', off_diag=1, + verbose=True, n_cpus=1): + """ + Loads one file or a list of files containing pre-calculated Structural + Models (keep_models parameter used). And correlate each set of models + with real data. Useful to run different correlation on the same data + avoiding to re-calculate each time the models. + + :param filenames: either a path to a file or a list of paths. + :param spearman corr: correlation coefficient to use + 'param 1 off_diag: + :param True verbose: print the results to the standard output + + """ + if isinstance(filenames, str): + filenames = [filenames] + models = {} + for filename in filenames: + inf = open(filename) + models.update(load(inf)) + inf.close() + count = 0 + pool = mu.Pool(n_cpus, maxtasksperchild=1) + jobs = {} + for scale, kbending, maxdist, lowfreq, upfreq, dcutoff in models: + svd = models[(scale, kbending, maxdist, lowfreq, upfreq, dcutoff)] + jobs[(scale, kbending, maxdist, lowfreq, upfreq, dcutoff)] = pool.apply_async( + _mu_correlate, args=(svd, corr, off_diag, + scale, kbending, maxdist, lowfreq, upfreq, dcutoff, + verbose, count)) + count += 1 + pool.close() + pool.join() + for scale, kbending, maxdist, lowfreq, upfreq, dcutoff in models: + self.results[(scale, kbending, maxdist, lowfreq, upfreq, dcutoff)] = \ + jobs[(scale, kbending, maxdist, lowfreq, upfreq, dcutoff)].get() + if not scale in self.scale_range: + self.scale_range.append(scale) + if not kbending in self.kbending_range: + self.kbending_range.append(kbending) + if not maxdist in self.maxdist_range: + self.maxdist_range.append(maxdist) + if not lowfreq in self.lowfreq_range: + self.lowfreq_range.append(lowfreq) + if not upfreq in self.upfreq_range: + self.upfreq_range.append(upfreq) + if not dcutoff in self.dcutoff_range: + self.dcutoff_range.append(dcutoff) + self.scale_range.sort( key=float) + self.kbending_range.sort(key=float) + self.maxdist_range.sort(key=float) + self.lowfreq_range.sort(key=float) + self.upfreq_range.sort( key=float) + self.dcutoff_range.sort(key=float) + + + def get_best_parameters_dict(self, reference=None, with_corr=False): + """ + :param None reference: a description of the dataset optimized + :param False with_corr: if True, returns also the correlation value + + :returns: a dict that can be used for modelling, see config parameter in + :func:`taddyn.experiment.Experiment.model_region` + + """ + if not self.results: + stderr.write('WARNING: no optimization done yet\n') + return + best = ((float('nan'), float('nan'), float('nan'), float('nan'), float('nan'), float('nan')), 0.0) + kbending = 0 + try: + for (scale, maxdist, upfreq, lowfreq, kbending, cutoff), val in self.results.iteritems(): + if val > best[-1]: + best = ((scale, maxdist, upfreq, lowfreq, kbending, cutoff), val) + except ValueError: + for (scale, maxdist, upfreq, lowfreq, cutoff), val in self.results.iteritems(): + if val > best[-1]: + best = ((scale, maxdist, upfreq, lowfreq, kbending, cutoff), val) + + if with_corr: + #print best + return (dict((('scale' , float(best[0][0])), + ('kbending' , float(best[0][1])), + ('maxdist' , float(best[0][2])), + ('lowfreq' , float(best[0][3])), + ('upfreq' , float(best[0][4])), + ('dcutoff' , float(best[0][5])), + ('reference', reference or ''), ('kforce', 5))), + best[-1]) + else: + return dict((('scale' , float(best[0][0])), + ('kbending' , float(best[0][1])), + ('maxdist' , float(best[0][2])), + ('lowfreq' , float(best[0][3])), + ('upfreq' , float(best[0][4])), + ('dcutoff' , float(best[0][5])), + ('reference', reference or ''), ('kforce', 5))) + + + def plot_2d_OLD(self, axes=('scale', 'maxdist', 'upfreq', 'lowfreq'), + show_best=0, skip=None, savefig=None,clim=None): + """ + A grid of heatmaps representing the result of the optimization. + + :param 'scale','maxdist','upfreq','lowfreq' axes: list of axes to be + represented in the plot. The order will define which parameter will + be placed on the x, y, z or w axe. + :param 0 show_best: number of best correlation values to highlight in + the plot + :param None skip: if passed (as a dictionary), fix a given axe, + e.g.: {'scale': 0.001, 'maxdist': 500} + :param None savefig: path to a file where to save the image generated; + if None, the image will be shown using matplotlib GUI (the extension + of the file name will determine the desired format). + + """ + results = self._result_to_array() + plot_2d_optimization_result((('scale', 'maxdist', 'upfreq', 'lowfreq'), + ([float(i) for i in self.scale_range], + [float(i) for i in self.maxdist_range], + [float(i) for i in self.upfreq_range], + [float(i) for i in self.lowfreq_range]), + results), axes=axes, dcutoff=self.dcutoff_range, show_best=show_best, + skip=skip, savefig=savefig,clim=clim) + + def plot_2d(self, axes=('scale', 'kbending', 'maxdist', 'lowfreq', 'upfreq'), dcutoff=None, + show_best=0, skip=None, savefig=None,clim=None): + """ + A grid of heatmaps representing the result of the optimization. + + :param 'scale','kbending','maxdist','lowfreq','upfreq' axes: list of + axes to be represented in the plot. The order will define which + parameter will be placed on the x, y, z or w axe. + :param 0 show_best: number of best correlation values to highlight in + the plot + :param None skip: if passed (as a dictionary), fix a given axe, + e.g.: {'scale': 0.001, 'maxdist': 500} + :param None savefig: path to a file where to save the image generated; + if None, the image will be shown using matplotlib GUI (the extension + of the file name will determine the desired format). + """ + + # Case in which there is more than 1 distance cutoff (dcutoff) + cut = self.get_best_parameters_dict()['dcutoff'] + + results = self._result_to_array() + plot_2d_optimization_result((('scale', 'kbending', 'maxdist', 'lowfreq', 'upfreq'), + ([float(i) for i in self.scale_range], + [float(i) for i in self.kbending_range], + [float(i) for i in self.maxdist_range], + [float(i) for i in self.lowfreq_range], + [float(i) for i in self.upfreq_range]), + results), dcutoff=cut, axes=axes, show_best=show_best, + skip=skip, savefig=savefig,clim=clim) + + + + def plot_3d_OLD(self, axes=('scale', 'maxdist', 'upfreq', 'lowfreq')): + """ + A grid of heatmaps representing the result of the optimization. + + :param 'scale','maxdist','upfreq','lowfreq' axes: tuple of axes to be + represented in the plot. The order will define which parameter will + be placed on the x, y, z or w axe. + + """ + results = self._result_to_array() + plot_3d_optimization_result((('scale', 'maxdist', 'upfreq', 'lowfreq'), + ([float(i) for i in self.scale_range], + [float(i) for i in self.maxdist_range], + [float(i) for i in self.upfreq_range], + [float(i) for i in self.lowfreq_range]), + results), axes=axes) + + + + def _result_to_array_OLD(self): + # This auxiliary method organizes the results of the grid optimization in a + # Numerical array to be passed to the plot_2d_OLD and plot_3d functions above + + results = np.empty((len(self.scale_range), len(self.maxdist_range), + len(self.upfreq_range), len(self.lowfreq_range))) + + for w, scale in enumerate(self.scale_range): + for x, maxdist in enumerate(self.maxdist_range): + for y, upfreq in enumerate(self.upfreq_range): + for z, lowfreq in enumerate(self.lowfreq_range): + try: + cut = [c for c in self.dcutoff_range + if (scale, maxdist, upfreq, lowfreq, c) + in self.results][0] + except IndexError: + results[w, x, y, z] = float('nan') + continue + # + try: + results[w, x, y, z] = self.results[ + (scale, maxdist, upfreq, lowfreq, cut)] + except KeyError: + results[w, x, y, z] = float('nan') + return results + + + + def _result_to_array(self): + # This auxiliary method organizes the results of the grid optimization in a + # Numerical array to be passed to the plot_2d and plot_3d functions above + + results = np.zeros((len(self.scale_range), len(self.kbending_range), len(self.maxdist_range), + len(self.lowfreq_range), len(self.upfreq_range))) + #print results + #print self.lowfreq_range + + for v, scale in enumerate(self.scale_range): + for w, kbending in enumerate(self.kbending_range): + for x, maxdist in enumerate(self.maxdist_range): + for y, lowfreq in enumerate(self.lowfreq_range): + for z, upfreq in enumerate(self.upfreq_range): + + # Case in which there is more than 1 distance cutoff (dcutoff) + try: + cut = [c for c in self.dcutoff_range + if (scale, kbending, maxdist, lowfreq, upfreq, c) + in self.results][0] + except IndexError: + results[v, w, x, y, z] = float('nan') + continue + + # + try: + results[v, w, x, y, z] = self.results[ + (scale, kbending, maxdist, lowfreq, upfreq, cut)] + except KeyError: + results[v, w, x, y, z] = float('nan') + + """ + for i in xrange(len(self.scale_range)): + for j in xrange(len(self.kbending_range)): + for k in xrange(len(self.maxdist_range)): + for l in xrange(len(self.lowfreq_range)): + for m in xrange(len(self.upfreq_range)): + print "Correlation",self.scale_range[i],self.kbending_range[j],\ + self.maxdist_range[k],self.lowfreq_range[l],self.upfreq_range[m],\ + results[i][j][k][l][m] + exit(1) + """ + return results + + + + def write_result(self, f_name): + """ + This function writes a log file of all the values tested for each + parameter, and the resulting correlation value. + + This file can be used to load or merge data a posteriori using + the function taddyn.modelling.impoptimizer.IMPoptimizer.load_from_file + + :param f_name: file name with the absolute path + """ + out = open(f_name, 'w') + out.write(('## n_models: %s n_keep: %s ' + + 'close_bins: %s\n') % (self.n_models, + self.n_keep, self.close_bins)) + out.write('# scale\tkbending\tmax_dist\tlow_freq\tup_freq\tdcutoff\tcorrelation\n') + + parameters_sets = itertools.product(*[[my_round(i) for i in self.scale_range ], + [my_round(i) for i in self.kbending_range], + [my_round(i) for i in self.maxdist_range ], + [my_round(i) for i in self.lowfreq_range ], + [my_round(i) for i in self.upfreq_range ]]) + for (scale, kbending, maxdist, lowfreq, upfreq) in parameters_sets: + try: + cut = sorted( + [int(c) for c in self.dcutoff_range + if (scale, kbending, maxdist, lowfreq, upfreq, int(c)) + in self.results], + key=lambda x: self.results[ + (scale, kbending, maxdist, lowfreq, upfreq, x)]) + except IndexError: + print('Missing dcutoff', (scale, kbending, maxdist, lowfreq, upfreq)) + continue + + for c in cut: + try: + result = self.results[(scale, kbending, maxdist, lowfreq, upfreq, c)] + out.write(' %-5s\t%-8s\t%-8s\t%-8s\t%-7s\t%-7s\t%-11s\n' % ( + scale, kbending, maxdist, lowfreq, upfreq, c, result)) + except KeyError: + print('KeyError', (scale, kbending, maxdist, lowfreq, upfreq, c, result)) + continue + out.close() + + + def load_from_file_OLD(self, f_name): + """ + Loads the optimized parameters from a file generated with the function: + taddyn.modelling.impoptimizer.IMPoptimizer.write_result. + This function does not overwrite the parameters that were already + loaded or calculated. + + :param f_name: file name with the absolute path + """ + for line in open(f_name): + # Check same parameters + if line.startswith('##'): + n_models, _, n_keep, _, close_bins = line.split()[2:] + if ([int(n_models), int(n_keep), int(close_bins)] + != + [self.n_models, self.n_keep, self.close_bins]): + raise Exception('Parameters does in %s not match: %s\n%s' %( + f_name, + [int(n_models), int(n_keep), int(close_bins)], + [self.n_models, self.n_keep, self.close_bins])) + if line.startswith('#'): + continue + + # OLD format before May 2017 without kbending parameter + scale, maxdist, upfreq, lowfreq, dcutoff, result = line.split() + # Setting the kbending to 0.0 for to be compatible with the new version + kbending = 0.0 + scale, kbending, maxdist, lowfreq, upfreq, dcutoff = ( + float(scale), float(kbending), int(maxdist), float(lowfreq), float(upfreq), + float(dcutoff)) + scale = my_round(scale, val=5) + kbending = my_round(kbending) + maxdist = my_round(maxdist) + lowfreq = my_round(lowfreq) + upfreq = my_round(upfreq) + dcutoff = my_round(dcutoff) + + self.results[(scale, kbending, maxdist, lowfreq, upfreq, dcutoff)] = float(result) + if not scale in self.scale_range: + self.scale_range.append(scale) + if not kbending in self.kbending_range: + self.kbending_range.append(kbending) + if not maxdist in self.maxdist_range: + self.maxdist_range.append(maxdist) + if not lowfreq in self.lowfreq_range: + self.lowfreq_range.append(lowfreq) + if not upfreq in self.upfreq_range: + self.upfreq_range.append(upfreq) + if not dcutoff in self.dcutoff_range: + self.dcutoff_range.append(dcutoff) + + self.scale_range.sort( key=float) + self.kbending_range.sort(key=float) + self.maxdist_range.sort( key=float) + self.lowfreq_range.sort( key=float) + self.upfreq_range.sort( key=float) + self.dcutoff_range.sort( key=float) + + + + def load_from_file(self, f_name): + """ + Loads the optimized parameters from a file generated with the function: + taddyn.modelling.impoptimizer.IMPoptimizer.write_result. + This function does not overwrite the parameters that were already + loaded or calculated. + + :param f_name: file name with the absolute path + """ + for line in open(f_name): + # Check same parameters + if line.startswith('##'): + n_models, _, n_keep, _, close_bins = line.split()[2:] + if ([int(n_models), int(n_keep), int(close_bins)] + != + [self.n_models, self.n_keep, self.close_bins]): + raise Exception('Parameters does in %s not match: %s\n%s' %( + f_name, + [int(n_models), int(n_keep), int(close_bins)], + [self.n_models, self.n_keep, self.close_bins])) + if line.startswith('#'): + continue + scale, kbending, maxdist, lowfreq, upfreq, dcutoff, result = line.split() + scale, kbending, maxdist, lowfreq, upfreq, dcutoff = ( + float(scale), float(kbending), int(maxdist), float(lowfreq), float(upfreq), + float(dcutoff)) + scale = my_round(scale, val=5) + kbending = my_round(kbending) + maxdist = my_round(maxdist) + lowfreq = my_round(lowfreq) + upfreq = my_round(upfreq) + dcutoff = my_round(dcutoff) + self.results[(scale, kbending, maxdist, lowfreq, upfreq, dcutoff)] = float(result) + if not scale in self.scale_range: + self.scale_range.append(scale) + if not kbending in self.kbending_range: + self.kbending_range.append(kbending) + if not maxdist in self.maxdist_range: + self.maxdist_range.append(maxdist) + if not lowfreq in self.lowfreq_range: + self.lowfreq_range.append(lowfreq) + if not upfreq in self.upfreq_range: + self.upfreq_range.append(upfreq) + if not dcutoff in self.dcutoff_range: + self.dcutoff_range.append(dcutoff) + self.scale_range.sort( key=float) + self.kbending_range.sort(key=float) + self.maxdist_range.sort( key=float) + self.lowfreq_range.sort( key=float) + self.upfreq_range.sort( key=float) + self.dcutoff_range.sort( key=float) + + + +def my_round(num, val=4): + num = round(float(num), val) + return str(int(num) if num == int(num) else num) + + + +# def _mu_correlate(svd, corr, off_diag, scale, kbending, maxdist, lowfreq, upfreq, +# dcutoff, verbose, count): +# tdm = StructuralModels( +# nloci=svd['nloci'], models=svd['models'], +# bad_models=svd['bad_models'], +# resolution=svd['resolution'], +# original_data=svd['original_data'], +# clusters=svd['clusters'], config=svd['config'], +# zscores=svd['zscore']) +# try: +# result = correlate_with_real_data(tdm, +# cutoff=dcutoff, corr=corr, +# off_diag=off_diag)[0] +# if verbose: +# verb = ' %-5s\t%-8s\t%-7s\t%-8s\t%-8s\t%-7s\n' % ( +# scale, kbending, maxdist, lowfreq, upfreq, dcutoff) +# if verbose == 2: +# stderr.write(verb + str(result) + '\n') +# else: +# print verb + str(result) +# except Exception, e: +# print 'ERROR %s' % e +# return result diff --git a/tadphys/modelling/lammps_modelling.py b/tadphys/modelling/lammps_modelling.py new file mode 100644 index 0000000..1e68b59 --- /dev/null +++ b/tadphys/modelling/lammps_modelling.py @@ -0,0 +1,3925 @@ +""" +16 Mar 2019 + + +""" +from string import ascii_uppercase as uc, ascii_lowercase as lc +from os.path import exists +from random import uniform, randint, seed, random, sample, shuffle +from pickle import load, dump +from multiprocessing.dummy import Pool as ThreadPool +from functools import partial +from math import atan2 +from itertools import combinations, product, chain +from shutil import copyfile + +import sys +import copy +import os +import shutil +import multiprocessing + +from numpy import sin, cos, arccos, sqrt, fabs, pi +from scipy import spatial +import numpy as np +from mpi4py import MPI +from lammps import lammps + +from tadphys.modelling import LAMMPS_CONFIG as CONFIG +from tadphys.modelling.lammpsmodel import LAMMPSmodel +from tadphys.modelling.restraints import HiCBasedRestraints + +class InitalConformationError(Exception): + """ + Exception to handle failed initial conformation. + """ + pass + +def abortable_worker(func, *args, **kwargs): + timeout = kwargs.get('timeout', None) + failedSeedLog = kwargs.get('failedSeedLog', None) + p = ThreadPool(1) + res = p.apply_async(func, args=args) + try: + out = res.get(timeout) # Wait timeout seconds for func to complete. + return out + except multiprocessing.TimeoutError: + print("Model took more than %s seconds to complete ... canceling" % str(timeout)) + p.terminate() + raise + except: + print("Unknown error with process") + if failedSeedLog != None: + failedSeedLog, k = failedSeedLog + with open(failedSeedLog, 'a') as f: + f.write('%s\t%s\n' %(k, 'Failed')) + p.terminate() + raise + +def generate_lammps_models(zscores, resolution, nloci, start=1, n_models=5000, + n_keep=1000, close_bins=1, n_cpus=1, + verbose=0, outfile=None, config=None, + values=None, coords=None, zeros=None, + first=None, container=None,tmp_folder=None,timeout_job=10800, + initial_conformation=None, connectivity="FENE", + timesteps_per_k=10000,keep_restart_out_dir=None, + kfactor=1, adaptation_step=False, cleanup=False, + hide_log=True, remove_rstrn=[], initial_seed=0, + restart_path=False, store_n_steps=10, + useColvars=False): + """ + This function generates three-dimensional models starting from Hi-C data. + The final analysis will be performed on the n_keep top models. + + :param zscores: the dictionary of the Z-score values calculated from the + Hi-C pairwise interactions + :param resolution: number of nucleotides per Hi-C bin. This will be the + number of nucleotides in each model's particle + :param nloci: number of particles to model (may not all be present in + zscores) + :param None coords: a dictionary like: + :: + + {'crm' : '19', + 'start': 14637, + 'end' : 15689} + + :param 5000 n_models: number of models to generate + :param 1000 n_keep: number of models used in the final analysis (usually + the top 20% of the generated models). The models are ranked according to + their objective function value (the lower the better) + :param 1 close_bins: number of particles away (i.e. the bin number + difference) a particle pair must be in order to be considered as + neighbors (e.g. 1 means consecutive particles) + :param n_cpus: number of CPUs to use + :param False verbose: if set to True, information about the distance, force + and Z-score between particles will be printed. If verbose is 0.5 than + constraints will be printed only for the first model calculated. + :param None values: the normalized Hi-C data in a list of lists (equivalent + to a square matrix) + :param None config: a dictionary containing the standard + parameters used to generate the models. The dictionary should contain + the keys kforce, lowrdist, maxdist, upfreq and lowfreq. Examples can be + seen by doing: + + :: + + from tadphys.modelling.HIC_CONFIG import HIC_CONFIG + + where CONFIG is a dictionary of dictionaries to be passed to this function: + + :: + + CONFIG = { + 'dmel_01': { + # Paramaters for the Hi-C dataset from: + 'reference' : 'victor corces dataset 2013', + + # Force applied to the restraints inferred to neighbor particles + 'kforce' : 5, + + # Space occupied by a nucleotide (nm) + 'scale' : 0.005 + + # Strength of the bending interaction + 'kbending' : 0.0, # OPTIMIZATION: + + # Maximum experimental contact distance + 'maxdist' : 600, # OPTIMIZATION: 500-1200 + + # Minimum thresholds used to decide which experimental values have to be + # included in the computation of restraints. Z-score values bigger than upfreq + # and less that lowfreq will be include, whereas all the others will be rejected + 'lowfreq' : -0.7 # OPTIMIZATION: min/max Z-score + + # Maximum threshold used to decide which experimental values have to be + # included in the computation of restraints. Z-score values greater than upfreq + # and less than lowfreq will be included, while all the others will be rejected + 'upfreq' : 0.3 # OPTIMIZATION: min/max Z-score + + } + } + :param None first: particle number at which model should start + :param None container: restrains particle to be within a given object. Can + only be a 'cylinder', which is, in fact a cylinder of a given height to + which are added hemispherical ends. This cylinder is defined by a radius, + its height (with a height of 0 the cylinder becomes a sphere) and the + force applied to the restraint. E.g. for modeling E. coli genome (2 + micrometers length and 0.5 micrometer of width), these values could be + used: ['cylinder', 250, 1500, 50], and for a typical mammalian nuclei + (6 micrometers diameter): ['cylinder', 3000, 0, 50] + :param None tmp_folder: path to a temporary file created during + the clustering computation. Default will be created in /tmp/ folder + :param 10800 timeout_job: maximum seconds a job can run in the multiprocessing + of lammps before is killed + :param initial_conformation: lammps input data file with the particles initial conformation. + :param True hide_log: do not generate lammps log information + :param FENE connectivity: use FENE for a fene bond or harmonic for harmonic + potential for neighbours + :param None keep_restart_out_dir: path to write files to restore LAMMPs + session (binary) + :param True cleanup: delete lammps folder after completion + :param [] remove_rstrn: list of particles which must not have restrains + :param 0 initial_seed: Initial random seed for modelling. + :param False restart_path: path to files to restore LAMMPs session (binary) + :param 10 store_n_steps: Integer with number of steps to be saved if + restart_file != False + :param False useColvars: True if you want the restrains to be loaded by colvars + + :returns: a Tadphys models dictionary + """ + + if not tmp_folder: + tmp_folder = '/tmp/tadphys_tmp_%s/' % ( + ''.join([(uc + lc)[int(random() * 52)] for _ in range(4)])) + else: + if tmp_folder[-1] != '/': + tmp_folder += '/' + randk = ''.join([(uc + lc)[int(random() * 52)] for _ in range(4)]) + tmp_folder = '%s%s/' %(tmp_folder, randk) + while os.path.exists(tmp_folder): + randk = ''.join([(uc + lc)[int(random() * 52)] for _ in range(4)]) + tmp_folder = '%s%s/' %(tmp_folder[:-1], randk) + if not os.path.exists(tmp_folder): + os.makedirs(tmp_folder) + + # Setup CONFIG + if isinstance(config, dict): + CONFIG.HiC.update(config) + elif config: + raise Exception('ERROR: "config" must be a dictionary') + + global RADIUS + + #RADIUS = float(resolution * CONFIG['scale']) / 2 + RADIUS = 0.5 + CONFIG.HiC['resolution'] = resolution + CONFIG.HiC['maxdist'] = CONFIG.HiC['maxdist'] / (float(resolution * CONFIG.HiC['scale'])) + + global LOCI + if first is None: + first = min([int(j) for i in zscores[0] for j in zscores[0][i]] + + [int(i) for i in zscores[0]]) + LOCI = list(range(first, nloci + first)) + LOCI = 20000 + + # random inital number + global START + START = start + # verbose + global VERBOSE + VERBOSE = verbose + #VERBOSE = 3 + + HiCRestraints = [HiCBasedRestraints(nloci,RADIUS,CONFIG.HiC,resolution, zs, + chromosomes=coords, close_bins=close_bins,first=first, + remove_rstrn=remove_rstrn) for zs in zscores] + + run_time = 1000 + + colvars = 'colvars.dat' + + steering_pairs = None + time_dependent_steering_pairs = None + if len(HiCRestraints) > 1: + time_dependent_steering_pairs = { + 'colvar_input' : HiCRestraints, + 'colvar_output' : colvars, + 'chrlength' : nloci, + 'binsize' : resolution, + 'timesteps_per_k_change' : [float(timesteps_per_k)]*6, + 'k_factor' : kfactor, + 'perc_enfor_contacts' : 100., + 'colvar_dump_freq' : int(timesteps_per_k/100), + 'adaptation_step' : adaptation_step, + } + if not initial_conformation: + initial_conformation = 'random' + else: + steering_pairs = { + 'colvar_input': HiCRestraints[0], + 'colvar_output': colvars, + 'binsize': resolution, + 'timesteps_per_k' : timesteps_per_k, + 'k_factor' : kfactor, + 'colvar_dump_freq' : int(timesteps_per_k/100), + 'timesteps_relaxation' : int(timesteps_per_k*10) + } + if not initial_conformation: + initial_conformation = 'random' + + if not container: + container = ['cube',1000.0] # http://lammps.sandia.gov/threads/msg48683.html + + ini_sm_model = None + sm_diameter = 1 + if initial_conformation != 'random': + if isinstance(initial_conformation, dict): + sm = [initial_conformation] + sm[0]['x'] = sm[0]['x'][0:nloci] + sm[0]['y'] = sm[0]['y'][0:nloci] + sm[0]['z'] = sm[0]['z'][0:nloci] + sm_diameter = float(resolution * CONFIG.HiC['scale']) + for single_m in sm: + for i in range(len(single_m['x'])): + single_m['x'][i] /= sm_diameter + single_m['y'][i] /= sm_diameter + single_m['z'][i] /= sm_diameter + cm0 = single_m.center_of_mass() + for i in range(len(single_m['x'])): + single_m['x'][i] -= cm0['x'] + single_m['y'][i] -= cm0['y'] + single_m['z'][i] -= cm0['z'] + ini_sm_model = [[single_sm.copy()] for single_sm in sm] + + models, ini_model = lammps_simulate(lammps_folder=tmp_folder, + run_time=run_time, + initial_conformation=ini_sm_model, + connectivity=connectivity, + steering_pairs=steering_pairs, + time_dependent_steering_pairs=time_dependent_steering_pairs, + initial_seed=initial_seed, + n_models=n_keep, + n_keep=n_keep, + n_cpus=n_cpus, + keep_restart_out_dir=keep_restart_out_dir, + confining_environment=container, + timeout_job=timeout_job, + cleanup=cleanup, to_dump=int(timesteps_per_k/100.), + hide_log=hide_log, + chromosome_particle_numbers=chromosome_particle_numbers, + restart_path=restart_path, + store_n_steps=store_n_steps, + useColvars=useColvars) +# for i, m in enumerate(models.values()): +# m['index'] = i + if outfile: + if exists(outfile): + old_models, _ = load(open(outfile)) + else: + old_models, _ = {}, {} + models.update(old_models) + out = open(outfile, 'w') + dump((models), out) + out.close() + else: + stages = {} + trajectories = {} + timepoints = None + if len(HiCRestraints)>1: + timepoints = time_dependent_steering_pairs['colvar_dump_freq'] + nbr_produced_models = len(models)//(timepoints*(len(HiCRestraints)-1)) + stages[0] = [i for i in range(nbr_produced_models)] + + for sm_id, single_m in enumerate(ini_model): + for i in range(len(single_m['x'])): + single_m['x'][i] *= sm_diameter + single_m['y'][i] *= sm_diameter + single_m['z'][i] *= sm_diameter + + lammps_model = LAMMPSmodel({ 'x' : single_m['x'], + 'y' : single_m['y'], + 'z' : single_m['z'], + 'index' : sm_id+1+initial_seed, + 'cluster' : 'Singleton', + 'objfun' : single_m['objfun'] if 'objfun' in single_m else 0, + 'log_objfun' : single_m['log_objfun'] if 'log_objfun' in single_m else [], + 'radius' : float(CONFIG.HiC['resolution'] * \ + CONFIG.HiC['scale'])/2, + 'rand_init' : str(sm_id+1+initial_seed)}) + + models[sm_id] = lammps_model + for timepoint in range((len(HiCRestraints)-1)*timepoints): + stages[timepoint+1] = [(t+nbr_produced_models+timepoint*nbr_produced_models) + for t in range(nbr_produced_models)] + for traj in range(nbr_produced_models): + trajectories[traj] = [stages[t][traj] for t in range(timepoints+1)] + + model_ensemble = { + 'loci': len(LOCI), + 'models': models, + 'resolution': resolution, + 'original_data': values if len(HiCRestraints)>1 else values[0], + 'zscores': zscores, + 'config': CONFIG.HiC, + 'zeros': zeros, + 'restraints': HiCRestraints[0]._get_restraints(), + 'stages': stages, + 'trajectories': trajectories, + 'models_per_step': timepoints + } + + return model_ensemble +# Initialize the lammps simulation with standard polymer physics based +# interactions: chain connectivity (FENE) ; excluded volume (WLC) ; and +# bending rigidity (KP) +def init_lammps_run(lmp, initial_conformation, + neighbor=CONFIG.neighbor, + hide_log=True, + chromosome_particle_numbers=None, + connectivity="FENE", + restart_file=False): + + """ + Initialise the parameters for the computation in lammps job + + :param lmp: lammps instance object. + :param initial_conformation: lammps input data file with the particles initial conformation. + :param CONFIG.neighbor neighbor: see LAMMPS_CONFIG.py. + :param True hide_log: do not generate lammps log information + :param FENE connectivity: use FENE for a fene bond or harmonic for harmonic + potential for neighbours + :param False restart_file: path to file to restore LAMMPs session (binary) + + """ + + if hide_log: + lmp.command("log none") + #os.remove("log.lammps") + + ####################################################### + # Box and units (use LJ units and period boundaries) # + ####################################################### + lmp.command("units %s" % CONFIG.units) + lmp.command("atom_style %s" % CONFIG.atom_style) #with stiffness + lmp.command("boundary %s" % CONFIG.boundary) + """ + try: + lmp.command("communicate multi") + except: + pass + """ + + ########################## + # READ "start" data file # + ########################## + if restart_file == False : + lmp.command("read_data %s" % initial_conformation) + else: + restart_time = int(restart_file.split('/')[-1].split('_')[4][:-8]) + print('Previous unfinished LAMMPS steps found') + print('Loaded %s file' %restart_file) + lmp.command("read_restart %s" % restart_file) + lmp.command("reset_timestep %i" % restart_time) + + lmp.command("mass %s" % CONFIG.mass) + + ################################################################## + # Pair interactions require lists of neighbours to be calculated # + ################################################################## + lmp.command("neighbor %s" % neighbor) + lmp.command("neigh_modify %s" % CONFIG.neigh_modify) + + ############################################################## + # Sample thermodynamic info (temperature, energy, pressure) # + ############################################################## + lmp.command("thermo %i" % CONFIG.thermo) + + ############################### + # Stiffness term # + # E = K * (1+cos(theta)), K>0 # + ############################### + lmp.command("angle_style %s" % CONFIG.angle_style) # Write function for kinks + lmp.command("angle_coeff * %f" % CONFIG.persistence_length) + + ################################################################### + # Pair interaction between non-bonded atoms # + # # + # Lennard-Jones 12-6 potential with cutoff: # + # potential E=4epsilon[ (sigma/r)^12 - (sigma/r)^6] for r timepoints) for ks in kseeds]): + # kseeds.append(rnd) + + #pool = ProcessPool(max_workers=n_cpus, max_tasks=n_cpus) + pool = multiprocessing.Pool(processes=n_cpus, maxtasksperchild=n_cpus) + + results = [] + def collect_result(result): + results.append((result[0], result[1], result[2])) + + initial_models = initial_conformation + if not initial_models: + initial_models = [] + + jobs = {} + for k_id, k in enumerate(kseeds): + k_folder = lammps_folder + 'lammps_' + str(k) + '/' + failedSeedLog = None + # First we check if the modelling fails with this seed + if restart_path != False: + restart_file = restart_path + 'lammps_' + str(k) + '/' + failedSeedLog = restart_file + 'runLog.txt' + if os.path.exists(failedSeedLog): + with open(failedSeedLog, 'r') as f: + for line in f: + prevRun = line.split() + # add number of models done so dont repeat same seed + if prevRun[1] == 'Failed': + k = int(prevRun[0]) + n_models + k_folder = lammps_folder + 'lammps_' + str(k) + '/' + + #print "#RandomSeed: %s" % k + keep_restart_out_dir2 = None + if keep_restart_out_dir != None: + keep_restart_out_dir2 = keep_restart_out_dir + 'lammps_' + str(k) + '/' + if not os.path.exists(keep_restart_out_dir2): + os.makedirs(keep_restart_out_dir2) + model_path = False + if restart_path != False: + # check presence of previously finished jobs + model_path = restart_path + 'lammps_' + str(k) + '/finishedModel_%s.pickle' %k + # define restart file by checking for finished jobs or last step + if model_path != False and os.path.exists(model_path): + with open(model_path, "rb") as input_file: + m = load(input_file) + results.append((m[0], m[1])) + else: + if restart_path != False: + restart_file = restart_path + 'lammps_' + str(k) + '/' + dirfiles = os.listdir(restart_file) + # check for last k and step + maxi = (0, 0, '') + for f in dirfiles: + if f.startswith('restart_kincrease_'): + kincrease = int(f.split('_')[2]) + step = int(f.split('_')[-1][:-8]) + if kincrease > maxi[0]: + maxi = (kincrease, step, f) + elif kincrease == maxi[0] and step > maxi[1]: + maxi = (kincrease, step, f) + # In case there is no restart file at all + if maxi[2] == '': + #print('Could not find a LAMMPS restart file') + # will check later if we have a path or a file + getIniConf = True + #restart_file = False + else: + restart_file = restart_file + maxi[2] + getIniConf = False + else: + restart_file = False + getIniConf = True + + ini_conf = None + if not os.path.exists(k_folder): + os.makedirs(k_folder) + if initial_conformation and getIniConf == True: + ini_conf = '%sinitial_conformation.dat' % k_folder + write_initial_conformation_file(initial_conformation[k_id], + chromosome_particle_numbers, + confining_environment, + out_file=ini_conf) + # jobs[k] = run_lammps(k, k_folder, run_time, + # initial_conformation, connectivity, + # neighbor, + # tethering, minimize, + # compress_with_pbc, compress_without_pbc, + # confining_environment, + # steering_pairs, + # time_dependent_steering_pairs, + # loop_extrusion_dynamics, + # to_dump, pbc,) + # jobs[k] = pool.schedule(run_lammps, + jobs[k] = partial(abortable_worker, run_lammps, timeout=timeout_job, + failedSeedLog=[failedSeedLog, k]) + pool.apply_async(jobs[k], + args=(k, k_folder, run_time, + ini_conf, connectivity, + neighbor, + tethering, minimize, + compress_with_pbc, compress_without_pbc, + initial_relaxation, + confining_environment, + steering_pairs, + time_dependent_steering_pairs, + compartmentalization, + loop_extrusion_dynamics, + to_dump, pbc, hide_log, + chromosome_particle_numbers, + keep_restart_out_dir2, + restart_file, + model_path, + store_n_steps, + useColvars,), callback=collect_result) + # , timeout=timeout_job) + pool.close() + pool.join() + +# for k in kseeds: +# try: +# #results.append((k, jobs[k])) +# results.append((k, jobs[k].result())) +# except TimeoutError: +# print "Model took more than %s seconds to complete ... canceling" % str(timeout_job) +# jobs[k].cancel() +# except Exception as error: +# print "Function raised %s" % error +# jobs[k].cancel() + + models = {} + initial_models = [] + ############ WARNING ############ + # PENDING TO ADD THE STORAGE OF INITIAL MODELS # + if timepoints > 1: + for t in range(timepoints): + time_models = [] + for res in results: + (k,restarr,init_conf) = res + time_models.append(restarr[t]) + for i, m in enumerate(time_models[:n_keep]): + models[i+t*len(time_models[:n_keep])+n_keep] = m + #for i, (_, m) in enumerate( + # sorted(time_models.items(), key=lambda x: x[1]['objfun'])[:n_keep]): + # models[i+t+1] = m + + else: + for i, (_, m, im) in enumerate( + sorted(results, key=lambda x: x[1][0]['objfun'])[:n_keep]): + models[i] = m[0] + if not initial_conformation: + initial_models += [im] + + if cleanup: + for k in kseeds: + k_folder = lammps_folder + '/lammps_' + str(k) + '/' + if os.path.exists(k_folder): + shutil.rmtree(k_folder) + + return models, initial_models + + + +# This performs the dynamics: I should add here: The steered dynamics (Irene and Hi-C based) ; +# The loop extrusion dynamics ; the binders based dynamics (Marenduzzo and Nicodemi)...etc... +def run_lammps(kseed, lammps_folder, run_time, + initial_conformation=None, connectivity="FENE", + neighbor=CONFIG.neighbor, + tethering=False, minimize=True, + compress_with_pbc=None, compress_without_pbc=None, + initial_relaxation=None, + confining_environment=None, + steering_pairs=None, + time_dependent_steering_pairs=None, + compartmentalization=None, + loop_extrusion_dynamics=None, + to_dump=10000, pbc=False, + hide_log=True, + chromosome_particle_numbers=None, + keep_restart_out_dir2=None, + restart_file=False, + model_path=False, + store_n_steps=10, + useColvars=False): + """ + Generates one lammps model + + :param kseed: Random number to identify the model. + :param initial_conformation_folder: folder where to store lammps input + data file with the particles initial conformation. + http://lammps.sandia.gov/doc/2001/data_format.html + :param FENE connectivity: use FENE for a fene bond or harmonic for harmonic + potential for neighbours (see init_lammps_run) + :param run_time: # of timesteps. + :param None initial_conformation: path to initial conformation file or None + for random walk initial start. + :param CONFIG.neighbor neighbor: see LAMMPS_CONFIG.py. + :param False tethering: whether to apply tethering command or not. + :param True minimize: whether to apply minimize command or not. + :param None compress_with_pbc: whether to apply the compression dynamics in case of a + system with cubic confinement and pbc. This compression step is usually apply + to obtain a system with the desired particle density. The input have to be a list + of three elements: + 0 - XXX; + 1 - XXX; + 2 - The compression simulation time span (in timesteps). + e.g. compress_with_pbc=[0.01, 0.01, 100000] + :param None compress_without_pbc: whether to apply the compression dynamics in case of a + system with spherical confinement. This compression step is usually apply to obtain a + system with the desired particle density. The simulation shrinks/expands the initial + sphere to a sphere of the desired radius using many short runs. In each short run the + radius is reduced by 0.1 box units. The input have to be a list of three elements: + 0 - Initial radius; + 1 - Final desired radius; + 2 - The time span (in timesteps) of each short compression run. + e.g. compress_without_pbc=[300, 100, 100] + :param None steering_pairs: particles contacts file from colvars fix + http://lammps.sandia.gov/doc/PDF/colvars-refman-lammps.pdf. + steering_pairs = { 'colvar_input' : "ENST00000540866.2chr7_clean_enMatch.txt", + 'colvar_output' : "colvar_list.txt", + 'kappa_vs_genomic_distance' : "kappa_vs_genomic_distance.txt", + 'chrlength' : 3182, + 'copies' : ['A'], + 'binsize' : 50000, + 'number_of_kincrease' : 1000, + 'timesteps_per_k' : 1000, + 'timesteps_relaxation' : 100000, + 'perc_enfor_contacts' : 10 + } + + :param None loop_extrusion_dynamics: dictionary with all the info to perform loop + extrusion dynamics. + loop_extrusion_dynamics = { 'target_loops_input' : "target_loops.txt", + 'loop_extrusion_steps_output' : "loop_extrusion_steps.txt", + 'attraction_strength' : 4.0, + 'equilibrium_distance' : 1.0, + 'chrlength' : 3182, + 'copies' : ['A'], + 'timesteps_per_loop_extrusion_step' : 1000, + 'timesteps_relaxation' : 100000, + 'perc_enfor_loops' : 10 + } + + Should at least contain Chromosome, loci1, loci2 as 1st, 2nd and 3rd column + :param None keep_restart_out_dir2: path to write files to restore LAMMPs + session (binary) + :param False restart_file: path to file to restore LAMMPs session (binary) + :param False model_path: path to/for pickle with finished model (name included) + :param 10 store_n_steps: Integer with number of steps to be saved if + restart_file != False + :param False useColvars: True if you want the restrains to be loaded by colvars + :returns: a LAMMPSModel object + + """ + + + lmp = lammps(cmdargs=['-screen','none','-log',lammps_folder+'log.lammps','-nocite']) + me = MPI.COMM_WORLD.Get_rank() + nprocs = MPI.COMM_WORLD.Get_size() + # check if we have a restart file or a path to which restart + if restart_file == False: + doRestart = False + saveRestart = False + elif os.path.isdir(restart_file): + doRestart = False + saveRestart = True + else: + doRestart = True + saveRestart = True + if not initial_conformation and doRestart == False: + initial_conformation = lammps_folder+'initial_conformation.dat' + generate_chromosome_random_walks_conformation ([len(LOCI)], + outfile=initial_conformation, + seed_of_the_random_number_generator=kseed, + confining_environment=confining_environment) + + # Just prepared the steps recovery for steering pairs + if steering_pairs and doRestart == True: + init_lammps_run(lmp, initial_conformation, + neighbor=neighbor, + hide_log=hide_log, + connectivity=connectivity, + restart_file=restart_file) + else: + init_lammps_run(lmp, initial_conformation, + neighbor=neighbor, + hide_log=hide_log, + chromosome_particle_numbers=chromosome_particle_numbers, + connectivity=connectivity) + + lmp.command("dump 1 all custom %i %slangevin_dynamics_*.XYZ id xu yu zu" % (to_dump,lammps_folder)) + #lmp.command("dump_modify 1 format line \"%d %.5f %.5f %.5f\" sort id append yes") + + # ########################################################## + # # Generate RESTART file, SPECIAL format, not a .txt file # + # # Useful if simulation crashes + # Prepared an optimisation for steering pairs, but not for the rest# + # ########################################################## + # create lammps restart files every x steps. 1000 is ok + # There was the doubt of using text format session info (which allows use in other computers) + # but since the binary can be converted later and this: "Because a data file is in text format, + # if you use a data file written out by this command to restart a simulation, the initial state + # of the new run will be slightly different than the final state of the old run (when the file + # was written) which was represented internally by LAMMPS in binary format. A new simulation + # which reads the data file will thus typically diverge from a simulation that continued + # in the original input script." will continue with binary. To convert use restart2data + #if keep_restart_out_dir2: + # lmp.command("restart %i %s/relaxation_%i_*.restart" % (keep_restart_step, keep_restart_out_dir2, kseed)) + + + ####################################################### + # Set up fixes # + # use NVE ensemble # + # Langevin integrator Tstart Tstop 1/friction rndseed # + # => sampling NVT ensamble # + ####################################################### + # Define the langevin dynamics integrator + lmp.command("fix 1 all nve") + lmp.command("fix 2 all langevin 1.0 1.0 2.0 %i" % kseed) + + # Define the tethering to the center of the confining environment + if tethering: + lmp.command("fix 3 all spring tether 50.0 0.0 0.0 0.0 0.0") + + # Do a minimization step to prevent particles + # clashes in the initial conformation + if minimize: + + if to_dump: + lmp.command("undump 1") + lmp.command("dump 1 all custom %i %sminimization_*.XYZ id xu yu zu" % (to_dump,lammps_folder)) + #lmp.command("dump_modify 1 format line \"%d %.5f %.5f %.5f\" sort id append yes") + + print("Performing minimization run...") + lmp.command("minimize 1.0e-4 1.0e-6 100000 100000") + + if to_dump: + lmp.command("undump 1") + lmp.command("dump 1 all custom %i %slangevin_dynamics_*.XYZ id xu yu zu" % (to_dump,lammps_folder)) + #lmp.command("dump_modify 1 format line \"%d %.5f %.5f %.5f\" sort id append yes") + + if compress_with_pbc: + if to_dump: + lmp.command("undump 1") + lmp.command("dump 1 all custom %i %scompress_with_pbc_*.XYZ id xu yu zu" % (to_dump,lammps_folder)) + #lmp.command("dump_modify 1 format line \"%d %.5f %.5f %.5f\" sort id append yes") + + # Re-setting the initial timestep to 0 + lmp.command("reset_timestep 0") + + lmp.command("unfix 1") + lmp.command("unfix 2") + + # default as in PLoS Comp Biol Di Stefano et al. 2013 compress_with_pbc = [0.01, 0.01, 100000] + lmp.command("fix 1 all nph iso %s %s 2.0" % (compress_with_pbc[0], + compress_with_pbc[1])) + lmp.command("fix 2 all langevin 1.0 1.0 2.0 %i" % kseed) + print("run %i" % compress_with_pbc[2]) + lmp.command("run %i" % compress_with_pbc[2]) + + lmp.command("unfix 1") + lmp.command("unfix 2") + + lmp.command("fix 1 all nve") + lmp.command("fix 2 all langevin 1.0 1.0 2.0 %i" % kseed) + + # Here We have to re-define the confining environment + print("# Previous particle density (nparticles/volume)", lmp.get_natoms()/(confining_environment[1]**3)) + confining_environment[1] = lmp.extract_global("boxxhi",1) - lmp.extract_global("boxxlo",1) + print("") + print("# New cubic box dimensions after isotropic compression") + print(lmp.extract_global("boxxlo",1), lmp.extract_global("boxxhi",1)) + print(lmp.extract_global("boxylo",1), lmp.extract_global("boxyhi",1)) + print(lmp.extract_global("boxzlo",1), lmp.extract_global("boxzhi",1)) + print("# New confining environment", confining_environment) + print("# New particle density (nparticles/volume)", lmp.get_natoms()/(confining_environment[1]**3)) + print("") + + if to_dump: + lmp.command("undump 1") + lmp.command("dump 1 all custom %i %slangevin_dynamics_*.XYZ id xu yu zu" % (to_dump,lammps_folder)) + #lmp.command("dump_modify 1 format line \"%d %.5f %.5f %.5f\" sort id append yes") + + if compress_without_pbc: + if to_dump: + lmp.command("undump 1") + lmp.command("dump 1 all custom %i %scompress_without_pbc_*.XYZ id xu yu zu" % (to_dump,lammps_folder)) + #lmp.command("dump_modify 1 format line \"%d %.5f %.5f %.5f\" sort id append yes") + + # Re-setting the initial timestep to 0 + lmp.command("reset_timestep 0") + + # default as in Sci Rep Di Stefano et al. 2016 + # compress_without_pbc = [initial_radius, final_radius, timesteps_per_minirun] + # = [350, 161.74, 100] + radius = compress_without_pbc[0] + while radius > compress_without_pbc[1]: + + print("New radius %f" % radius) + if radius != compress_without_pbc[0]: + lmp.command("region sphere delete") + + lmp.command("region sphere sphere 0.0 0.0 0.0 %f units box side in" % radius) + + # Performing the simulation + lmp.command("fix 5 all wall/region sphere lj126 1.0 1.0 1.12246152962189") + lmp.command("run %i" % compress_without_pbc[2]) + + radius -= 0.1 + + # Here we have to re-define the confining environment + volume = 4.*np.pi/3.0*(compress_without_pbc[0]**3) + print("# Previous particle density (nparticles/volume)", lmp.get_natoms()/volume) + confining_environment[1] = compress_without_pbc[1] + print("") + volume = 4.*np.pi/3.0*(compress_without_pbc[1]**3) + print("# New particle density (nparticles/volume)", lmp.get_natoms()/volume) + print("") + + if initial_relaxation: + + if to_dump: + lmp.command("undump 1") + lmp.command("dump 1 all custom %i %sinitial_relaxation.XYZ id xu yu zu" % (to_dump,lammps_folder)) + lmp.command("dump_modify 1 format line \"%d %.5f %.5f %.5f\" sort id append yes") + if "MSD" in initial_relaxation: + lmp.command("compute MSD all msd") + lmp.command("variable MSD equal c_MSD[4]") + lmp.command("variable dx equal c_MSD[1]") + lmp.command("variable dy equal c_MSD[2]") + lmp.command("variable dz equal c_MSD[3]") + lmp.command("variable step equal step") + lmp.command("fix MSD all print %i \"${step} ${dx} ${dy} ${dz} ${MSD}\" file MSD.txt" % (initial_relaxation["MSD"])) + + if "Distances" in initial_relaxation: + #lmp.command("compute xu all property/atom xu") + #lmp.command("compute yu all property/atom yu") + #lmp.command("compute zu all property/atom zu") + #pair_number = 0 + #for particle1 in range(1,chrlength[0]): + # for particle2 in range(particle1,chrlength[0]): + # pair_number += 1 + # lmp.command("variable x%i equal c_xu[%i]" % (particle1, particle1)) + # lmp.command("variable x%i equal c_xu[%i]" % (particle2, particle2)) + # lmp.command("variable y%i equal c_yu[%i]" % (particle1, particle1)) + # lmp.command("variable y%i equal c_yu[%i]" % (particle2, particle2)) + # lmp.command("variable z%i equal c_zu[%i]" % (particle1, particle1)) + # lmp.command("variable z%i equal c_zu[%i]" % (particle2, particle2)) + + # lmp.command("variable xLE%i equal v_x%i-v_x%i" % (pair_number, particle1, particle2)) + # lmp.command("variable yLE%i equal v_y%i-v_y%i" % (pair_number, particle1, particle2)) + # lmp.command("variable zLE%i equal v_z%i-v_z%i" % (pair_number, particle1, particle2)) + # lmp.command("variable dist_%i_%i equal sqrt(v_xLE%i*v_xLE%i+v_yLE%i*v_yLE%i+v_zLE%i*v_zLE%i)" % (particle1, particle2, pair_number, pair_number, pair_number, pair_number, pair_number, pair_number)) + + lmp.command("compute pairs all property/local patom1 patom2") + lmp.command("compute distances all pair/local dist") + #lmp.command("variable distances equal c_distances[1]") + lmp.command("dump distances all local %i distances.txt c_pairs[1] c_pairs[2] c_distances" % (initial_relaxation["Distances"])) + #lmp.command("fix distances all print %i \"${step} ${distances}\" file distances.txt" % (initial_relaxation["Distances"])) + + lmp.command("reset_timestep 0") + lmp.command("run %i" % initial_relaxation["relaxation_time"]) + lmp.command("write_data relaxed_conformation.txt nocoeff") + if "MSD" in initial_relaxation: + lmp.command("uncompute MSD") + if "distances" in initial_relaxation: + lmp.command("uncompute distances") + lmp.command("undum distances") + + timepoints = 1 + xc = [] + # Setup the pairs to co-localize using the COLVARS plug-in + if steering_pairs: + + if doRestart == False: + # Start relaxation step + lmp.command("reset_timestep 0") # cambiar para punto ionicial + lmp.command("run %i" % steering_pairs['timesteps_relaxation']) + lmp.command("reset_timestep %i" % 0) + + # Start Steered Langevin dynamics + if to_dump: + lmp.command("undump 1") + lmp.command("dump 1 all custom %i %ssteered_MD_*.XYZ id xu yu zu" % (to_dump,lammps_folder)) + #lmp.command("dump_modify 1 format line \"%d %.5f %.5f %.5f\" sort id") + + if 'number_of_kincrease' in steering_pairs: + nbr_kincr = steering_pairs['number_of_kincrease'] + else: + nbr_kincr = 1 + + if doRestart == True: + restart_k_increase = int(restart_file.split('/')[-1].split('_')[2]) + restart_time = int(restart_file.split('/')[-1].split('_')[4][:-8]) + + #steering_pairs['colvar_output'] = os.path.dirname(os.path.abspath(steering_pairs['colvar_output'])) + '/' + str(kseed) + '_'+ os.path.basename(steering_pairs['colvar_output']) + steering_pairs['colvar_output'] = lammps_folder+os.path.basename(steering_pairs['colvar_output']) + for kincrease in range(nbr_kincr): + # Write the file containing the pairs to constraint + # steering_pairs should be a dictionary with: + # Avoid to repeat calculations in case of restart + if (doRestart == True) and (kincrease < restart_k_increase): + continue + + if useColvars == True: + + generate_colvars_list(steering_pairs, kincrease+1) + + # Adding the colvar option + #print "fix 4 all colvars %s output %s" % (steering_pairs['colvar_output'],lammps_folder) + lmp.command("fix 4 all colvars %s output %sout" % (steering_pairs['colvar_output'],lammps_folder)) + + if to_dump: + # lmp.command("thermo_style custom step temp epair emol") + lmp.command("thermo_style custom step temp epair emol pe ke etotal f_4") + lmp.command("thermo_modify norm no flush yes") + lmp.command("variable step equal step") + lmp.command("variable objfun equal f_4") + lmp.command('''fix 5 all print %s "${step} ${objfun}" file "%sobj_fun_from_time_point_%s_to_time_point_%s.txt" screen "no" title "#Timestep Objective_Function"''' % (steering_pairs['colvar_dump_freq'],lammps_folder,str(0), str(1))) + + # will load the bonds directly into LAMMPS + else: + bond_list = generate_bond_list(steering_pairs) + for bond in bond_list: + lmp.command(bond) + + if to_dump: + lmp.command("thermo_style custom step temp etotal") + lmp.command("thermo_modify norm no flush yes") + lmp.command("variable step equal step") + lmp.command("variable objfun equal etotal") + lmp.command('''fix 5 all print %s "${step} ${objfun}" file "%sobj_fun_from_time_point_%s_to_time_point_%s.txt" screen "no" title "#Timestep Objective_Function"''' % (steering_pairs['colvar_dump_freq'],lammps_folder,str(0), str(1))) + #lmp.command("reset_timestep %i" % 0) + resettime = 0 + runtime = steering_pairs['timesteps_per_k'] + if (doRestart == True) and (kincrease == restart_k_increase): + resettime = restart_time + runtime = steering_pairs['timesteps_per_k'] - restart_time + + # Create 10 restarts with name restart_kincrease_%s_time_%s.restart every + if saveRestart == True: + if os.path.isdir(restart_file): + restart_file_new = restart_file + 'restart_kincrease_%s_time_*.restart' %(kincrease) + else: + restart_file_new = '/'.join(restart_file.split('/')[:-1]) + '/restart_kincrease_%s_time_*.restart' %(kincrease) + #print(restart_file_new) + lmp.command("restart %i %s" %(int(steering_pairs['timesteps_per_k']/store_n_steps), restart_file_new)) + + #lmp.command("reset_timestep %i" % resettime) + lmp.command("run %i" % runtime) + + # Setup the pairs to co-localize using the COLVARS plug-in + if time_dependent_steering_pairs: + timepoints = time_dependent_steering_pairs['colvar_dump_freq'] + + #if exists("objective_function_profile.txt"): + # os.remove("objective_function_profile.txt") + + #print "# Getting the time dependent steering pairs!" + time_dependent_restraints = get_time_dependent_colvars_list(time_dependent_steering_pairs) + time_points = sorted(time_dependent_restraints.keys()) + print("#Time_points",time_points) + sys.stdout.flush() + + time_dependent_steering_pairs['colvar_output'] = lammps_folder+os.path.basename(time_dependent_steering_pairs['colvar_output']) + # Performing the adaptation step from initial conformation to Tadphys excluded volume + if time_dependent_steering_pairs['adaptation_step']: + restraints = {} + for time_point in time_points[0:1]: + lmp.command("reset_timestep %i" % 0) + # Change to_dump with some way to load the conformations you want to store + # This Adaptation could be discarded in the trajectory files. + if to_dump: + lmp.command("undump 1") + lmp.command("dump 1 all custom %i %sadapting_MD_from_initial_conformation_to_Tadphys_at_time_point_%s.XYZ id xu yu zu" % (to_dump, lammps_folder, time_point)) + lmp.command("dump_modify 1 format line \"%d %.5f %.5f %.5f\" sort id append yes") + + restraints[time_point] = {} + print("# Step %s - %s" % (time_point, time_point)) + sys.stdout.flush() + for pair in list(time_dependent_restraints[time_point].keys()): + # Strategy changing gradually the spring constant and the equilibrium distance + # Case 1: The restraint is present at time point 0 and time point 1: + if pair in time_dependent_restraints[time_point]: + # Case A: The restrainttype is the same at time point 0 and time point 1 -> + # The spring force changes, and the equilibrium distance is the one at time_point+1 + restraints[time_point][pair] = [ + # Restraint type + [time_dependent_restraints[time_point][pair][0]], + # Initial spring constant + [time_dependent_restraints[time_point][pair][1]*time_dependent_steering_pairs['k_factor']], + # Final spring constant + [time_dependent_restraints[time_point][pair][1]*time_dependent_steering_pairs['k_factor']], + # Initial equilibrium distance + [time_dependent_restraints[time_point][pair][2]], + # Final equilibrium distance + [time_dependent_restraints[time_point][pair][2]], + # Number of timesteps for the gradual change + [int(time_dependent_steering_pairs['timesteps_per_k_change'][time_point]*0.1)]] + if useColvars == True: + generate_time_dependent_colvars_list(restraints[time_point], time_dependent_steering_pairs['colvar_output'], time_dependent_steering_pairs['colvar_dump_freq']) + copyfile(time_dependent_steering_pairs['colvar_output'], + "colvar_list_from_time_point_%s_to_time_point_%s.txt" % + (str(time_point), str(time_point))) + lmp.command("velocity all create 1.0 %s" % kseed) + # Adding the colvar option and perfoming the steering + if time_point != time_points[0]: + lmp.command("unfix 4") + print("#fix 4 all colvars %s" % time_dependent_steering_pairs['colvar_output']) + sys.stdout.flush() + lmp.command("fix 4 all colvars %s tstat 2 output %sout" % (time_dependent_steering_pairs['colvar_output'],lammps_folder)) + else: + bond_list = generate_time_dependent_bond_list(restraints[time_point]) + for bond in bond_list: + lmp.command(bond) + + lmp.command("run %i" % int(time_dependent_steering_pairs['timesteps_per_k_change'][time_point]*0.1)) + + # Time dependent steering + restraints = {} + #for i in xrange(time_points[0],time_points[-1]): + for time_point in time_points[0:-1]: + lmp.command("reset_timestep %i" % 0) + # Change to_dump with some way to load the conformations you want to store + if to_dump: + lmp.command("undump 1") + lmp.command("dump 1 all custom %i %ssteered_MD_from_time_point_%s_to_time_point_%s.XYZ id xu yu zu" % (to_dump, lammps_folder, time_point, time_point+1)) + lmp.command("dump_modify 1 format line \"%d %.5f %.5f %.5f\" sort id append yes") + + restraints[time_point] = {} + print("# Step %s - %s" % (time_point, time_point+1)) + sys.stdout.flush() + # Compute the current distance between any two particles + xc_tmp = np.array(lmp.gather_atoms("x",1,3)) + current_distances = compute_particles_distance(xc_tmp) + + for pair in set(list(time_dependent_restraints[time_point].keys())+list(time_dependent_restraints[time_point+1].keys())): + r = 0 + + # Strategy changing gradually the spring constant + # Case 1: The restraint is present at time point 0 and time point 1: + if pair in time_dependent_restraints[time_point] and pair in time_dependent_restraints[time_point+1]: + # Case A: The restrainttype is the same at time point 0 and time point 1 -> + # The spring force changes, and the equilibrium distance is the one at time_point+1 + if time_dependent_restraints[time_point][pair][0] == time_dependent_restraints[time_point+1][pair][0]: + r += 1 + restraints[time_point][pair] = [ + # Restraint type + [time_dependent_restraints[time_point+1][pair][0]], + # Initial spring constant + [time_dependent_restraints[time_point][pair][1] *time_dependent_steering_pairs['k_factor']], + # Final spring constant + [time_dependent_restraints[time_point+1][pair][1]*time_dependent_steering_pairs['k_factor']], + # Initial equilibrium distance + [time_dependent_restraints[time_point][pair][2]], + # Final equilibrium distance + [time_dependent_restraints[time_point+1][pair][2]], + # Number of timesteps for the gradual change + [int(time_dependent_steering_pairs['timesteps_per_k_change'][time_point])]] + # Case B: The restrainttype is different between time point 0 and time point 1 + if time_dependent_restraints[time_point][pair][0] != time_dependent_restraints[time_point+1][pair][0]: + # Case a: The restrainttype is "Harmonic" at time point 0 + # and "LowerBoundHarmonic" at time point 1 + if time_dependent_restraints[time_point][pair][0] == "Harmonic": + r += 1 + restraints[time_point][pair] = [ + # Restraint type + [time_dependent_restraints[time_point][pair][0], time_dependent_restraints[time_point+1][pair][0]], + # Initial spring constant + [time_dependent_restraints[time_point][pair][1]*time_dependent_steering_pairs['k_factor'], 0.0], + # Final spring constant + [0.0, time_dependent_restraints[time_point+1][pair][1]*time_dependent_steering_pairs['k_factor']], + # Initial equilibrium distance + [time_dependent_restraints[time_point][pair][2], time_dependent_restraints[time_point][pair][2]], + # Final equilibrium distance + [time_dependent_restraints[time_point+1][pair][2], time_dependent_restraints[time_point+1][pair][2]], + # Number of timesteps for the gradual change + #[int(time_dependent_steering_pairs['timesteps_per_k_change']), int(time_dependent_steering_pairs['timesteps_per_k_change'])]] + [int(time_dependent_steering_pairs['timesteps_per_k_change'][time_point]), int(time_dependent_steering_pairs['timesteps_per_k_change'][time_point])]] + # Case b: The restrainttype is "LowerBoundHarmonic" at time point 0 + # and "Harmonic" at time point 1 + if time_dependent_restraints[time_point][pair][0] == "HarmonicLowerBound": + r += 1 + restraints[time_point][pair] = [ + # Restraint type + [time_dependent_restraints[time_point][pair][0], time_dependent_restraints[time_point+1][pair][0]], + # Initial spring constant + [time_dependent_restraints[time_point][pair][1]*time_dependent_steering_pairs['k_factor'], 0.0], + # Final spring constant + [0.0, time_dependent_restraints[time_point+1][pair][1]*time_dependent_steering_pairs['k_factor']], + # Initial equilibrium distance + [time_dependent_restraints[time_point][pair][2], time_dependent_restraints[time_point][pair][2]], + # Final equilibrium distance + [time_dependent_restraints[time_point+1][pair][2], time_dependent_restraints[time_point+1][pair][2]], + # Number of timesteps for the gradual change + #[int(time_dependent_steering_pairs['timesteps_per_k_change']), int(time_dependent_steering_pairs['timesteps_per_k_change'])]] + [int(time_dependent_steering_pairs['timesteps_per_k_change'][time_point]), int(time_dependent_steering_pairs['timesteps_per_k_change'][time_point])]] + + # Case 2: The restraint is not present at time point 0, but it is at time point 1: + elif pair not in time_dependent_restraints[time_point] and pair in time_dependent_restraints[time_point+1]: + # List content: restraint_type,kforce,distance + r += 1 + restraints[time_point][pair] = [ + # Restraint type -> Is the one at time point time_point+1 + [time_dependent_restraints[time_point+1][pair][0]], + # Initial spring constant + [0.0], + # Final spring constant + [time_dependent_restraints[time_point+1][pair][1]*time_dependent_steering_pairs['k_factor']], + # Initial equilibrium distance + [time_dependent_restraints[time_point+1][pair][2]], + # Final equilibrium distance + [time_dependent_restraints[time_point+1][pair][2]], + # Number of timesteps for the gradual change + [int(time_dependent_steering_pairs['timesteps_per_k_change'][time_point])]] + + # Case 3: The restraint is present at time point 0, but it is not at time point 1: + elif pair in time_dependent_restraints[time_point] and pair not in time_dependent_restraints[time_point+1]: + # List content: restraint_type,kforce,distance + r += 1 + restraints[time_point][pair] = [ + # Restraint type -> Is the one at time point time_point + [time_dependent_restraints[time_point][pair][0]], + # Initial spring constant + [time_dependent_restraints[time_point][pair][1]*time_dependent_steering_pairs['k_factor']], + # Final spring constant + [0.0], + # Initial equilibrium distance + [time_dependent_restraints[time_point][pair][2]], + # Final equilibrium distance + [time_dependent_restraints[time_point][pair][2]], + # Number of timesteps for the gradual change + [int(time_dependent_steering_pairs['timesteps_per_k_change'][time_point])]] + + #current_distances[pair], + else: + print("#ERROR None of the previous conditions is matched!") + if pair in time_dependent_restraints[time_point]: + print("# Pair %s at timepoint %s %s " % (pair, time_point, time_dependent_restraints[time_point][pair])) + if pair in time_dependent_restraints[time_point+1]: + print("# Pair %s at timepoint %s %s " % (pair, time_point+1, time_dependent_restraints[time_point+1][pair])) + continue + + if r > 1: + print("#ERROR Two of the previous conditions are matched!") + + #if pair in time_dependent_restraints[time_point]: + # print "# Pair %s at timepoint %s %s " % (pair, time_point, time_dependent_restraints[time_point][pair]) + #else: + # print "# Pair %s at timepoint %s None" % (pair, time_point) + + #if pair in time_dependent_restraints[time_point+1]: + # print "# Pair %s at timepoint %s %s " % (pair, time_point+1, time_dependent_restraints[time_point+1][pair]) + #else: + # print "# Pair %s at timepoint %s None" % (pair, time_point+1) + #print restraints[pair] + #print "" + lmp.command("velocity all create 1.0 %s" % kseed) + if useColvars == True: + generate_time_dependent_colvars_list(restraints[time_point], time_dependent_steering_pairs['colvar_output'], time_dependent_steering_pairs['colvar_dump_freq']) + copyfile(time_dependent_steering_pairs['colvar_output'], + "%scolvar_list_from_time_point_%s_to_time_point_%s.txt" % + (lammps_folder, str(time_point), str(time_point+1))) + # Adding the colvar option and perfoming the steering + if time_point != time_points[0]: + lmp.command("unfix 4") + print("#fix 4 all colvars %s" % time_dependent_steering_pairs['colvar_output']) + sys.stdout.flush() + lmp.command("fix 4 all colvars %s tstat 2 output %sout" % (time_dependent_steering_pairs['colvar_output'],lammps_folder)) + if to_dump: + lmp.command("thermo_style custom step temp epair emol pe ke etotal f_4") + lmp.command("thermo_modify norm no flush yes") + lmp.command("variable step equal step") + lmp.command("variable objfun equal f_4") + lmp.command('''fix 5 all print %s "${step} ${objfun}" file "%sobj_fun_from_time_point_%s_to_time_point_%s.txt" screen "no" title "#Timestep Objective_Function"''' % (time_dependent_steering_pairs['colvar_dump_freq'],lammps_folder,str(time_point), str(time_point+1))) + else: + bond_list = generate_time_dependent_bond_list(restraints[time_point]) + for bond in bond_list: + lmp.command(bond) + if to_dump: + lmp.command("thermo_style custom step temp epair emol pe ke etotal") + lmp.command("thermo_modify norm no flush yes") + lmp.command("variable step equal step") + lmp.command("variable objfun equal etotal") + lmp.command('''fix 5 all print %s "${step} ${objfun}" file "%sobj_fun_from_time_point_%s_to_time_point_%s.txt" screen "no" title "#Timestep Objective_Function"''' % (time_dependent_steering_pairs['colvar_dump_freq'],lammps_folder,str(time_point), str(time_point+1))) + + lmp.command("run %i" % int(time_dependent_steering_pairs['timesteps_per_k_change'][time_point])) + + if time_point > 0: + + if exists("%sout.colvars.traj.BAK" % lammps_folder): + + copyfile("%sout.colvars.traj.BAK" % lammps_folder, "%srestrained_pairs_equilibrium_distance_vs_timestep_from_time_point_%s_to_time_point_%s.txt" % (lammps_folder, str(time_point-1), str(time_point))) + + os.remove("%sout.colvars.traj.BAK" % lammps_folder) + + # Set interactions for chromosome compartmentalization + if compartmentalization: + if to_dump: + lmp.command("undump 1") + lmp.command("dump 1 all custom %i %scompartmentalization_*.XYZ id xu yu zu" % (to_dump,lammps_folder)) + + # First we have to partition the genome in the defined compartments + for group in compartmentalization['partition']: + list_of_particles = get_list(compartmentalization['partition'][group]) + for atom in list_of_particles: + #print("set atom %s type %s" % (atom,group+1)) + lmp.command("set atom %s type %s" % (atom,group+1)) + + # Second we have to define the type of interactions + for pair in compartmentalization['interactions']: + #pair_coeff t1 t2 epsilon sigma rc + t1 = pair[0]+1 + t2 = pair[1]+1 + if t1 > t2: + t1 = pair[1]+1 + t2 = pair[0]+1 + + epsilon = compartmentalization['interactions'][pair][1] + + try: + sigma1 = compartmentalization['radii'][pair[0]] + except: + sigma1 = 0.5 + try: + sigma2 = compartmentalization['radii'][pair[1]] + except: + sigma2 = 0.5 + sigma = sigma1 + sigma2 + + if compartmentalization['interactions'][pair][0] == "attraction": + rc = sigma * 2.5 + if compartmentalization['interactions'][pair][0] == "repulsion": + rc = sigma * 1.12246152962189 + + print("pair_coeff %s %s lj/cut %s %s %s" % (t1,t2,epsilon,sigma,rc)) + lmp.command("pair_coeff %s %s lj/cut %s %s %s" % (t1,t2,epsilon,sigma,rc)) + try: + lmp.command("run %s" % (compartmentalization['runtime'])) + except: + pass + + + # Setup the pairs to co-localize using the COLVARS plug-in + if loop_extrusion_dynamics: + + # Start relaxation step + try: + lmp.command("reset_timestep 0") + lmp.command("run %i" % loop_extrusion_dynamics['timesteps_relaxation']) + except: + pass + + lmp.command("reset_timestep 0") + # Start Loop extrusion dynamics + if to_dump: + lmp.command("undump 1") + lmp.command("dump 1 all custom %i %sloop_extrusion_MD_*.XYZ id xu yu zu" % (to_dump,lammps_folder)) + lmp.command("dump_modify 1 format line \"%d %.5f %.5f %.5f\" sort id append no") + + # Randomly extract starting point of the extrusion dynamics between start and stop + extruders_positions = [] + nextruders = int(chromosome_particle_numbers[0]/loop_extrusion_dynamics['separation']) + print(nextruders) + for extruder in range(nextruders): + try: + occupied_positions = list(chain(*tmp_extruders_positions)) + except: + occupied_positions = [] + new_positions = draw_loop_extrusion_starting_point(loop_extrusion_dynamics['chrlength'][0]) + while (new_positions[0] in occupied_positions) or (new_positions[1] in occupied_positions) or (new_positions[0] in loop_extrusion_dynamics['barriers']) or (new_positions[1] in loop_extrusion_dynamics['barriers']): + new_positions = draw_loop_extrusion_starting_point(loop_extrusion_dynamics['chrlength'][0]) + extruders_positions.append(new_positions) + tmp_extruders_positions = [extruders_positions[x] for x in range(len(extruders_positions))] + print("Initial extruders' positions",extruders_positions) + + + # Initialise the lifetime of each extruder + extruders_lifetimes = [] + for extruder in range(nextruders): + extruders_lifetimes.append(int(0)) + print(extruders_positions, extruders_lifetimes) + + lmp.command("compute xu all property/atom xu") + lmp.command("compute yu all property/atom yu") + lmp.command("compute zu all property/atom zu") + + # Loop extrusion steps + for LES in range(int(run_time/loop_extrusion_dynamics['extrusion_time'])): + thermo_style="thermo_style custom step temp epair emol" + + # Update the bond restraint + loop_number = 1 + for particle1,particle2 in extruders_positions: + print("# fix LE%i all restrain bond %i %i %f %f %f" % (loop_number, + particle1, + particle2, + loop_extrusion_dynamics['attraction_strength'], + loop_extrusion_dynamics['attraction_strength'], + loop_extrusion_dynamics['equilibrium_distance'])) + + lmp.command("fix LE%i all restrain bond %i %i %f %f %f" % (loop_number, + particle1, + particle2, + loop_extrusion_dynamics['attraction_strength'], + loop_extrusion_dynamics['attraction_strength'], + loop_extrusion_dynamics['equilibrium_distance'])) + lmp.command("variable x%i equal c_xu[%i]" % (particle1, particle1)) + lmp.command("variable x%i equal c_xu[%i]" % (particle2, particle2)) + lmp.command("variable y%i equal c_yu[%i]" % (particle1, particle1)) + lmp.command("variable y%i equal c_yu[%i]" % (particle2, particle2)) + lmp.command("variable z%i equal c_zu[%i]" % (particle1, particle1)) + lmp.command("variable z%i equal c_zu[%i]" % (particle2, particle2)) + + lmp.command("variable xLE%i equal v_x%i-v_x%i" % (loop_number, particle1, particle2)) + lmp.command("variable yLE%i equal v_y%i-v_y%i" % (loop_number, particle1, particle2)) + lmp.command("variable zLE%i equal v_z%i-v_z%i" % (loop_number, particle1, particle2)) + lmp.command("variable dist_%i_%i equal sqrt(v_xLE%i*v_xLE%i+v_yLE%i*v_yLE%i+v_zLE%i*v_zLE%i)" % (particle1, + particle2, + loop_number, + loop_number, + loop_number, + loop_number, + loop_number, + loop_number)) + thermo_style += " v_dist_%i_%i" % (particle1, particle2) + loop_number += 1 + + lmp.command("%s" % thermo_style) + # Doing the LES + lmp.command("run %i" % loop_extrusion_dynamics['extrusion_time']) + #lmp.command("run 0") + + # update the lifetime of each extruder + for extruder in range(nextruders): + extruders_lifetimes[extruder] = extruders_lifetimes[extruder] + 1 + + # Remove the loop extrusion restraint! + loop_number = 1 + for particle1,particle2 in extruders_positions: + + print("# unfix LE%i" % (loop_number)) + lmp.command("unfix LE%i" % (loop_number)) + + loop_number += 1 + + # Update the particles involved in the loop extrusion interaction: + # decrease the initial_start by one until you get to start + # increase the initial_stop by one until you get to stop + for extruder in range(len(extruders_positions)): + # If the left part reaches the start of the chromosome -> Stop extruding + if extruders_positions[extruder][0] > 1: + random_number = uniform(0, 1) + if random_number <= loop_extrusion_dynamics['left_extrusion_rate']: + extruders_positions[extruder][0] -= 1 + # If the right part reaches the end of the chromosome -> Stop extruding + if extruders_positions[extruder][1] < chromosome_particle_numbers[0]: + random_number = uniform(0, 1) + if random_number <= loop_extrusion_dynamics['right_extrusion_rate']: + extruders_positions[extruder][1] += 1 + + # If the extruder bumps into another extruder bring it back + tmp_extruders_positions = [extruders_positions[x] for x in range(len(extruders_positions)) if x != extruder] + occupied_positions = list(chain(*tmp_extruders_positions)) + print("Extruder positions",extruders_positions[extruder]) + print("Occupied_positions",sorted(occupied_positions)) + if extruders_positions[extruder][0] in occupied_positions: + extruders_positions[extruder][0] += 1 + if extruders_positions[extruder][1] in occupied_positions: + extruders_positions[extruder][1] -= 1 + + # If the extruder reached its lifetime, put it in another position + if extruders_lifetimes[extruder] == loop_extrusion_dynamics['lifetime']: + extruders_positions[extruder] = draw_loop_extrusion_starting_point(loop_extrusion_dynamics['chrlength'][0]) + while (extruders_positions[extruder][0] in occupied_positions) or (extruders_positions[extruder][1] in occupied_positions) or (extruders_positions[extruder][0] in loop_extrusion_dynamics['barriers']) or (extruders_positions[extruder][1] in loop_extrusion_dynamics['barriers']): + extruders_positions[extruder] = draw_loop_extrusion_starting_point(loop_extrusion_dynamics['chrlength'][0]) + # Re-initialise the lifetime of the extruder + extruders_lifetimes[extruder] = 0 + print("Extruders' repositioning after lifetime",extruders_positions,extruders_positions[extruder]) + + # Check presence of barriers + if loop_extrusion_dynamics['barriers_permeability'] < 1.0: + # If a motor tries to overcome a barrier we stop it with a probability > than the permeability + # If the barrier is on the left of the extruders, which is extruding contrary to the chain index, we have to re-put the extruder forwards + if extruders_positions[extruder][0] in loop_extrusion_dynamics['barriers']: + random_number = uniform(0, 1) + if random_number >= loop_extrusion_dynamics['barriers_permeability']: + extruders_positions[extruder][0] += 1 + # If the barrier is on the left of the extruders, which is extruding with the chain index, we have to re-put the extruder backwards + if extruders_positions[extruder][1] in loop_extrusion_dynamics['barriers']: + random_number = uniform(0, 1) + if random_number >= loop_extrusion_dynamics['barriers_permeability']: + extruders_positions[extruder][1] -= 1 + + print("Extruders positions at step",LES,extruders_positions) + print("Extruders lifetimes at step",LES,extruders_lifetimes) + + ### Put here the creationg of a pickle with the complete trajectory ### + if to_dump: + lmp.command("undump 1") + lmp.command("dump 1 all custom %i %sloop_extrusion_MD_*.XYZ id xu yu zu" % (to_dump,lammps_folder)) + # Post-processing analysis + # Save coordinates + #for time in range(0,runtime,to_dump): + # xc.append(np.array(read_trajectory_file("%s/loop_extrusion_MD_%s.XYZ" % (lammps_folder, time)))) + + #xc.append(np.array(lmp.gather_atoms("x",1,3))) + + lmp.close() + + result = [] + for stg in range(len(xc)): + #log_objfun = read_objective_function("%sobj_fun_from_time_point_%s_to_time_point_%s.txt" % (lammps_folder, str(stg), str(stg+1))) + log_objfun = [0.0] + for timepoint in range(1,timepoints+1): + lammps_model = LAMMPSmodel({'x' : [], + 'y' : [], + 'z' : [], + 'cluster' : 'Singleton', + 'log_objfun' : log_objfun, + 'objfun' : log_objfun[-1], + 'radius' : 0.5, #float(CONFIG.HiC['resolution'] * CONFIG.HiC['scale'])/2, + 'index' : kseed+timepoint, + 'rand_init' : str(kseed+timepoint)}) + + if pbc: + store_conformation_with_pbc(xc[stg], lammps_model, confining_environment) + else: + skip_first = 0 + if time_dependent_steering_pairs: + skip_first = 1 + for i in range((timepoint-1+skip_first)*len(LOCI)*3,(timepoint+skip_first)*len(LOCI)*3,3): + lammps_model['x'].append(xc[stg][i]*float(CONFIG.HiC['resolution'] * CONFIG.HiC['scale'])) + lammps_model['y'].append(xc[stg][i+1]*float(CONFIG.HiC['resolution'] * CONFIG.HiC['scale'])) + lammps_model['z'].append(xc[stg][i+2]*float(CONFIG.HiC['resolution'] * CONFIG.HiC['scale'])) + result.append(lammps_model) + + #os.remove("%slog.cite" % lammps_folder) + # safe finished model + if model_path != False: + with open(model_path, "wb") as output_file: + dump((kseed,result), output_file) + ################### Special case for clusters with disk quota + # Remove the saved steps + if saveRestart == True: + if os.path.isdir(restart_file): + restart_path = restart_file + else: + restart_path = '/'.join(restart_file.split('/')[:-1]) + '/' + for pathfile in os.listdir(restart_path): + if pathfile.startswith('restart'): + os.remove(restart_path + pathfile) + ################################################################## + # Load initial conformation and return it + init_conf = read_conformation_file(initial_conformation) + + return (kseed, result, init_conf) + +def read_trajectory_file(fname): + + coords=[] + fhandler = open(fname) + line = next(fhandler) + try: + while True: + if line.startswith('ITEM: TIMESTEP'): + while not line.startswith('ITEM: ATOMS'): + line = next(fhandler) + if line.startswith('ITEM: ATOMS'): + line = next(fhandler) + line = line.strip() + if len(line) == 0: + continue + line_vals = line.split() + coords += [float(line_vals[1]),float(line_vals[2]),float(line_vals[3])] + line = next(fhandler) + except StopIteration: + pass + fhandler.close() + + return coords + +def read_conformation_file(fname): + + mod={'x':[], 'y':[], 'z':[]} + fhandler = open(fname) + line = next(fhandler) + try: + while True: + if line.startswith('LAMMPS input data file'): + while not line.startswith(' Atoms'): + line = next(fhandler) + if line.startswith(' Atoms'): + line = next(fhandler) + while len(line.strip()) == 0: + line = next(fhandler) + line = line.strip() + line_vals = line.split() + mod['x'].append(float(line_vals[3])) + mod['y'].append(float(line_vals[4])) + mod['z'].append(float(line_vals[5])) + line = next(fhandler) + if len(line.strip()) == 0: + break + except StopIteration: + pass + fhandler.close() + + return mod + +########## Part to perform the restrained dynamics ########## +# I should add here: The steered dynamics (Irene's and Hi-C based models) ; +# The loop extrusion dynamics ; the binders based dynamics (Marenduzzo and Nicodemi)...etc... + +def linecount(filename): + """ + Count valid lines of input colvars file + + :param filename: input colvars file. + + :returns: number of valid contact lines + + """ + + k = 0 + tfp = open(filename) + for i, line in enumerate(tfp): + + if line.startswith('#'): + continue + cols_vals = line.split() + if cols_vals[1] == cols_vals[2]: + continue + k += 1 + + return k + +########## + +def generate_colvars_list(steering_pairs, + kincrease=0, + colvars_header='# collective variable: monitor distances\n\ncolvarsTrajFrequency 1000 # output every 1000 steps\ncolvarsRestartFrequency 10000000\n', + colvars_template=''' + +colvar { + name %s + # %s %s %i + distance { + group1 { + atomNumbers %i + } + group2 { + atomNumbers %i + } + } +}''', + colvars_tail = ''' + +harmonic { + name h_pot_%s + colvars %s + centers %s + forceConstant %f +}\n''', colvars_harmonic_lower_bound_tail = ''' + +harmonicWalls { + name hlb_pot_%s + colvars %s + lowerWalls %s + forceConstant %f + lowerWallConstant 1.0 +}\n''' + ): + + """ + Generates lammps colvars file http://lammps.sandia.gov/doc/PDF/colvars-refman-lammps.pdf + + :param dict steering_pairs: dictionary containing all the information to write down the + the input file for the colvars implementation + :param exisiting_template colvars_header: header template for colvars file. + :param exisiting_template colvars_template: contact template for colvars file. + :param exisiting_template colvars_tail: tail template for colvars file. + + """ + + # Getting the input + # XXXThe target_pairs could be also a list as the one in output of get_HiCbased_restraintsXXX + target_pairs = steering_pairs['colvar_input'] + outfile = steering_pairs['colvar_output'] + if 'kappa_vs_genomic_distance' in steering_pairs: + kappa_vs_genomic_distance = steering_pairs['kappa_vs_genomic_distance'] + if 'chrlength' in steering_pairs: + chrlength = steering_pairs['chrlength'] + else: + chrlength = 0 + if 'copies' in steering_pairs: + copies = steering_pairs['copies'] + else: + copies = ['A'] + kbin = 10000000 + binsize = steering_pairs['binsize'] + if 'percentage_enforced_contacts' in steering_pairs: + percentage_enforced_contacts = steering_pairs['perc_enfor_contacts'] + else: + percentage_enforced_contacts = 100 + + # Here we extract from all the restraints only + # a random sub-sample of percentage_enforced_contacts/100*totcolvars + rand_lines = [] + i=0 + j=0 + if isinstance(target_pairs, str): + totcolvars = linecount(target_pairs) + ncolvars = int(totcolvars*(float(percentage_enforced_contacts)/100)) + + #print "Number of enforced contacts = %i over %i" % (ncolvars,totcolvars) + rand_positions = sample(list(range(totcolvars)), ncolvars) + rand_positions = sorted(rand_positions) + + tfp = open(target_pairs) + with open(target_pairs) as f: + for line in f: + line = line.strip() + if j >= ncolvars: + break + if line.startswith('#'): + continue + + cols_vals = line.split() + # Avoid to enforce restraints between the same bin + if cols_vals[1] == cols_vals[2]: + continue + + if i == rand_positions[j]: + rand_lines.append(line) + j += 1 + i += 1 + tfp.close() + elif isinstance(target_pairs, HiCBasedRestraints): + + rand_lines = target_pairs.get_hicbased_restraints() + totcolvars = len(rand_lines) + ncolvars = int(totcolvars*(float(percentage_enforced_contacts)/100)) + + #print "Number of enforced contacts = %i over %i" % (ncolvars,totcolvars) + rand_positions = sample(list(range(totcolvars)), ncolvars) + rand_positions = sorted(rand_positions) + + + else: + print("Unknown target_pairs") + return + + + + #print rand_lines + + seqdists = {} + poffset=0 + outf = open(outfile,'w') + outf.write(colvars_header) + for copy_nbr in copies: + i = 1 + for line in rand_lines: + if isinstance(target_pairs, str): + cols_vals = line.split() + else: + cols_vals = ['chr'] + line + + #print cols_vals + + if isinstance(target_pairs, HiCBasedRestraints) and cols_vals[3] != "Harmonic" and cols_vals[3] != "HarmonicLowerBound": + continue + + part1_start = int(cols_vals[1])*binsize + part1_end = (int(cols_vals[1])+1)*binsize + #print part1_start, part1_end + + part2_start = int(cols_vals[2])*binsize + part2_end = (int(cols_vals[2])+1)*binsize + #print part2_start, part2_end + + name = str(i)+copy_nbr + seqdist = abs(part1_start-part2_start) + #print seqdist + + region1 = cols_vals[0] + '_' + str(part1_start) + '_' + str(part1_end) + region2 = cols_vals[0] + '_' + str(part2_start) + '_' + str(part2_end) + + particle1 = int(cols_vals[1]) + 1 + poffset + particle2 = int(cols_vals[2]) + 1 + poffset + + seqdists[name] = seqdist + + outf.write(colvars_template % (name,region1,region2,seqdist,particle1,particle2)) + + if isinstance(target_pairs, HiCBasedRestraints): + # If the spring constant is zero we avoid to add the restraint! + if cols_vals[4] == 0.0: + continue + + centre = cols_vals[5] + kappa = cols_vals[4]*steering_pairs['k_factor'] + + if cols_vals[3] == "Harmonic": + outf.write(colvars_tail % (name,name,centre,kappa)) + + if cols_vals[3] == "HarmonicLowerBound": + outf.write(colvars_harmonic_lower_bound_tail % (name,name,centre,kappa)) + + i += 1 + poffset += chrlength + + outf.flush() + + #=========================================================================== + # if isinstance(target_pairs, HiCBasedRestraints): + # for copy_nbr in copies: + # i = 1 + # for line in rand_lines: + # cols_vals = line + # + # if cols_vals[3] == 0.0: + # continue + # + # name = str(i)+copy_nbr + # + # centre = cols_vals[4] + # kappa = cols_vals[3] + # + # if cols_vals[2] == "Harmonic": + # outf.write(colvars_tail % (name,name,centre,kappa)) + # + # elif cols_vals[2] == "HarmonicLowerBound": + # outf.write(colvars_harmonic_lower_bound_tail % (name,name,centre,kappa)) + # + # + # + # i += 1 + # poffset += chrlength + # + # outf.flush() + #=========================================================================== + + if 'kappa_vs_genomic_distance' in steering_pairs: + + kappa_values = {} + with open(kappa_vs_genomic_distance) as kgd: + for line in kgd: + line_vals = line.split() + kappa_values[int(line_vals[0])] = float(line_vals[1]) + + for seqd in set(seqdists.values()): + kappa = 0 + if seqd in kappa_values: + kappa = kappa_values[seqd]*kincrease + else: + for kappa_key in sorted(kappa_values, key=int): + if int(kappa_key) > seqd: + break + kappa = kappa_values[kappa_key]*kincrease + centres='' + names='' + for seq_name in seqdists: + if seqdists[seq_name] == seqd: + centres += ' 1.0' + names += ' '+seq_name + + outf.write(colvars_tail % (str(seqd),names,centres,kappa)) + + outf.flush() + + outf.close() + + +def generate_bond_list(steering_pairs): + + """ + Generates lammps bond commands + + :param dict steering_pairs: dictionary containing all the information to write down the + the input file for the bonds + """ + + # Getting the input + # The target_pairs could be also a list as the one in output of get_HiCbased_restraintsXXX + target_pairs = steering_pairs['colvar_input'] + if 'kappa_vs_genomic_distance' in steering_pairs: + kappa_vs_genomic_distance = steering_pairs['kappa_vs_genomic_distance'] + if 'chrlength' in steering_pairs: + chrlength = steering_pairs['chrlength'] + else: + chrlength = 0 + if 'copies' in steering_pairs: + copies = steering_pairs['copies'] + else: + copies = ['A'] + kbin = 10000000 + binsize = steering_pairs['binsize'] + if 'percentage_enforced_contacts' in steering_pairs: + percentage_enforced_contacts = steering_pairs['perc_enfor_contacts'] + else: + percentage_enforced_contacts = 100 + + # Here we extract from all the restraints only + # a random sub-sample of percentage_enforced_contacts/100*totcolvars + rand_lines = [] + i=0 + j=0 + if isinstance(target_pairs, str): + totcolvars = linecount(target_pairs) + ncolvars = int(totcolvars*(float(percentage_enforced_contacts)/100)) + + #print "Number of enforced contacts = %i over %i" % (ncolvars,totcolvars) + rand_positions = sample(list(range(totcolvars)), ncolvars) + rand_positions = sorted(rand_positions) + + tfp = open(target_pairs) + with open(target_pairs) as f: + for line in f: + line = line.strip() + if j >= ncolvars: + break + if line.startswith('#'): + continue + + cols_vals = line.split() + # Avoid to enforce restraints between the same bin + if cols_vals[1] == cols_vals[2]: + continue + + if i == rand_positions[j]: + rand_lines.append(line) + j += 1 + i += 1 + tfp.close() + elif isinstance(target_pairs, HiCBasedRestraints): + + rand_lines = target_pairs.get_hicbased_restraints() + totcolvars = len(rand_lines) + ncolvars = int(totcolvars*(float(percentage_enforced_contacts)/100)) + + #print "Number of enforced contacts = %i over %i" % (ncolvars,totcolvars) + rand_positions = sample(list(range(totcolvars)), ncolvars) + rand_positions = sorted(rand_positions) + + + else: + print("Unknown target_pairs") + return + + + + #print rand_lines + + seqdists = {} + poffset=0 + outf = [] #### a list + for copy_nbr in copies: + i = 1 + for line in rand_lines: + if isinstance(target_pairs, str): + cols_vals = line.split() + else: + cols_vals = ['chr'] + line + + #print cols_vals + + if isinstance(target_pairs, HiCBasedRestraints) and cols_vals[3] != "Harmonic" and cols_vals[3] != "HarmonicLowerBound": + continue + + part1_start = int(cols_vals[1])*binsize + part1_end = (int(cols_vals[1])+1)*binsize + #print part1_start, part1_end + + part2_start = int(cols_vals[2])*binsize + part2_end = (int(cols_vals[2])+1)*binsize + #print part2_start, part2_end + + name = str(i)+copy_nbr + seqdist = abs(part1_start-part2_start) + #print seqdist + + region1 = cols_vals[0] + '_' + str(part1_start) + '_' + str(part1_end) + region2 = cols_vals[0] + '_' + str(part2_start) + '_' + str(part2_end) + + particle1 = int(cols_vals[1]) + 1 + poffset + particle2 = int(cols_vals[2]) + 1 + poffset + + seqdists[name] = seqdist + + + if isinstance(target_pairs, HiCBasedRestraints): + # If the spring constant is zero we avoid to add the restraint! + if cols_vals[4] == 0.0: + continue + + centre = cols_vals[5] + kappa = cols_vals[4]*steering_pairs['k_factor'] + + bondType = None + if cols_vals[3] == "Harmonic": + bondType = 'bond' + elif cols_vals[3] == "HarmonicLowerBound": + bondType = 'lbound' + + if bondType: + outf.append('fix %s all restrain %s %d %d %f %f %f %f' %( + name, bondType, particle1, particle2, 0, kappa, + centre, centre)) + + + i += 1 + poffset += chrlength + + return outf + +########## + +def generate_time_dependent_bond_list(steering_pairs): + + + """ + Generates lammps bond commands + + :param dict steering_pairs: dictionary containing all the information to write down the + the input file for the bonds + """ + + outf = [] #### a list + # Defining the particle pairs + for pair in steering_pairs: + + #print steering_pairs[pair] + sys.stdout.flush() + for i in range(len(steering_pairs[pair][0])): + name = "%s_%s_%s" % (i, int(pair[0])+1, int(pair[1])+1) + seqdist = abs(int(pair[1])-int(pair[0])) + particle1 = int(pair[0])+1 + particle2 = int(pair[1])+1 + + restraint_type = steering_pairs[pair][0][i] + kappa_start = steering_pairs[pair][1][i] + kappa_stop = steering_pairs[pair][2][i] + centre_start = steering_pairs[pair][3][i] + centre_stop = steering_pairs[pair][4][i] + timesteps_per_k_change = steering_pairs[pair][5][i] + + bonType = None + if restraint_type == "Harmonic": + bonType = 'bond' + elif restraint_type == "HarmonicLowerBound": + bonType = 'lbound' + + if bonType: + outf.append('fix %s all restrain %s %d %d %f %f %f %f' %( + name, bonType, particle1, particle2, kappa_start, kappa_stop, + centre_start, centre_stop)) + return outf + +########## + +def generate_time_dependent_colvars_list(steering_pairs, + outfile, + colvar_dump_freq, + colvars_header='# collective variable: monitor distances\n\ncolvarsTrajFrequency %i # output every %i steps\ncolvarsRestartFrequency 1000000\n', + colvars_template=''' + +colvar { + name %s + # %s %s %i + width 1.0 + distance { + group1 { + atomNumbers %i + } + group2 { + atomNumbers %i + } + } +}''', + colvars_harmonic_tail = ''' + +harmonic { + name h_pot_%s + colvars %s + forceConstant %f + targetForceConstant %f + centers %s + targetCenters %s + targetNumSteps %s + outputEnergy yes +}\n''', + colvars_harmonic_lower_bound_tail = ''' +harmonicBound { + name hlb_pot_%s + colvars %s + forceConstant %f + targetForceConstant %f + centers %f + targetCenters %f + targetNumSteps %s + outputEnergy yes +}\n''' + ): + + + """ + harmonicWalls { + name hlb_pot_%s + colvars %s + forceConstant %f # This is the force constant at time_point + targetForceConstant %f # This is the force constant at time_point+1 + centers %f # This is the equilibrium distance at time_point+1 + targetCenters %f # This is the equilibrium distance at time_point+1 + targetNumSteps %d # This is the number of timesteps between time_point and time_point+1 + outputEnergy yes + }\n''', + + + colvars_harmonic_lower_bound_tail = ''' + + harmonicBound { + name hlb_pot_%s + colvars %s + forceConstant %f # This is the force constant at time_point + targetForceConstant %f # This is the force constant at time_point+1 + centers %f # This is the equilibrium distance at time_point+1 + targetCenters %f # This is the equilibrium distance at time_point+1 + targetNumSteps %d # This is the number of timesteps between time_point and time_point+1 + outputEnergy yes + }\n''', + + Generates lammps colvars file http://lammps.sandia.gov/doc/PDF/colvars-refman-lammps.pdf + + :param dict steering_pairs: dictionary containing all the information to write down the + the input file for the colvars implementation + :param exisiting_template colvars_header: header template for colvars file. + :param exisiting_template colvars_template: contact template for colvars file. + :param exisiting_template colvars_tail: tail template for colvars file. + + """ + + #restraints[pair] = [time_dependent_restraints[time_point+1][pair][0], # Restraint type -> Is the one at time point time_point+1 + #time_dependent_restraints[time_point][pair][1]*10., # Initial spring constant + #time_dependent_restraints[time_point+1][pair][1]*10., # Final spring constant + #time_dependent_restraints[time_point][pair][2], # Initial equilibrium distance + #time_dependent_restraints[time_point+1][pair][2], # Final equilibrium distance + #int(time_dependent_steering_pairs['timesteps_per_k_change'][time_point])] # Number of timesteps for the gradual change + + outf = open(outfile,'w') + #tfreq=10000 + #for pair in steering_pairs: + # if len(steering_pairs[pair][5]) >= 1: + # tfreq = int(steering_pairs[pair][5][0]/100) + # break + + tfreq = colvar_dump_freq + outf.write(colvars_header % (tfreq, tfreq)) + # Defining the particle pairs + for pair in steering_pairs: + + #print steering_pairs[pair] + sys.stdout.flush() + for i in range(len(steering_pairs[pair][0])): + name = "%s_%s_%s" % (i, int(pair[0])+1, int(pair[1])+1) + seqdist = abs(int(pair[1])-int(pair[0])) + region1 = "particle_%s" % (int(pair[0])+1) + region2 = "particle_%s" % (int(pair[1])+1) + + outf.write(colvars_template % (name,region1,region2,seqdist,int(pair[0])+1,int(pair[1])+1)) + + restraint_type = steering_pairs[pair][0][i] + kappa_start = steering_pairs[pair][1][i] + kappa_stop = steering_pairs[pair][2][i] + centre_start = steering_pairs[pair][3][i] + centre_stop = steering_pairs[pair][4][i] + timesteps_per_k_change = steering_pairs[pair][5][i] + + if restraint_type == "Harmonic": + outf.write(colvars_harmonic_tail % (name,name,kappa_start,kappa_stop,centre_start,centre_stop,timesteps_per_k_change)) + + if restraint_type == "HarmonicLowerBound": + outf.write(colvars_harmonic_lower_bound_tail % (name,name,kappa_start,kappa_stop,centre_start,centre_stop,timesteps_per_k_change)) + + + + + outf.flush() + + outf.close() + +########## + +def get_time_dependent_colvars_list(time_dependent_steering_pairs): + + """ + Generates lammps colvars file http://lammps.sandia.gov/doc/PDF/colvars-refman-lammps.pdf + + :param dict time_dependent_steering_pairs: dictionary containing all the information to write down the + the input file for the colvars implementation + """ + + # Getting the input + # XXXThe target_pairs_file could be also a list as the one in output of get_HiCbased_restraintsXXX + target_pairs = time_dependent_steering_pairs['colvar_input'] + outfile = time_dependent_steering_pairs['colvar_output'] + if 'chrlength' in time_dependent_steering_pairs: + chrlength = time_dependent_steering_pairs['chrlength'] + binsize = time_dependent_steering_pairs['binsize'] + if 'percentage_enforced_contacts' in time_dependent_steering_pairs: + percentage_enforced_contacts = time_dependent_steering_pairs['perc_enfor_contacts'] + else: + percentage_enforced_contacts = 100 + + # HiCbasedRestraints is a list of restraints returned by this function. + # Each entry of the list is a list of 5 elements describing the details of the restraint: + # 0 - particle_i + # 1 - particle_j + # 2 - type_of_restraint = Harmonic or HarmonicLowerBound or HarmonicUpperBound + # 3 - the kforce of the restraint + # 4 - the equilibrium (or maximum or minimum respectively) distance associated to the restraint + + # Here we extract from all the restraints only a random sub-sample + # of percentage_enforced_contacts/100*totcolvars + rand_lines = [] + i=0 + j=0 + if isinstance(target_pairs, str): + time_dependent_restraints = {} + totcolvars = linecount(target_pairs) + ncolvars = int(totcolvars*(float(percentage_enforced_contacts)/100)) + + #print "Number of enforced contacts = %i over %i" % (ncolvars,totcolvars) + rand_positions = sample(list(range(totcolvars)), ncolvars) + rand_positions = sorted(rand_positions) + + with open(target_pairs) as f: + for line in f: + line = line.strip() + if j >= ncolvars: + break + if line.startswith('#') or line == "": + continue + + # Line format: timepoint,particle1,particle2,restraint_type,kforce,distance + cols_vals = line.split() + + if int(cols_vals[1]) < int(cols_vals[2]): + pair = (int(cols_vals[1]), int(cols_vals[2])) + else: + pair = (int(cols_vals[2]), int(cols_vals[1])) + + try: + if pair in time_dependent_restraints[int(cols_vals[0])]: + print("WARNING: Check your restraint list! pair %s is repeated in time point %s!" % (pair, int(cols_vals[0]))) + # List content: restraint_type,kforce,distance + time_dependent_restraints[int(cols_vals[0])][pair] = [cols_vals[3], + float(cols_vals[4]), + float(cols_vals[5])] + except: + time_dependent_restraints[int(cols_vals[0])] = {} + # List content: restraint_type,kforce,distance + time_dependent_restraints[int(cols_vals[0])][pair] = [cols_vals[3], + float(cols_vals[4]), + float(cols_vals[5])] + if i == rand_positions[j]: + rand_lines.append(line) + j += 1 + i += 1 + elif isinstance(target_pairs, list): + time_dependent_restraints = dict((i,{}) for i in range(len(target_pairs))) + for time_point, HiCR in enumerate(target_pairs): + rand_lines = HiCR.get_hicbased_restraints() + totcolvars = len(rand_lines) + ncolvars = int(totcolvars*(float(percentage_enforced_contacts)/100)) + + #print "Number of enforced contacts = %i over %i" % (ncolvars,totcolvars) + rand_positions = sample(list(range(totcolvars)), ncolvars) + rand_positions = sorted(rand_positions) + + for cols_vals in rand_lines: + + if cols_vals[2] != "Harmonic" and cols_vals[2] != "HarmonicLowerBound": + continue + if int(cols_vals[0]) < int(cols_vals[1]): + pair = (int(cols_vals[0]), int(cols_vals[1])) + else: + pair = (int(cols_vals[1]), int(cols_vals[0])) + + if pair in time_dependent_restraints[time_point]: + print("WARNING: Check your restraint list! pair %s is repeated in time point %s!" % (pair, time_point)) + # List content: restraint_type,kforce,distance + time_dependent_restraints[time_point][pair] = [cols_vals[2], + float(cols_vals[3]), + float(cols_vals[4])] + + else: + print("Unknown target_pairs") + return + +# for time_point in sorted(time_dependent_restraints.keys()): +# for pair in time_dependent_restraints[time_point]: +# print "#Time_dependent_restraints", time_point,pair, time_dependent_restraints[time_point][pair] + return time_dependent_restraints + +### TODO Add the option to add also spheres of different radii (e.g. to simulate nucleoli) +########## Part to generate the initial conformation ########## +def generate_chromosome_random_walks_conformation ( chromosome_particle_numbers , + confining_environment=['sphere',100.] , + particle_radius=0.5 , + seed_of_the_random_number_generator=1 , + number_of_conformations=1, + outfile="Initial_random_walk_conformation.dat", + pbc=False, + center=True): + """ + Generates lammps initial conformation file by random walks + + :param chromosome_particle_numbers: list with the number of particles of each chromosome. + :param ['sphere',100.] confining_environment: dictionary with the confining environment of the conformation + Possible confining environments: + ['cube',edge_width] + ['sphere',radius] + ['ellipsoid',x-semiaxes, y-semiaxes, z-semiaxes] + ['cylinder', basal radius, height] + :param 0.5 particle_radius: Radius of each particle. + :param 1 seed_of_the_random_number_generator: random seed. + :param 1 number_of_conformations: copies of the conformation. + :param outfile: file where to store resulting initial conformation file + + """ + seed(seed_of_the_random_number_generator) + + # This allows to organize the largest chromosomes first. + # This is to get a better acceptance of the chromosome positioning. + chromosome_particle_numbers = [int(x) for x in chromosome_particle_numbers] + chromosome_particle_numbers.sort(key=int,reverse=True) + + for cnt in range(number_of_conformations): + + final_random_walks = generate_random_walks(chromosome_particle_numbers, + particle_radius, + confining_environment, + pbc=pbc, + center=center) + + # Writing the final_random_walks conformation + #print "Succesfully generated conformation number %d\n" % (cnt+1) + write_initial_conformation_file(final_random_walks, + chromosome_particle_numbers, + confining_environment, + out_file=outfile) + +########## + +def generate_chromosome_rosettes_conformation ( chromosome_particle_numbers , + fractional_radial_positions=None, + confining_environment=['sphere',100.] , + rosette_radius=12.0 , particle_radius=0.5 , + seed_of_the_random_number_generator=1 , + number_of_conformations=1, + outfile = "Initial_rosette_conformation.dat", + atom_types=1): + """ + Generates lammps initial conformation file by rosettes conformation + + :param chromosome_particle_numbers: list with the number of particles of each chromosome. + :param None fractional_radial_positions: list with fractional radial positions for all the chromosomes. + :param ['sphere',100.] confining_environment: dictionary with the confining environment of the conformation + Possible confining environments: + ['cube',edge_width] + ['sphere',radius] + ['ellipsoid',x-semiaxes, y-semiaxes, z-semiaxes] + ['cylinder', basal radius, height] + :param 0.5 particle_radius: Radius of each particle. + :param 1 seed_of_the_random_number_generator: random seed. + :param 1 number_of_conformations: copies of the conformation. + :param outfile: file where to store resulting initial conformation file + + """ + seed(seed_of_the_random_number_generator) + + # This allows to organize the largest chromosomes first. + # This is to get a better acceptance of the chromosome positioning. + chromosome_particle_numbers = [int(x) for x in chromosome_particle_numbers] + chromosome_particle_numbers.sort(key=int,reverse=True) + + initial_rosettes , rosettes_lengths = generate_rosettes(chromosome_particle_numbers, + rosette_radius, + particle_radius) + print(rosettes_lengths) + + + # Constructing the rosettes conformations + for cnt in range(number_of_conformations): + + particle_inside = 0 # 0 means a particle is outside + particles_overlap = 0 # 0 means two particles are overlapping + while particle_inside == 0 or particles_overlap == 0: + particle_inside = 1 + particles_overlap = 1 + segments_P1 = [] + segments_P0 = [] + side = 0 + init_rosettes = copy.deepcopy(initial_rosettes) + + # Guess of the initial segment conformation: + # 1 - each rod is placed inside the confining evironment + # in a random position and with random orientation + # 2 - possible clashes between generated rods are checked + if fractional_radial_positions: + if len(fractional_radial_positions) != len(chromosome_particle_numbers): + print("Please provide the desired fractional radial positions for all the chromosomes") + sys.exit() + segments_P1 , segments_P0 = generate_rods_biased_conformation(rosettes_lengths, rosette_radius, + confining_environment, + fractional_radial_positions, + max_number_of_temptative=100000) + else: + segments_P1 , segments_P0 = generate_rods_random_conformation(rosettes_lengths, rosette_radius, + confining_environment, + max_number_of_temptative=100000) + + # Roto-translation of the rosettes according to the segment position and orientation + final_rosettes = rosettes_rototranslation(init_rosettes, segments_P1, segments_P0) + + # Checking that the beads are all inside the confining environment and are not overlapping + for rosette_pair in list(combinations(final_rosettes,2)): + molecule0 = list(zip(rosette_pair[0]['x'],rosette_pair[0]['y'],rosette_pair[0]['z'])) + molecule1 = list(zip(rosette_pair[1]['x'],rosette_pair[1]['y'],rosette_pair[1]['z'])) + distances = spatial.distance.cdist(molecule1,molecule0) + print(len(molecule0),len(molecule0[0]),distances.min()) + if distances.min() < particle_radius*2.0*0.95: + particles_overlap = 0 + break + + if particles_overlap != 0: + for r in xrange(len(final_rosettes)): + molecule0 = list(zip(final_rosettes[r]['x'],final_rosettes[r]['y'],final_rosettes[r]['z'])) + print(len(molecule0),len(molecule0[0])) + + distances = spatial.distance.cdist(molecule0,molecule0) + print(distances.min()) + for i in xrange(len(molecule0)): + for j in xrange(i+1,len(molecule0)): + if distances[(i,j)] < particle_radius*2.0*0.95: + particles_overlap = 0 + if particles_overlap == 0: + break + if particles_overlap == 0: + break + if particles_overlap == 0: + break + + # Writing the final_rosettes conformation + print("Succesfully generated conformation number %d\n" % (cnt+1)) + write_initial_conformation_file(final_rosettes, + chromosome_particle_numbers, + confining_environment, + out_file=outfile, + atom_types=atom_types) + +########## + +def generate_chromosome_rosettes_conformation_with_pbc ( chromosome_particle_numbers , + fractional_radial_positions=None, + confining_environment=['cube',100.] , + rosette_radius=12.0 , particle_radius=0.5 , + seed_of_the_random_number_generator=1 , + number_of_conformations=1, + outfile = "Initial_rosette_conformation_with_pbc.dat", + atom_types=1): + """ + Generates lammps initial conformation file by rosettes conformation + + :param chromosome_particle_numbers: list with the number of particles of each chromosome. + :param None fractional_radial_positions: list with fractional radial positions for all the chromosomes. + :param ['cube',100.] confining_environment: dictionary with the confining environment of the conformation + Possible confining environments: + ['cube',edge_width] + :param 0.5 particle_radius: Radius of each particle. + :param 1 seed_of_the_random_number_generator: random seed. + :param 1 number_of_conformations: copies of the conformation. + :param outfile: file where to store resulting initial conformation file + + """ + seed(seed_of_the_random_number_generator) + + # This allows to organize the largest chromosomes first. + # This is to get a better acceptance of the chromosome positioning. + chromosome_particle_numbers = [int(x) for x in chromosome_particle_numbers] + chromosome_particle_numbers.sort(key=int,reverse=True) + + initial_rosettes , rosettes_lengths = generate_rosettes(chromosome_particle_numbers, + rosette_radius, + particle_radius) + print(rosettes_lengths) + + + # Constructing the rosettes conformations + for cnt in range(number_of_conformations): + + particles_overlap = 0 # 0 means two particles are overlapping + while particles_overlap == 0: + particles_overlap = 1 + segments_P1 = [] + segments_P0 = [] + side = 0 + init_rosettes = copy.deepcopy(initial_rosettes) + + # Guess of the initial segment conformation: + # 1 - each rod is placed in a random position and with random orientation + # 2 - possible clashes between generated rods are checked taking into account pbc + segments_P1 , segments_P0 = generate_rods_random_conformation_with_pbc ( + rosettes_lengths, + rosette_radius, + confining_environment, + max_number_of_temptative=100000) + + # Roto-translation of the rosettes according to the segment position and orientation + final_rosettes = rosettes_rototranslation(init_rosettes, segments_P1, segments_P0) + + # Checking that the beads once folded inside the simulation box (for pbc) are not overlapping + folded_rosettes = copy.deepcopy(final_rosettes) + for r in range(len(folded_rosettes)): + particle = 0 + for x, y, z in zip(folded_rosettes[r]['x'],folded_rosettes[r]['y'],folded_rosettes[r]['z']): + #inside_1 = check_point_inside_the_confining_environment(x, y, z, + # particle_radius, + # confining_environment) + #if inside_1 == 0: + # print inside_1, r, particle, x, y, z + + while x > (confining_environment[1]*0.5): + x -= confining_environment[1] + while x < -(confining_environment[1]*0.5): + x += confining_environment[1] + + while y > (confining_environment[1]*0.5): + y -= confining_environment[1] + while y < -(confining_environment[1]*0.5): + y += confining_environment[1] + + while z > (confining_environment[1]*0.5): + z -= confining_environment[1] + while z < -(confining_environment[1]*0.5): + z += confining_environment[1] + + #inside_2 = check_point_inside_the_confining_environment(x, y, z, + # particle_radius, + # confining_environment) + #if inside_2 == 1 and inside_1 == 0: + # print inside_2, r, particle, x, y, z + folded_rosettes[r]['x'][particle] = x + folded_rosettes[r]['y'][particle] = y + folded_rosettes[r]['z'][particle] = z + particle += 1 + + for rosette_pair in list(combinations(folded_rosettes,2)): + + for x0,y0,z0 in zip(rosette_pair[0]['x'],rosette_pair[0]['y'],rosette_pair[0]['z']): + for x1,y1,z1 in zip(rosette_pair[1]['x'],rosette_pair[1]['y'],rosette_pair[1]['z']): + + particles_overlap = check_particles_overlap(x0,y0,z0,x1,y1,z1,particle_radius) + + if particles_overlap == 0: # 0 means that the particles are overlapping -> PROBLEM!!! + print("Particle",x0,y0,z0,"and",x1,y1,z1,"overlap\n") + break + if particles_overlap == 0: + break + if particles_overlap == 0: + break + + # Writing the final_rosettes conformation + print("Succesfully generated conformation number %d\n" % (cnt+1)) + write_initial_conformation_file(final_rosettes, + chromosome_particle_numbers, + confining_environment, + out_file=outfile, + atom_types=atom_types) + +########## + +def generate_rosettes(chromosome_particle_numbers, rosette_radius, particle_radius): + # Genaration of the rosettes + # XXXA. Rosa publicationXXX + # List to contain the rosettes and the rosettes lengths + rosettes = [] + rosettes_lengths = [] + + for number_of_particles in chromosome_particle_numbers: + + # Variable to build the chain + phi = 0.0 + + # Dictory of lists to contain the rosette + rosette = {} + rosette['x'] = [] + rosette['y'] = [] + rosette['z'] = [] + + # Position of the first particle (x_0, 0.0, 0.0) + k = 6. + x = 0.38 + p = 1.0 + rosette['x'].append(rosette_radius * (x + (1 - x) * cos(k*phi) * cos(k*phi)) * cos(phi)) + rosette['y'].append(rosette_radius * (x + (1 - x) * cos(k*phi) * cos(k*phi)) * sin(phi)) + rosette['z'].append(p * phi / (2.0 * pi)) + #print "First bead is in position: %f %f %f" % (rosette['x'][0], rosette['y'][0], rosette['z'][0]) + + # Building the chain: The rosette is growing along the positive z-axes + for particle in range(1,number_of_particles): + + distance = 0.0 + while distance < (particle_radius*2.0): + phi = phi + 0.001 + x_tmp = rosette_radius * (x + (1 - x) * cos(k*phi) * cos(k*phi)) * cos(phi) + y_tmp = rosette_radius * (x + (1 - x) * cos(k*phi) * cos(k*phi)) * sin(phi) + z_tmp = phi / (2.0 * pi) + distance = sqrt((x_tmp - rosette['x'][-1])*(x_tmp - rosette['x'][-1]) + + (y_tmp - rosette['y'][-1])*(y_tmp - rosette['y'][-1]) + + (z_tmp - rosette['z'][-1])*(z_tmp - rosette['z'][-1])) + + rosette['x'].append(x_tmp) + rosette['y'].append(y_tmp) + rosette['z'].append(z_tmp) + if distance > ((particle_radius*2.0)*1.2): + print("%f %d %d %d" % (distance, particle-1, particle)) + + rosettes.append(rosette) + rosettes_lengths.append(rosette['z'][-1]-rosette['z'][0]) + + return rosettes , rosettes_lengths + +########## + +def generate_rods_biased_conformation(rosettes_lengths, rosette_radius, + confining_environment, + fractional_radial_positions, + max_number_of_temptative=100000): + # Construction of the rods initial conformation + segments_P0 = [] + segments_P1 = [] + + if confining_environment[0] != 'sphere': + print("ERROR: Biased chromosome positioning is currently implemented") + print("only for spherical confinement. If you need other shapes, please") + print("contact the developers") + + for length , target_radial_position in zip(rosettes_lengths,fractional_radial_positions): + tentative = 0 + clashes = 0 # 0 means that there is an clash -> PROBLEM + best_radial_position = 1.0 + best_radial_distance = 1.0 + best_segment_P0 = [] + best_segment_P1 = [] + + # Positioning the rods + while tentative < 100000 and best_radial_distance > 0.00005: + + print("Length = %f" % length) + + print("Trying to position terminus 0") + segment_P0_tmp = [] + segment_P0_tmp = draw_point_inside_the_confining_environment(confining_environment, + rosette_radius) + print("Successfully positioned terminus 0: %f %f %f" % (segment_P0_tmp[0], segment_P0_tmp[1], segment_P0_tmp[2])) + + print("Trying to position terminus 1") + segment_P1_tmp = [] + segment_P1_tmp = draw_second_extreme_of_a_segment_inside_the_confining_environment(segment_P0_tmp[0], + segment_P0_tmp[1], + segment_P0_tmp[2], + length, + rosette_radius, + confining_environment) + print("Successfully positioned terminus 1: %f %f %f" % (segment_P1_tmp[0], segment_P1_tmp[1], segment_P1_tmp[2])) + + # Check clashes with the previously positioned rods + clashes = 1 + for segment_P1,segment_P0 in zip(segments_P1,segments_P0): + clashes = check_segments_clashes(segment_P1, + segment_P0, + segment_P1_tmp, + segment_P0_tmp, + rosette_radius) + if clashes == 0: + break + + if clashes == 1: + # Check whether the midpoint of the segment is close to the target radial position + segment_midpoint = [] + segment_midpoint.append((segment_P0_tmp[0] + segment_P1_tmp[0])*0.5) + segment_midpoint.append((segment_P0_tmp[1] + segment_P1_tmp[1])*0.5) + segment_midpoint.append((segment_P0_tmp[2] + segment_P1_tmp[2])*0.5) + + radial_position = sqrt( ( segment_midpoint[0] * segment_midpoint[0] + + segment_midpoint[1] * segment_midpoint[1] + + segment_midpoint[2] * segment_midpoint[2] ) / + (confining_environment[1]*confining_environment[1])) + + radial_distance = fabs(radial_position-target_radial_position) + + print(radial_position , target_radial_position , radial_distance , best_radial_distance , tentative) + + # If the midpoint of the segment is closer to the target radial position than the + # previous guesses. Store the points as the best guesses! + if radial_distance < best_radial_distance: + best_radial_distance = radial_distance + best_radial_position = radial_position + best_tentative = tentative+1 # The variable tentative starts from 0 + + best_segment_P0 = [] + best_segment_P1 = [] + for component_P0 , component_P1 in zip(segment_P0_tmp,segment_P1_tmp): + best_segment_P0.append(component_P0) + best_segment_P1.append(component_P1) + + tentative = tentative + 1 + + if best_segment_P0 == []: + print("Valid placement not found for chromosome rosette after %d tentatives" % tentative) + sys.exit() + + print("Successfully positioned chromosome of length %lf at tentative %d of %d tentatives" % (length, best_tentative, tentative)) + segments_P0.append(best_segment_P0) + segments_P1.append(best_segment_P1) + + print("Successfully generated rod conformation!") + return segments_P1 , segments_P0 + +########## + +def generate_rods_random_conformation(rosettes_lengths, rosette_radius, + confining_environment, + max_number_of_temptative=100000): + # Construction of the rods initial conformation + segments_P0 = [] + segments_P1 = [] + + for length in rosettes_lengths: + tentative = 0 + clashes = 0 + # Random positioning of the rods + while tentative < 100000 and clashes == 0: + + tentative += 1 + clashes = 1 + #print "Length = %f" % length + + print("Trying to position terminus 0") + #pick uniformly within the confining environment using the rejection method + first_point = [] + first_point = draw_point_inside_the_confining_environment(confining_environment, + rosette_radius) + + print("Successfully positioned terminus 0: %f %f %f" % (first_point[0], first_point[1], first_point[2])) + + print("Trying to position terminus 1") + #pick from P0 another point one the sphere of radius length inside the confining environment + last_point = [] + last_point = draw_second_extreme_of_a_segment_inside_the_confining_environment(first_point[0], + first_point[1], + first_point[2], + length, + rosette_radius, + confining_environment) + + print("Successfully positioned terminus 1: %f %f %f" % (last_point[0], last_point[1], last_point[2])) + + # Check clashes with the previously positioned rods + clashes = 1 + for segment_P1,segment_P0 in zip(segments_P1,segments_P0): + clashes = check_segments_clashes(segment_P1, + segment_P0, + last_point, + first_point, + rosette_radius) + if clashes == 0: + break + + #print clashes + print("Successfully positioned chromosome of length %lf at tentative %d\n" % (length, tentative)) + segments_P1.append(last_point) + segments_P0.append(first_point) + + print("Successfully generated rod conformation!") + return segments_P1 , segments_P0 + +########## + +def generate_rods_random_conformation_with_pbc(rosettes_lengths, rosette_radius, + confining_environment, + max_number_of_temptative=100000): + + # Construction of the rods initial conformation + segments_P0 = [] + segments_P1 = [] + + for length in rosettes_lengths: + tentative = 0 + clashes = 0 + # Random positioning of the rods + while tentative < 100000 and clashes == 0: + + tentative += 1 + clashes = 1 + #print "Length = %f" % length + + print("Trying to position terminus 0") + #pick uniformly within the confining environment using the rejection method + first_point = [] + first_point = draw_point_inside_the_confining_environment(confining_environment, + rosette_radius) + + print("Successfully positioned terminus 0: %f %f %f" % (first_point[0], first_point[1], first_point[2])) + + print("Trying to position terminus 1") + #pick from P0 another point one the sphere of radius length inside the confining environment + last_point = [] + last_point = draw_second_extreme_of_a_segment(first_point[0], + first_point[1], + first_point[2], + length, + rosette_radius) + + print(last_point) + # Check clashes with the previously positioned rods + for segment_P1,segment_P0 in zip(segments_P1,segments_P0): + clashes = check_segments_clashes_with_pbc(segment_P1, + segment_P0, + last_point, + first_point, + rosette_radius, + confining_environment) + if clashes == 0: + break + + #print clashes + print("Successfully positioned chromosome of length %lf at tentative %d\n" % (length, tentative)) + segments_P1.append(last_point) + segments_P0.append(first_point) + + print("Successfully generated rod conformation!") + return segments_P1 , segments_P0 + +########## + +def generate_random_walks(chromosome_particle_numbers, + particle_radius, + confining_environment, + center, + pbc=False): + # Construction of the random walks initial conformation + random_walks = [] + + for number_of_particles in chromosome_particle_numbers: + #print "Trying to position random walk" + random_walk = {} + random_walk['x'] = [] + random_walk['y'] = [] + random_walk['z'] = [] + + + #print "Positioning first particle" + particle_overlap = 0 + while particle_overlap == 0: + particle_overlap = 1 + first_particle = [] + first_particle = draw_point_inside_the_confining_environment(confining_environment, + particle_radius) + + # Check if the particle is overlapping with any other particle in the system + for rand_walk in random_walks: + if pbc: + particle_overlap = check_particle_vs_all_overlap(first_particle[0], + first_particle[1], + first_particle[2], + rand_walk, + 2.0*particle_radius) + else: + particle_overlap = check_particle_vs_all_overlap(first_particle[0], + first_particle[1], + first_particle[2], + rand_walk, + 2.0*particle_radius) + + if particle_overlap == 0: + break + + random_walk['x'].append(first_particle[0]) + random_walk['y'].append(first_particle[1]) + random_walk['z'].append(first_particle[2]) + + for particle in range(1,number_of_particles): + #print "Positioning particle %d" % (particle+1) + particle_overlap = 0 # 0 means that there is an overlap -> PROBLEM + overlapCounter = -1 + maxIter = 1000 + while particle_overlap == 0: + overlapCounter += 1 + if overlapCounter > maxIter: + # raise error so log file is created to avoid k_seed + errorName = 'ERROR: Initial conformation non ending loop' + raise InitalConformationError(errorName) + particle_overlap = 1 + new_particle = [] + if pbc: + new_particle = draw_second_extreme_of_a_segment( + random_walk['x'][-1], + random_walk['y'][-1], + random_walk['z'][-1], + 2.0*particle_radius, + 2.0*particle_radius) + else: + new_particle = draw_second_extreme_of_a_segment_inside_the_confining_environment( + random_walk['x'][-1], + random_walk['y'][-1], + random_walk['z'][-1], + 2.0*particle_radius, + 2.0*particle_radius, + confining_environment) + + # Check if the particle is overlapping with any other particle in the system + for rand_walk in random_walks: + particle_overlap = check_particle_vs_all_overlap(new_particle[0], + new_particle[1], + new_particle[2], + rand_walk, + 2.0*particle_radius) + if particle_overlap == 0: + break + if particle_overlap == 0: + continue + + # The current random walk is not yet in the list above + particle_overlap = check_particle_vs_all_overlap(new_particle[0], + new_particle[1], + new_particle[2], + random_walk, + 2.0*particle_radius) + if particle_overlap == 0: + continue + + random_walk['x'].append(new_particle[0]) + random_walk['y'].append(new_particle[1]) + random_walk['z'].append(new_particle[2]) + + #print "Successfully positioned random walk of %d particles" % number_of_particles + random_walks.append(random_walk) + + #print "Successfully generated random walk conformation!" + if center: + for random_walk in random_walks: + x_com, y_com, z_com = (0.0,0.0,0.0) + cnt = 0 + for (x,y,z) in zip(random_walk['x'],random_walk['y'],random_walk['z']): + x_com += x + y_com += y + z_com += z + cnt += 1 + x_com, y_com, z_com = (x_com/cnt,y_com/cnt,z_com/cnt) + + for i in range(len(random_walk['x'])): + random_walk['x'][i] -= x_com + random_walk['y'][i] -= y_com + random_walk['z'][i] -= z_com + + x_com, y_com, z_com = (0.0,0.0,0.0) + cnt = 0 + for (x,y,z) in zip(random_walk['x'],random_walk['y'],random_walk['z']): + x_com += x + y_com += y + z_com += z + cnt += 1 + x_com, y_com, z_com = (x_com/cnt,y_com/cnt,z_com/cnt) + + return random_walks + +########## + +def check_particle_vs_all_overlap(x,y,z,chromosome,overlap_radius): + particle_overlap = 1 + + for x0, y0, z0 in zip(chromosome['x'],chromosome['y'],chromosome['z']): + particle_overlap = check_particles_overlap(x0,y0,z0,x,y,z,overlap_radius) + if particle_overlap == 0: + return particle_overlap + + return particle_overlap + +########## + +def draw_second_extreme_of_a_segment_inside_the_confining_environment(x0, y0, z0, + segment_length, + object_radius, + confining_environment): + inside = 0 + while inside == 0: + particle = [] + temp_theta = arccos(2.0*random()-1.0) + temp_phi = 2*pi*random() + particle.append(x0 + segment_length * cos(temp_phi) * sin(temp_theta)) + particle.append(y0 + segment_length * sin(temp_phi) * sin(temp_theta)) + particle.append(z0 + segment_length * cos(temp_theta)) + # Check if the particle is inside the confining_environment + inside = check_point_inside_the_confining_environment(particle[0], + particle[1], + particle[2], + object_radius, + confining_environment) + + return particle + +########## + +def draw_second_extreme_of_a_segment(x0, y0, z0, + segment_length, + object_radius): + particle = [] + temp_theta = arccos(2.0*random()-1.0) + temp_phi = 2*pi*random() + particle.append(x0 + segment_length * cos(temp_phi) * sin(temp_theta)) + particle.append(y0 + segment_length * sin(temp_phi) * sin(temp_theta)) + particle.append(z0 + segment_length * cos(temp_theta)) + + return particle + +########## + +def draw_point_inside_the_confining_environment(confining_environment, object_radius): + #pick a point uniformly within the confining environment using the rejection method + + if confining_environment[0] == 'cube': + dimension_x = confining_environment[1] * 0.5 + dimension_y = confining_environment[1] * 0.5 + dimension_z = confining_environment[1] * 0.5 + if len(confining_environment) > 2: + print("# WARNING: Defined a cubical confining environment with reduntant paramenters.") + print("# Only 2 are needed the identifier and the side") + + if confining_environment[0] == 'sphere': + dimension_x = confining_environment[1] + dimension_y = confining_environment[1] + dimension_z = confining_environment[1] + if len(confining_environment) > 2: + print("# WARNING: Defined a spherical confining environment with reduntant paramenters.") + print("# Only 2 are needed the identifier and the radius") + + if confining_environment[0] == 'ellipsoid': + if len(confining_environment) < 4: + print("# ERROR: Defined an ellipsoidal confining environment without the necessary paramenters.") + print("# 4 are needed the identifier, the x-semiaxes, the y-semiaxes, and the z-semiaxes") + sys.exit() + dimension_x = confining_environment[1] + dimension_y = confining_environment[2] + dimension_z = confining_environment[3] + + if confining_environment[0] == 'cylinder': + if len(confining_environment) < 3: + print("# WARNING: Defined a cylindrical confining environment without the necessary paramenters.") + print("# 3 are needed the identifier, the basal radius, and the height") + sys.exit() + dimension_x = confining_environment[1] + dimension_y = confining_environment[1] + dimension_z = confining_environment[2] + + inside = 0 + while inside == 0: + particle = [] + particle.append((2.0*random()-1.0)*(dimension_x - object_radius)) + particle.append((2.0*random()-1.0)*(dimension_y - object_radius)) + particle.append((2.0*random()-1.0)*(dimension_z - object_radius)) + # Check if the particle is inside the confining_environment + inside = check_point_inside_the_confining_environment(particle[0], + particle[1], + particle[2], + object_radius, + confining_environment) + + return particle + +########## + +def check_point_inside_the_confining_environment(Px, Py, Pz, + object_radius, + confining_environment): + # The shapes are all centered in the origin + # - sphere : radius r + # - cube : side + # - cylinder : basal radius , height + # - ellipsoid : semi-axes a , b , c ; + + if confining_environment[0] == 'sphere': + radius = confining_environment[1] - object_radius + if ((Px*Px)/(radius*radius) + (Py*Py)/(radius*radius) + (Pz*Pz)/(radius*radius)) < 1.0 : return 1 + + if confining_environment[0] == 'ellipsoid': + a = confining_environment[1] - object_radius + b = confining_environment[2] - object_radius + c = confining_environment[3] - object_radius + if ((Px*Px)/(a*a) + (Py*Py)/(b*b) + (Pz*Pz)/(c*c)) < 1.0 : return 1 + + if confining_environment[0] == 'cube': + hside = confining_environment[1] * 0.5 - object_radius + if (((Px*Px)/(hside*hside)) < 1.0) and (((Py*Py)/(hside*hside)) < 1.0) and (((Pz*Pz)/(hside*hside)) < 1.0) : return 1 + + if confining_environment[0] == 'cylinder': + radius = confining_environment[1] - object_radius + half_height = confining_environment[2]*0.5 - object_radius + if (((Px*Px)/(radius*radius) + (Py*Py)/(radius*radius)) < 1.0) and (((Pz*Pz)/(half_height*half_height)) < 1.0): return 1 + + return 0 + +########## + +def check_segments_clashes(s1_P1, s1_P0, s2_P1, s2_P0, rosette_radius): + + # Check steric clashes without periodic boundary conditions + if distance_between_segments(s1_P1, s1_P0, s2_P1, s2_P0) < 2.0*rosette_radius: + # print "Clash between segments",s1_P1,s1_P0,"and",s2_P1_tmp,s2_P0_tmp,"at distance", distance + return 0 + + return 1 + +########## + +def check_segments_clashes_with_pbc(s1_P1, s1_P0, s2_P1, s2_P0, + rosette_radius, + confining_environment): + + # Check steric clashes with periodic boundary conditions + if distance_between_segments(s1_P1, s1_P0, s2_P1, s2_P0) < 2.0*rosette_radius: + # print "Clash between segments",s1_P1,s1_P0,"and",s2_P1_tmp,s2_P0_tmp,"at distance", distance + return 0 + + return 1 + +########## + +def distance_between_segments(s1_P1, s1_P0, s2_P1, s2_P0): + + # Inspiration: http://softsurfer.com/Archive/algorithm_0106/algorithm_0106.htm + # Copyright 2001, softSurfer (www.softsurfer.com) + # This code may be freely used and modified for any purpose + # providing that this copyright notice is included with it. + # SoftSurfer makes no warranty for this code, and cannot be held + # liable for any real or imagined damage resulting from its use. + # Users of this code must verify correctness for their application. + + u = [] + v = [] + w = [] + dP = [] + + for c_s1_P1,c_s1_P0,c_s2_P1,c_s2_P0 in zip(s1_P1, s1_P0, s2_P1, s2_P0): + u.append(c_s1_P1 - c_s1_P0) + v.append(c_s2_P1 - c_s2_P0) + w.append(c_s1_P0 - c_s2_P0) + + a = scalar_product(u, u) + b = scalar_product(u, v) + c = scalar_product(v, v) + d = scalar_product(u, w) + e = scalar_product(v, w) + + D = a*c - b*b + sD = tD = D + + if D < (1.0e-7): + # Segments almost parallel + sN = 0.0 + sD = 1.0 + tN = e + tD = c + else: + # Get the closest points on the infinite lines + sN = (b*e - c*d) + tN = (a*e - b*d) + if (sN < 0.0): + # sc < 0 => the s=0 edge is visible + sN = 0.0 + tN = e + tD = c + elif sN > sD: # sc > 1 => the s=1 edge is visible + sN = sD + tN = e + b + tD = c + + if tN < 0.0: # tc < 0 => the t=0 edge is visible + tN = 0.0 + # Recompute sc for this edge + if -d < 0.0: + sN = 0.0 + elif -d > a: + sN = sD + else: + sN = -d + sD = a + + elif tN > tD: # tc > 1 => the t=1 edge is visible + tN = tD + # Recompute sc for this edge + if (-d + b) < 0.0: + sN = 0 + elif (-d + b) > a: + sN = sD; + else: + sN = (-d + b) + sD = a + + # Finally do the division to get sc and tc + if abs(sN) < (1.0e-7): + sc = 0.0 + else: + sc = sN / sD + + if abs(tN) < (1.0e-7): + tc = 0.0 + else: + tc = tN / tD + + # Get the difference of the two closest points + for i in range(len(w)): + dP.append(w[i] + ( sc * u[i] ) - ( tc * v[i] )) # = S1(sc) - S2(tc) + + return norm(dP) # return the closest distance + +########## + +def rosettes_rototranslation(rosettes, segments_P1, segments_P0): + + for i in range(len(segments_P1)): + vector = [] + theta = [] + + for component_P1 , component_P0 in zip(segments_P1[i], segments_P0[i]): + vector.append(component_P1-component_P0) + + # Rotation Angles + theta.append(atan2(vector[1],vector[2])) + + x_temp_2 = vector[0] + y_temp_2 = cos(theta[0]) * vector[1] - sin(theta[0]) * vector[2] + z_temp_2 = sin(theta[0]) * vector[1] + cos(theta[0]) * vector[2] + theta.append(atan2(x_temp_2,z_temp_2)) + + x_temp_1 = cos(theta[1]) * x_temp_2 - sin(theta[1]) * z_temp_2 + y_temp_1 = y_temp_2 + z_temp_1 = sin(theta[1]) * x_temp_2 + cos(theta[1]) * z_temp_2 + + if(z_temp_1 < 0.0): + z_temp_1 = -z_temp_1 + theta.append(pi) + else: + theta.append(0.0) + #print x_temp_1 , y_temp_1 , z_temp_1 + + # Chromosome roto-translations + for particle in range(len(rosettes[i]['x'])): + + x_temp_2 = rosettes[i]['x'][particle] + y_temp_2 = cos(theta[2]) * rosettes[i]['y'][particle] + sin(theta[2]) * rosettes[i]['z'][particle] + z_temp_2 = - sin(theta[2]) * rosettes[i]['y'][particle] + cos(theta[2]) * rosettes[i]['z'][particle] + + x_temp_1 = cos(theta[1]) * x_temp_2 + sin(theta[1]) * z_temp_2 + y_temp_1 = y_temp_2 + z_temp_1 = - sin(theta[1]) * x_temp_2 + cos(theta[1]) * z_temp_2 + + x = x_temp_1; + y = cos(theta[0]) * y_temp_1 + sin(theta[0]) * z_temp_1; + z = - sin(theta[0]) * y_temp_1 + cos(theta[0]) * z_temp_1; + + # Chromosome translations + rosettes[i]['x'][particle] = segments_P0[i][0] + x; + rosettes[i]['y'][particle] = segments_P0[i][1] + y; + rosettes[i]['z'][particle] = segments_P0[i][2] + z; + return rosettes + +########## + +def scalar_product(a, b): + + scalar = 0.0 + for c_a,c_b in zip(a,b): + scalar = scalar + c_a*c_b + + return scalar + +########## + +def norm(a): + + return sqrt(scalar_product(a, a)) + +########## + +def write_initial_conformation_file(chromosomes, + chromosome_particle_numbers, + confining_environment, + out_file="Initial_conformation.dat", + atom_types=1, + angle_types=1, + bond_types=1): + # Choosing the appropriate xlo, xhi...etc...depending on the confining environment + xlim = [] + ylim = [] + zlim = [] + if confining_environment[0] == 'sphere': + radius = confining_environment[1] + 1.0 + xlim.append(-radius) + xlim.append(radius) + ylim.append(-radius) + ylim.append(radius) + zlim.append(-radius) + zlim.append(radius) + + if confining_environment[0] == 'ellipsoid': + a = confining_environment[1] + 1.0 + b = confining_environment[2] + 1.0 + c = confining_environment[3] + 1.0 + xlim.append(-a) + xlim.append(a) + ylim.append(-b) + ylim.append(b) + zlim.append(-c) + zlim.append(c) + + if confining_environment[0] == 'cube': + hside = confining_environment[1] * 0.5 + xlim.append(-hside) + xlim.append(hside) + ylim.append(-hside) + ylim.append(hside) + zlim.append(-hside) + zlim.append(hside) + + if confining_environment[0] == 'cylinder': + radius = confining_environment[1] + 1.0 + hheight = confining_environment[2] * 0.5 + 1.0 + xlim.append(-radius) + xlim.append(radius) + ylim.append(-radius) + ylim.append(radius) + zlim.append(-hheight) + zlim.append(hheight) + + fileout = open(out_file,'w') + n_chr=len(chromosomes) + n_atoms=0 + for n in chromosome_particle_numbers: + n_atoms+=n + + fileout.write("LAMMPS input data file \n\n") + fileout.write("%9d atoms\n" % (n_atoms)) + fileout.write("%9d bonds\n" % (n_atoms-n_chr)) + fileout.write("%9d angles\n\n" % (n_atoms-2*n_chr)) + fileout.write("%9s atom types\n" % atom_types) + fileout.write("%9s bond types\n" % bond_types) + fileout.write("%9s angle types\n\n" % angle_types) + fileout.write("%6.3lf %6.3lf xlo xhi\n" % (xlim[0], xlim[1])) + fileout.write("%6.3lf %6.3lf ylo yhi\n" % (ylim[0], ylim[1])) + fileout.write("%6.3lf %6.3lf zlo zhi\n" % (zlim[0], zlim[1])) + + fileout.write("\n Atoms \n\n") + particle_number = 1 + for chromosome in chromosomes: + for x,y,z in zip(chromosome['x'],chromosome['y'],chromosome['z']): + fileout.write("%-8d %s %s %7.4lf %7.4lf %7.4lf\n" % (particle_number, "1", "1", x, y, z)) + particle_number += 1 + + # for(i = 0; i < N_NUCL; i++) + # { + # k++; + # fileout.write("%5d %s %s %7.4lf %7.4lf %7.4lf \n", k, "1", "1", P[i][0], P[i][1], P[i][2]); + # } + + fileout.write("\n Bonds \n\n") + bond_number = 1 + first_particle_index = 1 + for chromosome in chromosomes: + for i in range(len(chromosome['x'])-1): + fileout.write("%-4d %s %4d %4d\n" % (bond_number, "1", first_particle_index, first_particle_index+1)) + bond_number += 1 + first_particle_index += 1 + first_particle_index += 1 # I have to go to the end of the chromosome! + + fileout.write("\n Angles \n\n") + angle_number = 1 + first_particle_index = 1 + for chromosome in chromosomes: + for i in range(len(chromosome['x'])-2): + fileout.write("%-4d %s %5d %5d %5d\n" % (angle_number, "1", first_particle_index, first_particle_index+1, first_particle_index+2)) + angle_number += 1 + first_particle_index += 1 + first_particle_index += 2 # I have to go to the end of the chromosome! + + fileout.close() + +########## + +def distance(x0,y0,z0,x1,y1,z1): + return sqrt((x0-x1)*(x0-x1)+(y0-y1)*(y0-y1)+(z0-z1)*(z0-z1)) + +########## + +def check_particles_overlap(x0,y0,z0,x1,y1,z1,overlap_radius): + if distance(x0,y0,z0,x1,y1,z1) < overlap_radius: + #print "Particle %f %f %f and particle %f %f %f are overlapping\n" % (x0,y0,z0,x1,y1,z1) + return 0 + return 1 + +########## + +def store_conformation_with_pbc(xc, result, confining_environment): + # Reconstruct the different molecules and store them separatelly + ix , iy , iz = (0, 0, 0) + ix_tmp, iy_tmp, iz_tmp = (0, 0, 0) + x_tmp , y_tmp , z_tmp = (0, 0, 0) + + molecule_number = 0 # We start to count from molecule number 0 + + particles = [] + particles.append({}) + particles[molecule_number]['x'] = [] + particles[molecule_number]['y'] = [] + particles[molecule_number]['z'] = [] + + particle_counts = [] + particle_counts.append({}) # Initializing the particle counts for the first molecule + + max_bond_length = (1.5*1.5) # This is the default polymer-based bond length + + for i in range(0,len(xc),3): + particle = int(i/3) + + x = xc[i] + ix * confining_environment[1] + y = xc[i+1] + iy * confining_environment[1] + z = xc[i+2] + iz * confining_environment[1] + + # A - Check whether the molecule is broken because of pbc + # or if we are changing molecule + if particle > 0: + + # Compute the bond_length + bond_length = (particles[molecule_number]['x'][-1] - x)* \ + (particles[molecule_number]['x'][-1] - x)+ \ + (particles[molecule_number]['y'][-1] - y)* \ + (particles[molecule_number]['y'][-1] - y)+ \ + (particles[molecule_number]['z'][-1] - z)* \ + (particles[molecule_number]['z'][-1] - z) + + # Check whether the bond is too long. This could mean: + # 1 - Same molecule disjoint by pbc + # 2 - Different molecules + if bond_length > max_bond_length: + min_bond_length = bond_length + x_tmp , y_tmp , z_tmp = (x, y, z) + + # Check if we are in case 1: the same molecule continues + # in a nearby box + indeces_sets = product([-1, 0, 1], + [-1, 0, 1], + [-1, 0, 1]) + + for (l, m, n) in indeces_sets: + # Avoid to check again the same periodic copy + if (l, m, n) == (0, 0, 0): + continue + + # Propose a new particle position + x = xc[i] + (ix + l) * confining_environment[1] + y = xc[i+1] + (iy + m) * confining_environment[1] + z = xc[i+2] + (iz + n) * confining_environment[1] + + # Check the new bond length + bond_length = (particles[molecule_number]['x'][-1] - x)* \ + (particles[molecule_number]['x'][-1] - x)+ \ + (particles[molecule_number]['y'][-1] - y)* \ + (particles[molecule_number]['y'][-1] - y)+ \ + (particles[molecule_number]['z'][-1] - z)* \ + (particles[molecule_number]['z'][-1] - z) + + # Store the periodic copy with the minimum bond length + if bond_length < min_bond_length: + #print bond_length + x_tmp , y_tmp , z_tmp = (x , y , z ) + ix_tmp, iy_tmp, iz_tmp = (ix+l, iy+m, iz+n) + min_bond_length = bond_length + + # If the minimum bond length is yet too large + # we are dealing with case 2 + if min_bond_length > 10.: + # Start another molecule + molecule_number += 1 + + particles.append({}) + particles[molecule_number]['x'] = [] + particles[molecule_number]['y'] = [] + particles[molecule_number]['z'] = [] + + + particle_counts.append({}) # Initializing the particle counts for the new molecule + + # If the minimum bond length is sufficiently short + # we are dealing with case 2 + ix, iy, iz = (ix_tmp, iy_tmp, iz_tmp) + x , y , z = (x_tmp , y_tmp , z_tmp) + + # To fullfill point B (see below), we have to count how many + # particle we have of each molecule for each triplet + # (ix, iy, iz) + try: + particle_counts[molecule_number][(ix, iy, iz)] += 1.0 + except: + particle_counts[molecule_number][(ix, iy, iz)] = 0.0 + particle_counts[molecule_number][(ix, iy, iz)] += 1.0 + + particles[molecule_number]['x'].append(x) + particles[molecule_number]['y'].append(y) + particles[molecule_number]['z'].append(z) + + # B - Store in the final arrays each molecule in the periodic copy + # with more particle in the primary cell (0, 0, 0) + for molecule in range(molecule_number+1): + max_number = 0 + # Get the periodic box with more particles + for (l, m, n) in particle_counts[molecule]: + if particle_counts[molecule][(l, m, n)] > max_number: + ix, iy, iz = (l, m, n) + max_number = particle_counts[molecule][(l, m, n)] + + # Translate the molecule to include the largest portion of particles + # in the (0, 0, 0) image + for (x, y, z) in zip(particles[molecule]['x'],particles[molecule]['y'],particles[molecule]['z']): + x = x - ix * confining_environment[1] + y = y - iy * confining_environment[1] + z = z - iz * confining_environment[1] + + result['x'].append(x) + result['y'].append(y) + result['z'].append(z) + + +##### Loop extrusion dynamics functions ##### +def read_target_loops_input(input_filename, chromosome_length, percentage): + # Open input file + fp_input = open(input_filename, "r") + + loops = [] + target_loops = [] + # Get each loop per line and fill the output list of loops + for line in fp_input.readlines(): + + if line.startswith('#'): + continue + + splitted = line.strip().split() + loop = [] + loop.append(int(splitted[1])) + loop.append(int(splitted[2])) + + loops.append(loop) + + #ntarget_loops = int(len(loops)*percentage/100.) + ntarget_loops = int(len(loops)) + shuffle(loops) + target_loops = loops[0:ntarget_loops] + + return target_loops + +########## + +#def draw_loop_extrusion_starting_points(target_loops, chromosome_length): +# initial_loops = [] + # Scroll the target loops and draw a point between each start and stop +# for target_loop in target_loops: + +# random_particle = randint(target_loop[0], target_loop[1]) + +# loop = [] +# loop.append(random_particle) +# loop.append(random_particle+1) + +# initial_loops.append(loop) + +# return initial_loops + +def draw_loop_extrusion_starting_point(chromosome_length): + + # draw a starting point for extrusion along the chromosome + random_particle = randint(1, chromosome_length-1) + + return [random_particle,random_particle+1] + + + +########## + +def get_maximum_number_of_extruded_particles(target_loops, initial_loops): + # The maximum is the maximum sequence distance between a target start/stop particle of a loop + # and the initial random start/stop of a loop + maximum = 0 + + for target_loop,initial_loop in zip(target_loops,initial_loops): + #print initial_loop,target_loop + + l = abs(target_loop[0]-initial_loop[0])+1 + if l > maximum: + maximum = l + + l = abs(target_loop[1]-initial_loop[1])+1 + if l > maximum: + maximum = l + + return maximum + +########## + +def compute_particles_distance(xc): + + particles = [] + distances = {} + + # Getting the coordiantes of the particles + for i in range(0,len(xc),3): + x = xc[i] + y = xc[i+1] + z = xc[i+2] + particles.append((x, y, z)) + + # Checking whether the restraints are satisfied + for pair in combinations(range(len(particles)), 2): + dist = distance(particles[pair[0]][0], + particles[pair[0]][1], + particles[pair[0]][2], + particles[pair[1]][0], + particles[pair[1]][1], + particles[pair[1]][2]) + distances[pair] = dist + + return distances + +########## + +def compute_the_percentage_of_satysfied_restraints(input_file_name, + restraints, + output_file_name, + time_point, + timesteps_per_k_change): + + ### Change this function to use a posteriori the out.colvars.traj file similar to the obj funct calculation ### + infile = open(input_file_name , "r") + outfile = open(output_file_name, "w") + if os.path.getsize(output_file_name) == 0: + outfile.write("#%s %s %s %s\n" % ("timestep","satisfied", "satisfiedharm", "satisfiedharmLowBound")) + + #restraints[pair] = [time_dependent_restraints[time_point+1][pair][0], # Restraint type -> Is the one at time point time_point+1 + #time_dependent_restraints[time_point][pair][1]*10., # Initial spring constant + #time_dependent_restraints[time_point+1][pair][1]*10., # Final spring constant + #time_dependent_restraints[time_point][pair][2], # Initial equilibrium distance + #time_dependent_restraints[time_point+1][pair][2], # Final equilibrium distance + #int(time_dependent_steering_pairs['timesteps_per_k_change']*0.5)] # Number of timesteps for the gradual change + + # Write statistics on the restraints + nharm = 0 + nharmLowBound = 0 + ntot = 0 + for pair in restraints: + for i in range(len(restraints[pair][0])): + if restraints[pair][0][i] == "Harmonic": + nharm += 1 + ntot += 1 + if restraints[pair][0][i] == "HarmonicLowerBound": + nharmLowBound += 1 + ntot += 1 + outfile.write("#NumOfRestraints = %s , Harmonic = %s , HarmonicLowerBound = %s\n" % (ntot, nharm, nharmLowBound)) + + # Memorizing the restraint + restraints_parameters = {} + for pair in restraints: + for i in range(len(restraints[pair][0])): + #E_hlb_pot_p_106_189 + if restraints[pair][0][i] == "Harmonic": + name = "E_h_pot_%d_%d_%d" % (i, int(pair[0])+1, int(pair[1])+1) + if restraints[pair][0][i] == "HarmonicLowerBound": + name ="E_hlb_pot_%d_%d_%d" % (i, int(pair[0])+1, int(pair[1])+1) + restraints_parameters[name] = [restraints[pair][0][i], + restraints[pair][1][i], + restraints[pair][2][i], + restraints[pair][3][i], + restraints[pair][4][i], + restraints[pair][5][i]] + #print restraints_parameters + + # Checking whether the restraints are satisfied + columns_to_consider = {} + for line in infile.readlines(): + nsatisfied = 0. + nsatisfiedharm = 0. + nsatisfiedharmLowBound = 0. + ntot = 0. + ntotharm = 0. + ntotharmLowBound = 0. + + line = line.strip().split() + + # Checking which columns contain the pairwise distance + if line[0][0] == "#": + for column in range(2,len(line)): + # Get the columns with the distances + if "_pot_" not in line[column]: + columns_to_consider[column-1] = line[column] + #print columns_to_consider + else: + for column in range(1,len(line)): + if column in columns_to_consider: + if column >= len(line): + continue + dist = float(line[column]) + + # Get which restraints are between the 2 particles + for name in ["E_h_pot_"+columns_to_consider[column], "E_hlb_pot_"+columns_to_consider[column]]: + if name not in restraints_parameters: + #print "Restraint %s not present" % name + continue + else: + pass + #print name, restraints_parameters[name] + + restrainttype = restraints_parameters[name][0] + restraintd0 = float(restraints_parameters[name][3]) + float(line[0])/float(restraints_parameters[name][5])*(float(restraints_parameters[name][4]) - float(restraints_parameters[name][3])) + restraintk = float(restraints_parameters[name][1]) + float(line[0])/float(restraints_parameters[name][5])*(float(restraints_parameters[name][2]) - float(restraints_parameters[name][1])) + sqrt_k = sqrt(restraintk) + limit1 = restraintd0 - 2./sqrt_k + limit2 = restraintd0 + 2./sqrt_k + + if restrainttype == "Harmonic": + if dist >= limit1 and dist <= limit2: + nsatisfied += 1.0 + nsatisfiedharm += 1.0 + #print "#ESTABLISHED",time_point,name,restraints_parameters[name],limit1,dist,limit2 + else: + pass + #print "#NOESTABLISHED",time_point,name,restraints_parameters[name],limit1,dist,limit2 + ntotharm += 1.0 + if restrainttype == "HarmonicLowerBound": + if dist >= restraintd0: + nsatisfied += 1.0 + nsatisfiedharmLowBound += 1.0 + #print "#ESTABLISHED",time_point,name,restraints_parameters[name],dist,restraintd0 + else: + pass + #print "#NOESTABLISHED",time_point,name,restraints_parameters[name],dist,restraintd0 + ntotharmLowBound += 1.0 + ntot += 1.0 + #print int(line[0])+(time_point)*timesteps_per_k_change, nsatisfied, ntot, nsatisfiedharm, ntotharm, nsatisfiedharmLowBound, ntotharmLowBound + if ntotharm == 0.: + ntotharm = 1.0 + if ntotharmLowBound == 0.: + ntotharmLowBound = 1.0 + + + outfile.write("%d %lf %lf %lf\n" % (int(line[0])+(time_point)*timesteps_per_k_change, nsatisfied/ntot*100., nsatisfiedharm/ntotharm*100., nsatisfiedharmLowBound/ntotharmLowBound*100.)) + infile.close() + outfile.close() + +########## + +def read_objective_function(fname): + + obj_func=[] + fhandler = open(fname) + line = next(fhandler) + try: + while True: + if line.startswith('#'): + line = next(fhandler) + continue + line = line.strip() + if len(line) == 0: + continue + line_vals = line.split() + obj_func.append(float(line_vals[1])) + line = next(fhandler) + except StopIteration: + pass + fhandler.close() + + return obj_func +########## + +def compute_the_objective_function(input_file_name, + output_file_name, + time_point, + timesteps_per_k_change): + + + infile = open(input_file_name , "r") + outfile = open(output_file_name, "w") + if os.path.getsize(output_file_name) == 0: + outfile.write("#Timestep obj_funct\n") + + columns_to_consider = [] + + # Checking which columns contain the energies to sum + for line in infile.readlines(): + line = line.strip().split() + + # Checking which columns contain the energies to sum + if line[0][0] == "#": + for column in range(len(line)): + if "_pot_" in line[column]: + columns_to_consider.append(column-1) + else: + obj_funct = 0.0 + for column in columns_to_consider: + if column < len(line): + obj_funct += float(line[column]) + outfile.write("%d %s\n" % (int(line[0])+timesteps_per_k_change*(time_point), obj_funct)) + + infile.close() + outfile.close() + + +### get unique list ### + +def get_list(input_list): + + output_list = [] + + for element in input_list: + #print(type(element)) + if isinstance(element, (int)): + output_list.append(element) + if isinstance(element, (list)): + for subelement in element: + output_list.append(subelement) + if isinstance(element, (tuple)): + for subelement in range(element[0],element[1]+1,element[2]): + output_list.append(subelement) + return output_list + +########## +#MPI.Finalize() diff --git a/taddyn/modelling/lammps_modelling.py b/tadphys/modelling/lammps_modelling.py~ similarity index 91% rename from taddyn/modelling/lammps_modelling.py rename to tadphys/modelling/lammps_modelling.py~ index 7259c59..3cf7459 100644 --- a/taddyn/modelling/lammps_modelling.py +++ b/tadphys/modelling/lammps_modelling.py~ @@ -5,7 +5,7 @@ """ from string import ascii_uppercase as uc, ascii_lowercase as lc from os.path import exists -from random import randint, seed, random, sample, shuffle +from random import uniform, randint, seed, random, sample, shuffle from pickle import load, dump from multiprocessing.dummy import Pool as ThreadPool from functools import partial @@ -21,11 +21,12 @@ from numpy import sin, cos, arccos, sqrt, fabs, pi import numpy as np +from mpi4py import MPI from lammps import lammps -from taddyn.modelling import LAMMPS_CONFIG as CONFIG -from taddyn.modelling.lammpsmodel import LAMMPSmodel -from taddyn.modelling.restraints import HiCBasedRestraints +from tadphys.modelling import LAMMPS_CONFIG as CONFIG +from tadphys.modelling.lammpsmodel import LAMMPSmodel +from tadphys.modelling.restraints import HiCBasedRestraints class InitalConformationError(Exception): """ @@ -102,7 +103,7 @@ def generate_lammps_models(zscores, resolution, nloci, start=1, n_models=5000, :: - from taddyn.modelling.HIC_CONFIG import HIC_CONFIG + from tadphys.modelling.HIC_CONFIG import HIC_CONFIG where CONFIG is a dictionary of dictionaries to be passed to this function: @@ -164,11 +165,11 @@ def generate_lammps_models(zscores, resolution, nloci, start=1, n_models=5000, restart_file != False :param False useColvars: True if you want the restrains to be loaded by colvars - :returns: a TADdyn models dictionary + :returns: a Tadphys models dictionary """ if not tmp_folder: - tmp_folder = '/tmp/taddyn_tmp_%s/' % ( + tmp_folder = '/tmp/tadphys_tmp_%s/' % ( ''.join([(uc + lc)[int(random() * 52)] for _ in range(4)])) else: if tmp_folder[-1] != '/': @@ -198,7 +199,8 @@ def generate_lammps_models(zscores, resolution, nloci, start=1, n_models=5000, if first is None: first = min([int(j) for i in zscores[0] for j in zscores[0][i]] + [int(i) for i in zscores[0]]) - LOCI = list(range(first, nloci + first)) + LOCI = list(range(first, nloci + first)) + LOCI = 20000 # random inital number global START @@ -269,19 +271,25 @@ def generate_lammps_models(zscores, resolution, nloci, start=1, n_models=5000, single_m['z'][i] -= cm0['z'] ini_sm_model = [[single_sm.copy()] for single_sm in sm] - models, ini_model = lammps_simulate(lammps_folder=tmp_folder, run_time=run_time, - initial_conformation=ini_sm_model, - connectivity=connectivity, - steering_pairs=steering_pairs, - time_dependent_steering_pairs=time_dependent_steering_pairs, - initial_seed=initial_seed, - n_models=n_keep, n_keep=n_keep, n_cpus=n_cpus, - keep_restart_out_dir=keep_restart_out_dir, - confining_environment=container, timeout_job=timeout_job, - cleanup=cleanup, to_dump=int(timesteps_per_k/100.), - hide_log=hide_log, restart_path=restart_path, - store_n_steps=store_n_steps, - useColvars=useColvars) + models, ini_model = lammps_simulate(lammps_folder=tmp_folder, + run_time=run_time, + initial_conformation=ini_sm_model, + connectivity=connectivity, + steering_pairs=steering_pairs, + time_dependent_steering_pairs=time_dependent_steering_pairs, + initial_seed=initial_seed, + n_models=n_keep, + n_keep=n_keep, + n_cpus=n_cpus, + keep_restart_out_dir=keep_restart_out_dir, + confining_environment=container, + timeout_job=timeout_job, + cleanup=cleanup, to_dump=int(timesteps_per_k/100.), + hide_log=hide_log, + chromosome_particle_numbers=chromosome_particle_numbers, + restart_path=restart_path, + store_n_steps=store_n_steps, + useColvars=useColvars) # for i, m in enumerate(models.values()): # m['index'] = i if outfile: @@ -347,6 +355,7 @@ def generate_lammps_models(zscores, resolution, nloci, start=1, n_models=5000, def init_lammps_run(lmp, initial_conformation, neighbor=CONFIG.neighbor, hide_log=True, + chromosome_particle_numbers=None, connectivity="FENE", restart_file=False): @@ -476,15 +485,18 @@ def lammps_simulate(lammps_folder, run_time, connectivity="FENE", initial_seed=0, n_models=500, n_keep=100, neighbor=CONFIG.neighbor, tethering=True, - minimize=True, compress_with_pbc=False, + minimize=True, compress_with_pbc=False, compress_without_pbc=False, + initial_relaxation=None, keep_restart_out_dir=None, outfile=None, n_cpus=1, confining_environment=['cube',100.], steering_pairs=None, time_dependent_steering_pairs=None, - loop_extrusion_dynamics=None, cleanup = True, + compartmentalization=None, + loop_extrusion_dynamics=None, cleanup = False, to_dump=100000, pbc=False, timeout_job=3600, hide_log=True, + chromosome_particle_numbers=None, restart_path=False, store_n_steps=10, useColvars=False): @@ -540,7 +552,7 @@ def lammps_simulate(lammps_folder, run_time, restart_file != False :param False useColvars: True if you want the restrains to be loaded by colvars - :returns: a TADdyn models dictionary + :returns: a Tadphys models dictionary """ @@ -557,7 +569,7 @@ def lammps_simulate(lammps_folder, run_time, timepoints = (len(time_dependent_steering_pairs['colvar_input'])-1) * \ time_dependent_steering_pairs['colvar_dump_freq'] - chromosome_particle_numbers = [int(x) for x in [len(LOCI)]] + #chromosome_particle_numbers = [int(x) for x in [len(LOCI)]] chromosome_particle_numbers.sort(key=int,reverse=True) kseeds = [] @@ -662,21 +674,24 @@ def collect_result(result): failedSeedLog=[failedSeedLog, k]) pool.apply_async(jobs[k], args=(k, k_folder, run_time, - ini_conf, connectivity, - neighbor, - tethering, minimize, - compress_with_pbc, compress_without_pbc, - confining_environment, - steering_pairs, - time_dependent_steering_pairs, - loop_extrusion_dynamics, - to_dump, pbc, hide_log, - keep_restart_out_dir2, - restart_file, - model_path, - store_n_steps, - useColvars,), callback=collect_result) - # , timeout=timeout_job) + ini_conf, connectivity, + neighbor, + tethering, minimize, + compress_with_pbc, compress_without_pbc, + initial_relaxation, + confining_environment, + steering_pairs, + time_dependent_steering_pairs, + compartmentalization, + loop_extrusion_dynamics, + to_dump, pbc, hide_log, + chromosome_particle_numbers, + keep_restart_out_dir2, + restart_file, + model_path, + store_n_steps, + useColvars,), callback=collect_result) + # , timeout=timeout_job) pool.close() pool.join() @@ -731,12 +746,15 @@ def run_lammps(kseed, lammps_folder, run_time, neighbor=CONFIG.neighbor, tethering=False, minimize=True, compress_with_pbc=None, compress_without_pbc=None, + initial_relaxation=None, confining_environment=None, steering_pairs=None, time_dependent_steering_pairs=None, + compartmentalization=None, loop_extrusion_dynamics=None, to_dump=10000, pbc=False, hide_log=True, + chromosome_particle_numbers=None, keep_restart_out_dir2=None, restart_file=False, model_path=False, @@ -812,8 +830,11 @@ def run_lammps(kseed, lammps_folder, run_time, :returns: a LAMMPSModel object """ + - lmp = lammps(cmdargs=['-screen','none','-log',lammps_folder+'log.lammps','-nocite']) + lmp = lammps(cmdargs=['-package','omp','4','-suffix','omp','-screen','none','-log',lammps_folder+'log.lammps','-nocite']) + me = MPI.COMM_WORLD.Get_rank() + nprocs = MPI.COMM_WORLD.Get_size() # check if we have a restart file or a path to which restart if restart_file == False: doRestart = False @@ -840,9 +861,10 @@ def run_lammps(kseed, lammps_folder, run_time, restart_file=restart_file) else: init_lammps_run(lmp, initial_conformation, - neighbor=neighbor, - hide_log=hide_log, - connectivity=connectivity) + neighbor=neighbor, + hide_log=hide_log, + chromosome_particle_numbers=chromosome_particle_numbers, + connectivity=connectivity) lmp.command("dump 1 all custom %i %slangevin_dynamics_*.XYZ id xu yu zu" % (to_dump,lammps_folder)) #lmp.command("dump_modify 1 format line \"%d %.5f %.5f %.5f\" sort id append yes") @@ -873,6 +895,7 @@ def run_lammps(kseed, lammps_folder, run_time, # Define the langevin dynamics integrator lmp.command("fix 1 all nve") lmp.command("fix 2 all langevin 1.0 1.0 2.0 %i" % kseed) + # Define the tethering to the center of the confining environment if tethering: lmp.command("fix 3 all spring tether 50.0 0.0 0.0 0.0 0.0") @@ -972,6 +995,14 @@ def run_lammps(kseed, lammps_folder, run_time, print("# New particle density (nparticles/volume)", lmp.get_natoms()/volume) print("") + if initial_relaxation: + if to_dump: + lmp.command("undump 1") + lmp.command("dump 1 all custom %i %sinitial_relaxation_*.XYZ id xu yu zu" % (to_dump,lammps_folder)) + lmp.command("reset_timestep 0") + lmp.command("run %i" % initial_relaxation) + lmp.command("write_data relaxed_conformation.txt nocoeff") + timepoints = 1 xc = [] # Setup the pairs to co-localize using the COLVARS plug-in @@ -1068,7 +1099,7 @@ def run_lammps(kseed, lammps_folder, run_time, sys.stdout.flush() time_dependent_steering_pairs['colvar_output'] = lammps_folder+os.path.basename(time_dependent_steering_pairs['colvar_output']) - # Performing the adaptation step from initial conformation to TADdyn excluded volume + # Performing the adaptation step from initial conformation to Tadphys excluded volume if time_dependent_steering_pairs['adaptation_step']: restraints = {} for time_point in time_points[0:1]: @@ -1077,7 +1108,7 @@ def run_lammps(kseed, lammps_folder, run_time, # This Adaptation could be discarded in the trajectory files. if to_dump: lmp.command("undump 1") - lmp.command("dump 1 all custom %i %sadapting_MD_from_initial_conformation_to_TADdyn_at_time_point_%s.XYZ id xu yu zu" % (to_dump, lammps_folder, time_point)) + lmp.command("dump 1 all custom %i %sadapting_MD_from_initial_conformation_to_Tadphys_at_time_point_%s.XYZ id xu yu zu" % (to_dump, lammps_folder, time_point)) lmp.command("dump_modify 1 format line \"%d %.5f %.5f %.5f\" sort id append yes") restraints[time_point] = {} @@ -1299,42 +1330,93 @@ def run_lammps(kseed, lammps_folder, run_time, os.remove("%sout.colvars.traj.BAK" % lammps_folder) + # Set interactions for chromosome compartmentalization + if compartmentalization: + if to_dump: + lmp.command("undump 1") + lmp.command("dump 1 all custom %i %scompartmentalization_*.XYZ id xu yu zu" % (to_dump,lammps_folder)) + + # First we have to partition the genome in the defined compartments + for group in compartmentalization['partition']: + list_of_particles = get_list(compartmentalization['partition'][group]) + for atom in list_of_particles: + #print("set atom %s type %s" % (atom,group+1)) + lmp.command("set atom %s type %s" % (atom,group+1)) + + # Second we have to define the type of interactions + for pair in compartmentalization['interactions']: + #pair_coeff t1 t2 epsilon sigma rc + t1 = pair[0]+1 + t2 = pair[1]+1 + if t1 > t2: + t1 = pair[1]+1 + t2 = pair[0]+1 + + epsilon = compartmentalization['interactions'][pair][1] + + try: + sigma1 = compartmentalization['radii'][pair[0]] + except: + sigma1 = 0.5 + try: + sigma2 = compartmentalization['radii'][pair[1]] + except: + sigma2 = 0.5 + sigma = sigma1 + sigma2 + + if compartmentalization['interactions'][pair][0] == "attraction": + rc = sigma * 2.5 + if compartmentalization['interactions'][pair][0] == "repulsion": + rc = sigma * 1.12246152962189 + + print("pair_coeff %s %s lj/cut %s %s %s" % (t1,t2,epsilon,sigma,rc)) + lmp.command("pair_coeff %s %s lj/cut %s %s %s" % (t1,t2,epsilon,sigma,rc)) + try: + lmp.command("run %s" % (compartmentalization['runtime'])) + except: + pass + + # Setup the pairs to co-localize using the COLVARS plug-in if loop_extrusion_dynamics: # Start relaxation step - lmp.command("reset_timestep 0") - lmp.command("run %i" % loop_extrusion_dynamics['timesteps_relaxation']) - + try: + lmp.command("reset_timestep 0") + lmp.command("run %i" % loop_extrusion_dynamics['timesteps_relaxation']) + except: + pass + lmp.command("reset_timestep 0") # Start Loop extrusion dynamics if to_dump: lmp.command("undump 1") lmp.command("dump 1 all custom %i %sloop_extrusion_MD_*.XYZ id xu yu zu" % (to_dump,lammps_folder)) - #lmp.command("dump_modify 1 format line \"%d %.5f %.5f %.5f\" sort id") - - # List of target loops of the form [(loop1_start,loop1_stop),...,(loopN_start,loopN_stop)] - target_loops = read_target_loops_input(loop_extrusion_dynamics['target_loops_input'], - loop_extrusion_dynamics['chrlength'], - loop_extrusion_dynamics['perc_enfor_loops']) # Randomly extract starting point of the extrusion dynamics between start and stop - initial_loops = draw_loop_extrusion_starting_points(target_loops, - loop_extrusion_dynamics['chrlength']) - - # Maximum number of particles to be extruded during the dynamics - maximum_number_of_extruded_particles = get_maximum_number_of_extruded_particles(target_loops, initial_loops) - print("Number of LES",maximum_number_of_extruded_particles) + extruders_positions = [] + nextruders = int(chromosome_particle_numbers[0]/loop_extrusion_dynamics['separation']) + print(nextruders) + for extruder in range(nextruders): + extruders_positions.append(draw_loop_extrusion_starting_point(loop_extrusion_dynamics['chrlength'][0])) + + # Initialise the lifetime of each extruder + extruders_lifetimes = [] + for extruder in range(nextruders): + extruders_lifetimes.append(int(0)) + print(extruders_positions, extruders_lifetimes) + + lmp.command("compute xu all property/atom xu") + lmp.command("compute yu all property/atom yu") + lmp.command("compute zu all property/atom zu") # Loop extrusion steps - for LES in range(1,maximum_number_of_extruded_particles): - - # Loop extrusion steps - - # Update the Lennard-Jones coefficients between extruded particles + for LES in range(int(run_time/loop_extrusion_dynamics['extrusion_time'])): + thermo_style="thermo_style custom step temp epair emol" + + # Update the bond restraint loop_number = 1 - for particle1,particle2 in initial_loops: - + for particle1,particle2 in extruders_positions: print("# fix LE%i all restrain bond %i %i %f %f %f" % (loop_number, particle1, particle2, @@ -1348,15 +1430,38 @@ def run_lammps(kseed, lammps_folder, run_time, loop_extrusion_dynamics['attraction_strength'], loop_extrusion_dynamics['attraction_strength'], loop_extrusion_dynamics['equilibrium_distance'])) - + lmp.command("variable x%i equal c_xu[%i]" % (particle1, particle1)) + lmp.command("variable x%i equal c_xu[%i]" % (particle2, particle2)) + lmp.command("variable y%i equal c_yu[%i]" % (particle1, particle1)) + lmp.command("variable y%i equal c_yu[%i]" % (particle2, particle2)) + lmp.command("variable z%i equal c_zu[%i]" % (particle1, particle1)) + lmp.command("variable z%i equal c_zu[%i]" % (particle2, particle2)) + + lmp.command("variable xLE%i equal v_x%i-v_x%i" % (loop_number, particle1, particle2)) + lmp.command("variable yLE%i equal v_y%i-v_y%i" % (loop_number, particle1, particle2)) + lmp.command("variable zLE%i equal v_z%i-v_z%i" % (loop_number, particle1, particle2)) + lmp.command("variable dist_%i_%i equal sqrt(v_xLE%i*v_xLE%i+v_zLE%i*v_zLE%i+v_zLE%i*v_zLE%i)" % (particle1, + particle2, + loop_number, + loop_number, + loop_number, + loop_number, + loop_number, + loop_number)) + thermo_style += " v_dist_%i_%i" % (particle1, particle2) loop_number += 1 + lmp.command("%s" % thermo_style) # Doing the LES - lmp.command("run %i" % loop_extrusion_dynamics['timesteps_per_loop_extrusion_step']) - + lmp.command("run %i" % loop_extrusion_dynamics['extrusion_time']) + + # update the lifetime of each extruder + for extruder in range(nextruders): + extruders_lifetimes[extruder] = extruders_lifetimes[extruder] + 1 + # Remove the loop extrusion restraint! loop_number = 1 - for particle1,particle2 in initial_loops: + for particle1,particle2 in extruders_positions: print("# unfix LE%i" % (loop_number)) lmp.command("unfix LE%i" % (loop_number)) @@ -1366,65 +1471,65 @@ def run_lammps(kseed, lammps_folder, run_time, # Update the particles involved in the loop extrusion interaction: # decrease the intial_start by one until you get to start # increase the intial_stop by one until you get to stop - for initial_loop, target_loop in zip(initial_loops,target_loops): - - if initial_loop[0] > target_loop[0]: - initial_loop[0] -= 1 - if initial_loop[1] < target_loop[1]: - initial_loop[1] += 1 - - - - #if to_dump: - # lmp.command("undump 1") - # lmp.command("dump 1 all custom %i langevin_dynamics_*.XYZ id xu yu zu" % to_dump) - # lmp.command("dump_modify 1 format line \"%d %.5f %.5f %.5f\" sort id append yes") - + for extruder in range(len(extruders_positions)): + # If the left part reaches the start of the chromosome -> Stop extruding + if extruders_positions[extruder][0] > 1: + random_number = uniform(0, 1) + if random_number <= loop_extrusion_dynamics['left_extrusion_rate']: + extruders_positions[extruder][0] -= 1 + # If the right part reaches the end of the chromosome -> Stop extruding + if extruders_positions[extruder][1] < chromosome_particle_numbers[0]: + random_number = uniform(0, 1) + if random_number <= loop_extrusion_dynamics['right_extrusion_rate']: + extruders_positions[extruder][1] += 1 + # If the extruder reached its lifetime, put it in another position + if extruders_lifetimes[extruder] == loop_extrusion_dynamics['lifetime']: + extruders_positions[extruder] = draw_loop_extrusion_starting_point(loop_extrusion_dynamics['chrlength'][0]) + # Re-initialise the lifetime of the extruder + extruders_lifetimes[extruder] = 0 + # Check presence of barriers + if loop_extrusion_dynamics['barriers_permeability'] < 1.0: + # If a motor tries to overcome a barrier we stop it with a probability > than the permeability + # If the barrier is on the left of the extruders, which is extruding contrary to the chain index, we have to re-put the extruder forwards + if extruders_positions[extruder][0] in loop_extrusion_dynamics['barriers']: + random_number = uniform(0, 1) + if random_number >= loop_extrusion_dynamics['barriers_permeability']: + extruders_positions[extruder][0] += 1 + # If the barrier is on the left of the extruders, which is extruding with the chain index, we have to re-put the extruder backwards + if extruders_positions[extruder][1] in loop_extrusion_dynamics['barriers']: + random_number = uniform(0, 1) + if random_number >= loop_extrusion_dynamics['barriers_permeability']: + extruders_positions[extruder][1] -= 1 + + print(extruders_positions,extruders_lifetimes) + + ### Put here the creationg of a pickle with the complete trajectory ### + if to_dump: + lmp.command("undump 1") + lmp.command("dump 1 all custom %i %sloop_extrusion_MD_*.XYZ id xu yu zu" % (to_dump,lammps_folder)) # Post-processing analysis - if time_dependent_steering_pairs: + # Save coordinates + #for time in range(0,runtime,to_dump): + # xc.append(np.array(read_trajectory_file("%s/loop_extrusion_MD_%s.XYZ" % (lammps_folder, time)))) - if useColvars == True: - copyfile("%sout.colvars.traj" % lammps_folder, "%srestrained_pairs_equilibrium_distance_vs_timestep_from_time_point_%s_to_time_point_%s.txt" % (lammps_folder, str(time_point), str(time_point+1))) - - - os.remove("%sout.colvars.traj" % lammps_folder) - os.remove(time_dependent_steering_pairs['colvar_output']) - for time_point in time_points[0:-1]: - # Compute energy associated to the restraints: something like the IMP objective function - #compute_the_objective_function("%srestrained_pairs_equilibrium_distance_vs_timestep_from_time_point_%s_to_time_point_%s.txt" % (lammps_folder, str(time_point), str(time_point+1)), - # "%sobjective_function_profile_from_time_point_%s_to_time_point_%s.txt" % (lammps_folder, str(time_point), str(time_point+1)), - # time_point, - # time_dependent_steering_pairs['timesteps_per_k_change'][time_point]) - - # Compute the % of satysfied constraints between 2. sigma = 2./sqrt(k) - compute_the_percentage_of_satysfied_restraints("%srestrained_pairs_equilibrium_distance_vs_timestep_from_time_point_%s_to_time_point_%s.txt" % (lammps_folder, str(time_point), str(time_point+1)), - restraints[time_point], - "%spercentage_of_established_restraints_from_time_point_%s_to_time_point_%s.txt" % (lammps_folder, str(time_point), str(time_point+1)), - time_point, - time_dependent_steering_pairs['timesteps_per_k_change'][time_point]) - - for time_point in time_points[0:-1]: - xc.append(np.array(read_trajectory_file("%ssteered_MD_from_time_point_%s_to_time_point_%s.XYZ" % (lammps_folder, time_point, time_point+1)))) - - else: - # Managing the final model - xc.append(np.array(lmp.gather_atoms("x",1,3))) + #xc.append(np.array(lmp.gather_atoms("x",1,3))) lmp.close() result = [] for stg in range(len(xc)): - log_objfun = read_objective_function("%sobj_fun_from_time_point_%s_to_time_point_%s.txt" % (lammps_folder, str(stg), str(stg+1))) + #log_objfun = read_objective_function("%sobj_fun_from_time_point_%s_to_time_point_%s.txt" % (lammps_folder, str(stg), str(stg+1))) + log_objfun = [0.0] for timepoint in range(1,timepoints+1): lammps_model = LAMMPSmodel({'x' : [], - 'y' : [], - 'z' : [], - 'cluster' : 'Singleton', - 'log_objfun' : log_objfun, - 'objfun' : log_objfun[-1], - 'radius' : float(CONFIG.HiC['resolution'] * CONFIG.HiC['scale'])/2, - 'index' : kseed+timepoint, - 'rand_init' : str(kseed+timepoint)}) + 'y' : [], + 'z' : [], + 'cluster' : 'Singleton', + 'log_objfun' : log_objfun, + 'objfun' : log_objfun[-1], + 'radius' : 0.5, #float(CONFIG.HiC['resolution'] * CONFIG.HiC['scale'])/2, + 'index' : kseed+timepoint, + 'rand_init' : str(kseed+timepoint)}) if pbc: store_conformation_with_pbc(xc[stg], lammps_model, confining_environment) @@ -2329,26 +2434,31 @@ def generate_chromosome_rosettes_conformation ( chromosome_particle_numbers , # Checking that the beads are all inside the confining environment and are not overlapping for rosette_pair in list(combinations(final_rosettes,2)): - - for x0,y0,z0 in zip(rosette_pair[0]['x'],rosette_pair[0]['y'],rosette_pair[0]['z']): - particle_inside = check_point_inside_the_confining_environment(x0, y0, z0, - particle_radius, - confining_environment) - - if particle_inside == 0: # 0 means that the particle is outside -> PROBLEM!!! - print("Particle",x0,y0,z0,"is out of the confining environment\n") - break - - for x1,y1,z1 in zip(rosette_pair[1]['x'],rosette_pair[1]['y'],rosette_pair[1]['z']): - particles_overlap = check_particles_overlap(x0,y0,z0,x1,y1,z1,particle_radius) + molecule0 = list(zip(rosette_pair[0]['x'],rosette_pair[0]['y'],rosette_pair[0]['z'])) + molecule1 = list(zip(rosette_pair[1]['x'],rosette_pair[1]['y'],rosette_pair[1]['z'])) + distances = scipy.spatial.distance.cdist(molecule1,molecule0) + print(len(molecule0),len(molecule0[0]),distances.min()) + if distances.min() < particle_radius*2.0*0.95: + particles_overlap = 0 + break - if particles_overlap == 0: # 0 means that the particles are overlapping -> PROBLEM!!! - print("Particle",x0,y0,z0,"and",x1,y1,z1,"overlap\n") + if particles_overlap != 0: + for r in xrange(len(final_rosettes)): + molecule0 = list(zip(final_rosettes[r]['x'],final_rosettes[r]['y'],final_rosettes[r]['z'])) + print(len(molecule0),len(molecule0[0])) + + distances = scipy.spatial.distance.cdist(molecule0,molecule0) + print(distances.min()) + for i in xrange(len(molecule0)): + for j in xrange(i+1,len(molecule0)): + if distances[(i,j)] < particle_radius*2.0*0.95: + particles_overlap = 0 + if particles_overlap == 0: + break + if particles_overlap == 0: break - if particle_inside == 0 or particles_overlap == 0: + if particles_overlap == 0: break - if particle_inside == 0 or particles_overlap == 0: - break # Writing the final_rosettes conformation print("Succesfully generated conformation number %d\n" % (cnt+1)) @@ -2495,9 +2605,12 @@ def generate_rosettes(chromosome_particle_numbers, rosette_radius, particle_radi rosette['z'] = [] # Position of the first particle (x_0, 0.0, 0.0) - rosette['x'].append(rosette_radius * (0.38 + (1 - 0.38) * cos(6*phi) * cos(6*phi)) * cos(phi)) - rosette['y'].append(rosette_radius * (0.38 + (1 - 0.38) * cos(6*phi) * cos(6*phi)) * sin(phi)) - rosette['z'].append(phi / (2.0 * pi)) + k = 6. + x = 0.38 + p = 1.0 + rosette['x'].append(rosette_radius * (x + (1 - x) * cos(k*phi) * cos(k*phi)) * cos(phi)) + rosette['y'].append(rosette_radius * (x + (1 - x) * cos(k*phi) * cos(k*phi)) * sin(phi)) + rosette['z'].append(p * phi / (2.0 * pi)) #print "First bead is in position: %f %f %f" % (rosette['x'][0], rosette['y'][0], rosette['z'][0]) # Building the chain: The rosette is growing along the positive z-axes @@ -2506,8 +2619,8 @@ def generate_rosettes(chromosome_particle_numbers, rosette_radius, particle_radi distance = 0.0 while distance < (particle_radius*2.0): phi = phi + 0.001 - x_tmp = rosette_radius * (0.38 + (1 - 0.38) * cos(6*phi) * cos(6*phi)) * cos(phi) - y_tmp = rosette_radius * (0.38 + (1 - 0.38) * cos(6*phi) * cos(6*phi)) * sin(phi) + x_tmp = rosette_radius * (x + (1 - x) * cos(k*phi) * cos(k*phi)) * cos(phi) + y_tmp = rosette_radius * (x + (1 - x) * cos(k*phi) * cos(k*phi)) * sin(phi) z_tmp = phi / (2.0 * pi) distance = sqrt((x_tmp - rosette['x'][-1])*(x_tmp - rosette['x'][-1]) + (y_tmp - rosette['y'][-1])*(y_tmp - rosette['y'][-1]) + @@ -3463,20 +3576,29 @@ def read_target_loops_input(input_filename, chromosome_length, percentage): ########## -def draw_loop_extrusion_starting_points(target_loops, chromosome_length): - initial_loops = [] +#def draw_loop_extrusion_starting_points(target_loops, chromosome_length): +# initial_loops = [] # Scroll the target loops and draw a point between each start and stop - for target_loop in target_loops: +# for target_loop in target_loops: - random_particle = randint(target_loop[0], target_loop[1]) +# random_particle = randint(target_loop[0], target_loop[1]) - loop = [] - loop.append(random_particle) - loop.append(random_particle+1) +# loop = [] +# loop.append(random_particle) +# loop.append(random_particle+1) + +# initial_loops.append(loop) + +# return initial_loops + +def draw_loop_extrusion_starting_point(chromosome_length): + + # draw a starting point for extrusion along the chromosome + random_particle = randint(1, chromosome_length-1) + + return [random_particle,random_particle+1] - initial_loops.append(loop) - return initial_loops ########## @@ -3704,3 +3826,25 @@ def compute_the_objective_function(input_file_name, infile.close() outfile.close() + + +### get unique list ### + +def get_list(input_list): + + output_list = [] + + for element in input_list: + #print(type(element)) + if isinstance(element, (int)): + output_list.append(element) + if isinstance(element, (list)): + for subelement in element: + output_list.append(subelement) + if isinstance(element, (tuple)): + for subelement in range(element[0],element[1]+1,element[2]): + output_list.append(subelement) + return output_list + +########## +#MPI.Finalize() diff --git a/tadphys/modelling/lammpsmodel.py b/tadphys/modelling/lammpsmodel.py new file mode 100644 index 0000000..c604e39 --- /dev/null +++ b/tadphys/modelling/lammpsmodel.py @@ -0,0 +1,38 @@ +""" +25 Oct 2016 + + +""" +from tadphys.modelling.structuralmodel import StructuralModel + +class LAMMPSmodel(StructuralModel): + """ + A container for the LAMMPS modelling results. The container is a dictionary + with the following keys: + + - rand_init: Random number generator feed (needed for model reproducibility) + - x, y, z: 3D coordinates of each particles. Each represented as a list + + """ + def __str__(self): + try: + return ('LAMMPS model ranked %s (%s particles) with: \n' + + ' - random initial value: %s\n' + + ' - first coordinates:\n'+ + ' X Y Z\n'+ + ' %7s%7s%7s\n'+ + ' %7s%7s%7s\n'+ + ' %7s%7s%7s\n') % ( + self['index'] + 1, + len(self['x']), self['rand_init'], + int(self['x'][0]), int(self['y'][0]), int(self['z'][0]), + int(self['x'][1]), int(self['y'][1]), int(self['z'][1]), + int(self['x'][2]), int(self['y'][2]), int(self['z'][2])) + except IndexError: + return ('LAMMPS model of %s particles with: \n' + + ' - random initial value: %s\n' + + ' - first coordinates:\n'+ + ' X Y Z\n'+ + ' %5s%5s%5s\n') % ( + len(self['x']), self['rand_init'], + self['x'][0], self['y'][0], self['z'][0]) diff --git a/taddyn/modelling/lammpsmodel.py b/tadphys/modelling/lammpsmodel.py~ similarity index 100% rename from taddyn/modelling/lammpsmodel.py rename to tadphys/modelling/lammpsmodel.py~ diff --git a/tadphys/modelling/restraints.py b/tadphys/modelling/restraints.py new file mode 100644 index 0000000..88215a9 --- /dev/null +++ b/tadphys/modelling/restraints.py @@ -0,0 +1,321 @@ +from math import fabs, pow as power +from collections import OrderedDict + +from scipy import polyfit + +class HiCBasedRestraints(object): + + """ + This class contains distance restraints based on Hi-C zscores. + + :param nloci: number of particles to model (may not all be present in + zscores) + :param particle_radius: radius of each particle in the model. + :param None config: a dictionary containing the standard + parameters used to generate the models. The dictionary should contain + the keys kforce, lowrdist, maxdist, upfreq and lowfreq. Examples can be + seen by doing: + + :: + + from taddyn.modelling.CONFIG import CONFIG + + where CONFIG is a dictionary of dictionaries to be passed to this function: + + :: + + CONFIG = { + 'dmel_01': { + # Paramaters for the Hi-C dataset from: + 'reference' : 'victor corces dataset 2013', + + # Force applied to the restraints inferred to neighbor particles + 'kforce' : 5, + + # Space occupied by a nucleotide (nm) + 'scale' : 0.005 + + # Strength of the bending interaction + 'kbending' : 0.0, # OPTIMIZATION: + + # Maximum experimental contact distance + 'maxdist' : 600, # OPTIMIZATION: 500-1200 + + # Minimum thresholds used to decide which experimental values have to be + # included in the computation of restraints. Z-score values bigger than upfreq + # and less that lowfreq will be include, whereas all the others will be rejected + 'lowfreq' : -0.7 # OPTIMIZATION: min/max Z-score + + # Maximum threshold used to decide which experimental values have to be + # included in the computation of restraints. Z-score values greater than upfreq + # and less than lowfreq will be included, while all the others will be rejected + 'upfreq' : 0.3 # OPTIMIZATION: min/max Z-score + + } + } + :param resolution: number of nucleotides per Hi-C bin. This will be the + number of nucleotides in each model's particle + :param zscores: the dictionary of the Z-score values calculated from the + Hi-C pairwise interactions + :param 1 close_bins: number of particles away (i.e. the bin number + difference) a particle pair must be in order to be considered as + neighbors (e.g. 1 means consecutive particles) + :param None first: particle number at which model should start (0 should be + used inside TADbit) + :param None remove_rstrn: list of particles which must not have restrains + + + """ + def __init__(self, nloci, particle_radius,CONFIG,resolution,zscores, + chromosomes, close_bins=1,first=None, min_seqdist=0, + remove_rstrn=[]): + + self.particle_radius = particle_radius + self.nloci = nloci + self.CONFIG = CONFIG + self.resolution = resolution + self.nnkforce = CONFIG['kforce'] + self.min_seqdist = min_seqdist + self.chromosomes = OrderedDict() + self.remove_rstrn = remove_rstrn + if chromosomes: + if isinstance(chromosomes,dict): + self.chromosomes[chromosomes['crm']] = chromosomes['end'] - chromosomes['start'] + 1 + else: + tot = 0 + for k in chromosomes: + tot += k['end'] - k['start'] + 1 + self.chromosomes[k['crm']] = tot + else: + self.chromosomes['UNKNOWN'] = nloci + + self.CONFIG['lowrdist'] = self.particle_radius * 2. + + if self.CONFIG['lowrdist'] > self.CONFIG['maxdist']: + raise TADbitModelingOutOfBound( + ('ERROR: we must prevent you from doing this for the safe of our ' + + 'universe...\nIn this case, maxdist must be higher than %s\n' + + ' -> resolution times scale -- %s*%s)') % ( + self.CONFIG['lowrdist'], self.resolution, self.CONFIG['scale'])) + + # print 'config:', self.CONFIG + # get SLOPE and regression for all particles of the z-score data + + zsc_vals = [zscores[i][j] for i in zscores for j in zscores[i] + if abs(int(i) - int(j)) > 1] # condition is to avoid + # taking into account selfies + # and neighbors + self.SLOPE, self.INTERCEPT = polyfit([min(zsc_vals), max(zsc_vals)], + [self.CONFIG['maxdist'], self.CONFIG['lowrdist']], 1) + #print "#SLOPE = %f ; INTERCEPT = %f" % (self.SLOPE, self.INTERCEPT) + #print "#maxdist = %f ; lowrdist = %f" % (self.CONFIG['maxdist'], self.CONFIG['lowrdist']) + # get SLOPE and regression for neighbors of the z-score data + xarray = [zscores[i][j] for i in zscores for j in zscores[i] + if abs(int(i) - int(j)) <= (close_bins + 1)] + yarray = [self.particle_radius * 2 for _ in range(len(xarray))] + try: + self.NSLOPE, self.NINTERCEPT = polyfit(xarray, yarray, 1) + except TypeError: + self.NSLOPE, self.NINTERCEPT = 0.0, self.particle_radius * 2 + + # if z-scores are generated outside TADbit they may not start at zero + if first == None: + first = min([int(j) for i in zscores for j in zscores[i]] + + [int(i) for i in zscores]) + self.LOCI = list(range(first, nloci + first)) + + # Z-scores + + self.PDIST = zscores + + def get_hicbased_restraints(self): + + # HiCbasedRestraints is a list of restraints returned by this function. + # Each entry of the list is a list of 5 elements describing the details of the restraint: + # 0 - particle_i + # 1 - particle_j + # 2 - type_of_restraint = Harmonic or HarmonicLowerBound or HarmonicUpperBound + # 3 - the kforce of the restraint + # 4 - the equilibrium (or maximum or minimum respectively) distance associated to the restraint + + HiCbasedRestraints = [] + nlocis = list(sorted(set(range(self.nloci)) - set(self.remove_rstrn))) + for ni, i in enumerate(nlocis): + chr1 = [k for k,v in list(self.chromosomes.items()) if v > i][0] + for j in nlocis[ni+1:]: + chr2 = [k for k,v in list(self.chromosomes.items()) if v > j][0] + # Compute the sequence separation (in particles) depending on it the restraint changes + seqdist = abs(j - i) + + # 1 - CASE OF TWO CONSECUTIVE LOCI (NEAREST NEIGHBOR PARTICLES) + if seqdist == 1 and seqdist > self.min_seqdist: + if chr1 != chr2: + continue + RestraintType, dist = self.get_nearest_neighbors_restraint_distance(self.particle_radius, i, j) + kforce = self.nnkforce + + # 2 - CASE OF 2 SECOND NEAREST NEIGHBORS SEQDIST = 2 + if seqdist == 2 and seqdist > self.min_seqdist: + if chr1 != chr2: + continue + RestraintType, dist = self.get_second_nearest_neighbors_restraint_distance(self.particle_radius, i, j) + kforce = self.nnkforce + + # 3 - CASE OF TWO NON-CONSECUTIVE PARTICLES SEQDIST > 2 + if seqdist > 2 and seqdist > self.min_seqdist: + + #CASES OF TWO NON-CONSECUTIVE PARTICLES SEQDIST > 2 + RestraintType, kforce, dist = self.get_long_range_restraints_kforce_and_distance(i, j) + if RestraintType == "None": + #print "No HiC-based restraint between particles %d and %d" % (i,j) + continue + + HiCbasedRestraints.append([i, j, RestraintType, kforce, dist]) + + return HiCbasedRestraints + + + + #Functions to add restraints: HarmonicRestraints , HarmonicUpperBoundRestraints , HarmonicLowerBoundRestraints + #addNearestNeighborsRestraint , addSecondNearestNeighborsRestraint , addLongRangeRestraints + def get_nearest_neighbors_restraint_distance(self, particle_radius, i, j): + x=str(i) + y=str(j) + + if x in self.PDIST and y in self.PDIST[x] and self.PDIST[x][y] > self.CONFIG['upfreq']: + # When p1 and p2 have a contact propensity larger that upfreq + # their spatial proximity and a partial overlap between them is enforced + # The equilibrium distance of the spring is inferred from the 3C based Z-score + RestraintType = "NeighborHarmonic" + dist = distance(self.PDIST[x][y],self.NSLOPE,self.NINTERCEPT) + #print "Distance = ", dist + else: + # When p1 and p2 have a contact propensity lower than upfreq they are simply connected to each other + #p1 = model['particles'].get_particle(i) + #p2 = model['particles'].get_particle(j) + #dist = p1.get_value(model['radius']) + p2.get_value(model['radius']) + RestraintType = "NeighborHarmonicUpperBound" + + dist = 2.0 * particle_radius + return RestraintType , dist + + def get_second_nearest_neighbors_restraint_distance(self, particle_radius, i, j): + # IMP COMMAND: Consider the particles i, j and the particle between i and j + #p1 = model['particles'].get_particle(i) + #p2 = model['particles'].get_particle(j) + #pmiddle = model['particles'].get_particle(j-1) + + # The equilibrium distance is the sum of the radii of particles p1 and p2, and of the diameter of particle pmiddle + RestraintType = "HarmonicUpperBound" + dist = 4.0 * particle_radius + #dist = p1.get_value(model['radius']) + p2.get_value(model['radius']) + 2.0 * pmiddle.get_value(model['radius']) + #print p1.get_value(model['radius']) , p2.get_value(model['radius']) , pmiddle.get_value(model['radius']) + + #print RestraintType , dist + return RestraintType , dist + + def get_long_range_restraints_kforce_and_distance(self, i, j): + x = str(i) + y = str(j) + + Zscore = float('nan') + + # For non consecutive particles the kforce is a function of the *C based Zscore + # First we define the The kforce of the harmonic restraint. It is different for 3 scenarios... + + RestraintType = "None" + kforce = 0.0 + dist = 0.0 + + # 1 - If the Z-score between i and j is defined + if x in self.PDIST and y in self.PDIST[x]: + # Get the Zscore between particles p1 and p2 + Zscore = self.PDIST[x][y] + kforce = k_force(Zscore) + + # 2 - If the Z-score is defined only for particle i (In the Hi-C matrix you could encounter zero values next to very high entries) + elif x in self.PDIST: + prevy = str(j - 1) + posty = str(j + 1) + # The Zscore is compute as the average of the Z-scores of p2 nearest neighbor particles with p1 + Zscore = (self.PDIST[x].get(prevy, self.PDIST[x].get(posty, float('nan'))) + + self.PDIST[x].get(posty, self.PDIST[x].get(prevy, float('nan')))) / 2 + kforce = 0.5 * k_force(Zscore) + + # 3 - If the Z-score is defined only for particle j + else: + prevx = str(i - 1) + postx = str(i + 1) + prevx = prevx if prevx in self.PDIST else postx + postx = postx if postx in self.PDIST else prevx + try: + Zscore = (self.PDIST[prevx].get(y, self.PDIST[postx].get(y, float('nan'))) + + self.PDIST[postx].get(y, self.PDIST[prevx].get(y, float('nan')))) / 2 + # For non consecutive particles the kforce is a function of the *C based Zscore + except KeyError: + pass + kforce = 0.5 * k_force(Zscore) + + + # If the ZSCORE > UPFREQ the spatial proximity of particles p1 and p2 is favoured + if Zscore > self.CONFIG['upfreq']: + RestraintType = "Harmonic" + dist = distance(Zscore, self.SLOPE, self.INTERCEPT) + + # If the ZSCORE < LOWFREQ the particles p1 and p2 are restrained to be far from each other. + elif Zscore < self.CONFIG['lowfreq']: + RestraintType = "HarmonicLowerBound" + dist = distance(Zscore, self.SLOPE, self.INTERCEPT) + + #if RestraintType != "None": + # print i, j, Zscore, RestraintType, kforce, dist + return RestraintType, kforce, dist + + # This is a function need for TADkit? + def _get_restraints(self): + """ + Same function as addAllHarmonic but just to get restraints + """ + restraint_names = {'None' : None, + 'Harmonic' : 'a', + 'NeighborHarmonic' : 'n', + 'HarmonicUpperBound' : 'u', + 'NeighborHarmonicUpperBound' : 'u', + 'HarmonicLowerBound' : 'l'} + + #model = {'radius' : IMP.FloatKey("radius"), + # 'model' : Model(), + # 'restraints' : None, # 2.6.1 compat + # 'particles' : None} + + # set container + #try: + # model['restraints'] = IMP.RestraintSet(model['model']) # 2.6.1 compat + #except: + # pass + + #model['particles'] = ListSingletonContainer(IMP.core.create_xyzr_particles( + # model['model'], len(LOCI), RADIUS, 100000)) + + restraints = {} + for i, j, RestraintType, kforce, dist in self.get_hicbased_restraints(): + restraints[tuple(sorted((i, j)))] = restraint_names[RestraintType], dist, kforce + return restraints + +#Function to translate the Zscore value into distances and kforce values +def distance(Zscore, slope, intercept): + """ + Function mapping the Z-scores into distances for neighbor and non-neighbor fragments (slope, intercept) are different + """ + #print slope, intercept + return (slope * Zscore) + intercept + +def k_force(Zscore): + """ + Function to assign to each restraint a force proportional to the underlying + experimental value. + """ + return power(fabs(Zscore), 0.5) + +class TADbitModelingOutOfBound(Exception): + pass diff --git a/tadphys/modelling/structuralmodel.py b/tadphys/modelling/structuralmodel.py new file mode 100644 index 0000000..58b74f9 --- /dev/null +++ b/tadphys/modelling/structuralmodel.py @@ -0,0 +1,955 @@ +""" +25 Oct 2016 + + +""" +from __future__ import print_function + + +from tadphys.utils.tadmaths import newton_raphson +from tadphys import __version__ as version +from math import sqrt, pi +import hashlib + +try: + basestring +except NameError: + basestring = str + +def model_header(model): + """ + Defines the header to write in output files for a given model + """ + if not 'description' in model: + return '' + outstr = '' + for desc in sorted(model['description']): + outstr += '# %-15s : %s\n' % (desc.upper(), model['description'][desc]) + return outstr + + +class StructuralModel(dict): + """ + A container for the structural modelling results. The container is a dictionary + with the following keys: + + - rand_init: Random number generator feed (needed for model reproducibility) + - x, y, z: 3D coordinates of each particles. Each represented as a list + + """ + def __len__(self): + return len(self['x']) + + def distance(self, part1, part2): + """ + Calculates the distance between one point of the model and an external + coordinate + + :param part1: index of a particle in the model + :param part2: index of a particle in the model + + :returns: distance between one point of the model and an external + coordinate + """ + if part1 == 0 or part2 == 0: + raise Exception('Particle number must be strictly positive\n') + return sqrt((self['x'][part1-1] - self['x'][part2-1])**2 + + (self['y'][part1-1] - self['y'][part2-1])**2 + + (self['z'][part1-1] - self['z'][part2-1])**2) + + + def _square_distance(self, part1, part2): + """ + Calculates the square instance between one point of the model and an + external coordinate + + :param part1: index of a particle in the model + :param part2: index of a particle in the model + + :returns: distance between one point of the model and an external + coordinate + """ + return ((self['x'][part1-1] - self['x'][part2-1])**2 + + (self['y'][part1-1] - self['y'][part2-1])**2 + + (self['z'][part1-1] - self['z'][part2-1])**2) + + + def _square_distance_to(self, part1, part2): + """ + :param part1: index of a particle in the model + :param part2: external coordinate (dict format with x, y, z keys) + + :returns: square distance between one point of the model and an external + coordinate + """ + return ((self['x'][part1] - part2[0])**2 + + (self['y'][part1] - part2[1])**2 + + (self['z'][part1] - part2[2])**2) + + + def center_of_mass(self): + """ + Gives the center of mass of a model + + :returns: the center of mass of a given model + """ + r_x = sum(self['x'])/len(self) + r_y = sum(self['y'])/len(self) + r_z = sum(self['z'])/len(self) + return dict((('x', r_x), ('y', r_y), ('z', r_z))) + + + def radius_of_gyration(self): + """ + Calculates the radius of gyration or gyradius of the model + + Defined as: + + .. math:: + + \sqrt{\\frac{\sum_{i=1}^{N} (x_i-x_{com})^2+(y_i-y_{com})^2+(z_i-z_{com})^2}{N}} + + with: + + * :math:`N` the number of particles in the model + * :math:`com` the center of mass + + :returns: the radius of gyration for the components of the tensor + """ + com = self.center_of_mass() + rog = sqrt(sum(self._square_distance_to(i, + (com['x'], com['y'], com['z'])) + for i in range(len(self))) / len(self)) + return rog + + + def contour(self): + """ + Total length of the model + + :returns: the totals length of the model + """ + dist = 0 + for i in range(1, len(self)): + dist += self.distance(i, i+1) + return dist + + + def longest_axe(self): + """ + Gives the distance between most distant particles of the model + + :returns: the maximum distance between two particles in the model + """ + maxdist = 0 + for i in range(1, len(self)): + for j in range(i + 1, len(self) + 1): + dist = self.distance(i, j) + if dist > maxdist: + maxdist = dist + return maxdist + + + def shortest_axe(self): + """ + Minimum distance between two particles in the model + + :returns: the minimum distance between two particles in the model + """ + mindist = float('inf') + for i in range(1, len(self)): + for j in range(i + 1, len(self) + 1): + dist = self.distance(i, j) + if dist < mindist: + mindist = dist + return mindist + + + def min_max_by_axis(self): + """ + Calculates the minimum and maximum coordinates of the model + + :returns: the minimum and maximum coordinates for each x, y and z axis + """ + return ((min(self['x']), max(self['x'])), + (min(self['y']), max(self['y'])), + (min(self['z']), max(self['z']))) + + + def cube_side(self): + """ + Calculates the side of a cube containing the model. + + :returns: the diagonal length of the cube containing the model + """ + return sqrt((min(self['x']) - max(self['x']))**2 + + (min(self['y']) - max(self['y']))**2 + + (min(self['z']) - max(self['z']))**2) + + + def cube_volume(self): + """ + Calculates the volume of a cube containing the model. + + :returns: the volume of the cube containing the model + """ + return self.cube_side()**3 + + + def inaccessible_particles(self, radius): + """ + Gives the number of loci/particles that are accessible to an object + (i.e. a protein) of a given size. + + :param radius: radius of the object that we want to fit in the model + + :returns: a list of numbers, each being the ID of a particles that would + never be reached by the given object + + TODO: remove this function + + """ + inaccessibles = [] + sphere = generate_sphere_points(100) + for i in range(len(self)): + impossibles = 0 + for xxx, yyy, zzz in sphere: + thing = (xxx * radius + self['x'][i], + yyy * radius + self['y'][i], + zzz * radius + self['z'][i]) + # print form % (k+len(self), thing['x'], thing['y'], thing['z'], + # 0, 0, 0, k+len(self)), + for j in range(len(self)): + if i == j: + continue + # print self._square_distance_to(j, thing), radius + if self._square_distance_to(j, thing) < radius**2: + # print i, j + impossibles += 1 + break + if impossibles == 100: + inaccessibles.append(i + 1) + return inaccessibles + + + # def persistence_length(self, start=1, end=None, return_guess=False): + # """ + # Calculates the persistence length (Lp) of given section of the model. + # Persistence length is calculated according to [Bystricky2004]_ : + + # .. math:: + + # = 2 \\times Lp^2 \\times (\\frac{Lc}{Lp} - 1 + e^{\\frac{-Lc}{Lp}}) + + # with the contour length as :math:`Lc = \\frac{d}{c}` where :math:`d` is + # the genomic dstance in bp and :math:`c` the linear mass density of the + # chromatin (in bp/nm). + + # :param 1 start: start particle from which to calculate the persistence + # length + # :param None end: end particle from which to calculate the persistence + # length. Uses the last particle by default + # :param False return_guess: Computes persistence length using the + # approximation :math:`Lp=\\frac{Lc}{Lp}` + + # :returns: persistence length, or 2 times the Kuhn length + # """ + # clength = float(self.contour()) + # end = end or len(self) + # sq_length = float(self._square_distance(start, end)) + + # guess = sq_length / clength + # if return_guess: + # return guess # incredible! + # kuhn = newton_raphson(guess, clength, sq_length) + # return 2 * kuhn + + + # def accessible_surface(self, radius, nump=100, superradius=200, + # include_edges=True, view_mesh=False, savefig=None, + # write_cmm_file=None, verbose=False, + # chimera_bin='chimera'): + # """ + # Calculates a mesh surface around the model (distance equal to input + # **radius**) and checks if each point of this mesh could be replaced by + # an object (i.e. a protein) of a given **radius** + + # Outer part of the model can be excluded from the estimation of + # accessible surface, as the occupancy outside the model is unknown (see + # superradius option). + + # :param radius: radius of the object we want to fit in the model. + # :param None write_cmm_file: path to file in which to write cmm with the + # colored meshed (red inaccessible points, green accessible points) + # :param 100 nump: number of points to draw around a given particle. This + # number also sets the number of points drawn around edges, as each + # point occupies a given surface (see maths below). *Note that this + # number is considerably lowered by the occupancy of edges, depending + # of the angle formed by the edges surrounding a given particle, only + # 10% to 50% of the ``nump`` will be drawn in fact.* + # :param True include_edges: if False, edges will not be included in the + # calculation of the accessible surface, only particles. Note that + # statistics on particles (like last item returned) will not change, + # and computation time will be significantly decreased. + # :param False view_mesh: launches chimera to display the mesh around the + # model + # :param None savefig: path where to save chimera image + # :param 'chimera' chimera_bin: path to chimera binary to use + # :param False verbose: prints stats about the surface + # :param 200 superradius: radius of an object used to exclude outer + # surface of the model. Superradius must be higher than radius. + + # This function will first define a mesh around the chromatin, + # representing all possible position of the center of the object we want + # to fit. This mesh will be at a distance of *radius* from the chromatin + # strand. All dots in the mesh represents an equal area (*a*), the whole + # surface of the chromatin strand being: :math:`A=n \\times a` (*n* being + # the total number of dots in the mesh). + + # The mesh consists of spheres around particles of the model, and + # cylinders around edges joining particles (no overlap is allowed between + # sphere and cylinders or cylinder and cylinder when they are + # consecutive). + + # If we want that all dots of the mesh representing the surface of the + # chromatin, corresponds to an equal area (:math:`a`) + + # .. math:: + + # a = \\frac{4\pi r^2}{s} = \\frac{2\pi r N_{(d)}}{c} + + # with: + + # * :math:`r` radius of the object to fit (as the input parameter **radius**) + # * :math:`s` number of points in sphere + # * :math:`c` number of points in circle (as the input parameter **nump**) + # * :math:`N_{(d)}` number of circles in an edge of length :math:`d` + + # According to this, when the distance between two particles is equal + # to :math:`2r` (:math:`N=2r`), we would have :math:`s=c`. + + # As : + + # .. math:: + + # 2\pi r = \sqrt{4\pi r^2} \\times \sqrt{\pi} + + # It is fair to state the number of dots represented along a circle as: + + # .. math:: + + # c = \sqrt{s} \\times \sqrt{\pi} + + # Thus the number of circles in an edge of length :math:`d` must be: + + # .. math:: + + # N_{(d)}=\\frac{s}{\sqrt{s}\sqrt{\pi}}\\times\\frac{d}{2r} + + # :returns: a list of *1-* the number of dots in the mesh that could be + # occupied by an object of the given radius *2-* the total number of + # dots in the mesh *3-* the estimated area of the mesh (in square + # micrometers) *4-* the area of the mesh of a virtually straight strand + # of chromatin defined as + # :math:`contour\\times 2\pi r + 4\pi r^2` (also in + # micrometers) *5-* a list of number of (accessibles, inaccessible) for + # each particle (percentage burried can be infered afterwards by + # accessible/(accessible+inaccessible) ) + + # """ + + # points, dots, superdots, points2dots = build_mesh( + # self['x'], self['y'], self['z'], len(self), nump, radius, + # superradius, include_edges) + + # # calculates the number of inaccessible peaces of surface + # if superradius: + # radius2 = (superradius - 4)**2 + # outdot = [] + # for x2, y2, z2 in superdots: + # for j, (x1, y1, z1) in enumerate(points): + # if fast_square_distance(x1, y1, z1, x2, y2, z2) < radius2: + # outdot.append(False) + # break + # else: + # outdot.append(True) + # continue + # points.insert(0, points.pop(j)) + # else: + # outdot = [False] * len(superdots) + + # # calculates the number of inaccessible peaces of surface + # radius2 = (radius - 2)**2 + # grey = (0.6, 0.6, 0.6) + # red = (1, 0, 0) + # green = (0, 1, 0) + # colors = [] + # for i, (x2, y2, z2) in enumerate(dots): + # if outdot[i]: + # colors.append(grey) + # continue + # for j, (x1, y1, z1) in enumerate(points): + # if fast_square_distance(x1, y1, z1, x2, y2, z2) < radius2: + # colors.append(red) + # break + # else: + # colors.append(green) + # continue + # points.insert(0, points.pop(j)) + # possibles = colors.count(green) + + # acc_parts = [] + # for p in sorted(points2dots.keys()): + # acc = 0 + # ina = 0 + # for dot in points2dots[p]: + # if colors[dot]==green: + # acc += 1 + # elif colors[dot]==red: + # ina += 1 + # acc_parts.append((p + 1, acc, ina)) + + # # some stats + # dot_area = 4 * pi * (float(radius) / 1000)**2 / nump + # area = (possibles * dot_area) + # total = (self.contour() / 1000 * 2 * pi * float(radius) / 1000 + 4 * pi + # * (float(radius) / 1000)**2) + # if verbose: + # print((' Accessible surface: %s micrometers^2' + + # '(%s accessible times %s micrometers)') % ( + # round(area, 2), possibles, dot_area)) + # print(' (%s accessible dots of %s total times %s micrometers)' % ( + # possibles, outdot.count(False), round(dot_area, 5))) + # print(' - %s%% of the contour mesh' % ( + # round((float(possibles)/outdot.count(False))*100, 2))) + # print(' - %s%% of a virtual straight chromatin (%s microm^2)' % ( + # round((area/total)*100, 2), round(total, 2))) + + # # write cmm file + # if savefig: + # view_mesh = True + # if write_cmm_file or view_mesh: + # out = '\n' + # form = ('\n') + # for k_2, thing in enumerate(dots): + # out += form % (1 + k_2, thing[0], thing[1], thing[2], + # colors[k_2][0], colors[k_2][1], colors[k_2][2]) + # if superradius: + # for k_3, thing in enumerate(superdots): + # out += form % (1 + k_3 + k_2 + 1, + # thing[0], thing[1], thing[2], + # 0.1, 0.1, 0.1) + # out += '\n' + # if view_mesh: + # out_f = open('/tmp/tmp_mesh.cmm', 'w') + # out_f.write(out) + # out_f.close() + # if write_cmm_file: + # out_f = open(write_cmm_file, 'w') + # out_f.write(out) + # out_f.close() + # if view_mesh: + # chimera_cmd = [ + # 'focus', + # 'bonddisplay never #1', + # 'shape tube #1 radius 15 bandLength 300 segmentSubdivisions 1 followBonds on', + # '~show #1', + # 'set bg_color white', 'windowsize 800 600', + # 'clip yon -500', 'set subdivision 1', 'set depth_cue', + # 'set dc_color black', 'set dc_start 0.5', 'set dc_end 1', + # 'scale 0.8'] + # if savefig: + # if savefig.endswith('.png'): + # chimera_cmd += ['copy file %s png' % (savefig)] + # elif savefig[-4:] in ('.mov', 'webm'): + # chimera_cmd += [ + # 'movie record supersample 1', 'turn y 3 120', + # 'wait 120', 'movie stop', + # 'movie encode output %s' % savefig] + # self.write_cmm('/tmp/') + # chimera_view(['/tmp/tmp_mesh.cmm', + # '/tmp/model.%s.cmm' % (self['rand_init'])], + # chimera_bin=chimera_bin, align=False, + # savefig=savefig, chimera_cmd=chimera_cmd) + + # return (possibles, outdot.count(False), area, total, acc_parts) + + + # def write_cmm(self, directory, color='index', rndname=True, + # model_num=None, filename=None, **kwargs): + # """ + # Save a model in the cmm format, read by Chimera + # (http://www.cgl.ucsf.edu/chimera). + + # **Note:** If none of model_num, models or cluster parameter are set, + # ALL the models will be written. + + # :param directory: location where the file will be written (note: the + # name of the file will be model_1.cmm if model number is 1) + # :param None model_num: the number of the model to save + # :param True rndname: If True, file names will be formatted as: + # model.RND.cmm, where RND is the random number feed used by IMP to + # generate the corresponding model. If False, the format will be: + # model_NUM_RND.cmm where NUM is the rank of the model in terms of + # objective function value + # :param None filename: overide the default file name writing + # :param 'index' color: can be: + + # * a string as: + # * '**index**' to color particles according to their position in the + # model (:func:`tadphys.utils.extraviews.color_residues`) + # * '**tad**' to color particles according to the TAD they belong to + # (:func:`tadphys.utils.extraviews.tad_coloring`) + # * '**border**' to color particles marking borders. Color according to + # their score (:func:`tadphys.utils.extraviews.tad_border_coloring`) + # coloring function like. + # * a function, that takes as argument a model and any other parameter + # passed through the kwargs. + # * a list of (r, g, b) tuples (as long as the number of particles). + # Each r, g, b between 0 and 1. + # :param kwargs: any extra argument will be passed to the coloring + # function + # """ + # if isinstance(color, basestring): + # if color == 'index': + # color = color_residues(self, **kwargs) + # elif color == 'tad': + # if not 'tads' in kwargs: + # raise Exception('ERROR: missing TADs\n ' + + # 'pass an Experiment.tads disctionary\n') + # color = tad_coloring(self, **kwargs) + # elif color == 'border': + # if not 'tads' in kwargs: + # raise Exception('ERROR: missing TADs\n ' + + # 'pass an Experiment.tads disctionary\n') + # color = tad_border_coloring(self, **kwargs) + # else: + # raise NotImplementedError(('%s type of coloring is not yet ' + + # 'implemeted\n') % color) + # elif hasattr(color, '__call__'): # it's a function + # color = color(self, **kwargs) + # elif not isinstance(color, list): + # raise TypeError('one of function, list or string is required\n') + # out = '\n' % (self['rand_init']) + # form = ('\n') + # for i in range(len(self['x'])): + # out += form % (i + 1, + # self['x'][i], self['y'][i], self['z'][i], + # color[i][0], color[i][1], color[i][2], i + 1) + # form = ('\n') + # break_chroms = [1] + # try: + # for beg, end in zip(self['description']['start'],self['description']['end']): + # break_chroms.append((end - beg)/self['description']['resolution']+break_chroms[-1]) + # except: + # pass + # for i in range(1, len(self['x'])): + # if i in break_chroms[1:]: + # continue + # out += form % (i, i + 1) + # out += '\n' + + # if filename: + # out_f = open('%s/%s' % (directory, filename), 'w') + # else: + # if rndname: + # out_f = open('%s/model.%s.cmm' % (directory, + # self['rand_init']), 'w') + # else: + # out_f = open('%s/model_%s_rnd%s.cmm' % ( + # directory, model_num, self['rand_init']), 'w') + # out_f.write(out) + # out_f.close() + + +# def write_xyz_OLD(self, directory, model_num=None, get_path=False, +# rndname=True): +# """ +# Writes a xyz file containing the 3D coordinates of each particle in the +# model. + +# **Note:** If none of model_num, models or cluster parameter are set, +# ALL the models will be written. + +# :param directory: location where the file will be written (note: the +# file name will be model.1.xyz, if the model number is 1) +# :param None model_num: the number of the model to save +# :param True rndname: If True, file names will be formatted as: +# model.RND.xyz, where RND is the random number feed used by IMP to +# generate the corresponding model. If False, the format will be: +# model_NUM_RND.xyz where NUM is the rank of the model in terms of +# objective function value +# :param False get_path: whether to return, or not, the full path where +# the file has been written +# """ +# if rndname: +# path_f = '%s/model.%s.xyz' % (directory, self['rand_init']) +# else: +# path_f = '%s/model_%s_rnd%s.xyz' % (directory, model_num, +# self['rand_init']) +# out = '' +# form = "%12s%12s%12.3f%12.3f%12.3f\n" +# for i in range(len(self['x'])): +# out += form % ('p' + str(i + 1), i + 1, round(self['x'][i], 3), +# round(self['y'][i], 3), round(self['z'][i], 3)) +# out_f = open(path_f, 'w') +# out_f.write(out) +# out_f.close() +# if get_path: +# return path_f +# else: +# return None + + + +# def write_json(self, directory, color='index', rndname=True, +# model_num=None, title=None, filename=None, **kwargs): +# """ +# Save a model in the json format, read by TADkit. + +# **Note:** If none of model_num, models or cluster parameter are set, +# ALL the models will be written. + +# :param directory: location where the file will be written (note: the +# name of the file will be model_1.cmm if model number is 1) +# :param None model_num: the number of the model to save +# :param True rndname: If True, file names will be formatted as: +# model.RND.cmm, where RND is the random number feed used by IMP to +# generate the corresponding model. If False, the format will be: +# model_NUM_RND.cmm where NUM is the rank of the model in terms of +# objective function value +# :param None filename: overide the default file name writing +# :param 'index' color: can be: + +# * a string as: +# * '**index**' to color particles according to their position in the +# model (:func:`tadphys.utils.extraviews.color_residues`) +# * '**tad**' to color particles according to the TAD they belong to +# (:func:`tadphys.utils.extraviews.tad_coloring`) +# * '**border**' to color particles marking borders. Color according to +# their score (:func:`tadphys.utils.extraviews.tad_border_coloring`) +# coloring function like. +# * a function, that takes as argument a model and any other parameter +# passed through the kwargs. +# * a list of (r, g, b) tuples (as long as the number of particles). +# Each r, g, b between 0 and 1. +# :param kwargs: any extra argument will be passed to the coloring +# function +# """ +# if isinstance(color, basestring): +# if color == 'index': +# color = color_residues(self, **kwargs) +# elif color == 'tad': +# if not 'tads' in kwargs: +# raise Exception('ERROR: missing TADs\n ' + +# 'pass an Experiment.tads disctionary\n') +# color = tad_coloring(self, **kwargs) +# elif color == 'border': +# if not 'tads' in kwargs: +# raise Exception('ERROR: missing TADs\n ' + +# 'pass an Experiment.tads disctionary\n') +# color = tad_border_coloring(self, **kwargs) +# else: +# raise NotImplementedError(('%s type of coloring is not yet ' + +# 'implemeted\n') % color) +# elif hasattr(color, '__call__'): # it's a function +# color = color(self, **kwargs) +# elif not isinstance(color, list): +# raise TypeError('one of function, list or string is required\n') +# form = ''' +# { +# "chromatin" : { +# "id" : %(sha)s, +# "title" : "%(title)s", +# "source" : "Tadphys %(version)s", +# "metadata": {%(descr)s}, +# "type" : "tadphys", +# "data": { +# "models": [ +# { %(xyz)s }, +# ], +# "clusters":[%(cluster)s], +# "centroid":[%(centroid)s], +# "restraints": [[][]], +# "chromatinColor" : [ ] +# } +# } +# } +# ''' +# fil = {} +# fil['sha'] = hashlib.new(fil['xyz']).hexdigest() +# fil['title'] = title or "Sample TADbit data" +# fil['version'] = version +# fil['descr'] = ''.join('\n', ',\n'.join([ +# '"%s" : %s' % (k, ('"%s"' % (v)) if isinstance(v, basestring) else v) +# for k, v in list(self.get('description', {}).items())]), '\n') +# fil['xyz'] = ','.join(['[%.4f,%.4f,%.4f]' % (self['x'][i], self['y'][i], +# self['z'][i]) +# for i in range(len(self['x']))]) + + +# if filename: +# out_f = open('%s/%s' % (directory, filename), 'w') +# else: +# if rndname: +# out_f = open('%s/model.%s.cmm' % (directory, +# self['rand_init']), 'w') +# else: +# out_f = open('%s/model_%s_rnd%s.cmm' % ( +# directory, model_num, self['rand_init']), 'w') +# out_f.write(out) +# out_f.close() + + + +# def write_xyz(self, directory, model_num=None, get_path=False, +# rndname=True, filename=None, header=True): +# """ +# Writes a xyz file containing the 3D coordinates of each particle in the +# model. +# Outfile is tab separated column with the bead number being the +# first column, then the genomic coordinate and finally the 3 +# coordinates X, Y and Z + +# **Note:** If none of model_num, models or cluster parameter are set, +# ALL the models will be written. + +# :param directory: location where the file will be written (note: the +# file name will be model.1.xyz, if the model number is 1) +# :param None model_num: the number of the model to save +# :param True rndname: If True, file names will be formatted as: +# model.RND.xyz, where RND is the random number feed used by IMP to +# generate the corresponding model. If False, the format will be: +# model_NUM_RND.xyz where NUM is the rank of the model in terms of +# objective function value +# :param False get_path: whether to return, or not, the full path where +# the file has been written +# :param None filename: overide the default file name writing +# :param True header: write a header describing the experiment from which +# """ +# if filename: +# path_f = '%s/%s' % (directory, filename) +# else: +# if rndname: +# path_f = '%s/model.%s.xyz' % (directory, self['rand_init']) +# else: +# path_f = '%s/model_%s_rnd%s.xyz' % (directory, model_num, +# self['rand_init']) +# out = '' +# if header: +# out += model_header(self) +# form = "%s\t%s\t%.3f\t%.3f\t%.3f\n" +# # TODO: do not use resolution directly -> specific to Hi-C +# chrom_list = self['description']['chromosome'] +# chrom_start = self['description']['start'] +# chrom_end = self['description']['end'] +# if not isinstance(chrom_list, list): +# chrom_list = [chrom_list] +# chrom_start = [chrom_start] +# chrom_end = [chrom_end] + +# chrom_start = [(int(c) // int(self['description']['resolution']) +# if int(c) else 0) +# for c in chrom_start] +# chrom_end = [(int(c) // int(self['description']['resolution']) +# if int(c) else len(self['x'])) +# for c in chrom_end] + +# offset = -chrom_start[0] +# for crm in range(len(chrom_list)): +# for i in range(chrom_start[crm] + offset, chrom_end[crm] + offset): +# out += form % ( +# i + 1, +# '%s:%s-%s' % ( +# chrom_list[crm], +# int(chrom_start[crm] or 0) + +# int(self['description']['resolution']) * (i - offset) + 1, +# int(chrom_start[crm] or 0) + +# int(self['description']['resolution']) * (i + 1 - offset)), +# round(self['x'][i], 3), +# round(self['y'][i], 3), round(self['z'][i], 3)) +# offset += (chrom_end[crm] - chrom_start[crm]) +# out_f = open(path_f, 'w') +# out_f.write(out) +# out_f.close() +# if get_path: +# return path_f +# else: +# return None + + # def write_xyz_babel(self, directory, model_num=None, get_path=False, + # rndname=True, filename=None): + # """ + # Writes a xyz file containing the 3D coordinates of each particle in the + # model using a file format compatible with babel + # (http://openbabel.org/wiki/XYZ_%28format%29). + # Outfile is tab separated column with the bead number being the + # first column, then the genomic coordinate and finally the 3 + # coordinates X, Y and Z + # **Note:** If none of model_num, models or cluster parameter are set, + # ALL the models will be written. + # :param directory: location where the file will be written (note: the + # file name will be model.1.xyz, if the model number is 1) + # :param None model_num: the number of the model to save + # :param True rndname: If True, file names will be formatted as: + # model.RND.xyz, where RND is the random number feed used by IMP to + # generate the corresponding model. If False, the format will be: + # model_NUM_RND.xyz where NUM is the rank of the model in terms of + # objective function value + # :param False get_path: whether to return, or not, the full path where + # the file has been written + # :param None filename: overide the default file name writing + # """ + # if filename: + # path_f = '%s/%s' % (directory, filename) + # else: + # if rndname: + # path_f = '%s/model.%s.xyz' % (directory, self['rand_init']) + # else: + # path_f = '%s/model_%s_rnd%s.xyz' % (directory, model_num, + # self['rand_init']) + # out = '' + # # Write header as number of atoms + # out += str(len(self['x'])) + # # Write comment as type of molecule + # out += "\nDNA\n" + + # form = "%s\t%.3f\t%.3f\t%.3f\n" + # # TODO: do not use resolution directly -> specific to Hi-C + # chrom_list = self['description']['chromosome'] + # chrom_start = self['description']['start'] + # chrom_end = self['description']['end'] + # if not isinstance(chrom_list, list): + # chrom_list = [chrom_list] + # chrom_start = [chrom_start] + # chrom_end = [chrom_end] + # chrom_start = [int(c)/int(self['description']['resolution']) for c in chrom_start] + # chrom_end = [int(c)/int(self['description']['resolution']) for c in chrom_end] + # offset = 0 + # for crm in range(len(chrom_list)): + # for i in range(chrom_start[crm]+offset,chrom_end[crm]+offset): + # out += form % ( + # i + 1, + # '%s:%s-%s' % ( + # chrom_list[crm], + # int(chrom_start[crm] or 0) + + # int(self['description']['resolution']) * (i - offset) + 1, + # int(chrom_start[crm] or 0) + + # int(self['description']['resolution']) * (i + 1 - offset)), + # round(self['x'][i], 3), + # round(self['y'][i], 3), round(self['z'][i], 3)) + # offset += (chrom_end[crm]-chrom_start[crm]) + # out_f = open(path_f, 'w') + # out_f.write(out) + # out_f.close() + # if get_path: + # return path_f + # else: + # return None + + # def view_model(self, tool='chimera', savefig=None, cmd=None, + # center_of_mass=False, gyradius=False, color='index', + # **kwargs): + # """ + # Visualize a selected model in the three dimensions. (either with Chimera + # or through matplotlib). + + # :param model_num: model to visualize + # :param 'chimera' tool: path to the external tool used to visualize the + # model. Can also be 'plot', to use matplotlib. + # :param None savefig: path to a file where to save the image OR movie + # generated (depending on the extension; accepted formats are png, mov + # and webm). if set to None, the image or movie will be shown using + # the default GUI. + # :param 'index' color: can be: + + # * a string as: + # * '**index**' to color particles according to their position in the + # model (:func:`tadphys.utils.extraviews.color_residues`) + # * '**tad**' to color particles according to the TAD they belong to + # (:func:`tadphys.utils.extraviews.tad_coloring`) + # * '**border**' to color particles marking borders. Color according to + # their score (:func:`tadphys.utils.extraviews.tad_border_coloring`) + # coloring function like. + # * a function, that takes as argument a model and any other parameter + # passed through the kwargs. + # * a list of (r, g, b) tuples (as long as the number of particles). + # Each r, g, b between 0 and 1. + # :param False center_of_mass: draws a dot representing the center of + # mass of the model + # :param False gyradius: draws the center of mass of the model as a sphere + # with radius equal to the radius of gyration of the model + # :param None cmd: list of commands to be passed to the viewer. + # The chimera list is: + + # :: + + # focus + # set bg_color white + # windowsize 800 600 + # bonddisplay never #0 + # represent wire + # shape tube #0 radius 5 bandLength 100 segmentSubdivisions 1 followBonds on + # clip yon -500 + # ~label + # set subdivision 1 + # set depth_cue + # set dc_color black + # set dc_start 0.5 + # set dc_end 1 + # scale 0.8 + + # Followed by the movie command to record movies: + + # :: + + # movie record supersample 1 + # turn y 3 120 + # wait 120 + # movie stop + # movie encode output SAVEFIG + + # Or the copy command for images: + + # :: + + # copy file SAVEFIG png + + # Passing as the following list as 'cmd' parameter: + # :: + + # cmd = ['focus', 'set bg_color white', 'windowsize 800 600', + # 'bonddisplay never #0', + # 'shape tube #0 radius 10 bandLength 200 segmentSubdivisions 100 followBonds on', + # 'clip yon -500', '~label', 'set subdivision 1', + # 'set depth_cue', 'set dc_color black', 'set dc_start 0.5', + # 'set dc_end 1', 'scale 0.8'] + + # will return the default image (other commands can be passed to + # modified the final image/movie). + + # :param kwargs: see :func:`tadphys.utils.extraviews.plot_3d_model` or + # :func:`tadphys.utils.extraviews.chimera_view` for other arguments + # to pass to this function + + # """ + # if gyradius: + # gyradius = self.radius_of_gyration() + # center_of_mass = True + # if tool == 'plot': + # x, y, z = self['x'], self['y'], self['z'] + # plot_3d_model(x, y, z, color=color, **kwargs) + # return + # self.write_cmm('/tmp/', color=color, **kwargs) + # chimera_view(['/tmp/model.%s.cmm' % (self['rand_init'])], + # savefig=savefig, chimera_bin=tool, chimera_cmd=cmd, + # center_of_mass=center_of_mass, gyradius=gyradius) diff --git a/taddyn/modelling/structuralmodel.py b/tadphys/modelling/structuralmodel.py~ similarity index 100% rename from taddyn/modelling/structuralmodel.py rename to tadphys/modelling/structuralmodel.py~ diff --git a/tadphys/utils/__init__.py b/tadphys/utils/__init__.py new file mode 100644 index 0000000..e69de29 diff --git a/tadphys/utils/extraviews.py b/tadphys/utils/extraviews.py new file mode 100644 index 0000000..6f9e25f --- /dev/null +++ b/tadphys/utils/extraviews.py @@ -0,0 +1,314 @@ +""" +06 Aug 2013 +""" + +from warnings import warn +from subprocess import Popen +from itertools import product + +import numpy as np + +try: + from matplotlib.ticker import MultipleLocator + from matplotlib import pyplot as plt + from mpl_toolkits.mplot3d import Axes3D +except ImportError: + warn('matplotlib not found\n') + +def my_round(num, val=4): + num = round(float(num), val) + return str(int(num) if num == int(num) else num) + + +def nicer(res, sep=' ', comma='', allowed_decimals=0): + """ + writes resolution number for human beings. + :param ' ' sep: character between number and unit (e.g. default: '125 kb') + :param '' comma: character to separate groups of thousands + :param 0 allowed_decimals: if 1 '1900 kb' would be written as '1.9 Mb' + """ + format = lambda x: '{:,g}'.format(x).replace(',', comma) + + if not res: + return format(res) + sep + 'b' + if not res % 10**(9 - allowed_decimals): + return format(res / 10.**9) + sep + 'Gb' + if not res % 10**(6 - allowed_decimals): + return format(res / 10.**6) + sep + 'Mb' + if not res % 10**(3 - allowed_decimals): + return format(res / 10.**3) + sep + 'kb' + return format(res) + sep + 'b' + + +def tadbit_savefig(savefig): + try: + form = savefig[-4:].split('.')[1] + except IndexError: # no dot in file name + warn('WARNING: file extension not found saving in png') + form = 'png' + if not form in ['png', 'pdf', 'ps', 'eps', 'svg']: + raise NotImplementedError('File extension must be one of %s' %( + ['png', 'pdf', 'ps', 'eps', 'svg'])) + plt.savefig(savefig, format=form) + + +def plot_2d_optimization_result(result, + axes=('scale', 'kbending', 'maxdist', 'lowfreq', + 'upfreq'), + dcutoff=None, show_best=0, skip=None, + savefig=None,clim=None, cmap='inferno'): + + """ + A grid of heatmaps representing the result of the optimization. In the optimization + up to 5 parameters can be optimized: 'scale', 'kbending', 'maxdist', 'lowfreq', and 'upfreq'. + The maps will be divided in different pages depending on the 'scale' and 'kbending' values. + In each page there will be different maps depending the 'maxdist' values. + Each map has 'upfreq' values along the x-axes, and 'lowfreq' values along the y-axes. + + :param result: 3D numpy array contating the computed correlation values + :param 'scale','kbending','maxdist','lowfreq','upfreq' axes: tuple of axes + to represent. The order is important here. It will define which parameter + will be placed respectively on the v, w, z, y, or x axes. + :param 0 show_best: number of best correlation value to highlight in the heatmaps. + The best correlation is highlithed by default + :param None skip: a dict can be passed here in order to fix a given parameter value, + e.g.: {'scale': 0.001, 'kbending': 30, 'maxdist': 500} will represent all the + correlation values at fixed 'scale', 'kbending', and 'maxdist' values, + respectively equal to 0.001, 30, and 500. + :param None dcutoff: The distance cutoff (dcutoff) used to compute the contact matrix + in the models. + :param None savefig: path to a file where to save the generated image. + If None, the image will be displayed using matplotlib GUI. NOTE: the extension + of the file name will automatically determine the desired format. + :param None clim: color scale. If None, the max and min values of the input are used. + :param inferno cmap: matplotlib colormap + + """ + + from mpl_toolkits.axes_grid1 import AxesGrid + import matplotlib.patches as patches + + ori_axes, axes_range, result = result + + # Commands for compatibility with the OLD version: + #print axes_range + if len(axes_range) == 4: + tmp_axes_range = axes_range + tmp_axes_range[1] = [0.0] # kbending !!!New option!!! + len_kbending_range = 1 + for i in xrange(len(ori_axes)): + if ori_axes[i] == 'scale': + tmp_axes_range[0] = axes_range[i] # scale + len_scale_range = len(axes_range[i]) + scale_index = i + if ori_axes[i] == 'maxdist': + tmp_axes_range[2] = axes_range[i] # maxdist + len_maxdist_range = len(axes_range[i]) + maxdist_index = i + if ori_axes[i] == 'lowfreq': + tmp_axes_range[3] = axes_range[i] # lowfreq + len_lowfreq_range = len(axes_range[i]) + lowfreq_index = i + if ori_axes[i] == 'upfreq': + tmp_axes_range[4] = axes_range[i] # upfreq + len_upfreq_range = len(axes_range[i]) + upfreq_index = i + #print axes_range + + tmp_result = np.empty((len_scale_range , len_kbending_range, len_maxdist_range, + len_lowfreq_range, len_upfreq_range)) + + indeces_sets = product(list(range(len(axes_range[0]))), + list(range(len(axes_range[1]))), + list(range(len(axes_range[2]))), + list(range(len(axes_range[3])))) + + for indeces_set in indeces_sets: + tmp_indeces_set = [0, 0, 0, 0, 0] + tmp_indeces_set[0] = indeces_set[scale_index] # scale + tmp_indeces_set[1] = 0 # kbending + tmp_indeces_set[2] = indeces_set[maxdist_index] # maxdist + tmp_indeces_set[3] = indeces_set[lowfreq_index] # lowfreq + tmp_indeces_set[4]= indeces_set[upfreq_index] # upfreq + tmp_result[tmp_indeces_set] = result[indeces_set] + + ori_axes = ('scale', 'kbending', 'maxdist', 'lowfreq', 'upfreq') + axes_range = tmp_axes_range + result = tmp_result + + trans = [ori_axes.index(a) for a in axes] + axes_range = [axes_range[i] for i in trans] + # transpose results + result = result.transpose(trans) + # set NaNs + result = np.ma.array(result, mask=np.isnan(result)) + cmap = plt.get_cmap(cmap) + cmap.set_bad('w', 1.) + + # defines axes + if clim: + vmin=clim[0] + vmax=clim[1] + else: + vmin = result.min() + vmax = result.max() + + round_decs = 6 + # Here we round the values in axes_range and pass from the + # 5 parameters to the cartesian axes names. + vax = [my_round(i, round_decs) for i in axes_range[0]] # scale + wax = [my_round(i, round_decs) for i in axes_range[1]] # kbending + zax = [my_round(i, round_decs) for i in axes_range[2]] # maxdist + yax = [my_round(i, round_decs) for i in axes_range[3]] # lowfreq + xax = [my_round(i, round_decs) for i in axes_range[4]] # upfreq + + # This part marks the set of best correlations that the + # user wants to be highlighted in the plot + vax_range = list(range(len(vax)))[::-1] # scale + wax_range = list(range(len(wax)))[::-1] # kbending + zax_range = list(range(len(zax))) # maxdist + yax_range = list(range(len(yax))) # lowfreq + xax_range = list(range(len(xax))) # upfreq + indeces_sets = product(vax_range, wax_range, + zax_range, yax_range, + xax_range) + + sort_result = sorted([(result[indeces_set],vax[indeces_set[0]],wax[indeces_set[1]], + zax[indeces_set[2]],yax[indeces_set[3]],xax[indeces_set[4]]) + for indeces_set in indeces_sets if str(result[indeces_set]) != '--'], + key=lambda x: x[0], reverse=True)[:show_best+1] + + # This part allows the user to "skip" some parameters to show. + # This means to fix the value of certain parameters. + skip = {} if not skip else skip + for i, parameter in enumerate(axes): + if not parameter in skip: + continue + if i == 0: + vax_range = [vax.index(skip[parameter])] + elif i == 1: + wax_range = [wax.index(skip[parameter])] + elif i == 2: + zax_range = [zax.index(skip[parameter])] + else: + raise Exception(('ERROR: skip keys must be one of the three first' + + ' keywords passed as axes parameter')) + + # best number of rows/columns + ncols = len(zax_range) + nrows = len(vax_range) * len(wax_range) + + # width and height of each heatmap. These dimensions of each heatmap + # depend on the number of values on the x-axes, len(xax), related to + # 'upfreq', and on the y-axes, len(yax), related to 'lowfreq'. width and + # height are also multiplied by the ncols, that is the number of + # heatmaps per row (one for each value of 'maxdist'), and nrows, that is + # the number of heatmaps per column (one for each combination of 'scale' and + # 'kbending' values). + width = max(4, (float(ncols) * len(xax)) / 3) + height = max(3, (float(nrows) * len(yax)) / 3) + #print 4,float(ncols)*len(xax) / 3,width + #print 3,float(nrows)*len(yax) / 3,height + # Definition of the heatmap object + heatmap = plt.figure(figsize=(width, height)) + + # Here we define the grid of heatmaps. + grid = AxesGrid(heatmap, [.2, .2, .6, .5], + nrows_ncols = (nrows + 1, ncols + 1), + axes_pad = 0.0, + label_mode = "1", + share_all = False, + cbar_location="right", + cbar_mode="single", + # cbar_size="%s%%" % (20./ width), + cbar_pad="30%", + ) + cell = ncols + used = [] + + for row in product(vax_range,wax_range): + cell+=1 + + for column in zax_range: + used.append(cell) + # Setting the values in the heatmap + im = grid[cell].imshow(result[row[0], row[1], column, :, :], + interpolation="nearest", origin='lower', + vmin=vmin, vmax=vmax, cmap=cmap) + + # Setting the ticks of the heatmap + grid[cell].tick_params(axis='both', direction='out', top=False, + right=False, left=False, bottom=False) + + for j, best in enumerate(sort_result[:-1], 1): + if best[1] == vax[row[0]] and best[2] == wax[row[1]] and best[3] == zax[column]: + #print j, best, vax[row[0]], wax[row[1]], zax[column] + grid[cell].text(xax.index(best[5]), yax.index(best[4]), str(j), + {'ha':'center', 'va':'center'}, size=8) + + if row[0] == vax_range[0] and row[1] == wax_range[0]: + rect = patches.Rectangle((-0.5, len(yax)-0.5), len(xax), 1.5, + facecolor='grey', alpha=0.5) + rect.set_clip_on(False) + grid[cell].add_patch(rect) + # Set up label in the heatmap (for maxdist) + if column == 0: + #print "Cell number",cell + grid[cell].text(- (len(xax) / 2 + 0.5), len(yax)+0.25, + axes[2], + {'ha':'center', 'va':'center'}, size=8) + + grid[cell].text(len(xax) / 2. - 0.5, len(yax)+0.25, + str(my_round(zax[column], round_decs)), + {'ha':'center', 'va':'center'}, size=8) + + cell += 1 + + rect = patches.Rectangle((len(xax)-.5, -0.5), 2.5, len(yax), + facecolor='grey', alpha=0.5) + # Define the rectangles for + #print dcutoff + rect.set_clip_on(False) + grid[cell-1].add_patch(rect) + grid[cell-1].text(len(xax) + 1.0, len(yax) / 2., + str(my_round(vax[row[0]], round_decs)) + '\n' + + str(my_round(wax[row[1]], round_decs)) + '\n' + + str(my_round(dcutoff, round_decs)), + {'ha':'center', 'va':'center'}, + rotation=90, size=8) + + grid[cell-1].text(len(xax) - 0.2, len(yax) + 1.2, + axes[0] + '\n' + axes[1] + '\ndcutoff', + {'ha':'left', 'va':'center'}, + rotation=90, size=8) + + # + for i in range(cell+1): + if not i in used: + grid[i].set_visible(False) + + # This affects the axes of all the heatmaps, because the flag set share_all + # is set equal to True. + # grid.axes_llc.set_ylim(-0.5, len(yax)+1) + + grid.axes_llc.set_xticks(list(range(0, len(xax), 2))) + grid.axes_llc.set_yticks(list(range(0, len(yax), 2))) + grid.axes_llc.set_xticklabels([my_round(i, round_decs) for i in xax][::2], size=9) + grid.axes_llc.set_yticklabels([my_round(i, round_decs) for i in yax][::2], size=9) + grid.axes_llc.set_xlabel(axes[4], size=9) + grid.axes_llc.set_ylabel(axes[3], size=9) + + # Color bar settings + grid.cbar_axes[0].colorbar(im) + grid.cbar_axes[0].set_ylabel('Correlation value', size=9) + grid.cbar_axes[0].tick_params(labelsize=9) + + title = 'Optimal parameters\n' + heatmap.suptitle(title, size=12) + + #plt.tight_layout() + if savefig: + tadbit_savefig(savefig) + else: + plt.show() + diff --git a/tadphys/utils/hic_filtering.py b/tadphys/utils/hic_filtering.py new file mode 100644 index 0000000..a096e1c --- /dev/null +++ b/tadphys/utils/hic_filtering.py @@ -0,0 +1,445 @@ +""" +06 Aug 2013 +""" +from __future__ import print_function # (at top of module) +from warnings import warn +from sys import stderr +from re import sub + + +import numpy as np + +try: + from matplotlib import pyplot as plt +except ImportError: + warn('matplotlib not found\n') + + +def get_r2 (fun, X, Y, *args): + sstot = sum([(Y[i]-np.mean(Y))**2 for i in xrange(len(Y))]) + sserr = sum([(Y[i] - fun(X[i], *args))**2 for i in xrange(len(Y))]) + return 1 - sserr/sstot + + +def filter_by_mean(matrx, draw_hist=False, silent=False, bads=None): + """ + fits the distribution of Hi-C interaction count by column in the matrix to + a polynomial. Then searches for the first possible + """ + nbins = 100 + if not bads: + bads = {} + # get sum of columns + cols = [] + size = len(matrx) + for c in sorted([[matrx.get(i+j*size, 0) for j in xrange(size) if not j in bads] + for i in xrange(size) if not i in bads], key=sum): + cols.append(sum(c)) + cols = np.array(cols) + if draw_hist: + plt.figure(figsize=(9, 9)) + try: + percentile = np.percentile(cols, 5) + except IndexError: + warn('WARNING: no columns to filter out') + return bads + # mad = np.median([abs(median - c ) for c in cols]) + best =(None, None, None, None) + # bin the sum of columns + xmin = min(cols) + xmax = max(cols) + y = np.linspace(xmin, xmax, nbins) + hist = np.digitize(cols, y) + x = [sum(hist == i) for i in range(1, nbins + 1)] + if draw_hist: + hist = plt.hist(cols, bins=100, alpha=.3, color='grey') + xp = range(0, int(cols[-1])) + # check if the binning is correct + # we want at list half of the bins with some data + try: + cnt = 0 + while list(x).count(0) > len(x)/2: + cnt += 1 + cols = cols[:-1] + xmin = min(cols) + xmax = max(cols) + y = np.linspace(xmin, xmax, nbins) + hist = np.digitize(cols, y) + x = [sum(hist == i) for i in range(1, nbins + 1)] + if draw_hist: + plt.clf() + hist = plt.hist(cols, bins=100, alpha=.3, color='grey') + xp = range(0, int(cols[-1])) + if cnt > 10000: + raise ValueError + # find best polynomial fit in a given range + for order in range(6, 18): + z = np.polyfit(y, x, order) + zp = np.polyder(z, m=1) + roots = np.roots(np.polyder(z)) + # check that we are concave down, otherwise take next root + pente = np.polyval(zp, abs(roots[-2] - roots[-1]) / 2 + roots[-1]) + if pente > 0: + root = roots[-1] + else: + root = roots[-2] + # root must be higher than zero + if root <= 0: + continue + # and lower than the median + if root >= percentile: + continue + p = np.poly1d(z) + R2 = get_r2(p, x, y) + # try to avoid very large orders by weigthing negatively their fit + if order > 13: + R2 -= float(order)/30 + if best[0] < R2: + best = (R2, order, p, z, root) + try: + p, z, root = best[2:] + if draw_hist: + xlims = plt.xlim() + ylims = plt.ylim() + a = plt.plot(xp, p(xp), "--", color='k') + b = plt.vlines(root, 0, plt.ylim()[1], colors='r', linestyles='dashed') + # c = plt.vlines(median - mad * 1.5, 0, 110, colors='g', + # linestyles='dashed') + try: + plt.legend(a+[b], ['polyfit \n%s' % ( + ''.join([sub('e([-+][0-9]+)', 'e^{\\1}', + '$%s%.1fx^%s$' % ('+' if j>0 else '', j, + '{' + str(i) + '}')) + for i, j in enumerate(list(p)[::-1])])), + 'first solution of polynomial derivation'], + fontsize='x-small') + except TypeError: + plt.legend(a+[b], ['polyfit \n%s' % ( + ''.join([sub('e([-+][0-9]+)', 'e^{\\1}', + '$%s%.1fx^%s$' % ('+' if j>0 else '', j, + '{' + str(i) + '}')) + for i, j in enumerate(list(p)[::-1])])), + 'first solution of polynomial derivation']) + # plt.legend(a+[b]+[c], ['polyfit \n{}'.format ( + # ''.join([sub('e([-+][0-9]+)', 'e^{\\1}', + # '${}{:.1}x^{}$'.format ('+' if j>0 else '', j, + # '{' + str(i) + '}')) + # for i, j in enumerate(list(p)[::-1])])), + # 'first solution of polynomial derivation', + # 'median - (1.5 * median absolute deviation)'], + # fontsize='x-small') + plt.ylim([0, ylims[1]]) + plt.xlim(xlims) + plt.xlabel('Sum of interactions') + plt.xlabel('Number of columns with a given value') + plt.show() + # label as bad the columns with sums lower than the root + for i, col in enumerate([[matrx.get(i+j*size, 0) + for j in xrange(size)] + for i in xrange(size)]): + if sum(col) < root: + bads[i] = sum(col) + # now stored in Experiment._zeros, used for getting more accurate z-scores + if bads and not silent: + stderr.write(('\nWARNING: removing columns having less than %s ' + + 'counts:\n %s\n') % ( + round(root, 3), ' '.join( + ['%5s'%str(i + 1) + (''if (j + 1) % 20 else '\n') + for j, i in enumerate(sorted(bads.keys()))]))) + except: + if not silent: + stderr.write('WARNING: Too many zeroes to filter columns.' + + ' SKIPPING...\n') + if draw_hist: + plt.xlabel('Sum of interactions') + plt.xlabel('Number of columns with a given value') + plt.show() + except ValueError: + if not silent: + stderr.write('WARNING: Too few data to filter columns based on ' + + 'mean value.\n') + if draw_hist: + plt.close('all') + return bads + + +def filter_by_zero_count(matrx, perc_zero, min_count=None, silent=True): + """ + :param matrx: Hi-C matrix of a given experiment + :param perc: percentage of cells with no count allowed to consider a column + as valid. + :param None min_count: minimum number of reads mapped to a bin (recommended + value could be 2500). If set this option overrides the perc_zero + filtering... This option is slightly slower. + + :returns: a dicitionary, which has as keys the index of the filtered out + columns. + """ + bads = {} + size = len(matrx) + if min_count is None: + cols = [size for i in xrange(size)] + for k in matrx: + cols[k / size] -= 1 + min_val = int(size * float(perc_zero) / 100) + else: + if matrx.symmetricized: + min_count *= 2 + cols = [0 for i in xrange(size)] + for k, v in matrx.iteritems(): # linear representation of the matrix + cols[k / size] += v + min_val = size - min_count + if min_count is None: + check = lambda x: x > min_val + else: + check = lambda x: x < min_count + for i, col in enumerate(cols): + if check(col): + bads[i] = True + if bads and not silent: + if min_count is None: + stderr.write(('\nWARNING: removing columns having more than %s ' + + 'zeroes:\n %s\n') % ( + min_val, ' '.join( + ['%5s' % str(i + 1) + (''if (j + 1) % 20 else '\n') + for j, i in enumerate(sorted(bads.keys()))]))) + else: + stderr.write(('\nWARNING: removing columns having less than %s ' + + 'counts:\n %s\n') % ( + int(size - min_val), ' '.join( + ['%5s' % str(i + 1) + (''if (j + 1) % 20 else '\n') + for j, i in enumerate(sorted(bads.keys()))]))) + return bads + + +def hic_filtering_for_modelling(matrx, silent=False, perc_zero=90, auto=True, + min_count=None, draw_hist=False, + diagonal=True): + """ + Call filtering function, to remove artifactual columns in a given Hi-C + matrix. This function will detect columns with very low interaction + counts; and columns with NaN values (in this case NaN will be replaced + by zero in the original Hi-C data matrix). Filtered out columns will be + stored in the dictionary Experiment._zeros. + + :param matrx: Hi-C matrix of a given experiment + :param False silent: does not warn for removed columns + + :param True diagonal: remove row/columns with zero in the diagonal + :param 90 perc_zero: maximum percentage of cells with no interactions + allowed. + :param None min_count: minimum number of reads mapped to a bin (recommended + value could be 2500). If set this option overrides the perc_zero + filtering... This option is slightly slower. + :param True auto: if False, only filters based on the given percentage + zeros + + :returns: the indexes of the columns not to be considered for the + calculation of the z-score + """ + bads = filter_by_zero_count(matrx, perc_zero, min_count=min_count, silent=silent) + if auto: + bads.update(filter_by_mean(matrx, draw_hist=draw_hist, silent=silent, + bads=bads)) + # also removes rows or columns containing a NaN + has_nans = False + for i in xrange(len(matrx)): + if matrx.get(i + i * len(matrx), 0) == 0 and diagonal: + if not i in bads: + bads[i] = None + elif repr(sum([matrx.get(i + j * len(matrx), 0) + for j in xrange(len(matrx))])) == 'nan': + has_nans = True + if not i in bads: + bads[i] = None + return bads, has_nans + + +def _best_window_size(sorted_prc, size, beg, end, verbose=False): + """ + Search for best window size. + Between given begin and end percentiles of the distribution of cis interactions + searches for a window size (number of bins) where all median values are between + median * stddev and median * stddev of the global measure. + + :param sorted_prc: list of percentages of cis interactions by bins, sorted + by the total interactions in the corresponding bins. + :param size: total number of bins + :param beg: starting position of the region with expected 'normal' behavior + of the cis-percentage + :param end: last position of the region with expected 'normal' behavior + of the cis-percentage + :param False verbose: print running information + + :returns: window size + """ + if verbose: + print (' -> defining window in number of bins to average values of\n' + ' percentage of cis interactions') + nwins = min((1000, size / 10)) + if nwins < 100: + warn('WARNING: matrix probably too small to automatically filter out bins\n') + win_size = 0 + prevn = 0 + count = 0 + # iterate over possible window sizes (use logspace to gain some time) + for n in np.logspace(1, 4, num=100): + n = int(n) + if n == prevn: + continue + prevn = n + + tmp_std = [] + tmp_med = [] + for k in xrange(int(size * beg), int(size * end), + (int(size * end) - int(size * beg)) / nwins): + vals = sorted_prc[k:k + n] + tmp_std.append(np.std(vals)) + tmp_med.append(np.median(vals)) + med_mid = np.median([tmp_med[i] for i in xrange(nwins)]) + results = [m - s < med_mid < m + s + for m, s in zip(tmp_med, tmp_std)] + + # if verbose: + # print ' -', n, med_mid, sum(results) + if all(results): + if not count: + win_size = n + count += 1 + if count == 10: + break + else: + count = 0 + + if verbose: + print(' * first window size with stable median of cis-percentage: %d' % (win_size)) + return win_size + + +def filter_by_cis_percentage(cisprc, beg=0.3, end=0.8, sigma=2, verbose=False, + size=None, min_perc=None, max_perc=None): + """ + Define artifactual columns with either too low or too high counts of + interactions by compraing their percentage of cis interactions + (inter-chromosomal). + + :param cisprc: dictionary with counts of cis-percentage by bin number. + Values of the dictionary are tuple with,m as first element the number + of cis interactions and as second element the total number of + interactions. + :param 0.3 beg: proportion of bins to be considered as possibly having low + counts + :param 0.8 end: proportion of bins to be considered as possibly having high + counts + :param 2 sigma: number of standard deviations used to define lower and upper + ranges in the varaition of the percentage of cis interactions + :param None size: size of the genome, inumber of bins (otherwise inferred + from cisprc dictionary) + :param None sevefig: path to save image of the distribution of cis + percentages and total counts by bin. + + :returns: dictionary of bins to be filtered out (with either too low or too + high counts of interactions). + """ + sorted_sum, indices = zip(*sorted((cisprc[i][1], i) for i in cisprc)) + + sorted_prc = [float(cisprc[i][0]) / cisprc[i][1] for i in indices] + + size = (max(indices) + 1) if not size else size + + win_size = _best_window_size(sorted_prc, size, beg, end, verbose=verbose) + + # define confidance bands, compute median plus/minus one standard deviation + errors_pos = [] + errors_neg = [] + for k in xrange(0, size, 1): + vals = sorted_prc[k:k+win_size] + std = np.std(vals) + med = np.median(vals) + errors_pos.append(med + std * sigma) + errors_neg.append(med - std * sigma) + + # calculate median and variation of median plus/minus one standard deviation + # for values between percentile 10 and 90 of the distribution of the + # percentage of cis interactions + # - for median plus one standard deviation + std_err_pos = np.std (errors_pos[int(size * beg):int(size * end)]) + med_err_pos = np.median(errors_pos[int(size * beg):int(size * end)]) + # - for median minus one standard deviation + std_err_neg = np.std (errors_neg[int(size * beg):int(size * end)]) + med_err_neg = np.median(errors_neg[int(size * beg):int(size * end)]) + + # define cutoffs, values of cis percentage plus 1 stddev should be between + # the general median +/- 2 stddev of the distribution of the cis percentage + # plus 1 stddev. Same on the side of median cis percentage minus 1 stddev + beg_pos = med_err_pos - std_err_pos * sigma + end_pos = med_err_pos + std_err_pos * sigma + beg_neg = med_err_neg - std_err_neg * sigma + end_neg = med_err_neg + std_err_neg * sigma + + cutoffL = None + passed = 0 + consecutive = 10 + for cutoffL, (p, n) in enumerate(zip(errors_pos, errors_neg)): + # print '%6.4f %6.4f %6.4f %6.4f %6.4f %6.4f' % (beg_pos, p, end_pos, beg_neg, n, end_neg) + if (beg_pos < p < end_pos) and (beg_neg < n < end_neg): + if passed >= consecutive: + break + passed += 1 + else: + passed = 0 + else: + raise Exception('ERROR: left cutoff not found!!!') + cutoffL -= consecutive # rescale, we asked for XX consecutive + + # right + cutoffR = None + passed = 0 + for cutoffR, (p, n) in enumerate(zip(errors_pos, errors_neg)[::-1]): + cutoffR = size - cutoffR + # print '%6.4f %6.4f %6.4f %6.4f %6.4f %6.4f' % (beg_pos, p, end_pos, beg_neg, n, end_neg) + if (beg_pos < p < end_pos) and (beg_neg < n < end_neg): + if passed >= consecutive: + break + passed += 1 + else: + passed = 0 + else: + raise Exception('ERROR: right cutoff not found!!!') + cutoffR += consecutive # rescale, we asked for XX consecutive + + if min_perc: + cutoffL = min_perc / 100. * size + if max_perc: + cutoffR = max_perc / 100. * size + + min_count = sorted_sum[int(cutoffL)] + try: + max_count = sorted_sum[int(cutoffR)] + except IndexError: # all good + max_count = sorted_sum[-1] + 1 + + if verbose: + print(' * Lower cutoff applied until bin number: %d' % (cutoffL)) + print(' * too few interactions defined as less than %9d interactions' % (min_count)) + print(' * Upper cutoff applied until bin number: %d' % (cutoffR)) + print(' * too much interactions defined as more than %9d interactions' % (max_count)) + + # plot + + badcol = {} + countL = 0 + countZ = 0 + countU = 0 + for c in xrange(size): + if cisprc.get(c, [0, 0])[1] < min_count: + badcol[c] = cisprc.get(c, [0, 0])[1] + countL += 1 + if not c in cisprc: + countZ += 1 + elif cisprc[c][1] > max_count: # don't need get here, already cought in previous condition + badcol[c] = cisprc.get(c, [0, 0])[1] + countU += 1 + print(' => %d BAD bins (%d/%d/%d null/low/high counts) of %d (%.1f%%)' % (len(badcol), countZ, countL, countU, size, float(len(badcol)) / size * 100)) + + return badcol diff --git a/tadphys/utils/hic_parser.py b/tadphys/utils/hic_parser.py new file mode 100644 index 0000000..c5e597c --- /dev/null +++ b/tadphys/utils/hic_parser.py @@ -0,0 +1,391 @@ +from __future__ import print_function + +from future import standard_library +from builtins import next +standard_library.install_aliases() +from sys import stderr +from io import IOBase +from collections import OrderedDict +from warnings import warn +from math import sqrt, isnan + +import numpy as np + +try: + file_types = file, IOBase +except NameError: + file_types = (IOBase,) + +try: + basestring +except NameError: + basestring = str + +HIC_DATA = True + + +class AutoReadFail(Exception): + """ + Exception to handle failed autoreader. + """ + pass + + +def is_asymmetric(matrix): + """ + Helper functions for the autoreader. + """ + maxn = len(matrix) + for i in range(maxn): + maxi = matrix[i] # slightly more efficient + for j in range(i+1, maxn): + if maxi[j] != matrix[j][i]: + if isnan(maxi[j]) and isnan(matrix[j][i]): + continue + return True + return False + +def symmetrize_dico(hic): + """ + Make an HiC_data object symmetric by summing two halves of the matrix + """ + ncol = len(hic) + for i in xrange(ncol): + incol = i * ncol + for j in xrange(i, ncol): + p1 = incol + j + p2 = j * ncol + i + val = hic.get(p1, 0) + hic.get(p2, 0) + if val: + hic[p1] = hic[p2] = val + + +def symmetrize(matrix): + """ + Make a matrix symmetric by summing two halves of the matrix + """ + maxn = len(matrix) + for i in range(maxn): + for j in range(i, maxn): + matrix[i][j] = matrix[j][i] = matrix[i][j] + matrix[j][i] + +def __read_file_header(f): + """ + Read file header, inside first commented lines of a file + + :returns masked dict, chromsomes orderedDict, crm, beg, end, resolution: + """ + masked = {} + chromosomes = OrderedDict() + crm, beg, end, reso = None, None, None, None + fpos = f.tell() + for line in f: + if line[0] != '#': + break + fpos += len(line) + if line.startswith('# MASKED'): + try: + masked = dict([(int(n), True) for n in line.split()[-1].split(',')]) + except ValueError: # nothing here + pass + elif line.startswith('# CRM'): + _, _, crm, size = line.split() + chromosomes[crm] = int(size) + elif 'resolution:' in line: + _, coords, reso = line.split() + try: + crm, pos = coords.split(':') + beg, end = list(map(int, pos.split('-'))) + except ValueError: + crm = coords + beg, end = None, None + reso = int(reso.split(':')[1]) + f.seek(fpos) + if crm == 'full': + crm = None + return masked, chromosomes, crm, beg, end, reso + + +def abc_reader(f): + """ + Read matrix stored in 3 column format (bin1, bin2, value) + + :param f: an iterable (typically an open file). + + :returns: An iterator to be converted in dictionary, matrix size, raw_names + as list of tuples (chr, pos), dictionary of masked bins, and boolean + reporter of symetric transformation + """ + masked, chroms, crm, beg, end, reso = __read_file_header(f) # TODO rest of it not used here + sections = {} + size = 0 + for c in chroms: + sections[c] = size + size += chroms[c] // reso + 1 + if beg: + header = [(crm, '%d-%d' % (l * reso + 1, (l + 1) * reso)) + for l in range(beg, end)] + else: + header = [(c, '%d-%d' % (l * reso + 1, (l + 1) * reso)) + for c in chroms + for l in range(sections[c], sections[c] + chroms[c] // reso + 1)] + num = int if HIC_DATA else float + offset = (beg or 0) * (1 + size) + def _disect(x): + a, b, v = x.split() + return (int(a) + int(b) * size + offset, num(v)) + items = tuple(_disect(line) for line in f) + return items, size, header, masked, False + + +def __is_abc(f): + """ + Only works for matrices with more than 3 bins + """ + fpos = f.tell() + count = 0 + for line in f: + if line.startswith('#'): + continue + count += 1 + if len(line.split()) != 3: + f.seek(fpos) + return False + if count > 3: + f.seek(fpos) + return True + f.seek(fpos) + return False + +def autoreader(f): + """ + Auto-detect matrix format of HiC data file. + + :param f: an iterable (typically an open file). + + :returns: An iterator to be converted in dictionary, matrix size, raw_names + as list of tuples (chr, pos), dictionary of masked bins, and boolean + reporter of symetric transformation + """ + masked = __read_file_header(f)[0] # TODO rest of it not used here + + # Skip initial comment lines and read in the whole file + # as a list of lists. + line = next(f) + items = [line.split()] + [line.split() for line in f] + + # Count the number of elements per line after the first. + # Wrapping in a set is a trick to make sure that every line + # has the same number of elements. + S = set([len(line) for line in items[1:]]) + ncol = S.pop() + # If the set 'S' is not empty, at least two lines have a + # different number of items. + if S: + raise AutoReadFail('ERROR: unequal column number') + + # free little memory + del(S) + + nrow = len(items) + # Auto-detect the format, there are only 4 cases. + if ncol == nrow: + try: + _ = [float(item) for item in items[0] + if not item.lower() in ['na', 'nan']] + # Case 1: pure number matrix. + header = False + trim = 0 + except ValueError: + # Case 2: matrix with row and column names. + header = True + trim = 1 + warn('WARNING: found header') + else: + if len(items[0]) == len(items[1]): + # Case 3: matrix with row information. + header = False + trim = ncol - nrow + # warn('WARNING: found %d colum(s) of row names' % trim) + else: + # Case 4: matrix with header and row information. + header = True + trim = ncol - nrow + 1 + warn('WARNING: found header and %d colum(s) of row names' % trim) + # Remove header line if needed. + if header and not trim: + header = items.pop(0) + nrow -= 1 + elif not trim: + header = list(range(1, nrow + 1)) + elif not header: + header = [tuple([a for a in line[:trim]]) for line in items] + else: + del(items[0]) + nrow -= 1 + header = [tuple([a for a in line[:trim]]) for line in items] + # Get the numeric values and remove extra columns + num = int if HIC_DATA else float + try: + items = [[num(a) for a in line[trim:]] for line in items] + except ValueError: + if not HIC_DATA: + raise AutoReadFail('ERROR: non numeric values') + try: + # Dekker data 2009, uses integer but puts a comma... + items = [[int(float(a)+.5) for a in line[trim:]] for line in items] + warn('WARNING: non integer values') + except ValueError: + try: + # Some data may contain 'NaN' or 'NA' + items = [ + [0 if a.lower() in ['na', 'nan'] + else int(float(a)+.5) for a in line[trim:]] + for line in items] + warn('WARNING: NA or NaN founds, set to zero') + except ValueError: + raise AutoReadFail('ERROR: non numeric values') + + # Check that the matrix is square. + ncol -= trim + if ncol != nrow: + raise AutoReadFail('ERROR: non square matrix') + + symmetricized = False + if is_asymmetric(items): + warn('WARNING: matrix not symmetric: summing cell_ij with cell_ji') + symmetrize(items) + symmetricized = True + return (((i + j * ncol, a) for i, line in enumerate(items) + for j, a in enumerate(line) if a), + ncol, header, masked, symmetricized) + + +def _header_to_section(header, resolution): + """ + converts row-names of the form 'chr12\t1000-2000' into sections, suitable + to create HiC_data objects. Also creates chromosomes, from the reads + """ + sections = {} + chromosomes = None + if (isinstance(header, list) + and isinstance(header[0], tuple) + and len(header[0]) > 1): + chromosomes = OrderedDict() + for i, h in enumerate(header): + if '-' in h[1]: + a, b = list(map(int, h[1].split('-'))) + if resolution==1: + resolution = abs(b - a) + 1 + elif resolution != abs(b - a) + 1: + raise Exception('ERROR: found different resolution, ' + + 'check headers') + else: + a = int(h[1]) + if resolution==1 and i: + resolution = abs(a - b) + 1 + elif resolution == 1: + b = a + sections[(h[0], a // resolution)] = i + chromosomes.setdefault(h[0], 0) + chromosomes[h[0]] += 1 + return chromosomes, sections, resolution + + +def read_matrix(things, parser=None, hic=True, resolution=1, size=None, + **kwargs): + """ + Read and checks a matrix from a file (using + :func:`taddyn.utils.hic_parser.autoreader`) or a list. + + :param things: might be either a file name, a file handler or a list of + list (all with same length) + :param None parser: a parser function that returns a tuple of lists + representing the data matrix, + with this file example.tsv: + :: + + chrT_001 chrT_002 chrT_003 chrT_004 + chrT_001 629 164 88 105 + chrT_002 86 612 175 110 + chrT_003 159 216 437 105 + chrT_004 100 111 146 278 + + the output of parser('example.tsv') might be: + ``([629, 86, 159, 100, 164, 612, 216, 111, 88, 175, 437, 146, 105, 110, + 105, 278])`` + + :param 1 resolution: resolution of the matrix + :param True hic: if False, assumes that files contains normalized data + :param None size: size of the square matrix, needed if we read a dictionary + :returns: the corresponding matrix concatenated into a huge list, also + returns number or rows + + """ + global HIC_DATA + HIC_DATA = hic + if not isinstance(things, list): + things = [things] + matrices = [] + for thing in things: + if isinstance(thing, dict): + HiC_data = {'matrix':thing, 'size': size, 'masked':{}} + matrices.append(HiC_data) + elif isinstance(thing, file_types): + parser = parser or (abc_reader if __is_abc(thing) else autoreader) + matrix, size, header, masked, sym = parser(thing) + thing.close() + _, _, resolution = _header_to_section(header,resolution) + matrix = dict((pos, val) for pos, val in matrix) + HiC_data = {'matrix':matrix, 'size': size, 'masked':masked} + matrices.append(HiC_data) + elif isinstance(thing, basestring): + try: + with open(thing) as f_thing: + parser = parser or (abc_reader if __is_abc(f_thing) else autoreader) + matrix, size, header, masked, sym = parser(f_thing) + except IOError: + if len(thing.split('\n')) > 1: + parser = parser or (abc_reader if __is_abc(thing.split('\n')) else autoreader) + matrix, size, header, masked, sym = parser(thing.split('\n')) + else: + raise IOError('\n ERROR: file %s not found\n' % thing) + _, _, resolution = _header_to_section(header, resolution) + matrix = dict((pos, val) for pos, val in matrix) + HiC_data = {'matrix':matrix, 'size': size, 'masked':masked} + matrices.append(HiC_data) + elif isinstance(thing, list): + if all([len(thing)==len(l) for l in thing]): + size = len(thing) + matrix = dict((i + j * size, v) for i, l in enumerate(thing) + for j, v in enumerate(l) if v) + else: + raise Exception('must be list of lists, all with same length.') + HiC_data = {'matrix':matrix, 'size': size, 'masked':{}} + matrices.append(HiC_data) + elif isinstance(thing, tuple): + # case we know what we are doing and passing directly list of tuples + matrix = thing + siz = sqrt(len(thing)) + if int(siz) != siz: + raise AttributeError('ERROR: matrix should be square.\n') + size = int(siz) + matrix = dict((pos, val) for pos, val in matrix) + HiC_data = {'matrix':matrix, 'size': size, 'masked':{}} + matrices.append(HiC_data) + elif isinstance(thing, (np.ndarray, np.generic) ): + try: + row, col = thing.shape + if row != col: + raise Exception('matrix needs to be square.') + sqrt_matrix = thing.reshape(-1).tolist()[0] + size = row + matrix = dict((i + j * size, v) for i, l in enumerate(sqrt_matrix) + for j, v in enumerate(l) if v) + except Exception as exc: + print('Error found:', exc) + HiC_data = {'matrix':matrix, 'size': size, 'masked':{}} + matrices.append(HiC_data) + else: + raise Exception('Unable to read this file or whatever it is :)') + return matrices diff --git a/tadphys/utils/maths.py b/tadphys/utils/maths.py new file mode 100644 index 0000000..b204501 --- /dev/null +++ b/tadphys/utils/maths.py @@ -0,0 +1,88 @@ +from math import log10 +import numpy as np + +def transform(val): + with np.errstate(divide='ignore'): + return np.log10(val) + + +def nozero_log(values): + # Set the virtual minimum of the matrix to half the non-null real minimum + minv = float(min([v for v in list(values.values()) if v])) / 2 + # if minv > 1: + # warn('WARNING: probable problem with normalization, check.\n') + # minv /= 2 # TODO: something better + logminv = transform(minv) + for i in values: + try: + values[i] = transform(values[i]) + except ValueError: + values[i] = logminv + + +def nozero_log_list(values): + # Set the virtual minimum of the matrix to half the non-null real minimum + try: + if not np.isfinite(transform(0)): + raise Exception() + minv = 0. + except: + try: + minv = float(min([v for v in values if v])) / 2 + except ValueError: + minv = 1 + # if minv > 1: + # warn('WARNING: probable problem with normalization, check.\n') + # minv /= 2 # TODO: something better + logminv = transform(minv) + return [transform(v) if v else logminv for v in values] + + +def nozero_log_matrix(values, transformation): + # Set the virtual minimum of the matrix to half the non-null real minimum + try: + if not np.isfinite(transform(0)): + raise Exception() + minv = 0. + except: + try: + minv = float(min([v for l in values for v in l + if v and not np.isnan(v)])) / 2 + except ValueError: + minv = 1 + logminv = transformation(minv) + return [[transformation(v) if v else logminv for v in l] for l in values] + + +def zscore(values): + """ + Calculates the log10, Z-score of a given list of values. + + .. note:: + + _______________________/___ + / + / + / + / + / + / + / + / + / + / + / + / + / score + ___/_________________________________ + / + + """ + # get the log trasnform values + nozero_log(values) + mean_v = np.mean(list(values.values())) + std_v = np.std (list(values.values())) + # replace values by z-score + for i in values: + values[i] = (values[i] - mean_v) / std_v + diff --git a/tadphys/utils/modelAnalysis.py b/tadphys/utils/modelAnalysis.py new file mode 100644 index 0000000..375f73c --- /dev/null +++ b/tadphys/utils/modelAnalysis.py @@ -0,0 +1,195 @@ +from taddyn.squared_distance_matrix import squared_distance_matrix_calculation_wrapper +from scipy.stats import spearmanr, pearsonr, chisquare +from taddyn.utils.tadmaths import calinski_harabasz, nozero_log_list +from itertools import combinations +from pickle import load, dump, HIGHEST_PROTOCOL + + +def get_contact_matrix(tdm, + stage=None, cutoff=None, + show_bad_columns=True): + """ + Returns a matrix with the number of interactions observed below a given + cutoff distance. + + :param tdm: Dictionary with TADdyn output model info + :param None stage: compute the contact matrix only for the models in + stage number 'stage' + :param None cutoff: distance cutoff (nm) to define whether two particles + are in contact or not, default is 2 times resolution, times scale. + Cutoff can also be a list of values, in wich case the returned object + will be a dictionnary of matrices (keys being square cutoffs) + :param True show_bad_columns: show bad columns in contact map + + :returns: matrix frequency of interaction + """ + + if stage > -1 and stage in tdm['stages']: + models = [m for m in tdm['stages'][stage]] + else: + models = tdm['models'].keys() + if not cutoff: + cutoff = [int(2 * tdm['resolution'] * tdm['config']['scale'])] + #cutoff = [int(2)] # * tdm.resolution * tdm.config['scale'])] + cutoff_list = True + if not isinstance(cutoff, list): + cutoff = [cutoff] + cutoff_list = False + cutoff.sort(reverse=True) + cutoff = [c**2 for c in cutoff] + matrix = dict([(c, [[0. for _ in xrange(tdm['loci'])] + for _ in xrange(tdm['loci'])]) for c in cutoff]) + # remove (or not) interactions from bad columns + if show_bad_columns: + wloci = [i for i in xrange(tdm['loci']) if tdm['zeros'][0][i]] + else: + wloci = [i for i in xrange(tdm['loci'])] + models = [tdm['models'][mdl] for mdl in tdm['models']] + + frac = 1.0 / len(models) + #print "#Frac",frac + + all_matrix = [] + for model in models: + #print model + squared_distance_matrix = squared_distance_matrix_calculation_wrapper( + model['x'], model['y'], model['z'], tdm['loci']) + #print model, len(x), len(y), len(z) + for c in cutoff: + #print "#Cutoff",c + for i, j in combinations(wloci, 2): + if squared_distance_matrix[i][j] <= c: + matrix[c][i][j] += frac # * 100 + matrix[c][j][i] += frac # * 100 + + if cutoff_list: + return matrix + return matrix.values()[0] + + +def correlate_with_real_data(tdm, models=None, cluster=None, + stage=None, index=0, + dynamics=False, cutoff=None, + off_diag=1, plot=False, axe=None, savefig=None, + corr='spearman', midplot='hexbin', + log_corr=True, contact_matrix=None, + cmap='viridis', show_bad_columns=True): + """ + Plots the result of a correlation between a given group of models and + original Hi-C data. + + :param tdm: Dictionary with TADdyn output model info + :param None models: if None (default) the correlation will be computed + using all the models. A list of numbers corresponding to a given set + of models can be passed + :param None cluster: compute the correlation only for the models in the + cluster number 'cluster' + :param None dynamics: compute the correlation for all the stages + :param None cutoff: distance cutoff (nm) to define whether two particles + are in contact or not, default is 2 times resolution, times scale. + :param None savefig: path to a file where to save the image generated; + if None, the image will be shown using matplotlib GUI (the extension + of the file name will determine the desired format). + :param False plot: to display the plot + :param True log_corr: log plot for correlation + :param None axe: a matplotlib.axes.Axes object to define the plot + appearance + :param None contact_matrix: input a contact matrix instead of computing + it from the models + :param 'viridis' cmap: The Colormap instance + :param True show_bad_columns: Wether to hide or not bad columns in the + contact map + + :returns: correlation coefficient rho, between the two + matrices. A rho value greater than 0.7 indicates a very good + correlation + """ + if dynamics: + if not savefig: + raise Exception('ERROR: dynamics should only be called ' + + 'with savefig option.\n') + return + if not isdir(savefig): + raise Exception('ERROR: savefig should ' + + 'be a folder with dynamics option.\n') + return + elif stage is not None and stage not in tdm['stages']: + raise Exception('ERROR: stage ' + + 'not found in stages.\n') + return + if not cutoff: + cutoff = int(2 * tdm['resolution'] * tdm['config']['scale']) + if contact_matrix: + all_original_data = [0] + all_model_matrix = [contact_matrix] + else: + if dynamics: + all_model_matrix = [] + all_original_data = [] + for st in range(0,int((len(tdm['stages'])-1)/tdm['models_per_step'])+1): + all_original_data.append(st) + all_model_matrix.append(get_contact_matrix(tdm, stage=int(st*tdm['models_per_step']), cutoff=cutoff, show_bad_columns=show_bad_columns)) + elif stage is not None: + all_original_data = [index] + all_model_matrix = [get_contact_matrix(tdm, stage=stage,cutoff=cutoff)] + else: + all_original_data = [index] + all_model_matrix = [get_contact_matrix(tdm, models=models, cluster=cluster, + cutoff=cutoff, show_bad_columns=show_bad_columns)] + correl = {} + for model_matrix, od in zip(all_model_matrix,all_original_data): + oridata = [] + moddata = [] + if len(model_matrix) == 0: + correl[od] = 'Nan' + continue + if dynamics: + original_data = tdm['original_data'][od] + elif stage is not None or len(tdm['stages']) > 0: + original_data = tdm['original_data'][od] + else: + original_data = tdm['original_data'] + for i in xrange(len(original_data)): + for j in xrange(i + off_diag, len(original_data)): + if not original_data[i][j] > 0: + continue + oridata.append(original_data[i][j]) + moddata.append(model_matrix[i][j]) + if corr == 'spearman': + correl[od] = spearmanr(moddata, oridata) + elif corr == 'pearson': + correl[od] = pearsonr(moddata, oridata) + elif corr == 'logpearson': + correl[od] = pearsonr(nozero_log_list(moddata), nozero_log_list(oridata)) + elif corr == 'chi2': + tmpcorr = chisquare(array(moddata), array(oridata)) + tmpcorr = 1. / tmpcorr[0], tmpcorr[1] + correl[od] = tmpcorr + else: + raise NotImplementedError('ERROR: %s not implemented, must be one ' + + 'of spearman, pearson or frobenius\n') + if len(correl) < 2: + return correl[next(iter(correl))] + return correl + + +def save_models(models, outfile, minimal=(), convertToDict=True): + """ + Saves all the models in pickle format (python object written to disk). + :param path_f: path where to save the pickle file + :param () minimal: list of items to exclude from save. Options: + - 'restraints': used for modeling common to all models + - 'zscores': used generate restraints common to all models + - 'original_data': used generate Z-scores common to all models + - 'log_objfun': generated during modeling model specific + :param True convertToDict: Convert LAMMPSmodel object to dictionary + of dictionaries. Needed to convert to TADbit format without installing + TADdyn + """ + + if convertToDict == True: + models['models'] = dict((mod, dict(models['models'][mod])) + for mod in models['models']) + out = open(outfile, 'wb') + dump(models, out, HIGHEST_PROTOCOL) + out.close() \ No newline at end of file diff --git a/tadphys/utils/tadmaths.py b/tadphys/utils/tadmaths.py new file mode 100644 index 0000000..714ccc5 --- /dev/null +++ b/tadphys/utils/tadmaths.py @@ -0,0 +1,230 @@ +""" +06 Aug 2013 + + +""" + +from bisect import bisect_left +from itertools import combinations +from math import log10, exp +from warnings import warn +import numpy as np + + +def mad(arr): + """ Median Absolute Deviation: a "Robust" version of standard deviation. + Indices variability of the sample. + https://en.wikipedia.org/wiki/Median_absolute_deviation + """ + if not isinstance(arr, np.ndarray): + arr = np.array(arr) + arr = np.ma.array(arr).compressed() # should be faster to not use masked arrays. + med = np.median(arr) + return np.median(np.abs(arr - med)) + +def right_double_mad(arr): + """ Double Median Absolute Deviation: a 'Robust' version of standard deviation. + Indices variability of the sample. + http://eurekastatistics.com/using-the-median-absolute-deviation-to-find-outliers + """ + if not isinstance(arr, np.ndarray): + arr = np.array(arr) + arr = np.ma.array(arr).compressed() # should be faster to not use masked arrays. + med = np.median(arr) + right_mad = np.median(np.abs(arr[arr > med] - med)) + return right_mad + +def newton_raphson (guess, contour, sq_length, jmax=2000, xacc=1e-12): + """ + Newton-Raphson method as defined in: + http://www.maths.tcd.ie/~ryan/TeachingArchive/161/teaching/newton-raphson.c.html + used to search for the persistence length of a given model. + + :param guess: starting value + :param contour: of the model + :param sq_length: square of the distance between first and last particle + :param 100 jmax: number of tries + :param 1e-12 xacc: precision of the optimization + + :returns: persistence length + """ + for _ in xrange(1, jmax): + contour_guess = contour / guess + expcx = exp(-contour_guess) - 1 + # function + fx = 2 * pow(guess, 2) * (contour_guess + expcx) - sq_length + # derivative + df = 2 * contour * expcx + 4 * guess * (expcx + contour_guess) + + dx = -fx / df + if abs(dx) < xacc: + # print ("found root after %d attempts, at %f\n" % (j, x)) + return guess + guess += dx + raise Exception("ERROR: exceeded max tries no root\n") + + +class Interpolate(object): + """ + Simple linear interpolation, to be used when the one from scipy is not + available. + """ + def __init__(self, x_list, y_list): + for i, (x, y) in enumerate(zip(x_list, x_list[1:])): + if y - x < 0: + raise ValueError("x must be in strictly ascending") + if y - x == 0 and i >= len(x_list)-2: + x_list = x_list[:-1] + y_list = y_list[:-1] + if any(y - x <= 0 for x, y in zip(x_list, x_list[1:])): + raise ValueError("x must be in strictly ascending") + x_list = self.x_list = map(float, x_list) + y_list = self.y_list = map(float, y_list) + intervals = zip(x_list, x_list[1:], y_list, y_list[1:]) + self.slopes = [(y2 - y1)/(x2 - x1) for x1, x2, y1, y2 in intervals] + + def __call__(self, x): + i = bisect_left(self.x_list, x) - 1 + return self.y_list[i] + self.slopes[i] * (x - self.x_list[i]) + + +def transform(val): + return log10(val) + +def nozero_log(values): + # Set the virtual minimum of the matrix to half the non-null real minimum + minv = float(min([v for v in values.values() if v])) / 2 + # if minv > 1: + # warn('WARNING: probable problem with normalization, check.\n') + # minv /= 2 # TODO: something better + logminv = transform(minv) + for i in values: + try: + values[i] = transform(values[i]) + except ValueError: + values[i] = logminv + +def nozero_log_list(values): + # Set the virtual minimum of the matrix to half the non-null real minimum + try: + transform(0) + minv = 0. + except: + try: + minv = float(min([v for v in values if v])) / 2 + except ValueError: + minv = 1 + # if minv > 1: + # warn('WARNING: probable problem with normalization, check.\n') + # minv /= 2 # TODO: something better + logminv = transform(minv) + return [transform(v) if v else logminv for v in values] + +def nozero_log_matrix(values, transformation): + # Set the virtual minimum of the matrix to half the non-null real minimum + try: + transform(0) + minv = 0. + except: + try: + minv = float(min([v for l in values for v in l + if v and not np.isnan(v)])) / 2 + except ValueError: + minv = 1 + logminv = transformation(minv) + return [[transformation(v) if v else logminv for v in l] for l in values] + + +def zscore(values): + """ + Calculates the log10, Z-score of a given list of values. + + .. note:: + + _______________________/___ + / + / + / + / + / + / + / + / + / + / + / + / + / score + ___/_________________________________ + / + + """ + # get the log trasnform values + nozero_log(values) + mean_v = np.mean(values.values()) + std_v = np.std (values.values()) + # replace values by z-score + for i in values: + values[i] = (values[i] - mean_v) / std_v + + +def calinski_harabasz(scores, clusters): + """ + Implementation of the CH score [CalinskiHarabasz1974]_, that has shown to be + one the most accurate way to compare clustering methods + [MilliganCooper1985]_ [Tibshirani2001]_. + + The CH score is: + + .. math:: + + CH(k) = \\frac{B(k) / (k-1)}{W(k)/(n - k)} + + Where :math:`B(k)` and :math:`W(k)` are the between and within cluster sums + of squares, with :math:`k` clusters, and :math:`n` the total number of + elements (models in this case). + + :param scores: a dict with, as keys, a tuple with a pair of models; and, as + value, the distance between these models. + :param clusters: a dict with, as key, the cluster number, and as value a + list of models + :param nmodels: total number of models + + :returns: the CH score + """ + cluster_list = [c for c in clusters if len(clusters[c]) > 1] + if len(cluster_list) <= 1: + return 0.0 + nmodels = sum([len(clusters[c]) for c in cluster_list]) + + between_cluster = (sum([sum([sum([scores[(md1, md2)]**2 + for md1 in clusters[cl1]]) + for md2 in clusters[cl2]]) + / (len(clusters[cl1]) * len(clusters[cl2])) + for cl1, cl2 in combinations(cluster_list, 2)]) + / ((len(cluster_list) - 1.0) / 2)) + + within_cluster = (sum([sum([scores[(md1, md2)]**2 + for md1, md2 in combinations(clusters[cls], 2)]) + / (len(clusters[cls]) * (len(clusters[cls]) - 1.0) / 2) + for cls in cluster_list])) + + return ((between_cluster / (len(cluster_list) - 1)) + / + (within_cluster / (nmodels - len(cluster_list)))) + + + +def mean_none(values): + """ + Calculates the mean of a list of values without taking into account the None + + :param values: list of values + + :returns: the mean value + """ + values = [i for i in values if i !=None] + if values: + return float(sum(values)) / len(values) + else: + return None diff --git a/test/compartmentalization.py b/test/compartmentalization.py new file mode 100644 index 0000000..d2612d3 --- /dev/null +++ b/test/compartmentalization.py @@ -0,0 +1,47 @@ +#awk -v nc=3 'BEGIN{print (10143*3000/(30*30*30)/0.00441745*nc)^(1/3); print nc*10143}' + +from tadphys.modelling.lammps_modelling import run_lammps + +# From 52795 to 58989 both included +#nparticles = int(90702639/1000) +chrlength=20000000 +nparticles = int(chrlength/1000) +nchrs = 1 +chromosome_particle_numbers=[nparticles]*nchrs +print(chrlength,nparticles, nchrs, chromosome_particle_numbers) + +density=0.015 +side = (nparticles*1000*nchrs/(14*14*14)/density)**(1./3.) + +print("%d Chromosomes of length %d in a cube of side %s" % (nchrs, nparticles, side)) + +epsilon = XXXepsilonXXX +compartmentalization = { + 'partition' : { 1 : [( 1,1000,1),(2001,3000,1),(4001,5000,1),(6001,7000,1),(8001,9000,1),(10001,11000,1),(12001,13000,1),(14001,15000,1),(16001,17000,1),(18001,19000,1)], + 2 : [(1001,2000,1),(3001,4000,1),(5001,6000,1),(7001,8000,1),(9001,10000,1),(11001,12000,1),(13001,14000,1),(15001,16000,1),(17001,18000,1),(19001,2000,1)]}, + 'radii' : { 1 : 0.5, + 2 : 0.5}, + 'interactions' : {(1,1):["attraction",float(epsilon)], + (2,2):["attraction",float(epsilon)], + (1,2):["repulsion" ,1.0]}, + 'runtime' : 1000000, +} + +r = 1 + +#At the new coarse-graining, 14nm~1kb: What should be the size of a single bead? +for replica in range(r, r+1,1): + initial_conformation = "../../0_generate_initial_conformation/Initial_rosette_conformation_with_pbc_replica_1.dat" + + run_lammps(initial_conformation=initial_conformation, + minimize = True, + tethering = False, + compartmentalization = compartmentalization, + kseed = int(replica), + to_dump = 10000, + pbc=True, + run_time = 0, + lammps_folder = "./", + chromosome_particle_numbers = [20000], + confining_environment = ['cube', 80.618] + ) diff --git a/test/compartmentalization.sh b/test/compartmentalization.sh new file mode 100644 index 0000000..d9b078a --- /dev/null +++ b/test/compartmentalization.sh @@ -0,0 +1,12 @@ +for epsilon in $(seq 0.0 0.1 1.0); +do + + mkdir -p epsilon_${epsilon}kBT + cd epsilon_${epsilon}kBT + pwd + + sed "s/XXXepsilonXXX/${epsilon}/g" ../compartmentalization.py | python > output_${epsilon}kBT.log & + + cd .. + +done diff --git a/test/loop_extrusion.py b/test/loop_extrusion.py new file mode 100644 index 0000000..55905f0 --- /dev/null +++ b/test/loop_extrusion.py @@ -0,0 +1,77 @@ +#awk -v nc=3 'BEGIN{print (10143*3000/(30*30*30)/0.00441745*nc)^(1/3); print nc*10143}' + +from tadphys.modelling.lammps_modelling import run_lammps + +# From 52795 to 58989 both included +#nparticles = int(90702639/1000) +chrlength=20000000 +nparticles = int(chrlength/1000) +nchrs = 1 +chromosome_particle_numbers=[nparticles]*nchrs +print(chrlength,nparticles, nchrs, chromosome_particle_numbers) + +density=0.015 +side = (nparticles*1000*nchrs/(14*14*14)/density)**(1./3.) + +print("%d Chromosomes of length %d in a cube of side %s" % (nchrs, nparticles, side)) + +#loop_extrusion_dynamics = { 'separation' : 1000000/2700000*1000, + # in kb: typical separation between extruding factors. From DOI:https://doi.org/10.7554/eLife.40164 + #For cohesin, we previously estimated the fraction of cohesin complexes that are relatively stably + #associated with chromatin (~20–25 min residence time in mESC G1) and thus presumably topologi- + #cally engaged to be ~40% in G1 (Hansen et al., 2017). If we take this as the upper bound of puta- + #tively ‘loop-extruding’ cohesin complexes, we can similarly calculate the upper limit on the density + #of extruding cohesin molecules as ~5.3 per Mb assuming cohesin exists as a monomeric ring or ~2.7 + #per Mb if cohesin forms dimers (Figure 1G; full details on calculation in Materials and methods). This + #corresponds to a genomic distance between extruding cohesins of ~186–372 kb in mESCs, which + #approximately matches computational estimates (Fudenberg et al., 2016; Gassler et al., 2017). We + #envision that these numbers will be useful starting points for constraining and parameterizing models + #of 3D genome organization and we discuss some limitations of these estimates below.1 +# 'lifetime' : 600, + # in kb: segment extruded before detouching from the fiber. From DOI:https://doi.org/10.7554/eLife.40164 + #20-25min Cohesin residence time in mESC G1 and an extrusion velocity of 1kb/s -> 20*60*1kb=1200kb in total -> 600kb because it is two-sided! +# 'right_extrusion_rate' : 1, +# 'left_extrusion_rate' : 1, +# 'extrusion_time' : 100, # in integration times <- to change for tau_LJ +# 'barriers' : XXXbarriersXXX, + # barriers position in kb along the chromatin +# 'barriers_permeability' : 0.0, + # When 0 the extruder is always blocked! +# 'chrlength' : [20000], + # chromosome lengths +# 'attraction_strength' : 300.0, +# 'equilibrium_distance' : 0.0, +# } + +loop_extrusion_dynamics = { 'separation' : 1000000/2700000*1000, + 'lifetime' : 600, + 'right_extrusion_rate' : 1, + 'left_extrusion_rate' : 1, + 'extrusion_time' : 1000, + 'barriers' : XXXbarriersXXX, + 'barriers_permeability' : 0.0, + 'chrlength' : [20000], + 'attraction_strength' : 300.0, + 'equilibrium_distance' : 1.0, + } + +r = 1 + +#At the new coarse-graining, 14nm~1kb: What should be the size of a single bead? +for replica in range(r, r+1,1): + initial_conformation = "../../0_generate_initial_conformation/Initial_rosette_conformation_with_pbc_replica_1.dat" + + run_lammps(initial_conformation=initial_conformation, + minimize = True, + tethering = False, + initial_relaxation = 0, # The initial relaxation dynamics will be done after minimization and eventual compression + loop_extrusion_dynamics = loop_extrusion_dynamics, + kseed = int(replica), + to_dump = 1000, + pbc=True, + run_time = 1000000, + lammps_folder = "./", + chromosome_particle_numbers = [20000], + confining_environment = ['cube', 80.618], + hide_log = False + ) diff --git a/test/loop_extrusion.sh b/test/loop_extrusion.sh new file mode 100644 index 0000000..6109746 --- /dev/null +++ b/test/loop_extrusion.sh @@ -0,0 +1,29 @@ +chrlength=20000 + +#for nextruders in $(seq 1 10 100); +#do +for nbarriers in 1 ; # 10 20 30 40 50 60 70 80 90 100; +do + #dir=nextruders_${nextruders}_nbarriers_${nbarriers} + dir=nbarriers_${nbarriers} + + if [[ -d $dir ]]; + then + continue + fi + + echo $nbarriers + + barriers=$(awk -v cl=$chrlength -v nb=${nbarriers} 'BEGIN{printf("["); db=int(cl/(nb)); for(i=db/2;i0) printf("%d,", int(i)); printf("%d]",cl-db/2)}') + echo $barriers + + mkdir -p ${dir} + cd $dir + pwd + + sed -e "s/XXXnextrudersXXX/${nextruders}/g" -e "s/XXXbarriersXXX/${barriers}/g" ../loop_extrusion.py > loop_extrusion_${nbarriers}.py + python loop_extrusion_${nbarriers}.py > output_nbarriers_${nbarriers}.log & + cd .. + +done # Close cycle over barriers +#done # Close cycle over extruders diff --git a/test/relaxation.py b/test/relaxation.py new file mode 100644 index 0000000..08b2172 --- /dev/null +++ b/test/relaxation.py @@ -0,0 +1,52 @@ +#awk -v nc=3 'BEGIN{print (10143*3000/(30*30*30)/0.00441745*nc)^(1/3); print nc*10143}' +from os import rename + +from tadphys.modelling.lammps_modelling import run_lammps + +# From 52795 to 58989 both included +#nparticles = int(90702639/1000) +chrlength=20000000 +nparticles = int(chrlength/1000) +nchrs = 1 +chromosome_particle_numbers=[nparticles]*nchrs +print(chrlength,nparticles, nchrs, chromosome_particle_numbers) + +density=0.015 +side = (nparticles*1000*nchrs/(14*14*14)/density)**(1./3.) + + +print("%d Chromosomes of length %d in a cube of side %s" % (nchrs, nparticles, side)) + +replica = XXXreplicaXXX + +#At the new coarse-graining, 14nm~1kb: What should be the size of a single bead? +initial_conformation = "../../0_generate_initial_conformation/Initial_rosette_conformation_with_pbc_replica_%s.dat" % (replica) + +run_lammps(initial_conformation=initial_conformation, + minimize = True, + initial_relaxation = 10000000, + kseed = int(replica), + to_dump = 10000000, + pbc=True, + run_time = 0, + lammps_folder = "./", + chromosome_particle_numbers = [20000], + confining_environment = ['cube', 80.618], + hide_log=False + ) + +rename('relaxed_conformation.txt','relaxed_conformation_part1.txt') +rename('initial_relaxation_10000000.txt','initial_relaxation_10000000_part1.txt') + +run_lammps(initial_conformation="relaxed_conformation_part1.txt", + minimize = False, + initial_relaxation = 100000000, + kseed = int(replica), + to_dump = 1000, + pbc=True, + run_time = 0, + lammps_folder = "./", + chromosome_particle_numbers = [20000], + confining_environment = ['cube', 80.618], + hide_log=False + ) diff --git a/test/relaxation.sh b/test/relaxation.sh new file mode 100644 index 0000000..8c97316 --- /dev/null +++ b/test/relaxation.sh @@ -0,0 +1,13 @@ +# 0 - Relaxation run of 10MLN steps to try obliterating the initial rosettes; +# 1 - Relaxation run of 100MLN steps saving every 1000 steps. + +for replica in $(seq 2 1 2); +do + replicadir=replica_${replica} + mkdir -p ${replicadir} + cd ${replicadir} + sed -e "s/XXXreplicaXXX/${replica}/g" ../relaxation.py > relaxation_replica_${replica}.py + mpirun -np 16 python relaxation_replica_${replica}.py > output_replica_${replica}.log & + cd .. +done # Close cycle over replica +