You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm facing challenging alignments (several times 1000+ structures).
Since cath.superpose check if ssaps files exit, I found a way to speed up alignments by re-executing the cath.superpose command with random file order in the arguments (with a code example bellow, if it can be useful to someone);
But here's my question, I actually realised that all ssaps files pairs are computed
In some cases, I can have more than 10 million files in the same folder...
I was thinking if there is a particular reason to generate all pairs? Maybe cath.superpose could gain in efficiency and storage if only one file for each pair is generated?
Wishing you a nice day 🙂
Best regards,
Thibault.
Code example for running cath.superpose with random files order
export CATH_TOOLS_PDB_PATH=$WORKDIR
pdbinfile=""
for pdb in `ls $WORKDIR/*.pdb |sort -R`
do
pdbinfile+="--pdb-infile $pdb "
done
#echo $pdbinfile
cath-superpose --do-the-ssaps ssaps --sup-to-pdb-files-dir output $pdbinfile
The text was updated successfully, but these errors were encountered:
Thank you for using cath-superpose and for giving us some of your feedback - much appreciated.
I'm not 100% clear about your point about things being sped up by randomising the order of the inputs. Is the point that you're using the --do-the-ssaps option of cath-superpose and you're running several of these at the same time? So you're using the randomisation as a way to parallelise the SSAPs that generate the alignments? In which case, it sounds like it would be valuable to you if there was an option to tell --do-the-ssaps to run n SSAP jobs in parallel. Is that correct?
In general, I think you're right that this area feels like it could be improved. We did enough work in this area to start generating good multiple structural alignments and to build something usable but we think we could do much better on the current trade-off between quality and computation time and on figuring out which SSAPs don't need to be performed.
However, for the issue you're talking about, I think we've already exploited the symmetry of only needing one alignment for each pair of structures: the code only SSAPs+uses the pair in the order of the first-specified-on-the-command-line first. So I suspect what's happening is that your randomisation also randomises the ordering it requires for each pair.
Does that sound right? Does this reinforce the idea that you'd benefit from an in-built way to parallelise the --do-the-ssaps?
Dear all,
I'm facing challenging alignments (several times 1000+ structures).
Since cath.superpose check if ssaps files exit, I found a way to speed up alignments by re-executing the cath.superpose command with random file order in the arguments (with a code example bellow, if it can be useful to someone);
But here's my question, I actually realised that all ssaps files pairs are computed
In some cases, I can have more than 10 million files in the same folder...
I was thinking if there is a particular reason to generate all pairs? Maybe cath.superpose could gain in efficiency and storage if only one file for each pair is generated?
Wishing you a nice day 🙂
Best regards,
Thibault.
Code example for running cath.superpose with random files order
The text was updated successfully, but these errors were encountered: