Skip to content

Advanced Usage

Julian Dosch edited this page Mar 8, 2023 · 3 revisions

General Options

FAS comes with multitude of options. The first group of options, the general arguments, contains 4 arguments.

[--bidirectional] if this option is used, FAS will calculate the score in both directions (seed -> query, query -> seed). This will create a second .domains file

[--cpus] With this option you can define how many cpus calcFAS is allowed to use

Input Options

(1) There are several options that have an influence on the input of FAS. The first three options deal with which protein pairs should be scored:

[--seed_id] / [--query_id] These two options allow you to give a number of protein from either the seed or query fasta file. FAS will only calculated scores for the chosen ids in an all against all manner:

fas.run -q PATH/ortholog.fasta -s PATH/seed.fasta -a PATH/ANNOTATION -o PATH/FAS_OUT --seed_id ID_1 ID_2 ... query_id ID_A

[--pairwise] With this option you can give FAS a file containing pairs of protein ids. Instead of doing an all against all calculation FAS will only calculate scores for these pairs. Each line in the file should contain one tab separated pair with the seed protein first and query second. The proteins need to be present in the seed and query input files.

(2) The other two input option allow you to use different feature sets:

[-d|--feature_types] With this option you can give FAS an input file that controls which feature groups should be used for the calculations. You can also define which features should be linearized.

fas.run -q PATH/ortholog.fasta -s PATH/seed.fasta -a PATH/ANNOTATION -o PATH/FAS_OUT  -d PATH/featuretypes

This needs a file that looks like this:

 #linearized
 pfam
 smart
 #normal
 coils
 flps
 seg
 signalp
 tmhmm
 newfeatures

All features under #linearized will be linearized together all other feature groups should be under #normal. If you use features other than the standard feature groups that come with FAS you need to supply additional annotation files in the annotation directory with the next option.

[--extra_annotation] This option allows you to give additional annotations to FAS in the following way:

fas.run -q PATH/ortholog.fasta -s PATH/seed.fasta -a PATH/ANNOTATION -o PATH/FAS_OUT  --extra_annotation new ...

The additional annotation file should be place in the annotation directory. If the file name for the seed is someseed.fasta than the extra annotation files should be named someseed_[EXTRA_ANNOTATION].json (eg someseed_new.json for the example), these extra files need to exist for seed, query and the references (if given). Multiple names for extra annotations can be given.

You can create these .json files from tsv formatted files by using the annoParserFAS script that comes with FAS

Output Options

FAS has several different ways to return the output of the calculation to the user:

[-n|--out_name] This option allows you to set a name for the outputfiles. By default, fast creates the outputname from the seed and query name

[--raw] This options prints the FAS scores into the terminal

[--tsv] This option deactivates the tsv output, this also enables the [--raw] option

[--phyloprofile] This creates an outputfile for Phyloprofile to use. This needs a mapping file (see File Formats) as input that maps the protein ids to their Taxons ncbi_id. This expects a tab seperated file with the protein id in the first column and the ncbi id in the second

[--domain] This option deactivates the domain output

[--no_config] This options deactivates the config.yml output

Weighting

Besides the uniform weighting where all features equally contribute to the score, we have a weighting based on statistical appearance of features in a reference proteome. Here, highly frequent features will have a lower weight than scarce features. This weighting can be influenced in several ways:

[-r|--ref_proteome] [--ref2] These options allow you to give a reference proteome. The second option is relevant for bidirectional scoring

[-g|--weight_correction] With this option you can change how FAS calculates the weights by setting a so called correction function onto the feature count. For a detailed explanation of this function please refer to the weighting section in the FAS.pdf on the gh-pages

[-x|--weight_constraints] With this option you can give FAS a so called constraints file. In this file you can set minimum weights for features or whole databases/tools. The file should look like this:

 #tool_constraints
 coils N
 flps N
 pfam 0.5
 seg 0.1
 signalp N
 smart N
 tmhmm N
 #feature_constraints
 tmhmm_transmembrane 0.1
 pfam_HlyD 0.25

The sum of all constraint values under #tool_constraints as well as #feature_constraints should not exceed 1.0.

Threshold Options

There are several thresholds that can influence the FAS scoring:

[-c|--max_overlap] [--max_overlap_percentage] These two options allow you to change how overlaps are handled. With the first option you can define how many amino acids long an overlap between features is allowed to be(default=0). The second defines how much percentage of a features length an overlap is allowed to span. If an overlap falls under both these thresholds both features are allowed in the linearized path.

[-t|--priority_threshold] [-m|--max_cardinality] These two thresholds define, when FAS will switch to the greedy strategy priority mode instead of evaluating all paths exhaustively. The first threshold defines the maximum number of features while the second sets a maimum for the number of paths.

Annotation Options

The next group of options influence the feature annotation step before calculation:

[-f|--eFeature] [-i|--eInstance] These options set e-value cutoffs for the hmmscan based feature predictions pfam and smart. The first one is for the whole feature, while the second one is for feature instances. These cutoffs are also applied during calculation (If other annotations with e-values exist)

[--eFlps] This sets an e-value cutoff for the flps tool

[--org] With this you can set the organism of input sequence(s) for the SignalP search

[--toolPath] This option sets the path to Annotion tool directory created with prepareFAS. You don't need to touch this unless you moved the directory

Obscure Options (other options)

[--priority_mode] With this option you can deactivate the use of priority mode completely. NOT RECOMMENDED as this can severely increase runtime for larger proteins.

[--timelimit] If priority mode is deactivated, you can use this option to set a soft timelimit for how long a protein pair is allowed to stay in exhaustive mode before priority mode will be reactivated

[-w|--score_weights] With this you can influence the FAS score. This sets how the three scores MS, CS and PS are weighted. It takes exactly three float values. The sum should be 1.0, the default is 0.7, 0.0, 0.3 so that the MS makes up 70% of the score and the PS 30%

[--empty_as_1] Per default, if both proteins have no annotated features, an empty feature architecture, FAS will return a score of 0 (no features shared). When this option is activated, FAS will score these pairs with 1 instead.