Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to construct a network only focusing on my interesting genes? #210

Open
socialtree-yt opened this issue Dec 4, 2023 · 11 comments
Open

Comments

@socialtree-yt
Copy link

Hello, thank you for your convenient tools. I want to only focus on TFs regulating a exact set of genes and construct regulatory network on it. How can I do it? Is it right to use gene sets tpms expression data in "ananse network -e parameter" and not change inputs in "ananse binding"?

@siebrenf
Copy link
Member

siebrenf commented Dec 4, 2023

I think that is not correct. ANANSE works by combing lots of different values. To do this, each data types is scaled from 0-1, then averaged between data types. If you remove part of your expression data set, the remaining genes will still be scaled from 0-1, so some are now some genes are considered super important and others are considered unimportant, while reality can be totally different.

I think you should use the full dataset (for each data type), and then filter your results afterward. Furthermore, in ANANSE influence, you can whitelist your genes of interest with the -w parameter. This will make sure they are used to determine the differential network and top influential TF list.

Hope this helps!

@socialtree-yt
Copy link
Author

OK. I understand. Thank you for your help!

@socialtree-yt
Copy link
Author

Hi, I also want to ask if I can use part of regions to construct networks because I find "ananse network -r and -f parameters" can extract part of regions to do network.

@siebrenf
Copy link
Member

siebrenf commented Dec 5, 2023

The same rules apply. It's fine if, for example, you wish to filter for a specific chromosome (a superset that contains regions of interest and background regions) or you wish to remove outlier regions. But if your regions only contain regions of interest it will probably skew your results.

@socialtree-yt
Copy link
Author

How can I choose background regions if I used narrowpeak as input regions? Apart from my regions of interest, how much and which regions should be taked with background regions? Thank you for your help!

@siebrenf
Copy link
Member

You "choose" background regions by not removing them from your narrowPeaks :) In general: more is better.

If you want to shrink your input data down I guess you should keep 5 random peaks for each peak of interest. To predict motif activity, ANANSE randomly samples 3*50.000 regions. So if you shrink your input data down to 150.000 regions you should still be good. If you have less than 150.000 regions its fine, but maybe don't remove any.

@socialtree-yt
Copy link
Author

But if I use 5 times the random peaks that of interesting peaks, how can I identify the TFs targeting in peaks of interest rather than random background regions? And how can I know which TF target downstream genes through peaks of interest rather than random background regions?

@socialtree-yt
Copy link
Author

One other thing is, I have focused on some TFs and target genes but how can I specify which enhancers mediate these process. I want to select several specific and representative TF-enhancer-genes as samples to illustrate my theme. The ananse view only has relationships between TFs and enhancers but not with relations between enhancers and genes.

@socialtree-yt
Copy link
Author

So can I select several specific and representative TF-enhancer-genes as samples from ananse results? It seems to be difficult. Thank you for your help!

One other thing is, I have focused on some TFs and target genes but how can I specify which enhancers mediate these process. I want to select several specific and representative TF-enhancer-genes as samples to illustrate my theme. The ananse view only has relationships between TFs and enhancers but not with relations between enhancers and genes.

@siebrenf
Copy link
Member

siebrenf commented Jan 4, 2024

Happy new year! :)

how can I identify the TFs targeting in peaks of interest rather than random background regions?

You can use all peaks and all TFs for this. Run ANANSE binding, then use ananse view to see which TFs are most active in which regions. You can use filters in ANANSE view.

how can I know which TF target downstream genes through peaks of interest rather than random background regions?

Use all peaks in ANANSE binding. Then run ananse network --tfs tfs_of_interest.txt --regions regions_of_interest.bed. This way, the TF activity scores have been made with a proper background!

Note that ANANSE network outputs TF-gene links! The full output looks like this:

tf_target            prob        tf_expression       target_expression   weighted_binding  activity
LOC100127624—42Sp43  0.33331668  0.0532994923857868  0.8891054506351234  0.115612045       0.27524972
LOC100127624—42Sp50  0.3161416   0.0532994923857868  0.936017205457356   0.0               0.27524972

Headers:

  • tf_target: the transcription factor to target gene
  • prob: the aggregate interaction probability score (based on the other scores combined)
  • tf/target_expression: the scaled expression values of the TF/target gene
  • weighted_binding: aggregate score of all enhancers near the target gene
  • activity: TF activity from ANANSE binding

I want to select several specific and representative TF-enhancer-genes as samples to illustrate my theme. The ananse view only has relationships between TFs and enhancers but not with relations between enhancers and genes.

This is correct. ANANSE binding links TFs to enhancers. ANANSE network links enhancers to target genes, and combined these values to return a TF-target gene link. You could try to puzzle out which enhancers were involved, but I can't vouch for the reliability of the results. ANANSE's core strength is the influence output: identifying key transcription factor differences between two conditions.

@socialtree-yt
Copy link
Author

Thank you for your help! So if I use "ananse network --tfs tfs_of_interest.txt --regions regions_of_interest.bed" with 1 TF and 1 region.bed, I can depict exact TF-enhancer-genes triple in the total network. The triple is as same as it in the total network using "ananse network --tfs all_tfs.txt --regions all_regions.bed". Am I right?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants