Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can it be used for mobile of LncRNA #1

Open
zhangwenda0518 opened this issue Aug 9, 2024 · 18 comments
Open

Can it be used for mobile of LncRNA #1

zhangwenda0518 opened this issue Aug 9, 2024 · 18 comments
Assignees

Comments

@zhangwenda0518
Copy link

I am studying a grafting species and I want to further explore the mobile of LncRNA between rootstock and scion.
Can I use it to analyze lncRNA ? How to use it .
Thank you!

@zhangwenda0518
Copy link
Author

is there any attention?

@KJeynesCupper
Copy link
Owner

KJeynesCupper commented Oct 4, 2024

@zhangwenda0518 Sorry for the delayed response, yes theoretically this should work for lncRNAseq in grafting experiments. Feel free to send me an email if you would like to discuss further :)

Best,
Katie

@KJeynesCupper KJeynesCupper self-assigned this Oct 4, 2024
@zhangwenda0518
Copy link
Author

zhangwenda0518 commented Oct 4, 2024

Thank you for your reply. I am currently testing and have a few questions that I would like to consult further

  1. Tutorial on GitHub, such as
fasta_1<- system. file ("extdata", "reduced_chr12-Eggplant. fa"),
package="mobileRNA")

It should be:

fasta_1 <- system.file("extdata","reduced_chr12_Eggplant.fa.gz", 
package="mobileRNA")

The package provides compressed format files

  1. I used conda to install ShortStack and encountered an error. After investigation, it was found that the command name for installing conda was ShortStack, while the software call by mobileRNA seemed to be shortstack

shortstack_exists <- function(){

image

  1. in the third step #3. Import pre-processed data into R
sRNA_data <- RNAimport(input = "sRNA",
directory = results_dir,
samples = sample_names)

Should be added, analysisType="core", running it directly will result in an error.
image
image

Afterwards, I successfully ran with sample file

But when I was running on my data, I got stuck in the first step

image

how to motify the name of genome

@zhangwenda0518
Copy link
Author

for the progress of RNAmergeGenomes
I find the coding maybe call the genome name errors!
if(!is.null(names(ref1))){ ref1 <- unname(ref1) } if(!is.null(names(ref2))){ ref2 <- unname(ref2) }
image

image

@KJeynesCupper
Copy link
Owner

KJeynesCupper commented Oct 4, 2024

Hi @zhangwenda0518

To answer you concerns, thank you for spotting the errors in the README file, this will be correct.

  1. README inaccuracy regarding FASTA files has been corrected, same with the merging.
  2. It looks like you may have installed the older version of ShortStack? mobileRNA replies on ShortStack version >= 4.0. Also, to my knowledge ShortStack utilisation is not case sensitive (at least on my mac!).
  3. Going forward, I recommend utilizing the full vignette that has a HTML version on BioConductor. In the vignette, there is the bash script at the bottom, to run the analysis directly on the command-line. This will probably be more beneficial to you going forward with your analysis. I will be releasing the command line code as an a separate package on my git soon.

To install these changes, please re-install mobileRNA.

In terms of its utilisation for lncRNAseq, you may want to change your approach to choose more appropriate alignment and clustering tools to suit lncRNAseq and implement the merged genome.

Hopefully these address all your concerns :)

@zhangwenda0518
Copy link
Author

Thank you very much!
for the ShortStack ,I am sorry that after I check the version ,it is 4.10, maybe, I use linux.
image

Another problem is that when I run the analysis of mRNA, I encountered an error in the step of mapRNA, which seems to be a command line problem of hisat2, but I failed to solve the error problem of HTSeq.
mapRNA(input = "sRNA", input_files_dir = "rna-data/2ck", output_dir = "sRNA-output_dir", genomefile = output_assembly_file, condaenv = "/home/zhangwenda/mambaforge/envs/r-4.4", mmap = "n", threads= 60)

image

My final running result, Results.txt, is empty.
image

Can you help me to see what is going on? I am stuck in this step long time. Looking forward to your reply

@zhangwenda0518
Copy link
Author

the coding hisat2 is mistake in L208 and L898

image

@zhangwenda0518
Copy link
Author

I know why the Results.txt, is empty .
When I run it manually, the error prompt: My gff file is missing the NAME tag.

Another question is why the uniqueReads file was not generated after hisat2

image

@zhangwenda0518
Copy link
Author

I'm sorry, I hope you don't mind, I've made another mistake,
After I ran the mapRNA step manually, Because my gff file is missing the Name, I changed it to --idattr=ID. I don't know if this is correct.

image

python -m HTSeq.scripts.count --format=bam --order=pos --stranded=no --mode=union --nonunique=none --type=mRNA **--idattr=ID** *_uniqueReads_H.bam merged_assembly.gff3 >Results.txt

Last,I got the Results.txt file, as follows.
image

I continued to use the RNAimport , However, there are some problems. "Locus Chr start end width strand type" column is all NA.
mRNA_data <- RNAimport(input = "mRNA", directory = results_dir,samples = sample_names,annotation= output_annotation_file)

image

@zhangwenda0518
Copy link
Author

I carefully looked at the parameters of RNAimport, and found that there were parameters of idattr, and it was Name by default. I think this is the cause of the error.

mRNA_data <- RNAimport(input = "mRNA", directory = results_dir,samples = sample_names, idattr ="ID", annotation= output_annotation_file)

I successfully got the result after modification, thank you!

@KJeynesCupper
Copy link
Owner

hi @zhangwenda0518

How are you getting on? All sorted now?

Katie

@zhangwenda0518
Copy link
Author

@KJeynesCupper
Thank you, I can now run all the steps successfully.
the more I was wondering if you could explain the last step more clearly, which is the scion and rootstock. There are too many intermediate steps and it's a bit confusing.

For example, the scion is genome-A, gffA, and the rootstock is genome-B, gffB, And will the change in the input order of my intermediate files affect the order of transfer. Should genome-A be kept as a scion or rootstock, and genome_B is scion or rootstock 。

# define control samples
controls <- c("selfgraft_1", "selfgraft_2", "selfgraft_3")

mobile_sRNA <- RNAmobile(input = "sRNA",
data = sRNA_DESeq2, 
controls = controls,
genome.ID = "B",
task = "keep")

So, if we want to identify the genes transferred from the rootstock to the scion, are the controls from the scion? Does the genome.ID come from rootstock (B)?
Corresponding to the genes transferred to the rootstock in the scion? Do the controls still come from the scion?Does the genome.ID come from scion (A)?

I have a bit confusing , Can you further explain or annotate in the flowchart (https://github.com/KJeynesCupper/mobileRNA/blob/main/man/figures/mobileRNA_graphic_1.png).

@KJeynesCupper
Copy link
Owner

Hi @zhangwenda0518

It is routine in plant grafting experiments to utilise a self-graft as your control. So in the instance where you are looking at root-to-shoot movement, you will likely have tissue samples from your shoots (ie. leaf) taken from heterografts and self-grafts. In your control samples you should no contain RNAs that are aligned to the distant genome (ie. the genotype associated with the mobile molecules). Therefore, in the downstream functions after importing your data into R, any RNAs found in the controls which aligned to your distant genome are discarded.

The order of the input files will not affect the directionality you are looking to investigate.

  • You can use either the rootstock or the scion as genome A or B. You just need to know which is which going into the analysis.
  • When you generate the merged genome a prefix is added to the chromosome names for each genome. This means that for GenomeA the prefix "A" is added, and for GenomeB "B" is added, as default. When you import the data into R using RNAimport, and then look at the dataframe you will see that the chromosome names contain these prefixes - which i refer to as the genome.ID in downstream functions.
  • You will need to know what the genome.ID is that is associated with you distant genome.
  • In the example, we offer a root-to-shoot scenario where genome A represents the tissue sample genome and genome B represents the distant genome. Hence, in downstream function, the distant genome is set with the default in functions with the ID "B".
  • For ease, if you are exploring shoot-to-root where you tissue samples are from the root, you can simply generated your merged genomes setting the rootstock genome as GenomeA and the scion genome as GenomeB. This means you can utilise the default settings in downstream functions.

Best,
Katie

@zhangwenda0518
Copy link
Author

Hello, author, I'm sorry for not replying in time these two days.
I have encountered another sample comparison problem, which has been confusing for the past two days.
I read your article. The experiment you designed is to analyze the transcriptome of root part of A-B grafted plant and root part of B-B control plant.
And my transcription group is A-B grafted plants under different treatment conditions (drought and control).
My experimental data is from the same plant, A-B grafted plant (scion -CK) VS A-B grafted plant (rootstock -CK).
image

Does mobileRNA fit my experimental data?

@KJeynesCupper
Copy link
Owner

Hi @zhangwenda0518

So in your experiment, you are have exposed heterografted plants to a stress, and you are comparing scion tissue from the heterograft to root tissue from the heterograft - have I understood this correctly?

What are you aiming to detect or identify in your experiement? Ie. changes in RNAs or mobile molecules

Katie

@zhangwenda0518
Copy link
Author

Yes, I only sequenced the transcriptome of the scions and rootstocks of (Dry) and (CK) heterografted plants, without sequenced the self-grafts plants.
I want to study the types and expression differences of transfer mRNA under control and drought treatment.

Can I use mobileRNA directly!

@KJeynesCupper
Copy link
Owner

Hi @zhangwenda0518 ,

Yes mobileRNA could be used to identify mobile RNA in your heterografts.

To address this further, i refer to the breakdown of how mobileRNA works...
Essentially for each biological replicate, you're aligning your sequencing reads to the merged genome. The alignment tool is then choosing a location within the scion and rootstock genomes (within the merged genome) that is best suited for the sequencing read. As a result, you will generate a count file that for reads allocated to each genome. This is the summary of the pre-processing steps. The downstream analysis following the RNAimport() step, then utilises your two conditions to remove false positives from the data set. This is designed to compare self-grafts to a heterografts - here you expect that no RNAs from the distant genome should be present in the self-grafted replicates. Hence, in functions when it asks to state whether your system is chimeric, it will remove these RNAs from the dataset.

For you, if you are comparing two heterograft conditions without their respective self-graft conditions, and want to locate mobile RNAs (ie, shared or unique to drought or non-drought conditions), mobileRNA will still be an effective method but you may need to include your own code to further analyse the data. This is because both your two conditions are heterografts, so you are expecting to find RNAs from both genotypes (tissue genotype and distant genotype) in all of your sample replicates. As a result, make sure to set "chimeric" to FALSE in necessary functions. Additionally, the function RNAmobile() will not work optimally, and instead only identify the mobile RNAs which are unique to your drought condition (ie. eliminating mobile RNAs which are found in both conditions). This sub-optimal functionality is because you do not have the respective self-graft conditions which would help eliminate additional data noise thats indistinguishable without these controls. Hence, you will most likely have additional data noise within your results which could be eliminated (to a degree) with the use of self-grafted control conditions.

With that said in my opinion, i think it would be highly beneficial to your analysis to include self-grafted controls to help eliminate additional data noise. Please consider that as a practice, grafting has been shown to alter the gene expression in the scion. ie. gene expression changes occur in the scion of self-grafted plants in comparison to the scion of non-grafted plants.

Best, Katie

@KJeynesCupper
Copy link
Owner

Hi @zhangwenda0518

I would also like to recommend trying out the command-line mobileRNA preprocessing package

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants