Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cell Hashing on Multiome samples #174

Open
bassanio opened this issue Jan 27, 2023 · 11 comments
Open

Cell Hashing on Multiome samples #174

bassanio opened this issue Jan 27, 2023 · 11 comments

Comments

@bassanio
Copy link

Hi,

I am very much new to the Hashing method. I have got a 10x output using cellranger-arc (has both RNASEQ and ATCseq). I was told the samples are multiplexed using Biolegend hashing Ab and I have been provided with the Ab sequences.

  1. How can I use the provided Ab sequences to demultiplex the output of cellranger-arc.
@Hoohm
Copy link
Owner

Hoohm commented Jan 30, 2023

Hello @bassanio
did you already go through the documentation? If so, could you maybe tell me specifically what you need help with?

@mbassalbioinformatics
Copy link

Hi
I guess to the same spirit as the previous question.

So in the documentation you outline the structure of R1 with the UMI position and then R2 with the Ab barcode data. You also mention how you provide the tag.csv file which will take the input fq files, and generate counts based on the Ab barcodes provided in the csv. That part makes sense.

Now my question is, where does the barcode info for the HTO come into play? Where do you specify those and where does cite-seq-count deal with that? Do i need to run cite-seq-count twice, once for the Ab barcodes and then a 2nd time for the hto? Or do I make a single csv file with the hto and Ab sequences and let cite-seq-count loose on all of it in 1 go?

(I have 1 file of the format [say hto.csv]...

XXXXXX,hashtag1
YYYYYY,hashtag2

... and a 2nd file of format [say abs.csv]...

AAAAAA,Ab1
BBBBBB,Ab2

are you able to provide pseudo-code/commands as to how to run cite-seq-count for each of hto.csv and abs.csv to get the desired counts required for progressing...?)

The 2nd question, assuming now that we deal with the hto/Ab situation. The next step would require loading this information into Seurat for integration, is that correct?

@Hoohm
Copy link
Owner

Hoohm commented Feb 2, 2023

So depending on how your libraries habve been sequenced, you ocan run everythint together.
You should have fastqs for ABs and fastqs for HTO.

Does cellranger give you the output you need for the ABs?

If so, you only need to run CSC on the HTO.

You can make a tsg.csv with all your HTO tags and all your AB tags, CSC will try and match all of those on the fastqs you provide.

Pseudo code is very simple.

  1. Take a read from R2, try and match any of the tags provided in the tags.csv from the start of the read (or from the first base given by the -start-trim), if not found, flag as unmapped.
  2. Do some cell aggregation
  3. UMI aggregation
  4. Produce read and umi count matrices

Yes, you need then to load up the results into Seurat to do the demultiplexing.

@mbassalbioinformatics
Copy link

mbassalbioinformatics commented Feb 2, 2023

I have fq for the ab's and for the hto's seperate to the expression data (ie the fq have been split into the different samples, and each sample has its corresponding ab + hto fq files)

So if i understand you correctly i need to run cellranger on the ab+hto fq separately to get the counts matrix for those, right? and a 2nd run of cellranger on the expression fq files for those counts?

After which i just run CSC on the ab+hto-fq's with

CITE-seq-Count -R1 ab-HTO_R1.fastq.gz -R2 ab-HTO_R2.fastq.gz \
-t TAG_LIST_HTO-Ab.csv -cbf 1 -cbl 16 -umif 17 -umil 26 -cells 20000 -o ./out/

did i understand you correctly?

and from there into R for the rest 👍

@Hoohm
Copy link
Owner

Hoohm commented Feb 4, 2023 via email

@bassanio
Copy link
Author

Hi ,

I have tried to run the citeseq using the below command and I have got the following error.

I am also confused with R2 and R3 because for me I am finding the ABs in the R3 and not in R2.

CITE-seq-Count  \
-R1 hto_S3_L001_R1_001.fastq.gz\
 -R2 hto_S3_L001_R3_001.fastq.gz \
 -t TAGS.txt \
-cbf 1 -cbl 16 -umif 17 -umil 26 -cells 13641 \
-o RESULT

Tag File

ACCCACCAGTAAGAC,First_P1_Undivided
GGTCGAGAGCATTCA,Second_P2_late_dividers
CTTGCCGCATGTCAT,Third_P3_Early_dividers

** Executing the above command with Warning and issue error**

Read1 length is 51bp but you are using 26bp for Cell and UMI barcodes combined.
This might lead to wrong cell attribution and skewed umi counts.

Counting number of reads
Started mapping
Processing 10,651,191 read
CITE-seq-Count is running with XX cores.
Mapping done for process 2006672. Processed 166,424 reads
Mapping done for process 2006674. Processed 166,424 reads
Mapping done for .......
Mapping done for process 2006731. Processed 166,424 reads
Mapping done
Merging results
Correcting cell barcodes
Looking for a whitelist

Collapsing cell barcodes
Correcting umis
Traceback (most recent call last):
  File "/home/.local/bin/CITE-seq-Count", line 8, in <module>
    sys.exit(main())
  File "/home/.local/lib/python3.9/site-packages/cite_seq_count/__main__.py", line 435, in main
    ) = processing.correct_umis(
  File "/home/.local/lib/python3.9/site-packages/cite_seq_count/processing.py", line 229, in correct_umis
    for TAG in final_results[cell_barcode]:
RuntimeError: dictionary keys changed during iteration

HTO R1 :
Screen Shot 2023-05-16 at 11 23 17 AM

HTO R2 :
Screen Shot 2023-05-16 at 11 23 39 AM

HTO R3 :
Screen Shot 2023-05-16 at 11 24 09 AM

grep AB TAG in R3 :

Screen Shot 2023-05-16 at 11 26 28 AM

Some AB barcodes does not start correctly as shown in the example

@cpflueger2016
Copy link

cpflueger2016 commented May 16, 2023

@bassanio try to setup a conda environment with python version 3.7.16 and run it again. I have had no luck with any python version > 3.7. The error is actually an issue with changes in the pandas package. If you restrict python to 3.7.16, pip install CITE-seq-Count==1.4.5 will pull the correct pandas package version. good luck!

@bassanio
Copy link
Author

@cpflueger2016 : Thanks for the information I will do the same.

Can you also help me in understanding in R2 and R3 fastq files

@cpflueger2016
Copy link

Yea, so if you get the index read from the i7 index parsed out (there is an option in bcl2fastq), your read2 is actually the index of the library and read3 is truly the second read.

@bassanio
Copy link
Author

@cpflueger2016 : I have this warning message in the top

Read1 length is 51bp but you are using 26bp for Cell and UMI barcodes combined"

Should I change the umil to 51 ? do this has some affect on the analysis

@Hoohm
Copy link
Owner

Hoohm commented Jul 22, 2023

This is not going to affect the analysis. Back in the day I wanted to make sure people knew what they were running and catch potential wrong lengths.
In hindsight this might have been a mistake as it confuses users more than anything.

Is your general issue resolved, can I close this one?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants