Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scalable doublet finder #103

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

GreenGilad
Copy link

A major limitation of the current DoubletFinder version is it's limited ability to scale to larger datasets. The reason for this is that the current implementation computes the distances matrix (over the PC space) for all cells in dataset, resulting in an O(n^2) space complexity.

To improve space complexity, the distance matrix can be computed only for a subset of batch.size cells at a time, resulting in an O(n*k) space complexity solution. Default value of batch.size is Inf so to not change default behaviour of algorithm.

In addition, I found it beneficial to store for each real cell the ids of artificial nearest neighbours, as well as the real cell identities that were used to generate each artificial cell. Once DoubletFinder is executed over a dataset, this information is useful to interpret the doublet/singlet classification. Both the list of artificial nearest neighbours and the parent idents data frame are stored as a Tool record in the Seurat object.

Lastly, I also called the LogSeuratCommand function in order to store the parameters used to run DoubletFinder

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant