Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Csparse conversion to assays #22

Merged
merged 3 commits into from
Jan 24, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 4 additions & 3 deletions DESCRIPTION
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
Package: MouseGastrulationData
Title: Single-Cell -omics Data across Mouse Gastrulation and Early Organogenesis
Version: 1.17.0
Version: 1.17.1
Authors@R: c(
person("Jonathan", "Griffiths", email = "jonathan.griffiths.94@gmail.com", role = c("aut", "cre")),
person("Aaron", "Lun", email = "infinite.monkeys.with.keyboards@gmail.com", role = "aut"))
Expand All @@ -22,7 +22,8 @@ Imports:
Suggests:
BiocStyle,
knitr,
rmarkdown
rmarkdown,
testthat
VignetteBuilder:
knitr
License: GPL-3
Expand All @@ -31,4 +32,4 @@ Encoding: UTF-8
biocViews: ExperimentData, ExpressionData, SequencingData, RNASeqData, SingleCellData, ExperimentHub, Mus_musculus_Data
URL: https://github.com/MarioniLab/MouseGastrulationData
BugReports: https://github.com/MarioniLab/MouseGastrulationData/issues
RoxygenNote: 7.2.1
RoxygenNote: 7.3.0
6 changes: 4 additions & 2 deletions R/BPSATACData.R
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,8 @@
#'
#' @param type String specifying the type of data to obtain, see Details.
#' Default behaviour is to return processed data.
#' @param Csparse.assays Logical indicating whether to convert assay matrices into the column major format that is more performant with contemporary software packages.
#' Default behaviour is to perform the conversion.
#'
#' @return
#' If \code{type="processed"}, a \linkS4class{SingleCellExperiment} is returned containing the processed data.
Expand Down Expand Up @@ -81,8 +83,8 @@
#' @importFrom BiocGenerics sizeFactors
#' @importClassesFrom S4Vectors DataFrame
#' @importFrom methods as
BPSATACData <- function(type=c("processed", "raw")) {
BPSATACData <- function(type=c("processed", "raw"), Csparse.assays=TRUE) {
type <- match.arg(type)
versions <- list(base="1.6.0")
.getRNAseqData("BPS_atac", type, versions, samples=1, sample.options=as.character(1), sample.err="1")
.getRNAseqData("BPS_atac", type, versions, samples=1, sample.options=as.character(1), sample.err="1", makeCsparse=Csparse.assays)
}
6 changes: 4 additions & 2 deletions R/EmbryoAtlasData.R
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,8 @@
#' @param samples Integer or character vector specifying the samples for which data (processed or raw) should be obtained.
#' If \code{NULL} (default), data are returned for all (36) samples.
#' @param get.spliced Logical indicating whether to also download the spliced/unspliced/ambiguously spliced count matrices.
#' @param Csparse.assays Logical indicating whether to convert assay matrices into the column major format that is more performant with contemporary software packages.
#' Default behaviour is to perform the conversion.
#'
#' @return
#' If \code{type="processed"}, a \linkS4class{SingleCellExperiment} is returned containing processed data from selected samples.
Expand Down Expand Up @@ -76,7 +78,7 @@
#' @importFrom BiocGenerics sizeFactors
#' @importClassesFrom S4Vectors DataFrame
#' @importFrom methods as
EmbryoAtlasData <- function(type=c("processed", "raw"), samples=NULL, get.spliced=FALSE) {
EmbryoAtlasData <- function(type=c("processed", "raw"), samples=NULL, get.spliced=FALSE, Csparse.assays=TRUE) {
type <- match.arg(type)
versions <- list(base="1.0.0")
extra_a <- NULL
Expand All @@ -93,5 +95,5 @@ EmbryoAtlasData <- function(type=c("processed", "raw"), samples=NULL, get.splice
"counts-unspliced"="1.4.0",
"counts-ambig"="1.4.0"))
}
.getRNAseqData("atlas", type, versions, samples, sample.options=as.character(c(1:10, 12:37)), sample.err="1:10 or 12:37", extra_assays = extra_a)
.getRNAseqData("atlas", type, versions, samples, sample.options=as.character(c(1:10, 12:37)), sample.err="1:10 or 12:37", extra_assays = extra_a, makeCsparse=Csparse.assays)
}
6 changes: 4 additions & 2 deletions R/TChimeraData.R
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,8 @@
#' Default behaviour is to return processed data.
#' @param samples Integer or character vector specifying the samples for which data (processed or raw) should be obtained.
#' If \code{NULL} (default), data are returned for all QC-passing (fourteen) samples.
#' @param Csparse.assays Logical indicating whether to convert assay matrices into the column major format that is more performant with contemporary software packages.
#' Default behaviour is to perform the conversion.
#'
#' @return
#' If \code{type="processed"}, a \linkS4class{SingleCellExperiment} is returned containing processed data from selected samples
Expand Down Expand Up @@ -86,10 +88,10 @@
#' @importFrom BiocGenerics sizeFactors
#' @importClassesFrom S4Vectors DataFrame
#' @importFrom methods as
TChimeraData <- function(type=c("processed", "raw"), samples=c(1:2, 5:16)) {
TChimeraData <- function(type=c("processed", "raw"), samples=c(1:2, 5:16), Csparse.assays=TRUE) {
if(any(3:4 %in% samples))
warning("You are downloading the QC-fail samples 3 and/or 4.")
type <- match.arg(type)
versions <- list(base="1.4.0")
.getRNAseqData("t-chimera", type, versions, samples, sample.options=as.character(seq_len(16)), sample.err="1:16")
.getRNAseqData("t-chimera", type, versions, samples, sample.options=as.character(seq_len(16)), sample.err="1:16", makeCsparse=Csparse.assays)
}
6 changes: 4 additions & 2 deletions R/Tal1ChimeraData.R
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,8 @@
#' Default behaviour is to return processed data.
#' @param samples Integer or character vector specifying the samples for which data (processed or raw) should be obtained.
#' If \code{NULL} (default), data are returned for all (four) samples.
#' @param Csparse.assays Logical indicating whether to convert assay matrices into the column major format that is more performant with contemporary software packages.
#' Default behaviour is to perform the conversion.
#'
#' @return
#' If \code{type="processed"}, a \linkS4class{SingleCellExperiment} is returned containing processed data from selected samples.
Expand Down Expand Up @@ -67,8 +69,8 @@
#' @importFrom BiocGenerics sizeFactors
#' @importClassesFrom S4Vectors DataFrame
#' @importFrom methods as
Tal1ChimeraData <- function(type=c("processed", "raw"), samples=NULL) {
Tal1ChimeraData <- function(type=c("processed", "raw"), samples=NULL, Csparse.assays=TRUE) {
type <- match.arg(type)
versions <- list(base="1.0.0")
.getRNAseqData("tal1-chimera", type, versions, samples, sample.options=as.character(seq_len(4)), sample.err="1:4")
.getRNAseqData("tal1-chimera", type, versions, samples, sample.options=as.character(seq_len(4)), sample.err="1:4", makeCsparse=Csparse.assays)
}
6 changes: 4 additions & 2 deletions R/WTChimeraData.R
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,8 @@
#' Default behaviour is to return processed data.
#' @param samples Integer or character vector specifying the samples for which data (processed or raw) should be obtained.
#' If \code{NULL} (default), data are returned for all (ten) samples.
#' @param Csparse.assays Logical indicating whether to convert assay matrices into the column major format that is more performant with contemporary software packages.
#' Default behaviour is to perform the conversion.
#'
#' @return
#' If \code{type="processed"}, a \linkS4class{SingleCellExperiment} is returned containing processed data from selected samples
Expand Down Expand Up @@ -76,8 +78,8 @@
#' @importFrom BiocGenerics sizeFactors
#' @importClassesFrom S4Vectors DataFrame
#' @importFrom methods as
WTChimeraData <- function(type=c("processed", "raw"), samples=NULL) {
WTChimeraData <- function(type=c("processed", "raw"), samples=NULL, Csparse.assays=TRUE) {
type <- match.arg(type)
versions <- list(base="1.0.0")
.getRNAseqData("wt-chimera", type, versions, samples, sample.options=as.character(seq_len(10)), sample.err="1:10")
.getRNAseqData("wt-chimera", type, versions, samples, sample.options=as.character(seq_len(10)), sample.err="1:10", makeCsparse=Csparse.assays)
}
26 changes: 21 additions & 5 deletions R/getData.R
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,8 @@
names,
object.type=c("SingleCellExperiment", "SpatialExperiment"),
return.list=FALSE,
ensemblise=TRUE
ensemblise=TRUE,
makeCsparse=FALSE
){
object.type <- match.arg(object.type)
hub <- ExperimentHub()
Expand All @@ -45,7 +46,8 @@

if(return.list){
out <- lapply(samples, function(x){ .getData(dataset, version, x,
sample.options, sample.err, names, object.type, return.list=FALSE)})
sample.options, sample.err, names, object.type, return.list=FALSE,
ensemblise=ensemblise, makeCsparse=makeCsparse)})
names(out) <- samples
return(out)
}
Expand Down Expand Up @@ -119,13 +121,16 @@
if("cell" %in% names(colData(sce))){
colnames(sce) <- colData(sce)$cell
}
if(makeCsparse){
sce <- .makeCsparse(sce)
}
return(sce)
}

####
# Simpler interfaces for specific data types
####
.getRNAseqData <- function(dataset, type, version, samples, sample.options, sample.err, extra_assays=NULL, ens_rownames=TRUE){
.getRNAseqData <- function(dataset, type, version, samples, sample.options, sample.err, extra_assays=NULL, ens_rownames=TRUE, makeCsparse=FALSE){
if(type == "processed"){ return(
.getData(
dataset,
Expand All @@ -141,7 +146,8 @@
dimred="reduced-dims"
),
object.type="SingleCellExperiment",
ensemblise=ens_rownames
ensemblise=ens_rownames,
makeCsparse=makeCsparse
))
} else if (type == "raw"){ return(
.getData(
Expand All @@ -156,7 +162,8 @@
),
object.type="SingleCellExperiment",
return.list=TRUE,
ensemblise=ens_rownames
ensemblise=ens_rownames,
makeCsparse=makeCsparse
))
}
}
Expand Down Expand Up @@ -210,3 +217,12 @@
opt
}
}

.makeCsparse <- function(sce){
for(an in assayNames(sce)){
if(is(assay(sce, an), "TsparseMatrix")){
assay(sce, an) <- as(assay(sce, an), "CsparseMatrix")
}
}
return(sce)
}
5 changes: 4 additions & 1 deletion man/BPSATACData.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

6 changes: 5 additions & 1 deletion man/EmbryoAtlasData.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

9 changes: 8 additions & 1 deletion man/TChimeraData.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

9 changes: 8 additions & 1 deletion man/Tal1ChimeraData.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

9 changes: 8 additions & 1 deletion man/WTChimeraData.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

3 changes: 3 additions & 0 deletions tests/testthat.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
library(testthat)
library(MouseGastrulationData)
test_check("MouseGastrulationData")
10 changes: 10 additions & 0 deletions tests/testthat/test-Csparse.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
# This tests the conversion from triplet to column major matrix styles.
# library(testthat); library(MouseGastrulationData); source("test-Csparse.R")

test_that("EmbryoAtlasData function for sample 1, with and without csparse conversion, gives equal counts assay", {
data_without_csparse <- EmbryoAtlasData(samples = 1, Csparse.assays = FALSE)
data_with_csparse <- EmbryoAtlasData(samples = 1, Csparse.assays = TRUE)

expect_equal(assay(data_without_csparse, "counts"),
as(assay(data_with_csparse, "counts"), "TsparseMatrix"))
})
Loading