Oversample the reference data — oversample

Oversample the reference data and perform GSEA, topic modelling. In order to avoid class imbalance while training a classifier, we oversampled the training set. In other words, we randomly selected cells from clusters of a particular cell type to form new clusters, reaching the situation that each cell type owns the same number of clusters.

oversample_ref(
  reference_SeuratObj,
  number_clusters = NULL,
  group_by = "cellType",
  cluster_by = "seurat_clusters",
  species = "Homo sapiens",
  by = "GO",
  k = NULL,
  method = "VEM"
)

Arguments

reference_SeuratObj	reference data
number_clusters	goal number of clusters that you need to oversample to
group_by	the column of `reference_SeuratObj@meta.data` that indicates cell type
cluster_by	the column of `reference_SeuratObj@meta.data` that indicates cluster
species	species of the reference data
by	database used to perform GSEA. GO KEGG Reactome MSigDb WikiPathways DO NCG DGN.
k	number of topics.
method	method used for fitting a LDA model; currently "VEM" or "Gibbs" are supported.

Value

a Seurat object with oversampled expression matrix and topic-model result.

Examples

if (FALSE) {
reference_SeuratObj <- oversample_ref(reference_SeuratObj, number_clusters = 10,
group_by = 'cellType', cluster_by = 'seurat_clusters',
species = "Homo sapiens", by = 'GO', k = NULL, method = "VEM")
}