Oversample the reference data and perform GSEA, topic modelling. In order to avoid class imbalance while training a classifier, we oversampled the training set. In other words, we randomly selected cells from clusters of a particular cell type to form new clusters, reaching the situation that each cell type owns the same number of clusters.
oversample_ref( reference_SeuratObj, number_clusters = NULL, group_by = "cellType", cluster_by = "seurat_clusters", species = "Homo sapiens", by = "GO", k = NULL, method = "VEM" )
reference_SeuratObj | reference data |
---|---|
number_clusters | goal number of clusters that you need to oversample to |
group_by | the column of |
cluster_by | the column of |
species | species of the reference data |
by | database used to perform GSEA. GO KEGG Reactome MSigDb WikiPathways DO NCG DGN. |
k | number of topics. |
method | method used for fitting a LDA model; currently "VEM" or "Gibbs" are supported. |
a Seurat object with oversampled expression matrix and topic-model result.
if (FALSE) { reference_SeuratObj <- oversample_ref(reference_SeuratObj, number_clusters = 10, group_by = 'cellType', cluster_by = 'seurat_clusters', species = "Homo sapiens", by = 'GO', k = NULL, method = "VEM") }