TFcomb.preprocessing module¶
- TFcomb.preprocessing.oracle_preprocess(oracle, k=10)[source]¶
Preprocesses the Oracle object by performing PCA and KNN imputation.
This function computes the optimal number of principal components based on the explained variance ratio, adjusts the number of neighbors (k) for KNN imputation, and applies the preprocessing steps to the Oracle object.
- Parameters:
oracle (co.Oracle) – The Oracle object containing the data to preprocess.
k (int, optional) – The number of neighbors for KNN imputation. Defaults to 10.
- Returns:
The Oracle object after PCA and KNN imputation.
- Return type:
co.Oracle
- TFcomb.preprocessing.pca_umap_train(adata, cluster_column_name=None, embedding_name=None, n_components=50, svd_solver='arpack', random_seed=2022)[source]¶
Trains PCA and UMAP models on Oracle-normalized data.
This function processes the input AnnData object with Oracle’s normalization method, performs PCA for dimensionality reduction, and fits a UMAP model for visualization.
- Parameters:
adata (anndata.AnnData) – The AnnData object containing the dataset to be processed.
cluster_column_name (str, optional) – The name of the column in adata.obs containing cluster labels. Used during Oracle normalization. Defaults to None.
embedding_name (str, optional) – The name of the embedding field to be added during Oracle normalization. Defaults to None.
n_components (int, optional) – The number of principal components to retain in PCA. Defaults to 50.
svd_solver (str, optional) – The SVD solver to use for PCA. Options include ‘auto’, ‘full’, ‘arpack’, and ‘randomized’. Defaults to ‘arpack’.
random_seed (int, optional) – The random seed for reproducibility. Defaults to 2022.
- Returns:
- A tuple containing:
sklearn.decomposition.PCA: The trained PCA model.
umap.UMAP: The trained UMAP model.
co.Oracle: The Oracle object with the normalized data.
- Return type:
tuple
- TFcomb.preprocessing.pca_umap_vis(pca_train=None, umap_train=None, exp_mtx=None, label=None, title=None, bbox=1, figsize=(12, 5), save=None)[source]¶
Visualizes UMAP projections of PCA-transformed data.
This function applies a trained PCA model to the input expression matrix, projects it onto the UMAP embedding, and visualizes the results in a scatter plot.
- Parameters:
pca_train (sklearn.decomposition.PCA, optional) – Trained PCA model. Defaults to None.
umap_train (umap.UMAP, optional) – Trained UMAP model. Defaults to None.
exp_mtx (numpy.ndarray or scipy.sparse.csr_matrix, optional) – Input expression matrix to be projected. If sparse, it will be converted to a dense array. Defaults to None.
label (array-like, optional) – Labels for the data points. Used to color the scatter plot. Defaults to None.
title (str, optional) – Title for the plot. Defaults to None.
bbox (float, optional) – Positioning factor for the legend box. Defaults to 1.
figsize (tuple, optional) – Size of the figure in inches (width, height). Defaults to (12, 5).
save (str, optional) – Path to save the plot as an image file. If None, the plot is not saved. Defaults to None.