GhostUMAP2 can be installed in two ways:
- Installation from Git (development version):
git clone https://github.com/jjmmwon/ghostumap2.git
cd ghostumap2
hatch shell- Installation via pip (PyPI release version):
pip install ghostumapfrom ghostumap import GhostUMAP2
from sklearn.datasets import fetch_openml
mnist = fetch_openml("mnist_784")
X, y = mnist["data"], mnist["target"]
mapper = GhostUMAP2()
O, G, active_ghosts = mapper.fit_transform(X, n_ghosts=16)
mapper.visualize(label=y, legend=[str(i) for i in range(10)])def fit_transform(X, n_ghosts, r, ghost_gen, dropping, init_dropping):Fit X into an embedded space with ghosts and return the transformed outputs.
X: array, shape (n_samples, n_features) or (n_samples, n_samples). If the metric is 'precomputed' X must be a square distance matrix. Otherwise, it contains a sample per row.
Ghost Configuration
n_ghosts: The number of ghost points to embed in the embedding space. Default is 16.r: Radius for ghost generation. Default is 0.1.ghost_gen: Ghost generation parameter. Default is 0.2.
Dropping Scheme
dropping: Whether to drop ghosts during optimization. Default is True.init_dropping: Initial dropping parameter. Default is 0.4.
O: array, shape (n_samples, n_components)Embedding of the original data points, identical to the output of UMAP. It represents the transformed coordinates in the low-dimensional space.G: array, shape (n_samples, n_ghosts, n_components)Embedding of ghost points which are clones of the original points. These ghost points are used to evaluate the instability of each data instance.active_ghosts: array, shape (n_samples,)Boolean array indicating the presence of active ghost points for each data instance.
def visualize(title=None, label=None, legend=None):Returns an interactive visualization widget.
title: Title of the visualization.label: Labels for the data points.legend: Legend for the visualization.
widget: An interactive visualization widget.
Get the distances between the original and ghost projections.
def get_distances(sensitivity=1):Parameters
sensitivity: Sensitivity for distance calculation. Default is 1.
Returns
distances: array, shape (n_samples,). The distances between original and ghost embeddings.
Get the boolean array indicating unstable ghost points.
def get_unstable_ghosts(distance=0.1, sensitivity=1):Parameters
distance: Distance threshold for determining instability. Default is 0.1.
sensitivity: Sensitivity for instability calculation. Default is 1.
Returns
unstable_ghosts`: array, shape (n_samples,).