Measuring and Explaining the Inter-Cluster Reliability of Multidimensional Projections

Published in

SNU AIIS Blog

9 min readApr 3, 2022

by Sue Hyun Park

Multidimensional, or high-dimensional, data carries in-depth information of complex systems. An impressive example is single-cell RNA sequencing (scRNA-seq) data. This contains thousands of attributes to explain a single cell’s phenotype, since the number of cell-type-specific gene expression patterns in any tissue or cell type ranges from 3,000 to 5,000.

To visualize and interpret multidimensional data of the like, a widely used way is to reduce the dimensionality of data first. Then, we can thoroughly check the projection in a lower-dimensional space. Such conversion technique is called multidimensional projection (MDP).

Inter-cluster tasks have been regarded as the core tasks for using MDP. These tasks investigate meaningful inter-cluster structures of the dataset through projections, such as how cell clusters are located and related based on patterns of gene expression.

Left: Identifying clusters with discrete cell types, Right: seeking the relationship between clusters using an MDP technique called t-SNE (Source: Yan Wu & Kun Zhang)

Unfortunately, distortions inherently occur when reducing dimensionality. In the projected space, originally nearby clusters can be separated (called stretching) or originally distinct clusters can gather together (called compression). These distortions can make meaningful structures in projections less trustworthy, thus disturbing users’ comprehension of original data.

Distortions projecting E to F: compression (a) and stretching (b) (Source: Aupetit)

How much can we trust the clusters revealed by MDP?
Which MDP technique should we choose?

Researchers using multidimensional data in their work must be aware of inter-cluster reliability, specifically how well the low-dimensional projection preserves the inter-cluster structures in the original high-dimensional space.

In a paper published in IEEE TVCG, we introduce two novel metrics that quantitatively measure inter-cluster reliability: Steadiness and Cohesiveness. Recalling the two types of distortions mentioned above, Steadiness evaluates compression while Cohesiveness evaluates stretching. A complementary tool we propose is a reliability map that visually explains inter-cluster reliability quantified by Steadiness and Cohesiveness within projections. Starting off from our design considerations, we will explore how our metrics can precisely capture distortions and prevent users’ misinterpretations.

Design considerations

Measuring inter-cluster reliability is challenging. Through a survey of 26 papers concerning inter-cluster tasks, we first narrowed down inter-cluster tasks into three types. However, previous local metrics like Trustworthiness and Continuity (T&C) cannot correctly quantify the potential performance of each task, and can fail it.

Our metrics should adequately quantify how accurately each inter-cluster task can be performed, and thus be able to measure inter-cluster reliability precisely. We formulated the following three design considerations, which outline the capacity of our metrics:

(C1) Capture the inter-cluster structure in detail in order to precisely identify clusters or seek relationships between them**.** The inter-cluster structure in MDP is complex, intertwined, and often has no ground truth. Each cluster’s characteristics, like shape, density, or size, vary widely as well.
(C2) Consider stretching and compression individually so as to accurately estimate clusters’ features and their similarities. The clusters’ size, density, or their distance between can be overestimated due to stretching, or can be underestimated by compression.
(C3) Measure how accurately the clusters identified in the projection reflect their original density and size, to quantify misconceptions when comparing clusters between spaces. The reason is that projected clusters’ size and density may not reflect those in the original space.

Steadiness and Cohesiveness (S&C)

We will show how we developed our inter-cluster reliability metrics in accord with the three design considerations.

Defining inter-cluster distortion types

To achieve (C2), we defined a specific scope of problem that previous metrics were limited to assess — distortions involving multiple clusters, not confined to local structures (i.e., data points or a single cluster):

False Groups distortion denotes the cases when a low-dimensional group (cluster) consists of separate groups in the original space; compression has occurred.
Missing Groups distortion denotes the cases when a high-dimensional group misses its subgroups and is divided into multiple separated subgroups in the projected space; stretching has occurred.

Illustration of the concept of False Groups (left) and Missing Groups (right)

Followed by the two inter-cluster distortion type, we developed our inter-cluster reliability metrics:

Steadiness measures how much the clusters in the projected space are in a steady state that reflects the actual clusters in the original space, hence how well projections avoid False Groups.
Cohesiveness measures how much the clusters in the original space stands close together cohesively in the projected space, hence how well projections avoid Missing Groups.

Steadiness evaluates how well separated clusters in the original space are still separated in the projected space. Cohesiveness evaluates how well each cluster in the original space is not dispersed in the projected space.

Computing Steadiness and Cohesiveness

The inter-cluster reliability computation routine is composed of three steps:

(Step 1) Construct dissimilarity matrices.

Obtain dissimilarity matrices D+ and D− from corresponding points pairs of the original and projected space. We employ a Shared-Nearest-Neighbor(SNN)-based distance function.

(Step 2) Iteratively compute partial distortions for Steadiness or Cohesiveness

Extract random clusters in the projected space (Steadiness) or the original space (Cohesiveness).
Reveal the extracted cluster’s dispersion in the opposite space. Given input of points of the extracted cluster, a HDBSCAN-based clustering function generates their separated clusters {C_1, C_2, …, C_n} in the opposite space.
Compute distortions between the dispersed groups. We generalized the existing point-stretching and point-compression metrics to the cluster-level. Plugging in D+ or D− and the distance between a pair of clusters (C_i,C_j) via average linkage, we get distortion m_ij. Weight w_ij, determined as |C_i|⋅|C_j|, is also generated to penalize the distortion of larger clusters.

(Step 3) Aggregate partial distortions for Steadiness or Cohesiveness

Aggregate iteratively computed partial distortions with their corresponding weights. The weighted average is subtracted from 1 to assign lower scores to lower-quality projections.

How does the process adhere to our design considerations? After Step 1, the workflow splits into Steadiness and Cohesiveness to independently deal with compression and stretching (C2). At Step 2, we quantify how well the original density and size of clusters are retained by calculating partial distortions (C3). Moreover, through sufficient number of iterations and random selection of a seed point, Step 2 makes it possible to examine the complex, intertwined inter-cluster structure (C1).

Repeated extraction of random clusters allows precise measurement of inter-cluster reliability.

Code for the approach can be found here.

Visualizing Steadiness and Cohesiveness

Our reliability map provides a complementary visualization of our quantitative metrics, revealing how and where inter-cluster distortion occurred. Each edge encode distortions between data points, with False Groups assigned purple and Missing Groups assigned green. The map also offers a cluster selection interaction to expose Missing Groups distortions more precisely. In the animation below, the user makes a lasso with mouse clicks, and the map illuminates points that actually belong to the same cluster with the initially selected points.

Left: Class identity derived from LLE projection of MNIST test dataset. Center: Cluster selection interaction using the reliability map of the LLE projection. Right: Two-dimensional color scale adopted from CheckViz, and colors representing a data class.

Code for the reliability map can be found here.

Quantitatively evaluating the metrics’ effectiveness

We conducted four experiments to check whether Steadiness and Cohesiveness can sensitively measure inter-cluster reliability. Here is a layout of high-dimensional datasets and their projections, the latter of which are intentionally distorted synthetic projections (Experiment A, B, C) or UMAP projections that vary by hyperparameter values (Experiment D). These tasks test metrics to accurately quantify the loss of inter-cluster reliability caused by False Groups distortion (Experiment A), increment of inter-cluster reliability due to reduction of Missing Groups distortion (Experiment B), general projection quality degradation (Experiment C), or general quality improvement (Experiment D).

The datasets and their projections used in Experiment A-D. A) increasing overlap to enlarge False Groups distoriton B) increasing overlap to minimize Missing Groups distortion C) increasing replacement rate D) increasing “nearest neighbors” hyperparameter value defined fro UMAP projections

The intended inter-cluster reliability differences are manifest in Steadiness and Cohesiveness score changes over a sequence of projections. Unlike local metrics represented by Mean Relative Rank Errors (MRREs) and Trustworthiness and Continuity (T&C) which failed for cases with apparent distortions, we observe that our metrics can properly measure inter-cluster reliability.

The scores measured by our metrics (Steadiness and Cohesiveness) and baseline local distortion metrics (MRREs, T&C). In Experiment B, S&C were able to capture the critical point of difference (blue dashed box).

It is worth noting that the reliability map also serves as an intuitive and efficient means for visualizing inter-cluster reliability. The reliability map accurately highlights False Groups and Missing Groups distortion better than CheckViz, which visualizes distortions computed by Trustworthiness and Continuity.

The reliability map and CheckViz visualizing the distortion of the projections from Experiment A and B.

How can we make use of S&C?

From case studies starring two ML engineers, we discovered that our metrics and the reliability map supports users in gathering a better understanding of inter-cluster structures. One aspect is that users can select adequate projection techniques or hyperparameter settings that match the dataset. Higher Steadiness and Cohesiveness scores imply the optimal setting for the dataset, and the result can further be visually justified by the reliability map as it exposes lesser degree of distortion (False Group distortion in particular, as marked in the circles below).

The reliability maps, each of which visualize the inter-cluster distortion present in UMAP, Isomap, and LLE projections made for MNIST dataset. Steadiness (St) and Cohesiveness (Co) scores are specified under the name of each technique. UMAP is the most suitable projection for the dataset.

When conducting inter-cluster tasks afterwards, our metrics prevent users from misinterpreting inter-cluster structures. The bottom example depicts clusters projected with different hyperparameter values, σ = 100 and σ = 1,000. σ = 100 projection has more well-divided clusters, and a common perception is that such compactness better reflects the original inter-cluster structure. However, as the σ = 1,000 projection increases Cohesiveness, it is more reliable to interpret the original inter-cluster structure as a set of subclusters that constitute one large cluster rather than a set of separated clusters.

The reliability maps that visualize the inter-cluster distortion of t-SNE projections (\sigma = [100, 1000]) made for Fashion-MNIST dataset.

Moving forward

The results are exciting because Steadiness and Cohesiveness, coupled with the reliability map, are the first metrics to date that can directly measure inter-cluster distortions in MDP analysis. Many new multidimensional projection techniques are released every year, and we hope our metrics play a significant role in evaluating inter-cluster reliability of the algorithms to its fullest.

A common ground on MDP research is that clusters in projections do not truly represent the original data, due to inherent distortions. So far, we have explained our approach to guide users on how to construct the right setup and properly interpret the projections. What if we could actively correct distortions during cluster analysis? In a follow-up research, we enhance brushing, a typical interaction in visual analytics, by resolving distortions around the brushed points. The system dynamically relocates points so that the user-generated visual clusters always match the actual multidimensional clusters. These point to a bright future for overcoming drawbacks in MDP.

Here is our last remark:

All the mechanisms discussed revolve around the purpose of assisting users in visual data analysis.

We will continue our work to make MDP more explainable, ultimately to ease the human part of deriving safe insights from high-dimensional data.

Acknowledgement

This blog post is based on the following paper:

Jeon, Hyeon, et al. “Measuring and explaining the inter-cluster reliability of multidimensional projections.” IEEE Transactions on Visualization and Computer Graphics 28.1 (2021): 551–561. (paper, GitHub, video)
Jeon, Hyeon, et al. “Distortion-Aware Brushing for Interactive Cluster Analysis in Multidimensional Projections.” arXiv preprint arXiv:2201.06379 (2022). (paper under review)

Thanks to Hyeon Jeon for helpful comments on this blog post.

This post was originally posted on our Notion blog, at March 11, 2022.