Taylor Index

Taylor Index is a metric primarily for evaluating the goodness of clustering algorithms in n-dimensional space.

The Taylor Index is a metric used in the Euclid plugin for evaluating the goodness of a clustering result based on the relative separation of the individual clusters versus their compactness. It is the ration of the distances between cells in a cluster to the distance between clusters.

The Taylor Score is a weighted summation of all Taylor Indices associated with a full set of clustering results.

Background

The Taylor Index provides information concerning the similarity or dissimilarity of two clusters based on their phenotypic expression across a n-dimensional space. The algorithm calculates the similarity of the cells within a cluster using robust standard deviation, the cluster position by n-dimensional mean, and the distance of paired clusters in the n-dimensional space using the distance metrics Euclidean distance. The final score, calculated on a per cluster pair basis is the ratio of intra-cluster distance to inter-cluster distance. The Euclid plugin illustrates the results on a heatmap.

Taylor Index Calculations

Intra-cluster calculations: The Taylor Index first calculates the compactness of each cluster, the intra-cluster distance, by robust standard deviation. The smaller the distance between the cells within a cluster, the more tightly packed they are, the more likely they are to be phenotypically alike, as the numbers used to calculate ‘distance’ are the intensity values of the selected parameters.
Inter-cluster calculations: The inter-cluster distances are also calculated by Euclidean distance in n-dimensional space. The equation for this is:

Taylor Index calculation: The ratio between the Euclidean distance (inter-cluster distance) to the sum of the robust standard deviation (intra-cluster distance) of the clusters across the n-dimensional space, represents the Taylor Index. The calculation of the Taylor Index is performed for all pairs of clusters in the given dataset and selected clustering method.

Heatmap creation: Taylor Index values are then visualized on a heatmap. Smaller values indicate a poor clustering outcome; either small distances between two clusters and/or large spread within at least one cluster. Larger values indicate better clustering; either well separated clusters and /or tight, compact clusters. The color range in the heatmap is darker colors for smaller, worse separation between a pair of clusters to hotter colors culminating in yellow for well separated, compact clusters. In the example below, each row and column represent a cluster number. The intersection is the Taylor Index for those two clusters. In this example clusters 1 and 3 are the best resolved from each other, and cluster 2 resolves poorly from all other clusters.

Taylor Score Calculations

The Taylor Score is the weighted summation of all Taylor indices for a clustering outcome.

Weighting: Clusters with more cells are less likely to be a set of outliers, and simpler cluster definitions are a good indicator that over-clustering has been avoided. Hence, we calculate the weights on Taylor indices as:

m = number of cells; n = number of clusters; p = number of parameters

Taylor Score: The overall score is then calculated as the log of the sum of the indices calculated on pairs of clusters, multiplied by the weight.

x represents a pair of clusters

Recommendations

Taylor Index / Score could also be used to evaluate the goodness of dimensionality reduction approaches when used in combination with a consistent clustering approach.
Appropriate scaling for all parameters is recommended prior to running Euclid. FlowJo uses binned values for calculations, thus scaling will factor into the outcome.
Manual validation is suggested to examine the data for under clustering.
Euclid should automatically install its own dependencies; however if you need to manually install R packages, the required packages are: devtools, tidyverse, clustRcheck, dplyr, tidyr, ggplot2, viridisLite, ggnewscale, ggfittext.

If you have additional questions don’t hesitate to reach out: flowjo@bd.com