Dimensionality Reduction
Dimensionality reduction is an invaluable tool for analyzing high parameter data-sets. In various ways it allows researchers to visualize high dimensional data within a smaller number of parameters while maintaining a high degree of variability and separation between distinct cell populations.
PCA
The simplest of dimensionality reduction techniques available in SeqGeq is the machine learning algorithm for Principal Component Analysis (PCA). This is also typically the first dimensionality reduction technique applied to a sparse data-set for further downstream analysis after quality control.
To run PCA, select a population of interest, click on the Dimensionality Reduction button within the Analyze tab in SeqGeq’s workspace, and choose PCA:
Within that dialog you’ll likely want to normalized selected genes, log transform the parameters (to maximize the dynamic range available for separation of populations), and you’ll need to choose the genes on which you’d like to base your principal components computed.
The platform will display a table of variance described by each of the principal components. Principal components selected there will be added as parameters to your data as “Analytical Parameters”:
Try running PCA on the quality cells population using your highly dispersed genes Geneset:
Note: Islands illustrated in this visualization represent broadly distinct neighborhoods of populations within the data matrix.
tSNE
The tSNE machine learning algorithm is a much more complex tool for dimensionality reduction. Our implementation reduces a data matrix input to just two parameters, tSNE X and tSNE Y. This tool has reached great acclaim among many different data analysis fields for its ability to retain distinct variability among populations out of N-dimensional space to 2-3 dimensions.
To run the tSNE visualization select a population of interest, (re-) engage the Dimensionality Reduction button within the Analyze tab in SeqGeq, choose the “tSNE” option there, and select the parameters or genes you’d like to input (map into two dimensions). Typically principal component parameters are used to map single-cell RNA sequenced data with tSNE, as these represent a more rich (less sparse) interpretation of the data.
Within the tSNE section of the dimensionality reduction platform there are a number of settings, including Advanced Settings which can be used to adjust the tSNE algorithm calculations. Mousing over these different options will give feedback regarding the function of each:
Try mapping your principal components developed from the highly dispersed genes into tSNE space – If you normalized genes for your PCA, you won’t need to normalize those parameters in tSNE: