This page details the statistics that can be selected from the Statistics Dialog window.
Scale space vs. graph space
Intensity is in arbitrary units and can be expressed as scale or bin numbers. The scale values take into account scaling factors such as the gain setting for the parameter, and are the values displayed on the graphs in FlowJo. Bin numbers divide the data into fewer discreet bins or channels, using an integer scale ranging from 1 to the maximum value defined by the data collection program (i.e. 255, 1023, 4095). Graphs in FlowJo are rendered using the binned data to maximize speed. Almost all statistics are calculated on the binned values, then translated back to scale values. This is for speed, the ability to reflect the graphical representation very well, and to allow us to calculate statistics such as geometric mean on a range of values that never goes to 0 or negative numbers. One caveat of this approach is that changing the transform can potentially cause a difference in binning, and thus a (usually small) difference in statistical outputs.
Definition of statistics
- Median—The median is the relative intensity value below which 50% of the events are found; i.e., it is the 50th percentile. In general, the median is a more robust estimator of the central tendency of a population than the mean.
- Mean—The arithmetic mean. For a normal distribution, the mean = median = mode.
- Geom. Mean—The geometric mean. Can be a more applicable metric for a log-normal distribution. It is always less than or equal to the arithmetic mean. In FlowJo this is calculated as the geometric mean of the graph space to make it usable on data the may include zeros or negative numbers.
- Robust CV— robust coefficient of variation, Equals 100 * 1/2( Intensity[at 84.13 percentile] – Intensity [at 15.87 percentile] ) / Median. The robust CV is not as skewed by outlying values as the CV.
- Robust SD— Robust standard deviation, 68.26% of the events around the Median are used for this calculation, and an upper and lower range set. The robust standard deviation is equal to (upper range + lower range) /2. If the upper range is off scale, the robust standard deviation is equal to the lower range, and vice versa when the lower range is off scale and the robust standard deviation is the upper range. The robust standard deviation is not as skewed by outlying values as the Standard Deviation.
- CV—The Coefficient of Variation is a normalized Standard Deviation. CV = StdDev/Mean. In FlowJo, the CV statistic is displayed in percent (i.e. a CV of 0.15 is displayed as 15). 1/CV is a common way to define the Signal to Noise Ratio.
- SD—The Standard Deviation is a measure of the spread of the dataset. Lower values indicate the data points are closer to the mean and give higher confidence to the mean value.
- Percentile. This is the relative intensity below which n% of the events are found, where n is the selected value. n=50 is equivalent to the median.
- MADP* —Median Absolute Deviation Percentile is 100 * the MAD divided by the median, which is a measure of variance on a normalized scale to aid in interpretation.
- Median Abs Dev — Median Absolute Deviation is a robust measure of population variance. It is calculated as the median of the absolute deviation of each cells measure from the population median.
- Freq. of Parent—The percentage of events (cells) in this population out of the parent population (one level up).
- Freq. of Grandparent—The percentage of events (cells) in this population out of the population two levels up.
- Freq. of —The percentage of events (cells) in this population out of the total number of events a selected upstream population.
- Freq. of Total — The percentage of events (cells) in this population out of the total number of events in the sample.
- Count—The absolute number of events (cells) in a population.
- Mode—The relative intensity value which is most frequently found for a given parameter. This is the same intensity value at which the highest point on a histogram is found.
Note: A common question is “Which statistic should I use?”. The answer depends on how your cells are expressing the markers of choice, and what scale you have used to display the data on. Means are appropriate for linear scales, while geometric means are appropriate for log or biexponential scales, while medians are appropriate for either. By default, FlowJo tends to use medians which are also less impacted by outliers.
One clear recommendation is that when using the term “MFI”, it is a good idea to clearly define it in your context.
More on means, median, geometric means, and modes can be found in this DD article.