Appendix 1.
How to Compute Descriptive Statistics

"Descriptive Statistics" is just another term for the "Simple Statistics" (measures of central tendency and distribution) discussed in Part 1 of this tutorial. A quick review follows. Some of these were not listed in Part 1, but are included here for completeness. We assume your variable is named "x."

Measures of Central Tendency

N . . . The NUMBER. How many values or points there are in your sample.

Σx . . . The SUM. All the values added together.

x̄ . . . The MEAN. The average of all the values. Divide the sum by the number of points:

Σx x̄ = —— N

MEDIAN . . . The MEDIAN. The value in your sample which has half the values lying below it, and half above. You find it like this:

Sort your sample. Lowest to highest or highest to lowest doesn't matter.
If you have an odd number of points, the median is the one in the middle. For example, the tenth point if N = 19.
If you have an even number of points, take the average of the values closest to the middle. For example, take the average of the eighth and the ninth point if N = 16.

MODE . . . The MODE. The number that occurs most frequently in your sample. There is no easy way to find this except to sort your sample (see MEDIAN above), then count any values which occur more than once. A normal distribution, or most symmetrical distributions, will have a single mode. These are "unimodal distributions." Not all distributions will be--some are "multimodal." A good statistical program will list all the modes in your sample.

Measures of Dispersion

min . . . The MINIMUM. The lowest or smallest value found.

max . . . The MAXIMUM. The highest or largest value found.

range . . . The RANGE. The difference between the minimum and the maximum:

range = max - min

SS . . . The RAW SUM OF SQUARES. The square of each value added together:

SS = x₁² + x₂² + ... x_N²

CFM . . . The CORRECTION FOR THE MEAN. Used to reduce a sum of squares to a sum of squared deviations. For one variable, it is defined as:

CFM = N x̄²

ss . . . The REDUCED SUM OF SQUARES. The sum of the squared "deviations from the mean," these being the mean value of the sample subtracted from each individual value:

ss = (x₁ - x̄)² + (x₂ - x̄)² ... + (x_N - x̄)² = SS - CFM

s² . . . The SAMPLE VARIANCE. A measure of how variable a series is, how much the individual values tend to vary from the mean. It is defined as:

ss s = (———) N - 1

s . . . The SAMPLE STANDARD DEVIATION. The square root of the sample variance. This measure is more commonly used than the variance, since it relates easily to the normal distribution.

ss s = (s²)^1/2 = (———)^1/2 N - 1

se . . . The STANDARD ERROR OF THE MEAN. This measures how variable the sample mean itself may be compared to the true value for the whole population. It is defined as:

ss se = (—————)^1/2 (N - 1) N

Note that this value is very close in definition to the sample standard deviation. If you have one, you can easily compute the other.

Measures of Normality

skew . . . The SKEWNESS. If enough points are available, this measures how asymmetrically the sample is distributed. Skewness can have any value:

Positive values indicate the distribution is skewed right. There are more points below the mean, so the right forms a long, thin tail.
0 means a perfectly symmetrical distribution. There are as many points left of the mean as right.
Negative values indicate the distribution is skewed left. There are more points above the mean, so the left forms a long, thin tail.

The skewness is defined mathematically by:

N _N x_i - x̄ skew = ———————— Σ (———)³ (N - 1)(N - 2) ⁱ⁼¹ s

kurt . . . The KURTOSIS. If enough points are available, this measures how steep or shallow the sample's distribution is compared to the normal distribution:

Values > 3 mean a distribution that is narrow, high, and peaked.
3 means the same verticality as the normal distribution's "Bell Curve."
Values < 3 mean a shallower, flatter distribution.

The kurtosis is formally defined as:

N (N + 1) _N x_i - x̄ 3 (N - 1)² kurt = [——————————— Σ (———)⁴] - ———————— (N - 1)(N - 2)(N - 3) ⁱ⁼¹ s (N - 2)(N - 3)

Determining whether a sample is normally distributed or not can be important. The assumptions behind linear regression and many other statistical tests break down if a sample is too far from normally distributed. In this case, you might have to use non-parametric statistics or other methods to deal with your sample.

Page created:	04/12/2017
Last modified:	04/13/2017
Author:	BPL

Appendix 1.How to Compute Descriptive Statistics

Measures of Central Tendency

Measures of Dispersion

Measures of Normality

Appendix 1.
How to Compute Descriptive Statistics