Appendix 2.
Probability Distributions

(c) 2017 by Barton Paul Levenson



t and F are probability distributions. Student's t is named for "Student," the pen name of the statistician William Sealy Gosset, who worked for... wait for it... a beer company (Guinness Brewery in Dublin, Ireland.) F is named for Sir Ronald A. Fisher, a mathematical biologist.

The t-test on the coefficient in a regression is:

β - β0 t = ——— se

where β is your regression coefficient, β0 the value you're testing against. Usually this is zero, in which case equation 1 reduces to

β t = —— se

se is the sample standard deviation of the coefficient, something you can find with regression matrix math.

The t-statistic has a certain number of "degrees of freedom," d:

d = N - k

where N is the number of observations (points, values of one variable), and k the total number of variables. You would have N independent observations, but by relating them to one another, you are applying constraints, and some freedom of mathematical maneuvering is lost. You lose one degree of freedom for every variable in your analysis.

Student's t-tests are used for other types of statistical investigation as well, many of which LevenStats provides. Formally, they are all written as

tp(d)

where p is the probability the coefficient is due to sampling error. For example, let's say you're regressing inflation dP on money supply growth dM and real GDP growth dQ. You have 51 observations, maybe for the years 1961-2011 for the economy of France. And you want to find out if your hypothesis that dP is a function of dM and dQ is true. The null hypothesis for each independent variable is

H0: βi = 0

You want to prove the null hypothesis false. You hope your regression coefficients will be "significantly different" from zero at some level of confidence.

Scientists normally aren't impressed with a relationship unless the variable is "significant at the 95% level." This would mean there is only a 5% chance of the null hypothesis being true in real life, and any nonzero results you got are due to sampling error. That 5% is your "p" value. It's the probability the null hypothesis is true. If you're trying to prove two factors are related, you want p to be as small as possible.

For each coefficient in our hypothetical regression equation, we want to calculate t0.05(48). The programs listed in Appendix X will do that for you--you can enter the t or F statistics, the degree(s) of freedom, and get an answer. For a regression, these are calculated automatically.

The F statistic is used to test whether the regression as a whole is significant, all the independent variables taken together, or some subset of them. Like t, it can also be used for other types of test. Most of these are also provided by routines in the program.

The numbers of degrees of freedom for the F statistic in a multiple regression are:

d1 = k - 1 d2 = N - k

If you want some really arcane stuff, here are the probability density distributions for the t and F statistics:

Γ(½d + ½) -(½d + ½) f(t) = ———————— [1 + t2 / d] (d π)1/2 Γ(½d) d1½d1 d2½d2 F f(F) = ——————— ∫ t½(d1 - 2)(d2 + d1 t) - ½(d1 + d2) dt Β(½d1, ½d2) 0

Here Γ is something called the "gamma function," an extension of the factorial operator to real numbers. π is just the circle constant, 3.14159... Β is the "beta function," which you can derive easily from the gamma function:

Γ(x) Γ(y) Β(x, y) = ————— Γ(x + y)

The big S-like thing ( ∫ ) is an integral sign, a sum of very small parts of t taken over the limits between 0 and F. The "dt" at the end is the "differential" that goes with it, standing for the very small parts in question.





Page created:04/12/2017
Last modified:  04/13/2017
Author:BPL