A Short Seminar on Probability and Statistical SignificanceA B C D E F G H I J K L M N O P Q R S T U V W X Y Z
A. IntroductionThis seminar will cover the following topics
B. Probability : Why
The scientific approach that characterises the western civilisation is based on reproducible empirical observations. The idea is that repeatedly observed relationships or differences are more likely to reflect reality. Another way of saying this is that, a proposition cannot be accepted unless it is supported by repeated observations.
The problem with repeated observations is that the results, often similar, are not always the same. Experience compels us to abandon the idea that something is either true or false. Rather we increasingly see true or false merely as extremes while most of reality is a continuum in between.
Similarly, when we consider a scale (e.g. how tall is a man), we can only state an approximation, a range that most observations would fit in.
The uncertainty of reality therefore needs to be approached in a consistent and logical way. Probability is a measurement of how likely things are to occur and is one of the ways to represent uncertainty. Statistics is the set of tools to handle probability.
C. Probability : How
Clinicians often present probability as a percent. Mathematicians and statisticians however usually use a number between 0 and 1. This should be familiar with both notations, sa probability of 0.25 and 25% means the same thing.
Probability can be established by observations. For example, if we examine all the children in a class, and there are 20 boys and 15 girls, then we can conclude that the probability of being a girl in that class is 15/(20+15) = 15/35 = 0.43 or 43%
Probability can also be calculated based on a theoretical construct. For example, if we are to toss a coin, the results can only be the two outcomes of head or tails, so the probability of getting a head is 1/2 , 0.5 or 50%. Similarly, if we roll a dice, the results can only be 1 of the 6 numbers, so the probability of obtaining any number is 1/6 = 0.17 or 17%.
Inferential Statistics. If we accept that observations generally follows a particular probability model, we can interpret our observation according to the mathematics of that model. We expect that the pattern will repeat itself, and is therefore predictable.
This is the basis of using statistics in research
By interpreting our observations according to a probability pattern, we can predict what future observations are likely to be
D. Parametric StatisticsIf we accept that observations generally follows a particular probability model, and we are able to handle the mathematics of that model
We can expect or predict the patterns of future similar observations
We have a tool to translate our observations to represent reality
Parametric Statistics is a set of tools to translate observations into notions of reality. It is based on the belief that observations follows the mathematics of Normal Distribution
E. Normal DistributionMean
Ancient Phoenician traders carried their merchandise by boat, and often overload their boats to make more money. In a storm, some of the merchandise are thrown overboard to save the boat. A common practice was to compensate the traders who lost their merchandise by contributions from those who did not lose their merchandise. A sophisticated method of calculation was developed on how to do this, and the system was called "Havara". The term Havara evolved through the centuries, and eventually became "average".
There are 3 presentations of average. The mode which is the most common value, the median which is a value that divide all the values into two equal groups, and the mean which is a mathematical function where mean = sum of all values / number of values.
Statisticians sometimes use the median, but most commonly use the mean. This module will use mean to represent average.
The astronomer, Gauss, tried to measure distances between stars, and noticed that it was difficult to reproduce his measurements exactly. However, his measurements clustered around a central value, more common near the mean, and becoming less common as they are further away from the mean. He then noticed that, whenever he made any measurements of anything, this pattern applies, so he name this the Normal distribution.
Following Gauss, De Moivre derived a formula for the Normal Distribution curve in mathematical terms, so that various components of the Normal distribution can be mathematically handled. The mean (abbreviated to μ) becomes the measure of the central tendency, and the Standard Deviation (abbreviated to SD or σ), a measure of dispersion.
F. Probability and Normal Distribution
Fisher used calculus to calculate the area under De Moivre's Normal distribution curve.
Fisher argued that, if the total area under the curve represents all the possibilities (probability=1 or 100%), then any area further from a defined distance from the mean represents the probability of having a value greater than that value. Fisher standardized the distance from the mean and called it the Standard Deviate z. By common usage, z is now called the Standard Deviation (SD or σ).
The concept of Standard Deviation becomes very useful, as any value in a distribution of known mean and SD can be translated to z where z = (value-mean) / SD, and the relationship between z and probability is constant. For example, we know that the probability of having a z value >1.65 is 0.05 (5%), and >1.96 is 0.025 (2.5%).
95% Confidence Interval, One and two tail model
As statistics is the science of handling uncertainties, an expression of a measurement is the range of likely values, the most common form of which is the 95% confidence interval. This means that, if we are to make repeated observations, we expect the value observed would be within this interval 95% of the time. We can calculate this easily using the relationship between z and probability. There are two ways of doing this.
G. The t Distribution
The Normal distribution works best when the sample size is very large.
The probability distribution become increasingly wider as the sample size becomes smaller, as shown in the diagram to the left.
Gosset, who called himself Student, derived a correction of the probability estimate according to sample size and called it t, and this became known as Student's t.
Student's t allows the use of a small number of measurements to estimate what may be true of the whole population. This forms the basis of modern inferential statistics, where a small number of observations are made, and the results are generalized to the wider population.
The t distribution curve is wider than the normal one. Therefore, a larger area (or higher probability) of being greater than a particular deviate is obtained compared to the normal distribution. This difference varies with sample size (degrees of freedom), such that the probability of t approaches that of z when the sample size increases towards infinity. Conceptually, this is represented by the diagram to the left.
With infinite degrees of freedom (i.e., a large sample size), the one tailed t and z have the same value for a particular probability, but with fewer cases, t will be larger than z in obtaining the same probability.
H. Sample mean and Standard Error of the mean
After establishing the mathematics of Standard Deviation, Fisher went on to develop the idea of the Standard Error of the Mean (SE for short). He argued that the true mean is difficult to find, as this requires the measurement of everyone in a population, or an infinite number of times. The mean value obtained in a set of observations is therefore only the sample mean, an estimate of the underlying true mean, and this would vary from samples to samples. An estimate of this variation is called the Standard Error of the Mean (SE).
Conceptually, Standard Error SE represents the Standard Deviation (SD) of the mean values if repeated samples of the same size were taken. In other words, the mean value is calculated for each repeated sample from the population. The SD of these mean values are calculated which equals the SE of the mean.
Difference between two means and its Standard Error
Extending the argument of sample means, Fisher argued that the difference between two means is itself a mean, and the Standard Error of this difference can be estimated using the Standard Deviations in the two groups.
I. The Null Hypothesis and Type I Error
If the difference between two means is a true reflection of population differences, then the probability of having any theoretical difference can be calculated using the z value, where z = (theoretical difference - observed difference) / Standard Error of the difference
Fisher then proposed the null hypothesis. He asked that, given the observed difference and its Standard Error, what is the probability for this to represent a theoretical difference of null (0). This is the probability of z, where z = (0 - Difference) / Standard Error of the difference. In other words, the probability of z is the probability that there is no difference between the groups, as shown in the diagram to the right.
Probability of Type I Error, α or p
Fisher, being a mathematician, presents the null hypothesis in terms of a mathematical proof, in the following series of arguments.
J. Type II Error and Statistical Significance
Type I Error worked very well in industry, useful in comparing a new method of manufacturing to an existing one.
When α is low, p<0.05, a decision that the difference observed is real can be made.
The problem arises when α is high, p>0.05, as the rejection of the null does not mean the acceptance of non-null, we cannot decide that the difference is zero, so no statistical conclusions can be drawn
To fix this, Pearson proposed an additional Alternate Hypothesis, that the difference is not null, which is shown to the left. Following Fisher's argument style, he called the error in rejecting the alternative hypothesis Type II Error, and the probability of Type II Error abbreviated to beta (β). With this proposal, any difference found between two groups will be able to reject the null hypothesis according to α and reject the alternative hypothesis according to β.
Although Pearson's initial proposal was theoretically sound, it was not practical, as a non-null value can be anything between -∞ to +∞ except 0. To make his theory work, Pearson proposed the following, shown in the diagram to the right.
The term Power is often used instead of the probability of Type II Error, where Power = (1 - β). Conceptually power is used to represent the probability of detecting a difference, if it really exists.
K. The 95% confidence interval of the difference
Since 1980, researchers increasingly became doubtful about Pearson's model, as on many occasions, the research results using this model were unstable, and could not be replicated.
There are two related reasons for this failure.
K. Summary of Technical TermsMean : Average
Standard Deviation : Spread of measurements
Standard Error : Actually Standard Error of the mean. Spread of the mean
Difference and the Standard Error of the Difference
z : The distance from the mean, in terms of Standard Deviation or Standard Error
t : The same calculation as z, but is meant for use with small sample size
Probability of z or t : The probability of seeing a value further away from the mean
When interpreting a difference and its Standard Error from comparison of two groups
Contents of m : 12
Contents of n : 13
Contents of o : 14
Contents of p : 15
Contents of q : 16
Contents of r : 17
Contents of s : 18
Contents of t : 19
Contents of u : 20
Contents of v : 21
Contents of w : 22
Contents of x : 23
Contents of y : 24
Contents of z : 25