Introduction to Probability and Statistical Significance
Probability
Why
Science is based on empirical repeated observations
Repeated observations are similar but unlikely to be the same
All observations are therefore approximations
Probability is a tool to handle approximations
How
Based on data : e.g. probability of winning the lottery
Based on theory
 Head or tail in a coin
 Any number in a dice
 Normal Distribution  what we will discuss
Parametric Statistics
What
Parametric statistics is based on an acceptance that observations have a Normal distribution
Gauss

Repeated measurements tends to cluster around a central value, becoming less often further away
This is so common as to be noral
Thus Normal distribution

demoivre

De Moivre derived a formula for the Normal Distribution curve in mathematical terms

Fisher

Fisher used Calculus to define area under the curve as total probability
The area outside of any point is therefore the probability of observing a measurement more distant from the mean
Fisher define z as the number of Standard Deviations away from the mean

Gosset (Student)

Gosset, calling himself Student, produce a correction for Demoivre's formula for small sample size
For small sample size z becomes t
t and z define the same probability for very large samples
For the same distance away from mean, t excludes a larger probability than z
t allows research using few cases (small sample size) to draw conclusions that can be applied to whole populations. This is called Statistical Inference

95% Confidence Interval

Within which observations should lie 95% of the time
or, we are 95% sure that the truth is inside this range
One tail : exclude 5% one side. 0^{th} to 95^{th} percentile or 5^{th} to 100^{th} percentile
Two tail : exclude 2.5% each side. 2.5^{th} to 97.5^{th} percentile

Population and Sampling
Sampling
Population means everyone. It is difficult to study everyone
 Too expensive
 Takes too long
 Can't get hold of everyone
Sampling is the basis of statistical inference, and what we do in research
 We study a representative sample
 We infer that the conclusions drawn applies to the population
 However, if we repeat the study, the results would be slightly different
 This expected variations in sampling is called the Standard Error
 Standard Deviation measures the expected variability of individual observations, while Standard Error measures the expected variability of means
Mean and Standard Error of mean
Fisher showed that the mean of a set of Normally distributed observation is itself Normally distributed.
This means that the probability of seeing a mean from a theoretically defined mean can be calculated using z, or in the case of sampling t
Difference between means and Standard Error of difference
Fisher further shows that the difference between two normally distributed means is also Normally distributed, and its Standard Error can be calculated.
This means that the probability of seeing a difference a distance away from a theoretically defined difference can be calculated using z, or in the case of sampling, t
Null Hypothesis and Type I Error
The logic

From our research, we can establish the difference between two Normally sitributed means, and the Standard Error of this difference
If we then propose a theoretical difference of 0, the null, we can estimate the probability that the difference is equal or further away from null
From this we can estimate the probability that the difference we observed is not null, and therefore real

The formal statements
The shorthand statements
We want to know whether the difference is null
Type I Error, p, α provides us with the probability that the difference is null
The lower this probability, the less likely it is null
We follow everyone else, and make our decision based on p=0.05
How do all this helps
The use of Probability of Type I Error, p, α, allow decision makers to determine whether two means can be considered different
This allow a judgement whether one process, treatment, product, is better than another
This judgement can be made by research using relatively small samples
It therefore drove the industrial revolution of the western civilisation
Type II Error and Statistical Significance
Whats wrong with Type I Error
Type I Error works well when p is low, say <0.05
It does not help if p > 0.05, as the inability to reject null hypothesis is not the same as accepting the null
In other words, we cannot decide that there is no difference, based on type I Error
When p>0.05, we are unable to draw any conclusions at all
Alternative Hypothesis

Pearson proposed an additional hypothesis, the alternative hypothesis
The alternative hypothesis proposed that the difference is not null
We can then determine the error of rejecting the alternative hypothesis, the Probability of Type II Error, β
From, this, for any difference, We can then estimate the probability that the difference is null, and the probability that the difference is not null.
The problem is that, unlike the null which is a single value of 0, the nonnull can be anything except 0
The model is therefore mathematically elegant, but has no practical use.

Statistical Significance

To make the model practically useful, Pearson proposed the following
 If we can nominate the probability of Type I Error to reject the null hypothesis (say α=0.05)
 If we can nominate the probability of Type II Error to reject the alternative hypothesis (say β=0.2)
 If we can provide what the background or population Standard Deviation is
 If we can nominate a difference that is practically meaningful (known as the critical difference generally, but as a clinically significant difference in the health care domain)
 Then, we can calculate the sample size needed to make a confident decision
 After collecting the appropriate data :
 If the difference found is greater than the critical difference, the difference is declared Statistically Significant, the null hypothesis is rejected, and the alternative hypothesis accepted
 If the difference found is less than the critical difference, the difference is declared Statistically Not Significant, the alternative hypothesis rejected, and the null hypothesis is accepted

Whats good about Statistical Significance
The use of statistical significance allows clear decision making
 The difference is either statistically significant, meaning the difference exists and matters
 or, the difference is not statistically significant, meaning the difference is trivial and does not matter
The model is the basis for research in the social and health science in the 20th century, responsible for rapid advance in knowledge and technology, particularly in the pharmaceutical industry
Whats wrong with Statistical Significance
95% Confidence Interval
Changing emphasis
Increasingly, researchers are not attempting to discern a hidden "truth", but to describe what they have observed.
The emphasisis on clarity and confidence.
Results of research are now increasingly presented with two descriptions.
 The 95% confidence interval, where we expect any repeat ofthe same study will have results in this range 95% of the time
 The power analysis, where we analyse the stability of our conclusions, whether the sample size in our data is sufficient for us to draw our conclusions with confidence.
The analysis is descriptive and not a proof
It is based entirely on the data collected
There is no assumption of underlying truth or population parameters such as Standard Deviation
95% Confidence Interval of the Difference
The 95% confidence interval is based on
 The difference between the two means
 The Standard Error of the difference
 Student's t, which is a function of the sample size in the two groups
It is an alternative expression of the Probabilityof Type I Error, where p<0.05
If the 95% confidence interval does not overlap the null (0) value, then
 null is not included in the 95% confidence interval
 If we were to repeat the studies many times, null would not be found 95% of the time
 We are therefore 95% sure the difference is not null
The 95% confidence interval can be one or two tails
The most commonly used model is the two tail model
 The two tail model excludes 2.5% of the Normal distribution curve on each of the two tails, representing a range of 2.5^{th} to 97.5^{}percentile
 The model is used when the researcher intended to find a difference, any difference, ineither directions
 For example, does a new cytotoxic drug affect duration of survival in cancer patients.
 If may prolong life by killing cancer cells
 If may shorten life by its toxicity
The One tail model canalso be used
 The One tail model excludes 5% of the Normal distribution curve on one of the two tails, either 0^{th} to 95^{th} percentile, or 5^{th} to 100^{th} percentile, not both
 It is more powerful, requiring less sample size for the same power, or greater confidence in the results for the same sample size
 The model is used when the researcher is interested in a difference in one direction only
 For example, does a new cytotoxic prolongs the interval of remission.
 we are interested only if the drug prolongs remission
 We do not care whether it shortens remission
Power Analysis
Mathematically, power = 1  β, the reverse of the probability of Type II Error. power=0.8, power=80%, and $beta;=0.2 are allthe same thing
Conceptually, power means the stability of the results, whether our conclusions are likely to change if we collect more data
Power is calculated using difference betweenthe means, the sample size and Standard Deviations in the two groups, and whether the one or two tail model is used.
The power we wish to have is determined before analysis. It is commonly 80%
If the power in the data is less than planned, we cannot interpret our results with confidence, and we should seriously consider collecting more data
If the power equals or exceeds that planned, we can interpret our results, confident that they are reproducible
Summary and Discussions
Probability
Normal Distribution, z, t
Mean, Standard Deviation, Standard Error, Difference between means
Null Hypothesis, Probability of type I Error, p, α
Alternative Hypothesis, Type II Error, β, Statistical Significance
95% confidence interval, power analysis
One Tail model, Two tail Model
