Introduction
 Statistics arm of Systematic Review that involves Quantitative Research.
 Collates and summarizes research data from diverse sources.
 Very large subject. Statistics module will only cover
 Statistical Effects and its Standard Error that are Normally distributed
 Single group : mean and Standard Error of mean
 Single group : proportion and Standard Error of proportion
 Single group : Fisher's Z transformed correlation coefficient and Standard Error of Z
 Two groups: difference between means and its Standard Error
 Two groups: risk difference and its Standard Error
 Two groups: log(risk ratio) and its Standard Error
 Two groups: Log(odds ratio) and its Standard Error
 Most basic Calculations, as introduction to the concepts
 Data entry
 Heterogeneity
 Publication Bias
 Combine data to form summary effect
 The following programs are required for metaanalysis
Data Input
Heterogeneity
 Whether the studies are so different that comparison may produce misleading interpretation
 The Q test. If p of Q test <0.05 then significant heterogeneity exists. Good enough for most clinical researchers
 The I^{2} Test, which estimates the percentage of total variation in the studies attributable
to the difference between them. More nuanced, and preferred by statisticians
 <30% represents trivial heterogeneity and can be ignored
 >70% represents serious and significant heterogeneity
 In between represents that significant heterogeneity is present but not very severe.

Example from program
 Q test shows that significant heterogeneity
 I^{2} confirms this severe level of heterogeneity, as 95.7% of the total variations are
attributable to differences between studies.
 The Forest plot demonstrate this level of heterogeneity
 Response options when heterogeneity exists :
 Reexamine method of research selection, and delete unsuitable reports
 See if heterogeneity has a cause
 Studies may be from different populations and environments, and need to be subclustered
 Results may need to be statistically corrected (an advanced topic not covered in this module)
 Data considered unreliable, and metaanalysis abandoned
 Decision to ignore heterogeneity and proceed with metaanalysis
 Use Random Effect Model to combine results (see later)

 Forest Plot shows what happened when the 5th study is remove.
 Q = 1.97, p = 0.58 n.s.
 I^{2} = <0.01%
Publication Bias
How does it arise
 Poorly designed and executed studies often exists
 If significant difference found, publish
 If no significant difference found, throw results away
 The scientific literature therefore has an excess of small poorly designed studies with statistical significance
 This leads to misleading conclusions
 One of the many tests for publication bias is the Rank Correlation Test. The logic is as follows
 Smaller poorly designed studies have larger Standard Error
 If bias exists, then the greater the effect, also the greater its Standard Error
 A significant correlation between Effect and its Standard Error indicates exitence of Publication Bias
 Rank Correlation performs correlation after adjusted the data by weighting according to Standard Errors
 Response to the presence of significant publication bias
 Data may be considered unreliable, and metaanalysis abandoned
 Proceed with metaanalysis, but provides a warning that results may be bias, and should be interpreted with care
 Use one of the many statistical procedures to either delete or compensate for the positive
studies with large Standard Errors (an advance topic and not covered in this module).
[Demonstrate] Remind students that more studies are required for interpretation
Combine Multiple Studies
 The Fixed Effect Model.
 Assuming that all the studies have similar populations and environments
 Variations between studies in analysis are ignored
 More likely to conclude statistical significance in marginal situations
 The Random Effect Model.
 Assuming that all the studies may have different populations and environments
 Includes variations between studies in analysis
 Less likely to conclude statistical significance in marginal situations
 Comment : Use Random Effect Model
 No Heterogeneity : Both models produce very similar results
 Significant Heterogeneity : Random Effect Model should be used
Displaying final results
 Forest Plot
 The effect size and 95% confidence interval of each source of data, and the combined summary effect
are plotted as a Forest Plot.
 The combined summary effect is usually plotted at the bottom, and has a different appearance,
either with a different color, a different shaped dot, or different thickness of line for the confidence interval.
The default for the program is a diamond shape dot and blue color for the combined summary effect.
 From the example in the program, the analysis, original and after deletion of study 5, are presented as follows
 The diagram to the left show all 5 studies, with significant heterogeneity
 The Random Effect model is shown in green, better reflecting the data.
 The Fixed Effect model in blue, artificially precise and less representative
 The diagram to the right from after removal of study 5, without significant heterogeneity
 The two results are very similar
Exercises
Calculations for these exercises require StatPgm_4_MetaAnalysis.php Calculations for metaanalysis
 Grp 1 : Iron Therapy  Grp 2 : Control 
Country  n  mean  SD  n  mean  SD 
Australia  150  11.6  4.2  145  11.8  4.3 
Bangladesh  180  11.5  4.8  150  8.3  5.2 
Buranda  195  12.1  3.6  200  9.2  4.5 
Canada  309  12.3  3.8  300  12.4  3.5 
England  180  12.0  4.0  180  11.3  4.1 
Nigeria  130  11.5  3.8  120  9.1  3.8 
USA  150  12.3  3.8  160  12.4  3.5 
Zimbabwe  145  11.2  4.3  150  8.3  4.8 
Q 1. We wish to review the effectiveness of iron therapy to increase Haemoglobin level in pregnant women.
We found 8 controlled trials, listed in the table to the right
Q 1a. Calculate the difference between the mean, its Standard Error, and 95% confidence interval for each of these trials,
and establish whether heterogeneity exists
A 1a. Click to show contents
Country  Difference  SE 
Australia  0.2  0.4949 
Bangladesh  3.2  0.5512 
Buranda  2.9  0.4107 
Canada  0.1  0.2963 
England  0.7  0.4269 
Nigeria  2.4  0.4811 
USA  0.1  0.4146 
Zimbabwe  2.9  0.5312 
Q Test p=0.0001, I ^{2}= 92% of variance attributable to differences between study
Severe and significant level of heterogeneity exists
Q1 b. Divide the studies according to first and third world countries and repeat the calculation for heterogeneity
for each subcluster
A 1b. Click to show contents
First world countries : Australia, Canada, England, and USA. Q Test p=0.4 (n.s.), I ^{2}=0%
Third world countries : Bangladesh, Buranda, Nigeria, and Zimbabwe. Q Test p=0.73 (n.s.), I^{2}=0%
No heterogeneity within the separate clusters
Q1 c. Create a Forest Plot of the 95% confidence intervals, with the subclustered data
A 1c. Click to show contents
Q1 d. Create graphics to clarify a suspicion that the effectiveness of iron therapy is
related to prevalence of anaemia
A 1d. Click to show contents
The two plots are different ways of using graphics to make the same point, that the effectiveness
of iron therapy, as measured by the difference between the two groups, is dependent on whether
anaemia is prevalent, as measured by the mean of the control group.
 The plot on the left uses a scatter plot, with the mean Hb level of the control group in the x
axis, and the difference between the two groups in the y axis. This is an effective way of showing the
correlation between these two parameters.
 The plot on the right is simpler, merely showing the Hb levels in the two groups linked by a line,
and demonstrates that only when the Hb level in the control group is low was there any improvement,
and that if the Hb level in the control group is near normal, no improvement occurred, nor should it be expected.
Q1 e. Review all the results, and discuss the possible cause of the heterogeneity
A 1e. Click to show contents
When heterogeneity is present, the analyst should pursue it, so as to gain additional insight
In this case, iron therapy is more effective in third world countries where anaemia is prevalent,
so that the lower the base line Hb (control group), the more effective is iron therapy. In first world countries, where
iron deficiency anaemia is no longer common, routine iron therapy during pregnancy makes very little difference.


 Treated with Hormone  Placebo 
Study  Total  Aborted  Live Birth  Total  Aborted  Live Birth 
1  15  1  14  16  6  10 
2  50  3  47  60  10  50 
3  30  2  28  40  10  30 
4  80  10  70  60  10  50 
5  150  12  138  180  20  160 
6  218  35  183  240  34  206 
7  500  78  422  550  83  467 
8  420  65  355  450  68  382 
Q2. There was a belief that recurrent abortion is caused by hormonal deficiency in early pregnancy,
and giving women large doses of hormone may prevent another abortion. We wish to conduct a
metaanalysis of controlled trials for such a treatment, and found 8 such reports, the results of which
are listed in the table to the right. The studies were listed in order of their publication
Q2a. Calculate the risk difference and it's 95% confidence interval for each study
A 2a. Click to show contents
Study  R1  R2  RD  SE  95% CI 
1  0.067  0.375  0.308  0.137  0.577  0.040 
2  0.060  0.167  0.107  0.059  0.222  0.008 
3  0.067  0.250  0.183  0.082  0.345  0.022 
4  0.125  0.167  0.042  0.061  0.161  0.077 
5  0.080  0.111  0.031  0.032  0.094  0.032 
6  0.161  0.142  0.019  0.034  0.047  0.085 
7  0.156  0.151  0.005  0.022  0.039  0.049 
8  0.155  0.151  0.004  0.024  0.044  0.052 
Q2b. Calculate the Rank Correlation Test for the effects and Standard Error of all the studies, and
determine whether publication bias exists
A 2b. Click to show contents
Rank Correlation Test p=0.03, significant publication bias exists
Q2c. Combine the early 4 studies, where the sample size per group were less than 100,
produce a Forest Plot of the results, and draw conclusions from these results
A 2c. Click to show contents
The 4 studies with sample size less than 100 per group showed a reduction in abortion,
with the 95% Confidence Interval of Risk Difference 0.31 to 0.03,
when hormone treatments were given
Q2d. Combine the later 4 studies, where the sample size per group were more than 100,
produce a Forest Plot of the results, and draw conclusions from these results
A 2d. Click to show contents
The 4 studies with sample size more than 100 per group showed hormonal treatment had no significant benefit,
with the 95% Confidence Interval of Risk Difference 0.03 to +0.03,
when hormone treatments were given
Q2e. Combine all the studies, produce a Forest Plot of the results, and draw conclusions from these results
A 2e. Click to show contents
Collectively, the data showed hormonal treatment had no significant benefit,
with the 95% Confidence Interval of Risk Difference 0.07 to +0.01,
when hormone treatments were given
We know that the rate of spontaneous abortion is 15%. All the figures in this exercise
were generated by the computer to be within 95% confidence interval of 15%, given the sample size.
However, early trials tend to be smaller in size, and if we only publish those with significant results
and throw away the nonsignificant ones, there will be a bias showing that the treatment works.
In this Forest Plot, one can see the correlation between effect size and the Standard Error
in terms of the 95% confidence interval. It is this correlation that is the basis of the Rank Correlation Test
More Exercises
