Introduction
This is a short introduction to the StatPgm set of programs
- Explain why the set of programs is offered
- Show where all the programs are
- Provides a description of how data are entered and what pitfalls to avoid
- Show where the results are
- Demonstrate how to use the example buttons
This introduction will will not provide detail explanations on how the programs are used. This will be done when individual
statistical procedures are introduced and explained.
Compare StatPgm with Commercial Statistical Packages
- SPSS : most commonly used by social and clinical researchers
- SYSTAT : most commonly used by laboratory researchers
- STATA : favoured by many statistical technicians based in clinical institutions
- SAS : favoured by professional and academic statisticians
- R : favoured by those who likes to write their own statistical programs
- Many others, some specialised, can be found listed in Wikipedia
- Assurance : professionally developed and validated. No error
- Flexible : many options available for input and output
- Programmable : allows data to be edited and manipulated during calculation
- Complete : most associated parameters are calculated and presented
- Recognised : immediately accepted by colleagues, editors and regulators, without needs of explanation or defence
- Expensive : unless you have access to the institutional package
- Steep learning curve : most require a period of training to use
- Require understanding of data structure and able to program : to take advantage of the flexibility
- Confusing : too many options and too much output
- Free : provided as part of course material
- Accessible : whenever connected to the Internet
- Specific : closely synchronized to course material of the module
- Simple : minimum input and output, standardized format, intuitive interactions
- Low and shallow learning curve : requires virtually no teaching to use
- Non-professional : constructed by a clinical researcher, not a professional statistician or programmer. Not professionally validated. Errors cannot be ruled out
- No flexibility : I/O and some parameters are set by default, and cannot be changed
- Inadequate : designed for student learning. Output often insufficient for professional statisticians
- Not recognized : Cannot quote StatPgm as a reference for publications
Commercially Available Statistical Packages
Commonly used packages are
Advantages
Disadvantages
StatPgm
Advantages
Disadvantages
Help and Advice
- There must be no non-numerical characters included
- Numbers representing measurements can be with or without decimal value
- Numbers representing counts in tables must be whole positive number
- Group names can be wither numerical or text
- In graphic programs labelling are text
- Each item of text data must be a single work, with no gaps, as StatPgm uses gaps as separators of columns.
- The underscore _ is used to represent any gap. e.g. How_are_you
- Ranking, the data are ranked in alphabetical order
- No_pain, Severe_pain and Absolutely_unbearable_pain are ranked in order of Absolutely_unbearable_pain=0 No_pain=1, Severe_pain=2
- 0_No_pain, 1_Severe_pain and 2_Absolutely_unbearable_pain are ranked in order of 0_No_pain=0, 1_Severe_pain=1 and 2_Absolutely_unbearable_pain=2
- Large amount of data, a block, are entered as a table with multiple rows and columns
- The columns are separated by white spaces (spaces or tabs)
- There must be no blank data. Blanks are interpreted as separators, and the columns will be distorted
- The program dumps a set of example data into the data entry area
- Showing what the data roughly should look like
- Providing an example of the format the data should be in
- The program then performs the calculations using the example data and output the results
- The student can re3place the example data with her own data
- Click the Calculate from data button (the button that is not Example) to run the program
Data Entry
Unlike commercial statistical packages, StatPgm does not check for errors in data entry.
If data is entered erroneously, the program may crash or produce unreliable results.
Example Buttons
Every statistical procedure is provided with an "Example" button
The Example button is there to show the student how to use the procedure.
Available StatPgm Programs
- Probability of z : calculates the probability value of the Standard Deviate z value
- z from probability : Calculates the standard Deviate z value from a given probability value
- Probability of t : calculates the probability value from t and degrees of freedom values
- t from probability : Calculates the t value from probability and degrees of freedom values
- Percentile to value : calculates the data value according to a percentile value
- Value to percentile : Calculates the data value according to the percentile value
- Sample size estimation during planning
- Precision analysis of data collected
- Sample size estimation during planning
- Precision analysis of data collected
- Pearson's Correlation Coefficient : calculates Pearson's Correlation Coefficient and its 95% confidence interval, from a set of Paired Parametric measurements.
- Spearman's Correlation Coefficient : calculates Spearman's Correlation Coefficient and its statistical significance, from a set of Paired Nonparametric measurements.
- The Fixed Effect Model
- The Random Effect Model
- Numbers with True Positive, False Negative, False Positive, and True Negative
- True and False Positive Diagnosis Rate
- Likelihood Ratio for Test Positive and Test Negative
- Create an array of counts from a single column of values
- Create a 2 dimensional matrix of counts from a double column of values
StatPgm_1 : z and t test
Calculations based on the z and t distribution
StatPgm_2a : One Group Survey
StatPgm_2b : Correlation and Regression
Analysis of relationship between 2 measurements from a set of single group data
StatPgm_3a : Compare Two Measurements
Difference between 2 means, its Standard Error, and 95% confidence interval of the difference,
assuming the data is parametric
The Robust Signed Ranked Test and its statistical significance, assuming the data is nonparametric StatPgm_3b : Compare Two Proportions
Fisher's Exact probability
Chi Square for 2x2 table Risk Difference its 95% confidence interval, and Numbers Needed to Treat. Usually used for controlled trials Risk Ratio (Relative Risks) and its 95% confidence interval. Usually used for epidemiological studies Odds Ratio and its 95% confidence interval. Usually used for retrospective studies, or where cause and effect variables are difficult to identified. p StatPgm_3c : Compare Two Regressions
Compare Regression lines from two groups
Compute combined regression line from two groups Calculate adjusted means Compute difference between adjusted means Shifting data points on covariance plot StatPgm_4 : Meta-analysis
StatPgm_5a : Prediction Binary Tests
StatPgm_5b : Receiver Operator Characteristics (ROC)
Data preparation for ROC
Choice of program : low or high values related to outcomes Receiver Operator Characteristics (ROC) ROC Plot StatPgm_6 : Graphics
All of graphics has its own workshop
StatPgm_7 : Utilities
Calculate n, mean, SD from a column of raw data
Pivot table |