Risk Controlled Audit : Report on Daljit's Coefficients

Introduction

This report is my response to your email that I received on 8/10/2018, and consists of the following sections

Section 1 reproduces the data and the plot you sent me, just so that you can be reassured that we are on the same page to start off with

Section 2 attempts to answer the questions you raised in the email

Section 3 My preferences and suggestions

Section 4 shows data in groups of 116 in case you want to look at them

Section 5 Summary and remarks

Section 1 : Data you sent

The reasons I reproduced what you sent me is that I need to handle the data before I am familiar, then understand it. I reproduce them here to reassure you and myself that we are using the same data and talking about the same thing.

The table that follows, table 1.1, is copied from your email,and reproduced here. The only change is that I have used 4 decimal point precision for all values other than counts

The two plots, fig. 1.1 and 1.2 reproduced the plot you sent me. The one on the left, 1.1 is produced by me from the data in table 1.1, and the plot on the right is copied directly from your email. The two plots are the same except for slight differences in color, dot size, and line thickness.

At this point, we can be reassured that we are discussing the same data

GroupCumnAcutualnExpected BayesBayes LowerCIBayes Upper CInExpectedLogRegLogReg Lower CILogRegUpperCI
01503530.042720.435639.649829.270319.757038.7836
13006661.143247.467974.818558.573245.116572.0299
245010091.056374.3524107.760288.547472.0178105.0771
3600127120.1444100.9317139.3571117.999698.9167137.0825
4750155147.4112126.0807168.7417144.8026123.6160165.9893
5900184178.1732154.7432201.6032175.3873152.0964198.6782
61050219208.7142183.3682234.0602205.4788180.2817230.6759
71200247235.0856208.1378262.0334231.3200204.5368258.1032
81350269257.9118229.6009286.2227253.8190225.6811281.9570
91500305290.5922260.5910320.5934285.3762255.5814315.1710
101650334318.5043287.0817349.9269313.3368282.1098344.5638
111800356346.0408313.2721378.8095340.9109308.3286373.4931
121950387372.6766338.6463406.7069367.9176334.0544401.7809
132100406398.8202363.5904434.0500393.4699358.4222428.5176
142250427427.7556391.2747464.2365421.9667385.6759458.2574
152400454455.3973417.7476493.0470449.3335411.8770486.7899
162550481484.8436446.0050523.6822479.1713440.5076517.8350
172700509512.3191472.3857552.2525506.0471466.3020545.7922
182850531537.8418496.8997578.7839530.4738489.7484571.1992
193000572574.0483531.8193616.2773566.3344524.3235608.3453
203150611607.2679563.8727650.6631598.7302555.5689641.8915
213300649643.1103598.5109687.7097633.9194589.5633678.2754
223450675669.6481624.1158715.1804659.3933614.1277704.6589
233600708697.5983651.1161744.0805686.4501640.2524732.6478
243750725718.6158671.3759765.8557707.1926660.2414754.1437
253900748744.2691696.1698792.3684733.3583685.5304781.1862
264050772769.6905720.7527818.6283758.7847710.1142807.4552
274200804804.2861754.3054854.2668792.3041742.6096841.9985
284350816827.4272776.6924878.1620815.6115765.1558866.0672
294500853862.8551811.0945914.6157850.5715799.0939902.0490
304650879888.9808836.4240941.5376875.8127823.5554928.0700
314800904918.3476864.9346971.7606904.2225851.1254957.3195
324950929944.7783890.5867998.9699930.1867876.3174984.0560
335100961972.0906917.11251027.0687956.8774902.23081011.5239
3452509861003.7453947.89941059.5912987.4182931.92201042.9144
35540010241040.4681983.66221097.27401023.4508967.00151079.9001
36555010541070.28511012.67691127.89331052.7310995.48531109.9766
37570010751096.99881038.66201155.33561078.36421020.40811136.3203
38585011091128.83651069.67791187.99511109.65961050.88661168.4326
39600011371157.69481097.78421217.60541138.48511078.95591198.0142

Section 2 : Respond to your suggestions in the email

I think it best that I do what your requested in your email before putting forth my own opinion. Your email requests are as follows

For our purposes we are proposing that actual rates be compared against a comparator in order to tests deviation from expected. Can you calculate a few more things

1) sum up probs for all pregnancies in each year by model as this would be expected number of CS and an estimate as to overall under or over estimate

2) Can you break down data in 2014 to determine the distribution of the difference between actual and observed CS rates. We could then use this to determine how large a difference we aim to detect in CUSUM. Alternative is to keep it simple and plot chart something like one below which shows the cumulative number of actual and expected number of CS by consecutive block with an estimate for the 95% CI of the estimate using n* (p+/- 1.96*SQRT(p(1-p)/n)) where p is the proportion of expected CS .

The following table, 2.1. attempts to answer 1). Please note that 95% confidence intervals are calculated as mean ±1.96SE, and SE=sqrt(p(1-p)/n), and p = number of caesarean section /total births
20142014
Number of births in the study60386038
Number of caesaren sections11451220
Caesarean section rate(95%CI)0.190(0.180 to 0.199)0.202(0.192 to 0.212)
Mean naïve Bayes Probability(95%CI)0.185(0.175 to 0.195)0.189(0.179 to 0.199)
Mean logistic regression probability(95%CI)0.183(0.173 0.193)0.177(0.167 to 0.187)

Table 2.2. tanslates table 2.1. into actual and estimated number of caesarean sections by multiplying probability to the number of births
20142014
Actual number of Caesarean sections(95%CI)1145(1085.1 to 1204.5)1120(1158.8 to 1281.2)
Naïve Bayes estimated number of Caesarean sections(95%CI)1117.0(1057.9 to 1176.2)1141.2(1081.6 to 1200.8)
Logistic regression estimated number of Caesarean sections(95%CI)1105.0(1046/1 to 1163.8)1068.7(1010.6 to 1126.9)

Table 2.3. answers 2), and shows the difference (actual-expected) in each group

GroupnAcutualnExpected
Bayes
Actual - Expected
03530.05.0
13131.1-0.1
23429.94.1
32729.1-2.1
42827.30.7
52930.8-1.8
63530.54.5
72826.41.6
82222.8-0.8
93632.73.3
102927.91.1
112227.5-5.5
123126.64.4
131926.1-7.1
142128.9-7.9
152727.6-0.6
162729.4-2.4
172827.50.5
182225.5-3.5
194136.24.8
203933.25.8
213835.82.2
222626.5-0.5
233328.05.0
241721.0-4.0
252325.7-2.7
262425.4-1.4
273234.6-2.6
281223.1-11.1
293735.41.6
302626.1-0.1
312529.4-4.4
322526.4-1.4
333227.34.7
342531.7-6.7
353836.71.3
363029.80.2
372126.7-5.7
383431.82.2
392828.9-0.9

Summary of the difference are :

n = 40, minimum = -22.0, maximum = 15.5
mean = -0.8, Standard Deviation = 11.1, Standard Error of Mean = 1.8
95% CI of group means = -22.6 to 20.9, 95% CI of yearly mean = -4.3 to 2.7
Skewness = -0.4346, Standard error = 0.3873, 95% CI = -1.1937 to 0.3246
Kurtosis = -1.1199, Standard error = 0.7746, 95% CI = -2.6381 to 0.3983
Median = -0.5
Median-Mean = -0.3, Standard error = 1.8, 95% CI = -3.9 to 3.3

The mean difference is 0.8, less than 1, and the 95% confidence interval for the mean is -4.3 to 2.7 out of groups of 150, or -2.9% to 1.8%.

The plot of the differences in each group from 2014 are shown in fig 2.1

I understand your approach to determine what difference we should try to detect. It is based on laboratory protocols. Firstly you determine the normal range, then you define the abnormal as anything outside the normal range. However, auditing is a different situation, and such an approach have problems.

Firstly these figures are precision figures. They are useful to determine the normal range in a measurement, or in a research setting, to determine the critical difference and sample size of your research model thay are likely to yield a statistically significant result. The approach you suggested is however not appropriate to determine the difference to detect in an audit, as this is a clinical and not a statistical decision, based on why the audit is done and the difference to detect is one that indicates a need for investigation and remedy

Secondly, statistically setting a critical difference (effect size) to detect presupposes that this effect size and the inter-relationship between all variables in the model are invariant. The reason for doing an audit is because we suspect things are not invariant, and when changes come we want to know.

Thirdly, choosing a difference in this manner is too specific, as it is based on the binomial distribution with a caesarean section rate of around 19% and a group size of 150. As you know, variance (standard error and hence 95% confidence interval) in binomial distribution is highly sensitive to the proportion being study and to the sample size. This means that the results we obtained would be difficult to generalize.

Fouthly, in an audit, we are not trying to detect a group that is a statistical outlyer, but to detect a changing trend. this means that a single or even a small cluster of significant departures are not meaningful. To detect a changing trend means to detect very small but persistent changes as quickly and surely as possible. The difference we are trying to detect is not that one or a small cluster of groups are significantly different to the expected, but that there is a persistent change away from the expected, even very small ones.

Fifthly, your thinking is based on hypothesis testing based on Type I and II errors. These probabilities are based on a single set of observations, and interpreting repeated sets requires either a Bonferoni's correction or more complex meta-analysis type of analysis. An audit is really a time series study, representing frequent and repeated sets of observations where the sets are likely to be auto-correlated. Our data are actually 40 such sets of 150 cases in each set. The probabilities involved are therefore much more complex than that in a single set of data.

Section 3 : My preferences and suggestions

I am learning a lot and enjoying myself enormously looking at the data one way and another, and debating with you on the pros and cons and how to proceed. I feel that, at my stage of life, it is a real privilege to be able to work with someone like you, and having such detailed and relaxed discussions. I therefore do not feel strongly about any aspect of this project. If you feel strongly about anything, I will go along even if I disagree with you (after a bit of argument first). This of course applies to our current discussions.

However, I am assuming that you are equally relaxed and tolerant to my opinions, and in response to your email I would like to suggest the following for you to consider.

Using groups of 150 is OK as it is large enough. Can I recommend however that we use groups of 116 instead, as this has two advantages. Firstly this gives us 52 instead of 40 groups a year, providing greater sensitivity in an audit while the group standard deviations would not be all that different to 150 at this level. Secondly, it is intuitively easier to understand, as this represent auditing at a weekly interval, when all the variations in the workload are evened out, and the audit fits in with the routines of clinical supervision

I recommend that we avoid the binomial distribution because of its sensitivity to the sample size used and the proportion being considered. We should create indeces, such as departure from benchmark (group caesaresn section rate - 0.19), changing risks (group mean probability estimates - 0.19), and caesarean section rate in excess of risks (group caesarean section rate - group mean probability estimates). We should then test these indecis against normal distribution as I have done in Report 1b, and once demonstrated that we can assume them to be normally distributed, use them as such. This would give us flexibility in using the data and have a greater range of statistical procedures available to us. The normal distribution is also more familar to most clinicians so our paper would be easier to understand. We will have less lengthy explanations.

I would like to suggest that we abandon using statistical criteria to determine the change we wish to detect. This approach is cumbersome, difficult to understand by the uninitiated, and requires a lot of explanation. Can I suggest that we approach it clinically, and decide that we wish to detect a change of 1%, and in a reasonably short time (a few months) if it happens. Caesarean sections are continuing to increase, by about 1% a year. Obstetricians are obsessed about the percentage and not actual numbers as percentage is corrected for different sample size so it is easier to compare. A change of 1% within a year is therefore meaningful to most clinicians and quality controllers. However a difference of 1% is so small and variations in caesarean section rates are so great that normal inferential statistics would have difficulty to detect it, and quality controllers will need some auditing approach such as CUSUM to detect it, one of the points we try to make in our paper.

I still think we should use CUSUM for continuous monitoring as this is the standard in quality control, and also the results I have produced in Report 1b looked good. However, if you do not like it we can explore an alternative I think you are more familiar with. We can do exponentially weighted moving averaging (EWMA). This is easier to do, have less theoretical statistic presumptions, and because it is used extensively in the stock market, it is more familar to and intuitively easier to understand by the clinicians. I recalled that you used EWMA to even out the electronic signals in fetal ECG, and tried in vain to make me understand it. I have now understood it and have provided programs and explanations on the department's web site in case you want to look at it. If you want to explore this (even if you have not decided to use it) please let me know and I will put our data on it. The url are http://www.obg.cuhk.edu.hk/ResearchSupport/StatTools/EWMA_Pgm.php for the program and http://www.obg.cuhk.edu.hk/ResearchSupport/StatTools/EWMA_Exp.php for my explanations.

Section 4 : Groups of 116

Table 4.1. contains 52 groups of 116 consecutive births from 2014, with group mean caesarean section rates and group mean Bayes probability. If you decide to accept my recommendations and use it, I can provide group means for logistic regression as well, and for both 2014 and 2015.

orderGroup Caesarean
Section_Rate
Group Mean
Bayes_Probability
Caesarean
Section N
Estimated
Bayes N
Difference
CS P -Bayes P
Difference
CS N-Bayes N
00.23280.18462721.40.04825.6
10.21550.21892525.4-0.0034-0.4
20.250.23762927.60.01241.4
30.18970.16622219.30.02352.7
40.18970.21052224.4-0.0208-2.4
50.16380.1511917.50.01281.5
60.17240.20722024.0-0.0348-4.0
70.24140.21432824.90.02713.1
80.23280.21362724.80.01922.2
90.15520.15251817.70.00270.3
100.14660.15321717.8-0.0066-0.8
110.20690.18642421.60.02052.4
120.27590.243227.80.03594.2
130.16380.17351920.1-0.0097-1.1
140.14660.16961719.7-0.023-2.7
150.17240.16152018.70.01091.3
160.18970.17682220.50.01291.5
170.14660.2021723.4-0.0554-6.4
180.12070.18921421.9-0.0685-7.9
190.19830.18912321.90.00921.1
200.13790.18261621.2-0.0447-5.2
210.23280.20852724.20.02432.8
220.15520.17571820.4-0.0205-2.4
230.16380.17661920.5-0.0128-1.5
240.1810.21112124.5-0.0301-3.5
250.29310.21593425.00.07729.0
260.250.23842927.70.01161.3
270.250.22052925.60.02953.4
280.18970.19712222.9-0.0074-0.9
290.250.232926.70.022.3
300.17240.14012016.30.03233.7
310.11210.14961317.4-0.0375-4.4
320.18970.16372219.00.0263.0
330.13790.17821620.7-0.0403-4.7
340.15520.18281821.2-0.0276-3.2
350.21550.21442524.90.00110.1
360.08620.15911018.5-0.0729-8.5
370.16380.20931924.3-0.0455-5.3
380.24140.24142828.000.0
390.20690.17752420.60.02943.4
400.15520.17671820.5-0.0215-2.5
410.15520.18271821.2-0.0275-3.2
420.19830.18722321.70.01111.3
430.18970.17632220.50.01341.6
440.17240.19332022.4-0.0209-2.4
450.23280.2682731.1-0.0352-4.1
460.21550.20722524.00.00831.0
470.19830.19152322.20.00680.8
480.10340.1581218.3-0.0546-6.3
490.28450.24173328.00.04285.0
500.18970.2182225.3-0.0283-3.3
510.1810.16342119.00.01762.0

Fig.4.1. is a plot of difference between group caesarean section rate and group mean Bayes probability for 2014, using gtoups of 150 birthe. Fig.4.2. is the same plot, but used groups of 116 births. The two have the same scale for comparison.

Fig.4.3. have the two plots superimposed on top of each other, red for groups of 150 and blue for groups of 116.

It can be seen here that using rate rather than count, comparisons are more flexible as sample size is already corrected for.

The greater number of smaller groups, block 116, does have greater variations, but the patterns from the two size groups are very similar.

 

 

 

Section 5 : Summary and concluding remarks

I can now understand what you are trying to do. I am happy to go along with your decisions but have some alternative suggestions for you to consider. I hope that I have successfully produce the answers to your question, and provided some feedback for you to consider. I am happy to further debate or discuss any issues arising, and more than happy to do more analysis or graphics work if you wish me ro do so.