Sunday, 16 June, 2024
HomeResearch IssuesSA researchers find limitations in Discovery’s mortality rates study model

SA researchers find limitations in Discovery’s mortality rates study model

The design of Discovery Health's 2019 risk adjustment model to determine standardised mortality rates across South African private hospital systems, with the aim of contributing towards quality improvement in the private healthcare sector, could have been improved, say critics.

The model used by Discovery, write RN Rodseth, D Smith, C Maslo, A Laubscher and L Thabane in the South African Medical Journal, suffers from limitations in design and its reliance on administrative data is undermined by shortcomings in reporting. When designing a risk prediction model, patient-proximate variables with a sound theoretical or proven association with the outcome of interest should be used.

The addition of key condition-specific clinical data points at the time of hospital admission will dramatically improve model performance.

Performance could be further improved by using summary risk prediction scores like the EUROSCORE II for coronary artery bypass graft surgery or the GRACE risk score for acute coronary syndrome.

The authors write that model reporting should conform to published reporting standards, and attempts made to test model validity by using sensitivity analyses: additionally, the limitations of machine learning prediction models should be understood, and developed, evaluated and reported.

This critical analysis seeks to contribute towards improving the methodology, reporting and transparency of such risk adjustment models, and to widen discussion on the strengths and limitations of risk adjustment models based on service claims data.

Discovery Health’s article, published in the South African Medical Journal, describing its use of service claims data to determine standardised mortality rates, across hospital systems, for specific clinical conditions (i.e. acute myocardial infarction, coronary artery bypass graft (CABG) surgery, pneumonia and acute stroke), aimed to transparently examine variations in care across hospitals to “‘support improvement .. in the reduction of preventable deaths associated with acute inpatient care”.

This publication was the first of its kind in SA and an important step towards driving quality improvement in the private healthcare sector. But as more private sector medical funders explore their use, it is important the quality of the models be improved.

Background

Risk stratification and prediction are integral to clinical medicine: risk stratification and benchmarking can help evaluate the health outcomes of patients, clinicians, hospitals, systems or even countries, and becomes a powerful tool in improving healthcare quality. For clinicians, it’s useful in directing further patient investigation and treatment, and in providing a framework against which clinical outcomes can be measured. Was the patient’s death expected? Is my rate of heart failure readmissions comparable to that of my peers?

It also allows patients to make informed decisions about possible treatment options. Is the 0.5% chance of dying during the placement of my endovascular stent outweighed by the 5% chance of having my aortic aneurysm rupture during the next year?

Population-derived scores should generally not be used to assign individual risk but to stratify patients into risk categories.

A prediction model using a “history of coronary heart disease” as a risk factor to predict death from an acute myocardial infarction (AMI) will always be inferior to a model using “current admission to hospital for AMI” as a risk factor.

However, risk factors capturing the degree of end-organ damage from the current AMI, like N-terminal B-type natriuretic peptide or troponin elevations, or the use of inotropes during admission, are more powerful and accurate predictors than admission to hospital alone.

Similarly, in a patient with cardiac failure, an echocardiogram at the time of hospital admission has greater predictive value than one done a month before admission.

Closer patient proximity will generally dramatically improve the predictive ability of a risk factor. To build an accurate risk model, it’s important to explore as many patient-proximate risk factors as possible. Candidate risk predictors should also be determined by their clinical relevance. There should be a logical rationale, based on prior evidence or on theory, for choosing potential predictive factors on which to base risk models. These factors should then be statistically tested, with insignificant variables being discarded. Risk models should also avoid redundancy and aim for parsimony in the variables used.

These principles highlight the problems inherent to using administrative data for the prediction or risk stratification of patient outcomes. Administrative data are not as patient proximate as clinical data, and generally can’t capture the severity of a patient’s clinical condition accurately. Risk models using only administrative data underperform risk models based on clinical data alone, as well as risk models combining administrative and clinical data.

Shortcomings in reporting

Discovery Health’s aim of facilitating transparency is undermined by its reporting shortcomings, not providing the rationale behind the choice of variables to include into the risk models, nor reporting the number of DRGs used, or the range of severity represented by the Truven disease staging groupers. The article does not identify the software used to conduct these analyses, or the type of GBM model used, the data assumptions used in modelling the data, the choice of model hyperparameters, or the approach to hyperparameter tuning. The article fails to report any performance metrics for any of the models.

Various options can be used to report prediction model performance: F1 statistics for precision-recall graphs, C statistics, positive predictive value, negative predictive value, accuracy, area under the receiver operating characteristic curve, or variable importance plots. The need for such reporting is highlighted by the pneumonia precision-recall graph, which visually seems to perform worse than the other models.

Without the performance metrics, these models cannot be compared meaningfully. Results should be reported with 95% confidence intervals so the reader can understand their precision.

Furthermore, no formal test results are presented of the comparison between the derivation and validation models.

The article does not report any statistics to quantify the performance of the GBM model against the GLM. No calibrations between actual and predicted model performances are reported.

Towards building better risk prediction models

Risk adjustment models should avoid redundancy, and include candidate risk factors with a proven association with the outcome or with a strong basis in theory. When using administrative data as a source for these factors, the first step is to include patient age and sex.

After this, factors such as the number of chronic conditions with which the patient has been diagnosed, and the severity of these conditions, are added. In administrative models, ICD-10 codes are used to identify comorbidities and the primary and secondary admission diagnoses. As there are more than 68 000 ICD-10 codes, it is impractical to use them directly in risk models. ICD-10 codes are further limited because they do not assign a clinical risk weighting to a diagnosis.

A diagnosis of metastatic cancer is associated with a much higher chance of death than a diagnosis of an ingrown toenail, but ICD-10 codes do not capture this difference in risk. To adjust for these shortcomings, risk assignment tools have been developed.

These tools group key patient comorbidities into clusters, weighing diagnoses associated with a higher risk of death. The Charlson Comorbidity Index assigns risk points for 17-19 comorbidities to determine a patient’s estimated 10-year chance of survival. Similarly, the Elixhauser Comorbidity Index uses 30 comorbidities to predict one-year mortality. Thus, the diagnosis of any severe liver disease will contribute three points to the Charlson risk score, while uncomplicated diabetes mellitus will contribute one point.

The Elixhauser Comorbidity Index, used as a simple or weighted score, is generally shown to outperform the Charslon Comorbidity Index. Models should seek to include patient-proximate variables with a sound theoretical or proven association with the outcome of interest, including the reason for, and severity of, the acute admission. Adding key condition-specific clinical data points at the time of admission add significant value to any administrative risk model.

Ideally, these would be summary risk prediction scores like the EUROSCORE II for CABG surgery or the GRACE risk score for acute coronary syndrome (ACS). Alternatively, standardised baseline clinical data (e.g. heart rate, systolic blood pressure, electrocardiograph characteristics) or general scores like the sequential organ failure assessment score could be used. Engaging with physician societies to identify key variables or risk scores to be included in minimal clinical data sets would be valuable and contribute towards meaningful and transparent outcome reporting.

Structural differences (resource availability, geographical location) between hospitals are an important driver of patient mortality, and should be reflected either in model development or in result reporting.

Reporting risk prediction models

Reporting of clinical risk models should conform to published reporting standards. For machine-based learning models, the following are proposed: Transparent Reporting of a Multivariable Prediction Model for Individual Prognosis or Diagnosis (TRIPOD), Minimum Information for Medical AI Reporting (MINIMAR), and Recommendations for Reporting Machine Learning Analyses in Clinical Research.

The limitations of clinical risk adjustment models developed using administrative data should be clearly understood, as should the limitations of using machine learning prediction models in clinical medicine.

Testing risk prediction models

A sensitivity analysis tests the robustness of a model by conducting analyses under a plausible but different set of assumptions about the primary modelling process. In a GBM modelling process, as used in the Discovery Health model, rerunning an analysis excluding one of the chronic risk variable sets will therefore inform the predictive value of that variable, while also testing model robustness. Similarly, the impact of adding a variable representing hospital complexity or geographical distribution could be tested in the model.

 

SA Medical Journal article – A critical analysis of Discovery Health’s claims-based risk adjustment of mortality rates in South African private sector hospitals (Creative Commons Licence)

 

See more from MedicalBrief archives:


 

Discovery’s trend analysis: Leading causes of death of insured clients

 

Lockdown’s terrible damage to South African healthcare

 

Discovery publishes data on claims trends

 

SA cancer rates set to double by 2030, actuaries predict

 

Predictive computer models to optimise HIV therapy

 

 

 

 

 

 

MedicalBrief — our free weekly e-newsletter

We'd appreciate as much information as possible, however only an email address is required.