back to top
Wednesday, 30 April, 2025
HomeAnalysisShortcomings of new FDA pulse oximetry guidelines

Shortcomings of new FDA pulse oximetry guidelines

Updated guidance for pulse oximeters aimed at reducing disparities in device performance related to skin pigment is a great step forward, but without refinement, it may have several unintended negative consequences.

Writing in Jama Network, Michael Lipnick, Odinakachukwu Ehie, Elizabeth Igaga et al, note the first reports of oximeter performance problems in people with darker skin were published nearly 40 years ago, and the last FDA update to regulatory guidance was more than 10 years ago.

In its 2013 update, the FDA attempted to encourage performance safety for all skin pigments by recommending oximeters be verified during controlled laboratory studies of induced hypoxemia (oxygen saturations of 70%-100%) in 10 or more healthy adults, of whom at least two, or 15% of the study cohort (whichever was larger), had “darkly pigmented” skin.

Among the many limitations of the prior guidance were the small sample size, use of only healthy participants, limited number of participants with dark skin, and subjectivity in the interpretation of what “darkly pigmented” means.

Data during the pandemic linking oximeter performance with health disparities resulted in a surge of initiatives across many sectors – even legal action against device manufacturers and resellers for misrepresentation, as well as state legislation to prevent insurance coverage denials based on pulse oximeter peripheral oxygenation-saturation (Spo2) levels.

Although the issue of Spo2 bias will require a continued multi-faceted response, many stakeholders have been anticipating new guidance from the FDA as central to any solution.

Strengths of new guidance

The 2025 draft guidance recommends improving the diversity of study cohorts used to verify device performance by increasing the number of participants from 10 to 150 and increasing the proportion with dark skin from 15% to 25%. It also requires that at least 30% of data points be from individuals with dark skin.

An equally important recommendation aims to address the prior subjectivity of measuring darkly pigmented skin by adding spectrophotometry (i.e, devices that directly measure colour) to objectively measure pigment via a surrogate metric for melanin called individual typology angle (ITA).

However, the guidance still could have done more to improve diversity of skin pigment.

It also proposes several new and relatively more stringent statistical criteria to define an acceptable device for medical use. The first significantly increases stringency by requiring a root mean square error less than 3% with the new addition of a two-sided 95% CI (acceptance criterion 1).

Many devices that previously passed with point estimates of a root mean square error  of approximately 2.5% to 3% (or up to 3.5% for ear clip or reflective devices) will no longer meet acceptance criteria.

The guidance also adds a new metric called non-disparate performance, which evaluates whether the oximeter performs differently in participants with light vs dark skin as assessed with both a standardised subjective method (Monk Skin Tone Scale [MST]) at the forehead (acceptance criterion 2) and an objective method (ITA) at the oximeter site (acceptance criterion 3).

To pass, an oximeter must demonstrate minimal bias (<3.5% for arterial oxygen saturation 70%-85% and <1.5% for arterial oxygen saturation 85%-100%) when light and dark MST groups are compared or when bias is calculated across a modelled ITA range of 100 (ie, effectively comparing the lightest skin theoretically possible with the darkest skin theoretically possible).

The new guidance includes several more nuanced but important recommendations, such as clearly reporting enrolment details for verification studies to prevent selective reporting of data only from participants for whom devices perform exceptionally well, which is important because manufacturers can oversee their own testing studies (ie, there is no requirement that data be independently collected).

The updated guidance also recommends that paediatric devices provide clinical performance data from children aged 12 or younger. Only neonatal data were previously recommended.

Weaknesses of the guidance

The substantial increase in recommended verification study size is the biggest change and potentially the most problematic. Although there is broad consensus that the previous sample size of 10 was inadequate, there is no consensus on optimal sample size.

Sample size determination depends on performance characteristics of each oximeter model, and thus because performance varies widely among devices, the optimal sample size for each device is unique.

The proposed study size of 150 could have four unintended consequences:

1. the oximeter market may shrink considerably because likely only one or two manufacturers can conduct these studies soon due to study cost and limited laboratory testing capacity worldwide;

2. dominant manufacturers may prioritise testing new models (over resubmitting 510[k] clearances for devices already on the market), thereby forcing costly, large-scale “upgrades”;

3. oximeter users may be falsely reassured that newly cleared devices are truly better and worth potential price increases; and

4. if oximeter prices increase, innovation slows, or access to devices decreases, then patient safety could be negatively affected.

An alternative and equally feasible approach to the FDA’s recommended one-size-fits-all strategy is an adaptive study design allowing for sample sizes fewer than 150, with appropriate justification. The new guidance does allude to such an approach, but needs more clarity, especially on the minimum recommended sample size when an adaptive approach is used.

More research is needed to refine and validate proposed approaches to sample size while prioritising elimination of disparate performance and also accounting for access to oximetry, a major global health challenge. Regardless of the sample size, oximeter performance in laboratory verification studies is not the same as performance in patients, and the new guidance does little to address this.

Manufacturers are still allowed to deploy methods that make some devices appear to perform better during verification testing than would be expected in actuality. A widely used practice involves warming participants’ extremities to artificially enhance perfusion and signal quality.

This practice may contribute to discrepancies in device performance not only between the laboratory and clinical settings but also between data from independent laboratory studies and data produced by manufacturers.

Some data suggest warming could even mask disparate performance across skin colours. Although the FDA partially acknowledged this problem by now recommending manufacturers disclose the range of pulsatility amplitude (a surrogate for perfusion) in study cohorts, not banning the practice of enhancing perfusion is a major missed opportunity.

The new guidance does not address stakeholder requests to require verification of oximeter performance for patients.

Such trials could be immensely useful but in reality face many challenges, including the rarity of stable hypoxemia in well-resourced clinical settings. Protocols for laboratory-based controlled desaturation studies could be optimised and standardised to better reflect clinical conditions, and in doing so, provide more useful data and a pragmatic alternative to clinical trials.

This scheme should be but has never been recommended by the FDA or the International Organisation for Standardisation, which publishes the global standard for pulse oximeters (ISO 80601-2-61), on which the FDA guidance relies heavily. Although the updated guidance improves diversity of skin pigment in verification study cohorts, it continues to rely too much on subjective methods.

It falls short of explicitly defining objective (e.g, ITA) cutoffs for light, medium, and dark pigmentation, something that has been called for by broad stakeholders.

Although nuanced, the draft guidance specifies an objective definition only for the darkest half of the “dark” cohort (i.e, >50% of MST participants with category H, I, or J having ITA less than −50). Thresholds for light, medium, and dark pigmentation remain defined exclusively by the subjective MST score.

Although the newly recommended MST has standardised colours and was designed for human skin (two advantages over some prior methods), its overemphasis on print optimisation is a distraction from the fundamental problem with any subjective scale: susceptibility to conscious and unconscious bias of investigators assigning colour categories to study participants.

The proposed guidance allows for a verification study cohort to be deemed adequately diverse by the MST, show no bias related to the MST, show no bias related to ITA, but in fact have only 12.5% of the study population with objectively dark skin pigment.

Adding objective (ITA) cutoffs to define light, medium, and dark pigment is easy and would harmonise with the approach being proposed by new standards of the International Organisation for Standardisation.

Despite a reworking, the statistical methods and passing criteria (e.g, root mean square error and non-disparate bias) remain non-intuitive and not immediately useful at the bedside.

Users, for example, may not appreciate that an oximeter with a passing root mean square error of 2.3% can still have clinically relevant error. Furthermore, the non-disparate bias method and its cutoffs for passing require validation and refinement because they were largely based on multiple assumptions using preliminary data from only one to two oximeters.

New recommendations on device labelling and a new reporting website are positive steps to address ambiguity in performance reporting but are likely insufficient.

Fundamentally, these communications still rely on imperfect data from verification studies that may not reflect clinical performance and thus may have limited relevance to the user. Additionally, clinicians are unlikely to see or read oximeter package inserts, let alone understand what a root mean square error is.

Without better communication strategies (e.g, standardised, plain-language performance summaries or devices better communicating uncertainty in real time), many users will continue to underappreciate uncertainty in Spo2 readings.

Finally, the new guidance addresses only medical pulse oximeters, and without a better labelling strategy, it will remain difficult for homecare users or users from resource-variable communities (who frequently encounter inexpensive, non–FDA-cleared devices) to know whether they have a 510(k)-cleared device.

Standardised labelling of FDA clearance directly on the oximeter, including date of clearance and a QR code linking to an FDA repository with performance information, would be better than more extensive package inserts.

Conclusions

Pulse oximeters are essential but imperfect devices and will remain so, despite updated regulatory frameworks. The new FDA draft guidance is a significant improvement and, with the refinements proposed earlier, possibly the best that can be done, given evolving data.

Without refinements, the guidance could have significant unintended consequences.

Stakeholders should not assume that the updates will necessarily translate into improved performance or equity in the oximeter market, and the FDA’s list of newly cleared devices for procurement decisions or incentives programmes should be used with caution.

New data (from multiple laboratory and clinical studies) are imminently anticipated, and according to what has been shared thus far, these data may generate more questions than answers.

There remains no consensus on exactly how skin interacts with oximeters, why only some oximeters appear to be affected by pigment, and why laboratory and clinical performance diverge.

These questions are among many key ones that must be answered.

The update is currently in draft form, and the final guidance will not be an FDA regulation (i.e, will not require manufacturers to do anything), and it is not legally enforceable, although entities that manufacture, sell, or use non-compliant devices may be open to liability.

The effect of the final guidance will depend on complementary efforts across sectors, including educating users on optimal Spo2 use in clinical decision-making (e.g, not withholding care based on absolute Spo2 cutoffs).

Clinicians should continue to familiarise themselves with pulse oximeter limitations, which are likely to improve but are not going away anytime soon.

Michael Lipnick, MD – University of California San Francisco, UCSF Hypoxia Research Laboratory ; Odinakachukwu Ehie, MD, – Department of Anaesthesia and Critical Care, Makerere University, Uganda; Department of Anaesthesia and Critical Care, Makerere University, Uganda.

 

JAMA Network article – Pulse Oximetry and Skin Pigmentation—New Guidance From the FDA (Open access)

 

See more from MedicalBrief archives:

 

AGs urge action from FDA on biased pulse oximeter technology

 

Race versus skin tone debate in resolving pulse oximeters' false readings

 

FDA wants rethink on oximeters and skin tone discrepancy

 

UK review calls for action on ‘biased’ medical devices

 

 

 

 

 

MedicalBrief — our free weekly e-newsletter

We'd appreciate as much information as possible, however only an email address is required.