I'm a Good Surgeon...or am I? (Part One)

When it comes to quality reporting metrics, scorecards, and physician performance ratings, who watches the watchers?

Jan 29, 2023

Talk of value-based care is everywhere. You can’t discuss healthcare reform or improving the quality and outcomes of treatment without hearing about VBC. As the argument goes, shifting the focus from fee-for-service (FFS) to pay-for-performance shifts incentives. No longer will doctors be rewarded for simply delivering more care, they now have to consider the appropriateness and effectiveness of that care. Despite the intense focus on VBC, efforts have gotten off to a slow and uneven start. Adoption of VBC programs has not been widespread with most care still delivered under a FFS model. Existing value-based programs like CJR, BPCI, and various ACOs have consistently failed to show significant cost savings or quality improvements.

Despite these false starts, it is widely accepted that some form of VBC is here to stay. CMS recently re-iterated its commitment to value/quality with the ambitious goal of shifting all Medicare FFS and most Medicaid beneficiaries to accountable care by 2030. Similarly, as employers continue to struggle with rising costs of care, multiple solutions have sprung up to help them control healthcare spend. Navigation services like Carrum Health and Transcarent help employers steer their employees to (ostensibly) high-quality, cost-effective providers who practice evidence-based medicine. Walmart and other large corporations have embraced the “Centers of Excellence” concept, forming relationships with high-value institutions and incentivizing their employees to seek treatment there (all while disincentivizing them from getting costly care elsewhere). Musculoskeletal care, a significant cost-driver for the government, commercial insures, and employers alike, is a frequent area of focus. MSK disorders are common, treating them is expensive, and (as the argument goes) much of that treatment is low-value, non-evidence based, and of highly variable quality. As a result, a number of MSK-focused digital health companies including Hinge Health, Sword Health, Kaia Health, Vori Health, and SpineZone/Livara have entered the fray with the goal of Avoiding Unnecessary Treatment^TM.

The key to improving quality and outcomes is data, data, data. Data has been called the “new oil,” and in no field is this truer than healthcare. As Big Tech continues to flounder its way through healthcare, trying desperately to figure out its value proposition, data remains an area of opportunity. Through cloud computing, data collection and analysis, and the application of artificial intelligence (AI) and machine learning (ML), leveraging data may be the “killer app” to make VBC more viable and effective. Beyond Big Tech, a number of companies have arisen to serve as the front end of healthcare analytics, providing quality-rankings, cost data, and Actionable Insights^TM to steer patients towards value-focused healthcare. Companies such as Embold Health, Clarify, and Turquoise offer platforms that use AI/ML and data analysis to identify and rank high-performing providers. It should be noted that insurance companies have long had their own internal methods of tiering healthcare providers based on costs and quality, and CMS provides its own Care Compare tool.

Suffice it to say, if you are a healthcare provider, you would be wise to take notice of these efforts to collect data and rank the quality and cost-effectiveness of the care you provide. Being a passive or blissfully ignorant of this trend is no longer an option. That said, we have yet to reconcile the inaccuracy and incompleteness of healthcare data with efforts to properly rate care. The limitations of EHRs as a source of good, clean data are well documented and using claims data to make conclusions about the quality of care has its limitations too. Garbage in, garbage out. I am absolutely a believer in the need for high quality, high value, cost-effective, evidence-based healthcare (and have written about such many times). I also believe that our best way to get there is through careful and accurate data collection and analysis. What I don’t believe is that we’ve gotten there yet.

As a surgeon who’s passionate about healthcare delivery innovation, I’ve long been fascinated by efforts to drive value through data. (Shameless plug, I’ve written about my vision for data-driven, high value MSK care here and the limitations of the Center of Excellence concept here). It recently came to my attention that one of the companies uses data to rank physician care quality offers a free report to doctors upon request. Curious, I filled out the form to receive my report. What I found was both interesting and concerning. While I support these efforts to provide better decision-making tools to patients and employers, I’m not quite sure they’re achieving the intended goal. What follows is my own analysis of this report and its limitations (Part Two will detail thoughts on how to improve this process).

First, a few caveats. I’m not naming the company because the intent here is not to be hypercritical or to serve as an attack. The company should be applauded for providing this information transparently and free of charge upon request. Secondly, I cannot speak to whether this report is representative of what patients, employers, or insurers access as well. In the name of fairness, I realize that these efforts are still in their relative infancy and are only as good as the data upon which they are based. More thoughts on that in Part Two. Despite some digging, it was difficult to ascertain the exact methodology used in devising the rankings or the exact sources of the data. This particular company appears to use a third-party data vendor, and I assume the secret sauce of how they come up with their scores is proprietary. To their credit, the company does provide a link to a PDF file that lists the evidence upon which their metrics are based. That said, the list of studies is pretty short (2-3 per metric) and perhaps a bit curated to fit the scoring methodology.

Finally, a few words about me for context. I’m a private practice Orthopedic Surgeon with fellowship training in hip and knee replacement. I’ve been in practice for over 14 years and am sub-specialty focused both with the patients I see in clinic and with my operative practice (which is over 95% hip and knee replacements). My practice participates in the BPCI-A program (in which I performed well), I’m in Massachusetts with a pretty heavy ACO/PHO/IPA/Risk-sharing presence, and I practice in the community (suburbs of Boston). While I’m a believer in patient optimization and use some soft and hard stops for surgery, I try hard to avoid cherry-picking or lemon-dropping patients. I have a mature practice and, for what it’s worth, have a couple of awards for patient care, was recently named a Castle Connolly Top Doctor for 2023, and have built what I feel is a successful practice with strong referral patterns and robust word-of-mouth (admittedly all subjective measures). With those stipulations in mind, here’s how I did on this particular company’s rating scale.

Cutting to the chase, my overall performance score was 68 out of 100 (with 50 being market average). Not terrible (comfortably above average), but not great (I don’t consider “comfortably average” to be a career goal). The overall score is based on an adjusted average of scores from 11 categories broken into two domains for appropriateness (A) and effectiveness (E).

Let’s dig into the categories:

1. “Arthroscopy overuse in patients with new osteoarthritis”

I scored a perfect 100 here. The metric was likely chosen given pretty strong evidence that arthroscopic surgery is of little to no value in the treatment of knee arthritis. There’s also increasing evidence that prior arthroscopy can increase the risk of complications when knee replacement is eventually required. I like this metric both as an appropriate indication of evidence-based medicine and as a high-value focus for patients and employers/payers. Arthroscopy for knee arthritis was included as part of the Choosing Wisely campaign, and its routine use in cases of arthritis is not recommended by the American Academy of Orthopaedic Surgeons. In practice, this is often a challenging conversation to have with patients as arthroscopy is a much less invasive surgery with a quicker recovery than knee replacement. There are also many patients who fall into a gray area -- they have some arthritis that hasn’t responded to non-surgical treatment but not enough arthritis to warrant replacement. There are also isolated cases where knee arthroscopy might make sense. But these are few and far between.

2. “Hip of knee replacement within 1 year of new osteoarthritis diagnosis”

Again, did well here scoring a 99, but this is a bit of an odd one. I suppose the rationale is that a surgeon shouldn’t be quick to offer joint replacement without trying a year of non-surgical management. The problem here is that this is a vague and blanket metric lacking specificity and nuance. While many patients present with early symptoms of arthritis, mild limitations, and mild radiographic findings, others suffer mightily for years before seeking evaluation or treatment. In short, timing of diagnosis does not necessarily correlate with severity of arthritis or functional limitation. It can be appropriate, indicated, and evidence-based to offer replacement surgery within a year to someone who’s presenting for the first time with severe arthritis. In fact, waiting too long for surgery can lead to worse outcomes by unnecessarily delaying definitive treatment. I suppose this metric may help to filter out surgeons who are overly aggressive in offering surgery. But as a value-add to patients, I’m not sure it’s granular enough to guide decision-making. Perhaps employers/payers see some financial benefit in steering patients to more conservative surgeons – but that’s not always what’s best for the patient.

3. “Opioid prescribing within 28 days in patients with new joint pain”

Another sensible metric with good value for patients, employers, and payers. The opioid epidemic is well documented at this point. Narcotic pain medications are not indicated for the routine treatment of chronic hip and knee arthritis pain. Studies have also shown that patients who take opioids prior to joint replacement surgery are at increased risk of complications and poor outcomes. Not much else to say here.

4. “MRI within 4 months prior to hip and knee replacement”

Arthritis is primarily an x-ray diagnosis. Unfortunately, MRI scans have taken on an almost mythic quality in the diagnosis and treatment of musculoskeletal disorders. Patients diagnosed with arthritis (an irreversible condition) often wonder if an MRI would show some other, more easily addressed source of their pain. However, MRI is rarely indicated in cases where the presence of arthritis is clear on X-rays, and advanced imaging is not required for surgical planning in most cases. That said, I’m not sure how big of a value-add this metric is for patients. There are certain cases where an MRI might be indicated prior to joint replacement such as avascular necrosis of the hip, tibial stress fractures, osteonecrosis of the knee, or evaluation of abnormal x-ray findings. Does ordering an MRI prior to joint replacement make me a lower quality doctor? Maybe. Certainly, low yield scans should ideally be avoided. But this seems like an “all or none” metric that ignores appropriate use in edge cases and lacks nuance. I did fine here, scoring a 90.

5. “SNF admission after hip or knee replacement”

A reasonable metric. SNF admissions are one of the biggest cost-drivers associated with an episode of joint replacement care. Not only that, but they have also been associated with a higher risk of readmission and complications. Studies show that most patients can be safely discharged directly home following hip or knee replacement surgery. The significant trend towards reduction of SNF admissions after hip or knee replacement represents a major departure from historical practices. But, by most measures, this is to the benefit of patients, employers, payers, and surgeons. Still, there are some cases where direct discharge home isn’t the best option depending on the patient’s overall health status and social support system. This number can and should be reduced as much as possible but will probably never reach zero. When I started practicing, most patients expected discharge to a “rehab” facility following hip or knee replacement. That thinking has largely changed, and most patients are now receptive to going straight home. That’s a good thing and a trend I support — I scored 83 here.

6. “Surgical revision after hip or knee replacement”

An important metric that is reflective of both quality and costs. Patients, employers, and payers all benefit when surgical revision rates are low. To qualify for this metric, revision surgery had to take place within 90 days of the index procedure – a somewhat arbitrary but generally accepted timeframe. Revisions are costly and carry a high rate of morbidity and mortality. High value, high quality (and, in most cases, high volume) surgeons are able to reduce the need for surgical revision through experience and expertise. The one quarrel here might be that the data probably lacks risk factor adjustment and may punish those taking on higher risk patients while rewarding those who cherry pick. I scored a respectable 78 on this metric which is fine except when you consider how this number was reached (more on this in Part Two).

7. “Overly frequent use of preoperative stress testing”

Another odd one. Preoperative medical clearance is still considered a routine part of appropriate preparation for hip and knee replacement surgery. This process has evolved in recent years to include optimization (addressing modifiable risk factors such as diabetes and smoking) to reduce complication rates and improve outcomes. Preop testing has evolved to eliminate low value tests such as routine urinalysis and chest x-rays which are considered low yield in most cases. But, as an Orthopedic Surgeon, I defer to the medical doctors when it comes to the need for cardiac clearance and preoperative stress testing. If the cardiologist, primary care doctor, or anesthesiologist tells me a stress test is indicated, I trust their judegement. While I do believe preoperative stress tests are probably overused, I don’t feel qualified to make the decision as to whether or not they’re indicated. No test is without its risks, and overuse of stress testing can be costly. But I’m not sure this is a fair or useful metric upon which to rank or rate an Orthopedic Surgeon. Nor do I think it’s a particularly useful metric for patients seeking quality musculoskeletal care. In any event, I scored a 75 here.

8. “MRI in the first year after diagnosis of hip or knee pain”

See metric 4. Again, MRI is probably an overused test with low value add in many situations. But there are plenty of situations where an MRI is appropriate or indicated within the first year of diagnosing hip or knee pain. This is another metric that lacks enough granularity to make determinations about appropriateness or over testing. Perhaps this metric is meant to be viewed in relation to market average. But it’s not entirely clear if this is an absolute or relative measure. I did OK here – right at my overall performance score of 68 – so it didn’t pull me up or down. Do patients care how many MRIs a doctor orders within the first year of hip or knee pain diagnosis? Probably not (some might even view it as withholding treatment or dismissing symptoms). Payers and employers benefit from a more conservative approach here. I try to be rational and evidence-based when it comes to ordering MRIs. However, rules about when an MRI should be ordered aren’t hard and fast. Clinical judgement should count for something.

9. “Complication rate after hip or knee replacement”

Certainly, an important and reasonable metric for all parties when it comes to delivering value and high-quality care. Again, complications after joint replacement can be costly with high morbidity and increased mortality rates. For clarity, complications here were defined as: mechanical complication within 1 year (dislocation, loosening, fracture), AMI (heart attack) within 7 days, pneumonia within 7 days, sepsis within 7 days, bleeding within 30 days, PE (blood clot in the lungs) within 30 days, infection within 90 days, ED visit within 30 days of discharge, readmission within 30 days. It’s a comprehensive list, but I think a fair one. In the past, surgeons may not have felt responsible for medical complications, but that mindset doesn’t fly in a value-based system. As a result, more attention has been paid to optimizing patients for surgery which is a good thing. My disappointing score of 59 is slightly above average but perhaps misleading. More to come on this in Part Two.

10. “PT in the first 4 months of new hip or knee pain”

Another somewhat random metric. It’s not that I don’t believe in the effectiveness of physical therapy — it’s just that I’m not sure this metric necessarily reflects quality or cost-effectiveness of treatment. I understand the emphasis on non-surgical treatment, and the supposition that lack of ordering PT means you tend to favor more invasive, aggressive interventions. That may not be true. I typically offer PT as a treatment option for hip and knee conditions along with bracing, activity modification, NSAIDs/Tylenol, injections, and surgery (when indicated). But patients’ reaction to the idea of PT can be surprisingly variable. Some are big believers in PT and are eager to do a course of supervised exercises. Others feel they are already active enough such that PT is of little value. Some don’t feel they have time. Others worry about PT aggravating their symptoms. In the age of high deductible plans, PT can be cost prohibitive for some patients. Some want a list of exercises they can do on their own. Yet others are skeptical of PT’s ability to fix their problem – after all, it’s not going to cure their arthritis. I’m a believer in shared decision-making. If I think PT would be beneficial, I say so and explain why. I suppose I could push PT harder to satisfy this metric, but would I be treating the patient or myself? The truth is, many MSK conditions will improve with time and conservative treatment. PT is a part of our armamentarium, but it’s not the only effective non-surgical modality. I scored a 54 here but, without broader context, am unclear about what this actually says about the quality of care I deliver.

11. “PT within 4 months prior to hip or knee replacement”

Like metric 10, but in my opinion, even less indicative of quality care. This seems to come from the payer side of things as some won’t approve joint replacement unless the patient has failed 3 months of physical therapy. Again, nothing against PT. It can be very effective for many patients. But forcing a patient with a painful, arthritic joint to do exercises and delay definitive treatment seems not only unnecessary but, in some cases, cruel. This metric also fails to take into account patients who have had years of worsening pain prior to replacement and performed PT in the past (outside the arbitrary 4-month window). One additional consideration is the idea of “pre-hab” or doing physical therapy prior to joint replacement to make recovery easier. While some studies show potential benefit to pre-hab, others have shown little effect on outcomes or recover times. Unfortunately, I scored my lowest here (37) which dragged my total score down. Am lower quality doctor for not making more use of PT in the months prior to joint replacement? This metric would seem to say so.

So, there you have it, an interesting mix of metrics some of which seem spot on and others of which seem arbitrary and of questionable value to patients. Overall, the metrics seem to skew more towards the side of cost-savings for employers and payers and less toward Actionable Insights^TM for patients. What’s missing? A fair amount in my opinion. Nowhere are patient reported outcomes reflected. The patient might hate their hip or knee replacement, but as long as I send them to PT and they don’t have a complication, revision, or MRI, I’m going to score pretty high. There is a slew of validated outcomes measures in Orthopedics with the HOOS and KOOS scores being commonly used in joint replacement surgery. None of that here. Another curiosity with any of these value-based or quality scoring systems when it comes to joint replacement is how much emphasis is placed on short term outcomes. Full recovery from joint replacement takes at least a year and possibly as long as two years. Rarely do these measures extend past 90 days. Joint replacement surgery provides societal value that should be measured in years (maybe decades), not months. There’s also little hear that gets to the actual cost of care although there is a lot of dancing around it. Finally, the metrics don’t place any emphasis on how informative, engaging, receptive, or responsive the doctor is, nor do they take into account training or expertise (for instance, fellowship training in joint disorders). These factors would seem to be of great importance to patients.

In Part Two, I’ll discuss the second part of the report that goes into a touch more detail regarding how the performance score for each category was calculated. The methodology again leaves a bit to be desired and further speaks to the limitations of this type of doctor rating system. I’ll conclude by offering some thoughts on how we can do a better job using data to inform decisions about high-value, high-quality care. More to come.

Michael van Duren, MD, MBA

Jan 31, 2023

Ben, thank you for this thorough and thoughtful review of the report you received. I am one of the ones in the thick of creating these reports and trying to make them more useful. Your suggestions for how to make the measures more patient-centered and focused on longer term functional outcomes makes a lot of sense. If you know of ways to get that from claims data, I would be very interested.

Looking forward to part two of your musings. We need more clinicians like you who are trying to improve the measurement process, rather than rejecting it outright.