CHARGE Signal Newsletter 07/02/2026

Levi Miller
2 days ago
17 min read

CHARGE Surge

CHARGE Surge is your home for continuous monitoring of the security and integrity of AI deployment. A surge is a sudden spike in voltage across a circuit. CHARGE Surge monitors voltage spikes across the healthcare circuit, providing the information you need to track AI security breaches, system failures, and critical emergent research.

SURGE: OpenEvidence lashes out at authors of Nature study which reported superior performance from frontier LLMs

OpenEvidence's clinical AI product is used by over 40% of U.S. physicians, according to its own estimates

Drama and accusations of impropriety in the world of agentic LLMs. After a study published in Nature claimed that on “500 MedQA questions testing medical knowledge, 500 HealthBench items measuring alignment with clinicians” and “100 de-identified queries from physicians [...] in a live clinical environment,” frontier LLMs like GPT 5.2 and Claude Opus 4.6 “outperformed clinical AI tools” like OpenEvidence “in all three evaluations,” OpenEvidence delivered an excoriating response.

In a strikingly accusatory LinkedIn post, OpenEvidence accused the Nature study authors of “coincidentally” (read: deliberately) publishing their paper after OpenEvidence refused to provide an API to power a “competing in-house medical AI” at NYU Langone Health. As a counterweight, OpenEvidence demonstrated that the frontier LLMs tested in Nature trained on MedQA and HealthBench questions with access to official answers, and that performance metrics were graded by AI models on “arbitrary/subjective stylistic choices.”

OpenEvidence is certainly correct that the structural limitations of the study should not be understated. Note, however, that both BenchMark and RCQ clinician (that is, human) evaluations scored frontier LLMs above OpenEvidence in terms of clarity. Those same clinicians rated those LLMs at least as well as clinical AI tools on knowledge, clinical correctness and safety metrics – exactly where clinical AI tools should theoretically outperform frontier LLMs. Far from disingenuous, the article’s authors recognized that the “models may have been exposed to MedQA or HealthBench during training” and that “industry-created benchmarks may systematically favor the systems developed by their creators.”

Bottom line: clinicians rated frontier LLM responses at least as well as specialized clinical AI tools. Accusatory politicking, however, doesn’t inspire confidence. According to OpenEvidence’s own estimate, over 40% of U.S. physicians use the AI for diagnostic inquiries. As OpenEvidence’s architecture is inaccessible, every physician deserves clear cooperative, unambiguous, and impartial research into its clinical accuracy. Proper evaluation should assess the real, post-deployment outcomes of integrated AI to accurately reflect clinical realities.

SURGE: Sentri7 failed to detect fentanyl diversions at a Tennessee hospital

Sentri7 failed to detect at least five instances of surgical fentanyl diversions

Wolters Kluwer’s Sentri7, an AI drug-monitoring software employed by over 700 hospitals, failed to detect five instances of fentanyl diverted by a nurse, John A. Stevenson, at Erlanger Baroness between March and June of 2025. Most concerningly, state and hospital officials only identified fentanyl diversions through a retrospective investigation and dedicated audit of Sentri7, which claims to monitor and flag 60 ‘attributions of [drug diversion] risk’ to hospital employees. The Sentri7 breach demonstrates a paradigm developing across the health care industry: pre-deployment scrutiny of even the most trusted AI systems is not enough.

Most hospitals have implemented AI governance committees which introduce guidelines and ethical regulations for AI deployment. Those committees build the internal regulatory mechanism to govern implementation of AI systems and manually audit AI performance. Such committees, however, don’t provide the continuous operational monitoring that AI technologies demand. Moreover, because AI technologies often operate as proprietary black boxes, hospital officials often misunderstand their machinic function, and errors can be difficult to scrutinize.

According to the Tennessee Board of Nursing, Stevenson testified that he retained “unused fentanyl that would have otherwise been wasted after surgical procedures.” That diversion was uncovered in the first place not because of a catalog of drug inventory or an intervention by the hospital’s Governance Committee, but because Stevenson demonstrated behavior consistent with drug impairment.

The Sentri7 failure was an unknown unknown: when AI systems monitor their own performance, their malfunction is often obscured. Chance human intervention detected the Sentri7 diversion incidentally, a phenomenon which cannot be expected to reproduce across scale. The stakes are high: as many as 15% of all healthcare workers are believed to have diverted drugs at least once in their careers, and, according to the CDC, drug diversion is responsible for at least 13 outbreaks since 1985. Rather than trusting the internal auditing of AI system vendors, hospitals should seek out methods which explain and provide continuous post-deployment monitoring of AI systems.

CHARGE Current

CHARGE Current is recurring editorial column that surveys AI development, deployment and performance across healthcare. Charge current is the rate of positive charge flow across a particular point in a circuit. Think of healthcare as the circuit, and CHARGE as your point of measurement for unique insights into AI governance and risk management.

CURRENT: Abridge's AI partnership with NVIDIA surfaces the risk of 'recursive AI' models

Dr. Shiv Rao, Abridge's founder and CEO. Abridge is evaluated at $5.3 billion

At a sweeping keynote address on June 11, Abridge announced a strategic partnership with NVIDIA’s Nemotron to train a frontier artificial intelligence model designed specifically for clinical conversations. The new AI model will ostensibly be tailor designed for Abridge’s new suite of expanded operations, which it claims provides support for physicians at all stages of service, by surfacing and generating pre-visit notes with clinical context, suggesting discussion topics during clinical discussion, and coordinating post-visit summaries, claim adjudication, and billing. By domain adapting earlier in the development cycle, Abridge frames its access to 100 million de-identified clinical conversations annually as the “foundation model” for a generation of AI which “reasons clinically from its foundation.” Abridge’s ambitious new posture as the neutral health infrastructure also surfaces deep risk: partnership with Nemotron converts erroneous AI-generated notes from medical artifacts to the “foundation,” as Abridge puts it, of new AI.

Abridge's health infrastructure gambit

Though NVIDIA qualifies that they are “not a healthcare company,” they are an investor in Abridge through their venture capital wing NVentures, and are confident that Abridge provides the domain specific “clinical conversation foundation model” needed to navigate all of the “complexity of healthcare and all of the workflows.” Crucially, Abridge is building its proprietary model on top of NVIDIA’s Blackwell AI infrastructure and using NVIDIA’s open source model Nemotron which provides users full access to model training data and weights.

Consider the context: in August, Epic – Abridge’s largest partner and former shareholder – announced its plans to launch its own clinical scribe and sold off its shares in Abridge. Last summer, Abridge announced a prior-authorization feature, later countered by UnitedHealth Group’s Optum. Also last fall, the group began its foray into support for clinical decision making. That NVIDIA is bullish on Abridge’s newest movement against entrenched health giants indicates how confident NVIDIA is in Abridge as data custodians and in its own open-source Nemotron: if Abridge’s proprietary clinician supports AI functions “foundationally” as claimed, it is a portable, ostensibly neutral clinical infrastructure that could compete with the health giants.

The cost of doing business: the cost of data

NVIDIA is right: as stewards of raw clinical data, Abridge possesses a considerable comparative advantage compared to other health AI developers. Consider, first, the price of data within the current regulatory landscape: last October, General Catalyst purchased Summa Health, an Ohio based healthcare delivery system, for $515 million dollars, and transformed the health network into a data collection and testing ground for its new health venture, the Health Assurance Transformation Corporation. In Utah, startups Doctronic and Legion Health obtained regulatory approval for autonomous refills after building traditional telehealth networks providing native data access. Checkmark Abridge on both fronts: its data and networks are native.

Error artifacts become health infrastructure

While the data sources the confidence in Abridge, it should also surface uncertainty for health professionals across hospital networks. Ambient scribes occasionally generate incorrect or flawed transcriptions of clinical conversations – that’s a feature, not a bug, and certainly not an indictment of Abridge’s technology. Governance committees, CIOs, CAIOs and other AI leaders, of course, attempt to factor the rate of clinical accuracy before deployment of emergent health technologies. Under current deployment regimes, transcription errors are mere artifacts of deployment, factored into the risk-calculus of industry professionals. A domain-specific model trained on top of flawed medical data, however, transmutes those artifacts into the foundation of an entire health infrastructure.

To be sure, health innovators have always struggled with incomplete or inaccurate data sets and struggled to transition from limited sandbox data environments to operations with real, clinical research data. Here, though, Abridge and NVIDIA are essentially building AI on top of AI. At the same keynote, Abridge disclosed a strategic investment from Eli Lilly. That investment recalls Eli Lilly’s relationship with OpenEvidence, which partially trains its clinical models on top of Eli Lilly’s pharmaceutical trials and testing. OpenEvidence delivers targeted advertising for Eli Lilly and other pharmaceutical companies at the point of care; when sponsored data surfaces sponsored pharmaceuticals, OpenEvidence provides explanations of Eli Lilly products to HCPs.

As such, the risk multiplies. An addendum: Abridge, NVIDIA, and Eli Lilly are building an AI on top of an AI, perhaps on top of a sponsored data set. The point is not to criticize Abridge’s product, which over 300 hospital networks employ, nor is the risk of what we can coin recursive models Abridge specific. Rather, as domain specific open-source models proliferate, industry professionals should keep an eye out for the opaque AIs they are interacting with.

CURRENT: LLMs spearhead breakthrough in genomic reanalysis

AI-powered genomic reanalysis is revolutionizing local diagnosis of genetic diseases

A recent study from Harvard Medical School, Boston Children’s Hospital and OpenAI offers a compelling argument for LLM integration in rare disease diagnosis. With AI-assisted genomic reanalysis, researchers identified 18 new local diagnoses across 376 previously unsolved cases spanning neurodevelopmental and neuromuscular diseases, early psychosis, and sudden unexpected pediatric deaths. The LLM ingested clinical notes, genetic variant data, and standardized symptom terminology, before submitting potential hypotheses which were then adjudicated by clinicians. Across all four cohorts, AI-assisted local diagnoses produced an additional 4.8% diagnostic yield upon reanalysis as compared to original diagnoses; yield rates were highest in the early psychosis cohort (13.3%).

Rare and undiagnosed or misdiagnosed genetic disorders affect millions of global patients, and the clinical community has long recognized the promise of whole-genome sequencing across a range of applications. Clinicians have, for example, identified six emergent arenas for possible whole-genome sequencing usage: pre-conception identifications of Mendelian disorder carriers; prenatal and pre-implantation testing; individual risk for Mendelian disorders; pharmacogenetic testing; direct tissue typing for transplantation; and identification of alleles that increase risk for common disorders.

Before application, however, significant barriers remain: costs are high, clinical validity is uneven, social utility is occasionally dubious, and, as advancements in genomic sequencing supersaturate research pipelines with data, physicians struggle to identify and interpret lesser-known gene variants. Unsurprisingly, then, over half of physicians employ genomic tests that fail to meet accuracy standards, and phenotype-gene associations for rare disorders remain underrepresented in databases like the Online Mendelian Inheritance in Man (OMIM), forcing physicians to manually parse databases like PubMed.

As such, some experts remain cautious. Robert Nussbaum of UCSF suggested that genomic interpretation for identifying risk-associated alleles is still, for now, “more in the realm of entertainment than medicine.”

The AI’s role was to generate explainable and biologically grounded pathogenic findings by connecting a patient’s clinical features with genomic data and existing biomedical evidence. By proposing candidate gene-disease links for clinician review and validation, the model functioned less as a decision-maker and more as a systematic hypothesis engine, capable of rapidly surfacing and prioritizing plausible disease mechanisms from vast genomic datasets.

That the LLM produced new, clinically relevant findings in 18 of 376 rare disease cases suggests that AI-assisted reanalysis may offer a meaningful mechanism for implementing whole-genome sequencing, especially in cases where standard diagnostic methods are exhausted.

Before promising possible gains in, for example, alleles identification or real-world calibration with clinical workflows, larger prospective studies with multicenter evaluation are needed. If validated at scale, however, the hypothesis engine model could provide a durable paradigm for AI integration in medicine and research.

CHARGE Wave

CHARGE Wave invites health care professionals, researchers, and wonks to informational interviews where they provide invaluable insight into AI adoption industry-wide. A wave is the collective movement of many particles away from their starting position. Expect insights from a community of AI leaders who drive, direct, and constitute the AI adoption wave.

Jason G. Cooper on keeping humans in-the-loop, skills that stay sticky, the AI space race, and the Dewey Decimal System

Jason G. Cooper is an AI integration expert, formerly of Paradigm, HMS, and Blue Cross Blue Shield.

CHARGE had the pleasure of speaking with Jason G. Cooper this past weekend in an interview that ranged from responsible AI deployment to NASA to the historical significance of the Dewey Decimal System. Throughout his career as a Chief Technology, Analytics, and AI Officer, Jason has managed the core data and analytics systems and teams for major health payers and providers, among them Paradigm, HMS and Blue Cross Blue Shield plans.

As AI deployment proliferates across major health networks, Jason is turning his attention to thought leadership, now advising .406 Ventures, Covenant HR, and the International Institute for Analytics as an AI and data analytics expert. On his robust LinkedIn page, Jason sketches a path for responsible AI deployment in healthcare that reaps the benefits of deployment and mitigates its potential dangers. Each post, and his insightful comments in this interview, speak to Jason’s larger goal of preserving (though likely augmenting) the clinician-patient relationship, what he describes as “the basic unit of healthcare.” CHARGE thanks Jason for his thoughtful participation in this interview.

Thank you for joining us, Jason. You spent decades in analytics before AI became the industry's obsession. Looking back, what changed – and what didn't – as healthcare moved from predictive/data analytics to generative AI?

Jason: Ha - a long time, happy to answer that. First of all, I really appreciate the opportunity to chat with CHARGE. I was doing AI in healthcare back in the mid-nineties before it was really cool. A lot has changed over 30 years. We have far greater computational capabilities now, especially around unstructured data. Knowledge creation is so quick today that for a clinician to practice at the top of their license, they would have to read hours and hours a day. Clinical decision support has become a necessity.

But the bigger thing is AI governance. Robust governance is starting to address the "trust gap" between developers, providers, and patients. We need frameworks to decide where to place our bets: guardrails that aren't so bureaucratic they crush innovation, but not so light that we incur undue risk. Without trust, AI will lose every time.

On that point, you've written extensively about trust for AI deployment: between patients and providers, payers and members, providers and AI. Can you define trust and describe what it looks like? How can healthcare leaders cultivate trust as they implement AI systems?

Jason: First and foremost, trust is built on a foundation of mutual consent. Patients and professionals should consent to AI assistance, whether it's ambient listening in an EMR or a decision support system. That requires workforce development; professionals must understand why we're using AI. If it’s deployed without understanding, it immediately puts people back on their heels.

The other pillar is tool choice. First, if a good old spreadsheet will solve your problem, why use AI? You’re over-clubbing it. Second, human-in-the-loop is vital. The basic unit of healthcare is the physician-patient relationship. I’m a big believer that, in the majority of cases, AI is not mature enough to operate independently.

There has been a proliferation of no-human-in-the-loop sandboxes, like in Utah, where Legion Health and Doctronic offer diagnostic consultation and autonomous refills. You just stated that, broadly, the technology is not there yet. Are there instances where the technology is ready for autonomy?

Jason: I don't want to get my medical advice from ChatGPT or Claude. I still want a great relationship with my physician and with my GI doc. The complex things still require an interpersonal relationship.

But it's not too early in every case. Take automated pharmacy refills. If I receive an “it’s time for your refill” text, I’ll reply yes. And why not, right? I’m comfortable knowing a human isn't looping on the automatic refill. That is, until it comes time for delivery, where I still anticipate a person visually confirming the pill bottle. Can we anticipate that visual confirmation coming from machines? In our food supply chain, optical recognition filters my produce, and I feel comfortable not seeing a human filter it. We’ll get there. There are a whole host of administrative areas where we can reduce the experience burden and create efficiencies at scale.

Going back to the question of trust – when you spoke about workforce development and training, you mentioned that we need to tell healthcare workers what exactly the AI is gonna help with. There's an industry-wide fear right now that AI deployment is going to replace and degrade human skills, especially those skills that might be for fallback procedures in the case of system failures. As an AI leader, what skills do you think are becoming more important in the AI era? What traditional skills are proving sticky?

Jason: When people ask, “Is AI going to take my job?” I love to respond: “AI won't take your job, but someone who understands how to leverage AI as a tool may take your job.” To your point about system failures – whether a system is down, or we lose power, or god forbid we’re in a wartime situation – healthcare providers will always need fundamental fallback skills. We can't forget how to do what we've been doing forever without AI.

When hiring, it's not about coding skills; those can be taught. I look for three fundamental skills: 1. 1. Problem Solving: To understand the true business problem that someone’s bringing to you – and leaders rarely present their exact problem – you need to peel back multiple layers of the onion. That requires consultative capabilities and the problem solving skills to dig.

2. Creativity: You can't rely on ChatGPT for your creativity. Humans are so immensely creative. Whether I’m interviewing someone that’s doing coding or architectural work, I’m still looking for creativity in our conversations.

3. Storytelling: I don’t mean marketing – I love to say, “tortured data will confess to anything,” and I’m not interested in torturing data. I mean imbuing a call to action. If you can’t convey your great analytics work to stakeholders, you’ve wasted your time.

What is the biggest difference between how payers and providers deploy, govern, and regulate AI?

Jason: It comes down to different business models and reimbursement. Payers deal with massive claims data, but they may not see the full picture if a patient pays out-of-pocket. That’s, by the way, similar on the provider side: networks don’t always share data.

At the end of the day, each business lens is for the same purpose. For providers, it’s about patients. For payers, it’s about members. So, from a governance model perspective – how you prioritize things, the right guardrails you put in place, even acceptable use policies and things of the like – I think those actually can be very similar.

On the point of reimbursements and losing information, prior authorization has become one of the most visible applications of AI on the payer side, and one of the most politically fraught. Just last week, a House Appropriations committee halted funding for the CMS’ new WISeR model. What do you think responsible use of AI utilization management looks like? Do you think the industry has gotten it wrong in any way?

Jason: I've personally had a chronic disease for over 30 years, so I've walked every hallway of the US healthcare system. Prior authorization is broken, but not because it's bad policy to ensure care is medically necessary. It's broken because we make it painful for members and providers. We can solve it two ways:

1. Data Interoperability: In countries with a lifelong medical record number, professionals can see your longitudinal health record and know you've been on a specialty medication for 15 years. They can provide you holistic care that’s not disjointed from the cradle to the grave. In the US, if you switch employers (thus insurers), your record is lost.

2. Portable Health Records: Health records shouldn’t be proprietary to provider systems. I should be able to walk around with an encrypted smart card or thumb drive containing my verified information so that when the time comes I could simply hand a professional my trusted, verified information. US law says you can access your health record, but you have to ask for it. The default position should be that the data is ours to own 24/7 without needing to ask.

You’ve argued that leaders shouldn’t respond with denial when patients use AI. What are payers and providers learning from the reality of "Dr. GPT"?

Jason: It’s just another leg of the information curve. Wind the tape back to the fifties: we had the Dewey decimal system. Absent the internet, people walked into the library. Eventually, we went from the Dewey Decimal System to Dr. Google, and providers went, “Oy vey, patients are printing out notes!” Now we’ve moved to Dr. GPT.

I think that's a great thing. Don't you want your patients to be educated? Yes, it's fraught with potential error and what I call decision frustration. All I would ask, as a healthcare professional, is let me leverage my expertise to provide a full differential diagnosis before you demand a certain prescription. But I think in the majority of cases, all of us as patients are simply trying to arrive informed and help with admittedly, the very limited time we have face to face with our providers. And if it makes us more efficient and therefore leads to a deeper human connection, I'm all for it.

Does AI resemble previous technical revolutions, or is it unique?

Jason: I don't think it's unique. Look at the aerospace industry. For bi-planes, everything was manual. Jump to the early nineties, planes have GPS and radar. Today, planes fly-by-wire with autopilot and turbulence reduction. Look at what SpaceX accomplished with self-landing vehicles upright in a pitching ocean. All of this tech provided for safer flight and less pilot burden.

We've come a long way in aerospace in terms of efficiency, safety, and experience. I think that parallels will be drawn to healthcare for all the same reasons: efficiency, patient safety, better provider experience, and reduction in provider burnout.

We are having a particular political – oftentimes politically fraught – AI moment, especially as it pertains to integration in healthcare. Do you see an out for this? Do you think it's just gonna continue to become a potentially toxic political issue? Or will it resemble the obsession we had with the space race that tapered off over time?

Jason: Back in the mid-nineties, I did a 10-year stint in aerospace working with the International Space Station and autonomous spacecraft. In aerospace, we have robust, federally recognized standards organizations like the FAA, IEEE, ANSI, etc. I could go on and on. These standards have built a structure for aerospace development – look at what SpaceX or Blue Origin or Boeing have done from an innovation perspective.

Right now, AI in healthcare is the wild wild west. At last count, there are about 250 different laws and regulations in the US alone, managed on a state-by-state basis. This tamps down innovation because of legal risk. National enterprises don’t want to get on the wrong side of a state legislature or a regulatory body, and so they might just bow out. We need a unified federal model. I named a bunch of standards organizations that are federally recognized by both commercial aeronautic entities and the government. We all use them. That’s the framework.

Is there one starting point you're thinking or looking at that we could build around to build a clear, unambiguous, and thoughtful national framework for AI regulation?

Jason: There are two organizations to my knowledge that are down the path from a maturity perspective on this. One is called CHAI, which is the Coalition for Health AI. I’ve actually read a fair amount of the Coalition for Health AI's materials, and I think the actual concepts and the

materials are very worthwhile as a starting point. The other is HL7 which has an artificial intelligence office, and they're starting to also think about AI standards from an interoperability, and data interchange perspective. Perspectives on governance and technical application of standards are very important right now.

As an example, the National Academy of Medicine has also published a purple book, which I have no commercial interest in, called An Artificial Intelligence Code of Conduct for Health and Medicine. Healthcare leaders need to be reading this and considering it. To answer your question, I think we're still very much in a nascent stage of standardized AI governance and regulations. We're just not there yet. It's a very complex fabric in the United States at least, and it actually makes it much more difficult for organizations to do their work.

Jason, thank you so much for meeting with us.

Jason: Thank you, I appreciate it.

CHARGE Potential Job Roundup 07/02/2026

CHARGE Potential aggregates job listings across AI governance to trace the professional direction of the industry. Potential is the energy required to move a single charge through a field. At CHARGE, we believe that governance is interpersonal, and that professionals deliver the energy to transform the field.

OpenAI – Product Policy, Biosecurity Policy Manager

San Francisco, CA – hybrid, full-time.

$261k/yr - $290k/yr

Apply

OpenAI’s product policy team develops and implements governance policies for OpenAI’s suite of services, including ChatGPT, Codex, GPTs and the OpenAI API. As OpenAI expands its operations into biological and clinical applications – this past week, the AI giant launched GPT-5.5 Instant as an improved delivery system for health information – its liability surface for clinically sensitive information expands. Biosecurity Policy Managers will help define how OpenAI enables biomedical research while mitigating what it defines as “biological harm.” Whether that “harm” refers to biological testing, safe custodianship of privileged clinical records, or the per se biological risk of erroneous health advice remains to be seen. It is clear, however, that OpenAI aims to entrench itself into health care.

Mayo Clinic – Senior Director for AI Implementation and Adoption

Rochester, MN – in person, full-time.

Mayo Clinic – Director: AI Governance Technologies

Rochester, MN – in person, full-time.

Mayo Clinic – Director: AI Validation and Monitoring

Rochester, MN – in person, full-time.

Mayo Clinic – Director: AI Governance Operations

Rochester, MN – in person, full-time.

We originally intended to highlight these four roles at Mayo Clinic, each posted on June 15th, as evidence of the growing demand for AI governance directors across major American care networks. As of June 25th, Mayo Clinic has closed applications for each availability and removed them from their website, a remarkable turnaround for positions in senior directorship. The speed of the application cycle demonstrates that the demand for AI governance is bidirectional: hospitals need governance experts, professionals want to fill those roles, and the field moves quickly for professionals. As major health providers continue to implement AI technologies – and as those technologies overlap with increasing complexity – expect fast moving application cycles to replicate across the industry.