Field Notes: What's Moving in Caregiving AI This Week

A weekly look at the research, tools, and policy shifts shaping AI support for family caregivers — through the lens of people actually doing the caring.

GiveCare Team

Contributor

Sixty-three million Americans are caring for someone right now. Most of them are doing it without a break, without training, and without enough help.

AI is starting to show up in that gap. Some of what's being built is genuinely useful. Some of it will make things worse. Every week we track what's moving — papers, products, policy, signals — through one lens: does this help caregivers, or doesn't it?

This is the first in an ongoing series. Here's what we're watching.

The Memory Problem Is Getting Solved

The hardest thing about building AI for caregivers isn't the first conversation. It's the fifteenth.

By week three, your mom's care routine has changed twice. You've mentioned your sister won't help. You told the AI you're not sleeping. A good support system carries all of that forward. A bad one makes you repeat yourself — which feels dismissive and exhausting — or gets things wrong, which erodes trust exactly when trust matters most.

A paper published late last year called ENGRAM shows the memory problem is solvable without building something expensive or complex. It splits conversation history into episodic, semantic, and procedural layers, routes between them with a single lightweight orchestrator, and achieves state-of-the-art results on long-term conversation benchmarks using about 1% of the computing resources of competing approaches.

That last part matters for caregiving specifically. Heavy memory systems are expensive to run. They price out free tools. Lightweight approaches that work mean more caregivers get access to AI that actually remembers them — not just the caregivers whose families can afford a premium subscription.

We shipped our own lightweight memory system (SimpleMem) earlier this month. ENGRAM is independent validation that the approach is right.

Alzheimer's Caregivers Finally Have a Purpose-Built Tool

Generic AI is bad at caregiving. It gives advice that ignores disease progression, uses terminology caregivers find clinical and cold, and optimizes for information delivery when what caregivers need is to feel heard.

ADQueryAid, published this month in npj Biomedical Innovations, is built specifically for Alzheimer's and dementia caregivers. It uses an ADRD-specific knowledge base — disease stages, behavioral symptoms, caregiver burden patterns — and tested against real caregivers, not just benchmarks.

The finding: it outperformed ChatGPT 3.5 on usability measures with actual caregivers.

Worth pausing on that benchmark choice. GPT-3.5 is ancient by model standards — two full generations behind current frontier. This is a recurring pattern in peer-reviewed caregiving AI research: the publication cycle runs 12-24 months behind model development, so papers validating "AI-assisted X" are often comparing against models that practitioners stopped using a year ago. The finding is still meaningful — purpose-built beats general-purpose — but the margin against current models like GPT-5.2 or Claude Opus would likely be much narrower, or reversed. That's the honest read.

That's a meaningful bar, with that caveat noted. It means the gap between general-purpose AI and purpose-built caregiving AI is measurable and real. A caregiver asking about sundowning behavior or how to handle a parent who no longer recognizes them deserves something built for that — not a general assistant making its best guess.

We're reading the full paper this week.

Safety Gaps Are Bigger Than the Industry Admits

A new study from Karolinska Institutet found that small amounts of corrupted training data can significantly degrade AI performance — and that the harm falls disproportionately on vulnerable populations.

Here's what that means in practice: if an AI assistant for caregivers was trained on data that subtly underrepresents crisis signals, it will miss them more often than you'd expect. Not dramatically. Not in ways that show up easily in standard evaluations. Just quietly, consistently, in the exact moments when the stakes are highest.

This is what InvisibleBench was built to find. We test how AI handles the masked crisis signal — the "I can't do this anymore" that could be venting or could be something worse. Across every frontier model we've tested, crisis detection is the weakest dimension. The best model we've evaluated still misses more than half of crisis signals.

The Karolinska paper gives a mechanism for why that might be true even in models that perform well on other tasks. It's worth reading: Karolinska: AI's Promise in Health Care Comes With a Hidden Vulnerability

The Law Is Starting to Catch Up

Two things worth knowing if you're thinking about caregiving AI and accountability:

California AB 2013 requires AI companies to disclose training data sources. For caregiving AI, that raises a practical question: if a model was trained on caregiver forums, therapy transcripts, or health records without consent, what does that mean for the people who shared their hardest moments thinking they were talking to a support group?

AI malpractice liability is an emerging legal question. Medical Economics covered it this month: when AI gives a wrong answer in a health context, who's responsible? Right now the honest answer is nobody is sure. But Illinois, California, and New York are furthest along in building frameworks. If you're deploying AI in any health-adjacent context — caregiving very much included — this is the legal landscape forming around you.

What We're Watching Next

Two things on our list:

Can ENGRAM's memory approach work for caregiving specifically? The benchmark results are from general long-term conversations. We want to know if it holds for the kind of fragmented, emotionally loaded interactions caregivers actually have with AI.

The WHO's warning about regulatory gaps. The European Commission is loosening AI rules while WHO is sounding alarms about patient safety. The gap between what's being deployed and what's been tested is widening, and caregivers are often the last to know.

If you're a caregiver and something in here connects to your experience — or if you've tried AI tools that helped or didn't — we want to hear from you.

GiveCare builds text-based AI support for family caregivers. No app required. InvisibleBench, our open-source safety benchmark, is at github.com/givecareapp/givecare-bench.

← Back to Words