GiveCare Beta Retrospective: User Feedback, Data Insights, and Expert Advice

Reflecting on the GiveCare beta experience, highlighting key learnings from users, systematic evaluations, and expert insights on refining our AI-powered caregiving assistant.

GiveCare Team

Contributor

Beta Retrospective

As the GiveCare beta period wraps up (October–December 2025), we want to reflect on what we learned, celebrate successes, acknowledge challenges, and share how we're moving forward. The past two and a half months have been instrumental in shaping GiveCare, our AI-powered caregiving assistant, through direct user interactions, data-driven evaluations, and expert conversations.

Understanding Our Users: What We Learned

The beta involved 10 dedicated users, engaging with GiveCare regularly through SMS. User feedback highlighted several areas where GiveCare excelled and pointed out clear opportunities for improvement:

Valuing Empathy and Personalization

Users consistently expressed appreciation for GiveCare’s empathetic interactions. Some verbatim examples included:

"It's great you're considering new home health support."
(Reflecting our assistant’s ability to validate and support user decisions effectively.)
"Focusing on music and simple stories can be wonderful."
(Demonstrating practical caregiving advice tailored to specific situations.)

Caregivers often expressed gratitude for the assistant's emotional sensitivity and ability to provide timely, empathetic responses, emphasizing the need for continued enhancement of these capabilities.

Key Observations from the Data

Systematic evaluations during the beta period focused on key interaction metrics. Here’s what the data revealed:

Coherence: Responses were logically organized, averaging 3.72 out of 5.
Fluency: Interactions were fluent and natural, scoring approximately 3.92 out of 5.
Groundedness: Responses showed high relevance to user queries, averaging 4 out of 5, indicating GiveCare’s strong contextual understanding.
Relevance: Responses averaged 3.22 out of 5 in accurately addressing queries, clearly indicating a primary area for improvement.

Safety evaluations consistently returned "Very Low" risks for violence and self-harm across all interactions, underscoring GiveCare's safe and appropriate interaction design.

Challenges and Lessons Learned

One theme kept surfacing: automation helps, but it can't replace human empathy. Users liked GiveCare's automated resource recommendations. But in harder moments — grief, confusion, fear — they wanted a person. Or at least an AI that acknowledged its limits.

Transparency mattered more than we expected. Caregivers trusted the assistant more when it explained why it suggested something. "Because you mentioned X" landed better than unexplained recommendations.

Expert Insights from Hamal Hussein

A complementary perspective came from a conversation with Hamal Hussein, an expert in AI evaluations, who provided strategic guidance on effectively evaluating and improving GiveCare:

Bottom-Up Evaluation: Hussein advised starting from real user interactions rather than theoretical evaluations, stating:

"Evaluate each interaction individually within the broader session context."
Error Analysis Matters Most: He highlighted the need for clear categorization and direct analysis of errors observed in user interactions, stating,

"You don't even need to do evaluations initially. Sometimes the most effective approach is to look closely at user interactions and fix obvious problems immediately."
Balance Automation with Human Elements: Hussein affirmed the importance of maintaining a clear boundary where AI supports rather than replaces critical human connections, echoing our user feedback.

This guidance significantly clarified our evaluation strategy moving forward.

Integrating User Feedback and Expert Guidance: A Balanced Approach

Taking into account both direct user experiences and Hussein’s recommendations, we identified clear strategies to refine GiveCare:

Better personalization — refine models to improve response relevance. Users noticed when we got it wrong.
Emotional intelligence — the hardest part. Recognizing grief, fear, exhaustion. We're not there yet.
Show our work — explain why we suggest what we suggest. Trust comes from transparency.
Fix what's broken first — Hussein's advice: skip the metrics, look at actual conversations, fix obvious problems.

Practical Takeaways and Moving Forward

The beta taught us where we're strong (safety, fluency, basic empathy) and where we're weak (relevance, emotional depth). Ten users. Hundreds of conversations. Enough to know what to fix.

What's next:

Keep the feedback loop open — users drive the roadmap
Analyze real conversations before running metrics
Don't break what works: safety scores stayed low throughout

Gratitude and Next Steps

Thank you to our beta users. You were honest about what worked and what didn't. That honesty shapes everything we build next.

And thanks to Hamal Hussein for the reality check on evaluation — sometimes the best approach is the simplest: look at the conversations, fix the problems.

Were you part of our beta or have additional feedback? Reach out at info@givecareapp.com.

← Back to Words