Designing Trustworthy Next Best Actions

90% model accuracy means nothing if frontline teams override 70% of recommendations. This example illustrates how an AI Adoption Lab can validate whether call centre staff in a motor insurance claims business will trust and act on the recommendations presented by their next-best-action engine.

Are the recommendations understood?
Do staff trust and agree with the recommendations?
Are recommendations actioned correctly?

The goal isn’t better AI predictions — it’s better human decisions.

The Adoption Challenge

In this business, call centre staff often spent around 2–3 minutes reviewing case history and internal processes before deciding what to do next. This meant long periods with customers on hold, increase average call handling, and high levels of frustration. For newer staff, the complexity of deciding on the next best action also led to inconsistency and rework by more experienced staff involved in the claim down stream.

In theory, the NBA engine promised to reduce decision time, improve consistency and guide staff through optimal claim pathways. But, accurate recommendations would mean nothing if agents didn’t trust or act on the recommendations.

Designing for Trust, not Automation

Initial designs used a clear 'reason why' to help agents understand why each recommendation was being suggested. Links to the activity history would make it easy to check that the reason was true, with options to flag a recommendation as incorrect if needed.

A button would make it easy for agents to follow the recommended action if they wanted to.

The Experiments

Rather than relying on interviews or surveys, the lab ran behaviour-based experiments using working prototypes running on realistic test data. The AI Adoption Lab runs a sequence of small, focused experiments, each designed to explore a specific adoption challenge and progressively shape the solution. :

Trust and understanding
Do agents understand why a recommendation is being made? Explore different explanation styles, levels of detail, and confidence cues to see which designs increase acceptance without slowing decisions.

Taking action
Do recommendations prompt expected action? Data shows what actions were taken and from which screens e.g. using the button or finding a workaround.

Flagging errors and disagreement
What happens when the AI is wrong? Data shows when users follow incorrect recommendations, and when errors are flagged along with the reasons why.

Seeking further information
When do agents need more context before acting? Different claim profiles highlight when additional information is sought, when it isn't and what might increase agent autonomy in complex claims.

Test Cards

Each experiment is designed with a clear behavioural hypothesis, measurable success criteria, and simple mechanisms such as the use of Strategyzer Test Cards (as shown) to capture results consistently across tests.

Designs are iterated and retested until behaviour stabilises or value is disproven.

Page updated

Google Sites

Report abuse