Interruption Handling

“Wait—” and the agent stops.

Convexa agents yield to the caller the instant they speak — no 2-second tail, no “I’m sorry, can you repeat that,” no awkward dance. And the agent picks up the right thread when you’re done.

Book a call How it works

Live call · barge-in handled at 1.6 s

agent: Mira v22 · target yield latency: 120 ms

Yield · 118 ms

AGENT

caller starts · 1.6 s

“For the Thursday slot we have 2:15 with Sam or 4:30 with Riv…”

CALLER

“Wait — is Sam available later, like 6?”

AGENT

“Let me check — Sam has 6:15 open. Want that?”

Why this is hard

“Yield instantly”
and “don’t yield to a cough”
are the same problem.

Naive VAD yields to every breath, baby, or background TV. Our team trains and tunes the interrupter on your managed AI voice agents so they distinguish intent — the caller’s voice meant for the agent — from ambient noise.

Speaker-aware VAD

Identifies the primary speaker from the first 200 ms and ignores cross-talk after that. Babies, dogs, TVs — filtered.

Intent gating

Differentiates “Wait—” (intentional barge-in) from “Mhm” (backchannel). The agent keeps going on backchannels; yields on real interruptions.

Resume bookmarks

When the caller is done, the agent picks up the unfinished thought — without re-starting the whole sentence.

Inside the loop

A turn,
broken down.

Four states. Constant transitions. We tune the thresholds to your callers — and the whole machinery still runs in under 120 ms.

STATE 01

Speaking

Agent is producing audio. Listens with 30 ms windows for speech-onset above a learned floor.

VAD · every 30 ms

STATE 02

Possible interrupt

Speech detected. Agent softens output 8 dB and runs intent gate. If intent > 0.62, transition.

gate · 80 ms

STATE 03

Yielding

Audio out fades to silence in 40 ms. Last 2 seconds of agent output get bookmarked for resume.

total · 118 ms

STATE 04

Listening & planning

Streaming ASR on the new utterance. Agent reasons about how to respond — usually 240 ms of think time.

think · 240 ms

The recovery moment

What the agent does next
is where it sounds human.

Same interrupt during an AI appointment booking call. Two different handlers. The right one feels effortless; the wrong one is why people hate voice bots.

Naive yield · what most bots do

AGENT“For the Thursday slot we have 2:15 with Sam or 4:30 with…”

CALLER“Wait — is Sam available later, like 6?”

AGENT“I’m sorry, can you repeat your question?”

CALLER“(audibly frustrated) Is. Sam. Available. At. Six.”

Convexa · bookmarked + intent-aware

AGENT“For the Thursday slot we have 2:15 with Sam or 4:30 with…”

CALLER“Wait — is Sam available later, like 6?”

AGENT“Let me check Sam — yes, 6:15 is open. Want that one?”

CALLER“Yeah, that’s great.”

What good turn-taking looks like.

~120 ms

Typical yield latency we target, from the caller’s first phoneme to silenced agent output.

~99%

Share of caller-initiated interrupts handled cleanly, without “sorry, can you repeat that?”

<1%

False-yield rate we tune toward — the agent stopping when nobody actually interrupted.

Discover more features

human by design.

We tune interruption handling on every Convexa agent we run for you — no flag, no toggle.

Book a call

“Wait—” and the agent stops.

“Yield instantly”
and “don’t yield to a cough”
are the same problem.

Speaker-aware VAD

Intent gating

Resume bookmarks

A turn,
broken down.

Speaking

Possible interrupt

Yielding

Listening & planning

What the agent does next
is where it sounds human.

What good turn-taking looks like.

Pair it with these.

Voice Agents

Multilingual

Sentiment Analysis

“Wait—” and the agent stops.

“Yield instantly”and “don’t yield to a cough”are the same problem.

Speaker-aware VAD

Intent gating

Resume bookmarks

A turn,broken down.

Speaking

Possible interrupt

Yielding

Listening & planning

What the agent does nextis where it sounds human.

What good turn-taking looks like.

Pair it with these.

Voice Agents

Multilingual

Sentiment Analysis

“Yield instantly”
and “don’t yield to a cough”
are the same problem.

A turn,
broken down.

What the agent does next
is where it sounds human.