“Wait—” and the agent stops.
Convexa agents yield to the caller the instant they speak — no 2-second tail, no “I’m sorry, can you repeat that,” no awkward dance. And the agent picks up the right thread when you’re done.
“Yield instantly”
and “don’t yield to a cough”
are the same problem.
Naive VAD yields to every breath, baby, or background TV. Our team trains and tunes the interrupter on your managed AI voice agents so they distinguish intent — the caller’s voice meant for the agent — from ambient noise.
Speaker-aware VAD
Identifies the primary speaker from the first 200 ms and ignores cross-talk after that. Babies, dogs, TVs — filtered.
Intent gating
Differentiates “Wait—” (intentional barge-in) from “Mhm” (backchannel). The agent keeps going on backchannels; yields on real interruptions.
Resume bookmarks
When the caller is done, the agent picks up the unfinished thought — without re-starting the whole sentence.
A turn,
broken down.
Four states. Constant transitions. We tune the thresholds to your callers — and the whole machinery still runs in under 120 ms.
Speaking
Agent is producing audio. Listens with 30 ms windows for speech-onset above a learned floor.
VAD · every 30 msPossible interrupt
Speech detected. Agent softens output 8 dB and runs intent gate. If intent > 0.62, transition.
gate · 80 msYielding
Audio out fades to silence in 40 ms. Last 2 seconds of agent output get bookmarked for resume.
total · 118 msListening & planning
Streaming ASR on the new utterance. Agent reasons about how to respond — usually 240 ms of think time.
think · 240 msWhat the agent does next
is where it sounds human.
Same interrupt during an AI appointment booking call. Two different handlers. The right one feels effortless; the wrong one is why people hate voice bots.
What good turn-taking looks like.
Pair it with these.
We tune interruption handling on every Convexa agent we run for you — no flag, no toggle.