What happens if two people talk at the same time?

2 months ago

Dictalogic Support

2 minutes

Overview

This article explains how Conversation to Text handles overlapping speech, where two or more speakers are talking simultaneously.

Applies to

All Users

Overlapping speech is a known challenge

Simultaneous speech where two or more people speak at the same time is one of the most difficult scenarios for any automated transcription system to handle accurately. When multiple voices overlap in the audio, the AI engine must try to distinguish between them, which is inherently more difficult than processing one voice at a time.

What happens in the transcript

When overlapping speech occurs, the engine will typically do one of the following: attempt to transcribe whichever voice is dominant (clearest and loudest) in the overlapping segment; partially transcribe elements of both voices, which may result in fragmented or confused text; or mark the segment as unclear or assign it to whichever speaker it can most confidently identify.

Impact on accuracy

Overlapping speech will generally reduce transcription accuracy for the affected segments. This is a limitation of current AI transcription technology and is not specific to Dictalogic it applies to all automated speech recognition systems.

Recommendations

Encourage participants to take turns speaking and avoid interrupting each other where possible, particularly in recorded interviews or formal meetings. During the transcript review stage, pay particular attention to segments where overlapping speech occurred and make manual corrections where the output is unclear or incorrect.