Assembly Ai for speech diarisation

I’ve begun taking advantage of the speaker identification features of Assembly AI’s transcript generation capabilities for use in making my podtube shows readable, not just listenable. YouTube’s own transcripts are useful enough for Ctrl-F purposes but are otherwise insufficient.

Assembly AI detects the number of speakers, and gives them “A,” “B,” “C”…designations. (It’s solidly reliable up to ten speakers, beyond which things get shaky.)

Following that, I feed the transcript to ChatGPT, instructing it to replace those letters with the actual names of the discussants. Minor editing on my part ensues.