Voice Cloning

This is a member-only chapter. Log in with your Signal Over Noise membership email to continue.

Module 2 · Section 5 of 7

Voice Cloning

Voice cloning is operationally simpler than video deepfakes and requires less source material. Current tools need as little as three seconds of audio to create a basic clone. High-quality clones require 20 to 30 seconds. ElevenLabs offers voice cloning for $5 per month.

The attack pattern that has proliferated most widely is what the FBI calls the “grandparent scam”: a caller synthesises the voice of a grandchild or family member claiming to be in trouble — arrested, in an accident, hospitalised — and urgently needs money. An Arizona case in 2023 involved Jennifer DeStefano receiving a call with her daughter’s cloned voice claiming she had been kidnapped and demanding ransom. Her daughter was at a ski resort. Eight Canadian seniors collectively lost $200,000 to similar schemes in 2023.

In corporate settings, voice cloning is used for what was historically called CEO fraud: calls purportedly from senior executives requesting urgent wire transfers or credential resets. The 2020 UAE bank incident — where a manager approved $35 million in transfers after a call using “deep voice technology” to clone a company director — predates the current generation of accessible tools. The same attack today requires a $5 monthly subscription.

Humans distinguish cloned voices from real ones only 54% of the time — no better than chance. Family members frequently fail to recognise synthetic versions of their relatives’ voices.

Defence: Establish pre-shared code words with family members and key executives — a specific word or phrase that only the real person knows, required before any unusual request is taken seriously. The Ferrari incident in July 2024 illustrates this working: executives successfully identified a deepfake caller by asking a question about a recent book recommendation the real person had mentioned privately. The deepfake caller immediately ended the call.