Audio deepfake

Short description: Voice clips generated by AI

Audio deepfake technology, also referred to as voice cloning or deepfake audio, is an application of artificial intelligence designed to generate speech that convincingly mimics specific individuals, often synthesizing phrases or sentences they have never spoken.^[1]^[2]^[3]^[4]

Initially developed with the intent to enhance various aspects of human life, it has practical applications such as generating audiobooks and assisting individuals who have lost their voices due to medical conditions.^[5]^[6] Additionally, it has commercial uses, including the creation of personalized digital assistants, natural-sounding text-to-speech systems, and advanced speech translation services.^[7] As with other forms of deepfakes, audio deepfakes have faced criticism for their use in scams, phishing, and misinformation campaigns.

Detection methods

The audio deepfake detection task determines whether the given speech audio is real or fake.

Recently, this has become a hot topic in the forensic research community, trying to keep up with the rapid evolution of counterfeiting techniques.

In general, deepfake detection methods can be divided into two categories based on the aspect they leverage to perform the detection task. The first focuses on low-level aspects, looking for artifacts introduced by the generators at the sample level. The second, instead, focus on higher-level features representing more complex aspects as the semantic content of the speech audio recording.

Many machine learning models have been developed using different strategies to detect fake audio. Most of the time, these algorithms follow a three-steps procedure:

Each speech audio recording must be preprocessed and transformed into appropriate audio features;
The computed features are fed into the detection model, which performs the necessary operations, such as the training process, essential to discriminate between real and fake speech audio;
The output is fed into the final module to produce a prediction probability of the Fake class or the Real one. Following the ASVspoof^[26] challenge nomenclature, the Fake audio is indicated with the term "Spoof," the Real instead is called "Bonafide."

Over the years, many researchers have shown that machine learning approaches are more accurate than deep learning methods, regardless of the features used.^[14] However, the scalability of machine learning methods is not confirmed due to excessive training and manual feature extraction, especially with many audio files. Instead, when deep learning algorithms are used, specific transformations are required on the audio files to ensure that the algorithms can handle them.

Most detection studies focus on English language deepfakes. Research on the most spoken languages like Chinese and Spanish remains limited. Hindi and Arabic are also understudied. Detection models trained primarily on English show reduced performance when tested on non-English audio. Accent variation also affects detection accuracy. Different pronunciations associated with specific regions or populations influence how well models identify synthetic speech.^[27]

There are several open-source implementations of different detection methods,^[28]^[29]^[30] and usually many research groups release them on a public hosting service like GitHub.

Concerns and countermeasures

The increasing accessibility of audio deepfake technology has faced concerns over their use in scams and misinformation campaigns.^[3]^[31] People can use them as a logical access voice spoofing technique,^[32] where they can be used to manipulate public opinion for propaganda, defamation, or terrorism. Vast amounts of voice recordings are daily transmitted over the Internet, and spoofing detection is challenging.^[10] Audio deepfake attackers have targeted individuals and organizations, including politicians and governments.^[33]

In 2019, scammers using AI impersonated the voice of the CEO of a German energy company and directed the CEO of its UK subsidiary to transfer €220,000.^[34] In early 2020, the same technique impersonated a company director as part of an elaborate scheme that convinced a branch manager to transfer $35 million.^[35]

According to a 2023 global McAfee survey, one person in ten reported having been targeted by an AI voice cloning scam; 77% of these targets reported losing money to the scam.^[36]^[37] Audio deepfakes could also pose a danger to voice ID systems currently used by financial institutions.^[38]^[39] In March 2023, the United States Federal Trade Commission issued a warning to consumers about the use of AI to fake the voice of a family member in distress asking for money.^[40]

In October 2023, during the start of the British Labour Party's conference in Liverpool, an audio deepfake of Labour leader Keir Starmer was released that falsely portrayed him verbally abusing his staffers and criticizing Liverpool.^[41] That same month, an audio deepfake of Slovak politician Michal Šimečka falsely claimed to capture him discussing ways to rig the upcoming election.^[42]

During the campaign for the 2024 New Hampshire Democratic presidential primary, over 20,000 voters received robocalls from an AI-impersonated President Joe Biden urging them not to vote.^[43]^[44] The New Hampshire attorney general said this violated state election laws, and alleged involvement by Life Corporation and Lingo Telecom.^[45] In February 2024, the United States Federal Communications Commission banned the use of AI to fake voices in robocalls.^[46]^[47] That same month, political consultant Steve Kramer admitted that he had commissioned the calls for $500. He said that he wanted to call attention to the need for rules governing the use of AI in political campaigns.^[48] In May, the FCC said that Kramer had violated federal law by spoofing the number of a local political figure, and proposed a fine of $6 million. Four New Hampshire counties indicted Kramer on felony counts of voter suppression, and impersonating a candidate, a misdemeanor.^[49]

In 2024, YouTuber Jeff Geerling received attention after noticing his voice was cloned without consent by the electronics manufacturer Elecrow,^[50]^[51]^[52] and the company subsequently apologized and removed the offending content.^[53] Geerling's experience was cited as a case meant to be addressed by YouTube's likeness-detection technology when it was released in 2025.^[54]^[55]

State-sponsored propaganda

In February 2023, research firm Graphika documented the first known case of state-aligned actors using AI-generated news anchors for political disinformation. The Chinese propaganda network Spamouflage created two fictitious news presenters named "Alex" and "Anna" using Synthesia AI software. The deepfake anchors appeared in videos posted to Facebook, Twitter, and YouTube starting in late 2022. Videos promoted pro-China narratives and criticized the United States. The operation cost approximately $30 per month for the AI software subscription. Detection indicators included robotic speech patterns, lip-sync errors, and grammatical mistakes in video subtitles.^[56]

Open challenges and future research direction

The audio deepfake is a very recent field of research. For this reason, there are many possibilities for development and improvement, as well as possible threats that adopting this technology can bring to our daily lives. The most important ones are listed below.

A 2023 comprehensive survey on audio deepfake detection found that detection systems face significant challenges in generalizing across different generation methods and acoustic conditions.^[27]

Deepfake generation

Regarding the generation, the most significant aspect is the credibility of the victim, i.e., the perceptual quality of the audio deepfake.

Several metrics determine the level of accuracy of audio deepfake generation, and the most widely used is the mean opinion score (MOS), which is the arithmetic average of user ratings. Usually, the test to be rated involves perceptual evaluation of sentences made by different speech generation algorithms. This index showed that audio generated by algorithms trained on a single speaker has a higher MOS.^[25]^[15]^[57]^[58]^[20]

The sampling rate also plays an essential role in detecting and generating audio deepfakes. Currently, available datasets have a sampling rate of around 16 kHz, significantly reducing speech quality. An increase in the sampling rate could lead to higher quality generation.^[18]

In March 2020, a Massachusetts Institute of Technology researcher demonstrated data-efficient audio deepfake generation through 15.ai, a web application capable of generating high-quality speech using only 15 seconds of training data,^[59]^[60] compared to previous systems that required tens of hours.^[61] The system implemented a unified multi-speaker model that enabled simultaneous training of multiple voices through speaker embeddings, allowing the model to learn shared patterns across different voices even when individual voices lacked examples of certain emotional contexts.^[62] The platform integrated sentiment analysis through DeepMoji for emotional expression and supported precise pronunciation control via ARPABET phonetic transcriptions.^[63] The 15-second data efficiency benchmark was later corroborated by OpenAI in 2024.^[64]

Deepfake detection

Focusing on the detection part, one principal weakness affecting recent models is the adopted language.

Most studies focus on detecting audio deepfake in the English language, not paying much attention to the most spoken languages like Chinese and Spanish,^[65] as well as Hindi and Arabic.

It is also essential to consider more factors related to different accents that represent the way of pronunciation strictly associated with a particular individual, location, or nation. In other fields of audio, such as speaker recognition, the accent has been found to influence the performance significantly,^[66] so it is expected that this feature could affect the models' performance even in this detection task.

In addition, the excessive preprocessing of the audio data has led to a very high and often unsustainable computational cost. For this reason, many researchers have suggested following a self-supervised learning approach,^[67] dealing with unlabeled data to work effectively in detection tasks and improving the model's scalability, and, at the same time, decreasing the computational cost. Moreover, a key factor in handling unknown spoofing attacks lies in the use of effective data augmentation strategies, which expose the model to a broader range of acoustic variability and enhance its ability to generalize to unseen attack conditions.^[68]

Training and testing models with real audio data is still an underdeveloped area. Indeed, using audio with real-world background noises can increase the robustness of the fake audio detection models.

In addition, most of the effort is focused on detecting synthetic-based audio deepfakes, and few studies are analyzing imitation-based due to their intrinsic difficulty in the generation process.^[10]

Defense against deepfakes

Over the years, there has been an increase in techniques aimed at defending against malicious actions that audio deepfake could bring, such as identity theft and manipulation of speeches by the nation's governors.

To prevent deepfakes, some suggest using blockchain and other distributed ledger technologies (DLT) to identify the provenance of data and track information.^[14]^[69]^[70]^[71]

Extracting and comparing affective cues corresponding to perceived emotions from digital content has also been proposed to combat deepfakes.^[72]^[73]^[74]

Another critical aspect concerns the mitigation of this problem. It has been suggested that it would be better to keep some proprietary detection tools only for those who need them, such as fact-checkers for journalists.^[8] That way, those who create the generation models, perhaps for nefarious purposes, would not know precisely what features facilitate the detection of a deepfake,^[8] discouraging possible attackers.

To improve the detection instead, researchers are trying to generalize the process,^[75] looking for preprocessing techniques that improve performance and testing different loss functions used for training.^[32]^[76]

In January 2023, China implemented regulations requiring deepfake content to be clearly labeled. The rules prohibit using deep synthesis technology to produce or disseminate fake news. Service providers must verify user identities and maintain content logs. The regulations represent one of the first comprehensive government frameworks specifically targeting deepfake technology.^[77]

Research programs

Numerous research groups worldwide are working to recognize media manipulations; i.e., audio deepfakes but also image and video deepfake. These projects are usually supported by public or private funding and are in close contact with universities and research institutions.

For this purpose, the Defense Advanced Research Projects Agency (DARPA) runs the Semantic Forensics (SemaFor).^[78]^[79] Leveraging some of the research from the Media Forensics (MediFor)^[80]^[81] program, also from DARPA, these semantic detection algorithms will have to determine whether a media object has been generated or manipulated, to automate the analysis of media provenance and uncover the intent behind the falsification of various content.^[82]^[78]

Another research program is the Preserving Media Trustworthiness in the Artificial Intelligence Era (PREMIER)^[83] program, funded by the Italian Ministry of Education, University and Research (MIUR) and run by five Italian universities. PREMIER will pursue novel hybrid approaches to obtain forensic detectors that are more interpretable and secure.^[84]

DEEP-VOICE^[85] is a publicly available dataset intended for research purposes to develop systems to detect when speech has been generated with neural networks through a process called Retrieval-based Voice Conversion (RVC). Preliminary research showed numerous statistically significant differences between features found in human speech and that which had been generated by Artificial Intelligence algorithms.

Public challenges

In the last few years, numerous challenges have been organized to push this field of audio deepfake research even further.

The most famous world challenge is the ASVspoof,^[26] the Automatic Speaker Verification Spoofing and Countermeasures Challenge. This challenge is a bi-annual community-led initiative that aims to promote the consideration of spoofing and the development of countermeasures.^[86]

Another recent challenge is the ADD^[87]—Audio Deepfake Detection—which considers fake situations in a more real-life scenario.^[88]

Also the Voice Conversion Challenge^[89] is a bi-annual challenge, created with the need to compare different voice conversion systems and approaches using the same voice data.

Extended use without permission

On 22 May 2025, it was claimed that Hoya Corporations product ReadSpeak used recording work done by the actress Gayanne Potter for them in 2021 which at the time she understood would just be used for accessibility and e-learning software, but is now available generally as the voice Iona and is used as the announcer on ScotRail trains.^[90]^[91]^[92] This replaced older messages recorded by Fletcher Mathers without her permission.^[93] On 25 August 2025, ScotRail announced that they will replace the AI voice on trains, however it's not confirmed if this will be a human recording or another AI-trained voice.^[94]

References

↑ Smith, Hannah; Mansted, Katherine (April 1, 2020). Weaponised deep fakes: National security and democracy. 28. Australian Strategic Policy Institute. pp. 11–13.
↑ Lyu, Siwei (2020). "Deepfake Detection: Current Challenges and Next Steps" (in en-US). 2020 IEEE International Conference on Multimedia & Expo Workshops (ICMEW). pp. 1–6. doi:10.1109/icmew46912.2020.9105991. ISBN 978-1-7281-1485-9.
↑ ^3.0 ^3.1 Diakopoulos, Nicholas; Johnson, Deborah (June 2020). "Anticipating and addressing the ethical implications of deepfakes in the context of elections" (in en). New Media & Society 23 (7): 2072–2098. 2020-06-05. doi:10.1177/1461444820925811. ISSN 1461-4448. http://journals.sagepub.com/doi/10.1177/1461444820925811.
↑ Murphy, Margi (20 February 2024). "Deepfake Audio Boom Exploits One Billion-Dollar Startup's AI". Bloomberg. https://www.bloomberg.com/news/articles/2024-02-21/biden-deepfake-and-other-audio-fakes-were-made-with-elevenlabs-ai.
↑ Chadha, Anupama; Kumar, Vaibhav; Kashyap, Sonu; Gupta, Mayank (2021), Singh, Pradeep Kumar; Wierzchoń, Sławomir T.; Tanwar, Sudeep et al., eds., "Deepfake: An Overview" (in en), Proceedings of Second International Conference on Computing, Communications, and Cyber-Security, Lecture Notes in Networks and Systems (Singapore: Springer Singapore) 203: pp. 557–566, doi:10.1007/978-981-16-0733-2_39, ISBN 978-981-16-0732-5, https://link.springer.com/10.1007/978-981-16-0733-2_39, retrieved 2022-06-29
↑ "AI gave Val Kilmer his voice back. But critics worry the technology could be misused." (in en-US). Washington Post. ISSN 0190-8286. https://www.washingtonpost.com/technology/2021/08/18/val-kilmer-ai-voice-cloning/.
↑ Etienne, Vanessa (August 19, 2021). "Val Kilmer Gets His Voice Back After Throat Cancer Battle Using AI Technology: Hear the Results" (in en). https://people.com/movies/val-kilmer-gets-his-voice-back-after-throat-cancer-battle-using-ai-technology-hear-the-results/.
↑ ^8.0 ^8.1 ^8.2 ^8.3 ^8.4 Khanjani, Zahra; Watson, Gabrielle; Janeja, Vandana P. (2021-11-28). "How Deep Are the Fakes? Focusing on Audio Deepfake: A Survey". arXiv:2111.14203 [cs.SD].
↑ Pradhan, Swadhin; Sun, Wei; Baig, Ghufran; Qiu, Lili (2019-09-09). "Combating Replay Attacks Against Voice Assistants". Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 3 (3): 100:1–100:26. doi:10.1145/3351258. https://doi.org/10.1145/3351258.
↑ ^10.0 ^10.1 ^10.2 Ballesteros, Dora M.; Rodriguez-Ortega, Yohanna; Renza, Diego; Arce, Gonzalo (2021-12-01). "Deep4SNet: deep learning for fake speech classification" (in en). Expert Systems with Applications 184. doi:10.1016/j.eswa.2021.115465. ISSN 0957-4174. https://www.sciencedirect.com/science/article/pii/S0957417421008770.
↑ Villalba, Jesus; Lleida, Eduardo (2011). "Preventing replay attacks on speaker verification systems" (in en-US). 2011 Carnahan Conference on Security Technology. pp. 1–8. doi:10.1109/CCST.2011.6095943. ISBN 978-1-4577-0903-6.
↑ Tom, Francis; Jain, Mohit; Dey, Prasenjit (2018-09-02). "End-To-End Audio Replay Attack Detection Using Deep Convolutional Networks with Attention" (in en). Interspeech 2018 (ISCA): 681–685. doi:10.21437/Interspeech.2018-2279. https://www.isca-speech.org/archive/interspeech_2018/tom18_interspeech.html.
↑ Tan, Xu; Qin, Tao; Soong, Frank; Liu, Tie-Yan (2021-07-23). "A Survey on Neural Speech Synthesis". arXiv:2106.15561 [eess.AS].
↑ ^14.0 ^14.1 ^14.2 Almutairi, Zaynab; Elgibreen, Hebah (2022-05-04). "A Review of Modern Audio Deepfake Detection Methods: Challenges and Future Directions" (in en). Algorithms 15 (5): 155. doi:10.3390/a15050155. ISSN 1999-4893.
↑ ^15.0 ^15.1 Oord, Aaron van den; Dieleman, Sander; Zen, Heiga; Simonyan, Karen; Vinyals, Oriol; Graves, Alex; Kalchbrenner, Nal; Senior, Andrew; Kavukcuoglu, Koray (2016-09-19). "WaveNet: A Generative Model for Raw Audio". arXiv:1609.03499 [cs.SD].
↑ Kuchaiev, Oleksii; Li, Jason; Nguyen, Huyen; Hrinchuk, Oleksii; Leary, Ryan; Ginsburg, Boris; Kriman, Samuel; Beliaev, Stanislav; Lavrukhin, Vitaly; Cook, Jack; Castonguay, Patrice (2019-09-13). "NeMo: a toolkit for building AI applications using Neural Modules". arXiv:1909.09577 [cs.LG].
↑ Wang, Yuxuan; Skerry-Ryan, R. J.; Stanton, Daisy; Wu, Yonghui; Weiss, Ron J.; Jaitly, Navdeep; Yang, Zongheng; Xiao, Ying; Chen, Zhifeng; Bengio, Samy; Le, Quoc (2017-04-06). "Tacotron: Towards End-to-End Speech Synthesis". arXiv:1703.10135 [cs.CL].
↑ ^18.0 ^18.1 Prenger, Ryan; Valle, Rafael; Catanzaro, Bryan (2018-10-30). "WaveGlow: A Flow-based Generative Network for Speech Synthesis". arXiv:1811.00002 [cs.SD].
↑ Vasquez, Sean; Lewis, Mike (2019-06-04). "MelNet: A Generative Model for Audio in the Frequency Domain". arXiv:1906.01083 [eess.AS].
↑ ^20.0 ^20.1 Ping, Wei; Peng, Kainan; Gibiansky, Andrew; Arik, Sercan O.; Kannan, Ajay; Narang, Sharan; Raiman, Jonathan; Miller, John (2018-02-22). "Deep Voice 3: Scaling Text-to-Speech with Convolutional Sequence Learning". arXiv:1710.07654 [cs.SD].
↑ Ren, Yi; Ruan, Yangjun; Tan, Xu; Qin, Tao; Zhao, Sheng; Zhao, Zhou; Liu, Tie-Yan (2019-11-20). "FastSpeech: Fast, Robust and Controllable Text to Speech". arXiv:1905.09263 [cs.CL].
↑ Ning, Yishuang; He, Sheng; Wu, Zhiyong; Xing, Chunxiao; Zhang, Liang-Jie (January 2019). "A Review of Deep Learning Based Speech Synthesis" (in en). Applied Sciences 9 (19): 4050. doi:10.3390/app9194050. ISSN 2076-3417.
↑ ^23.0 ^23.1 Rodríguez-Ortega, Yohanna; Ballesteros, Dora María; Renza, Diego (2020). "A Machine Learning Model to Detect Fake Voice". in Florez, Hector; Misra, Sanjay (in en). Applied Informatics. Communications in Computer and Information Science. 1277. Cham: Springer International Publishing. pp. 3–13. doi:10.1007/978-3-030-61702-8_1. ISBN 978-3-030-61702-8. https://link.springer.com/chapter/10.1007/978-3-030-61702-8_1.
↑ Zhang, Mingyang; Wang, Xin; Fang, Fuming; Li, Haizhou; Yamagishi, Junichi (2019-04-07). "Joint training framework for text-to-speech and voice conversion using multi-source Tacotron and WaveNet". arXiv:1903.12389 [eess.AS].
↑ ^25.0 ^25.1 Sercan, Ö Arık; Jitong, Chen; Kainan, Peng; Wei, Ping; Yanqi, Zhou (2018). "Neural Voice Cloning with a Few Samples". Advances in Neural Information Processing Systems (NeurIPS 2018) 31: 10040–10050. 12 October 2018. https://papers.nips.cc/paper/2018/hash/4559912e7a94a9c32b09d894f2bc3c82-Abstract.html.
↑ ^26.0 ^26.1 "| ASVspoof". https://www.asvspoof.org/.
↑ ^27.0 ^27.1 Yi, Jiangyan; Tao, Jianhua; Bai, Ye; Tian, Ziqiang; Fan, Chuaneng (2023). "Audio Deepfake Detection: A Survey". arXiv:2308.14970 [cs.SD].
↑ resemble-ai/Resemblyzer, Resemble AI, 2022-06-30, https://github.com/resemble-ai/Resemblyzer, retrieved 2022-07-01
↑ mendaxfz (2022-06-28), Synthetic-Voice-Detection, https://github.com/mendaxfz/Synthetic-Voice-Detection, retrieved 2022-07-01
↑ HUA, Guang (2022-06-29), End-to-End Synthetic Speech Detection, https://github.com/ghuawhu/end-to-end-synthetic-speech-detection, retrieved 2022-07-01
↑ Caramancion, Kevin Matthe (June 2022). "An Exploration of Mis/Disinformation in Audio Format Disseminated in Podcasts: Case Study of Spotify". 2022 IEEE International IOT, Electronics and Mechatronics Conference (IEMTRONICS). pp. 1–6. doi:10.1109/IEMTRONICS55184.2022.9795760. ISBN 978-1-6654-8684-2.
↑ ^32.0 ^32.1 Chen, Tianxiang; Kumar, Avrosh; Nagarsheth, Parav; Sivaraman, Ganesh; Khoury, Elie (2020-11-01). "Generalization of Audio Deepfake Detection" (in en). The Speaker and Language Recognition Workshop (Odyssey 2020) (ISCA): 132–137. doi:10.21437/Odyssey.2020-19. https://www.isca-speech.org/archive/odyssey_2020/chen20_odyssey.html.
↑ Suwajanakorn, Supasorn; Seitz, Steven M.; Kemelmacher-Shlizerman, Ira (2017-07-20). "Synthesizing Obama: learning lip sync from audio". ACM Transactions on Graphics 36 (4): 95:1–95:13. doi:10.1145/3072959.3073640. ISSN 0730-0301. https://doi.org/10.1145/3072959.3073640.
↑ Stupp, Catherine. "Fraudsters Used AI to Mimic CEO's Voice in Unusual Cybercrime Case" (in en-US). WSJ. https://www.wsj.com/articles/fraudsters-use-ai-to-mimic-ceos-voice-in-unusual-cybercrime-case-11567157402.
↑ Brewster, Thomas. "Fraudsters Cloned Company Director's Voice In $35 Million Bank Heist, Police Find" (in en). https://www.forbes.com/sites/thomasbrewster/2021/10/14/huge-bank-fraud-uses-deep-fake-voice-tech-to-steal-millions/.
↑ "Generative AI is making voice scams easier to believe". Axios. 13 June 2023. https://www.axios.com/2023/06/13/generative-ai-voice-scams-easier-identity-fraud.
↑ Bunn, Amy (15 May 2023). "Artificial Imposters—Cybercriminals Turn to AI Voice Cloning for a New Breed of Scam". McAfee Blog. https://www.mcafee.com/blogs/privacy-identity-protection/artificial-imposters-cybercriminals-turn-to-ai-voice-cloning-for-a-new-breed-of-scam/.
↑ Cox, Joseph (23 February 2023). "How I Broke Into a Bank Account With an AI-Generated Voice" (in en). Vice. https://www.vice.com/en/article/how-i-broke-into-a-bank-account-with-an-ai-generated-voice/.
↑ Evershed, Nick; Taylor, Josh (16 March 2023). "AI can fool voice recognition used to verify identity by Centrelink and Australian tax office". The Guardian. https://www.theguardian.com/technology/2023/mar/16/voice-system-used-to-verify-identity-by-centrelink-can-be-fooled-by-ai.
↑ "Scammers use AI to enhance their family emergency schemes" (in en). 2023-03-17. https://consumer.ftc.gov/consumer-alerts/2023/03/scammers-use-ai-enhance-their-family-emergency-schemes.
↑ "Deepfake audio of Sir Keir Starmer released on first day of Labour conference". https://news.sky.com/story/labour-faces-political-attack-after-deepfake-audio-is-posted-of-sir-keir-starmer-12980181.
↑ Meaker, Morgan. "Slovakia's Election Deepfakes Show AI is a Danger to Democracy". Wired. https://www.wired.com/story/slovakias-election-deepfakes-show-ai-is-a-danger-to-democracy/.
↑ "Political consultant behind fake Biden AI robocall faces charges in New Hampshire". https://www.cnn.com/2024/05/23/politics/new-hampshire-ai-robocall-biden-charges/index.html.
↑ "Political consultant accused of hiring magician to spam voters with Biden deepfake calls" (in en). 2024-03-15. https://lawandcrime.com/high-profile/political-consultant-hired-transient-magician-to-spam-voters-with-deepfake-calls-using-joe-bidens-voice-urging-them-not-to-cast-ballot-lawsuit/.
↑ David Wright; Brian Fung; Brian Fung (February 6, 2024). "Fake Biden robocall linked to Texas-based companies, New Hampshire attorney general announces". CNN. https://www.cnn.com/2024/02/06/tech/nh-ag-robocall-update/index.html.
↑ Brian Fung (February 8, 2024). "FCC votes to ban scam robocalls that use AI-generated voices". CNN. https://www.cnn.com/2024/02/08/tech/fcc-scam-robocalls-ai-generated-voices/index.html.
↑ "FCC Makes AI-Generated Voices in Robocalls Illegal | Federal Communications Commission" (in en). 2024-02-08. https://www.fcc.gov/document/fcc-makes-ai-generated-voices-robocalls-illegal.
↑ Kramer, Marcia (2024-02-26). "Steve Kramer explains why he used AI to impersonate President Biden in New Hampshire - CBS New York" (in en-US). https://www.cbsnews.com/newyork/news/steve-kramer-explains-why-he-used-ai-to-impersonate-president-biden-in-new-hampshire/.
↑ "A political consultant faces charges and fines for Biden deepfake robocalls". https://www.npr.org/2024/05/23/nx-s1-4977582/fcc-ai-deepfake-robocall-biden-new-hampshire-political-operative.
↑ Cite error: Invalid <ref> tag; no text was provided for refs named forbes
↑ Growcoot, Matt (26 September 2024). "YouTuber Has His Voice AI-Cloned and Used by a Company Without Consent". Peta Pixel. https://petapixel.com/2024/09/26/youtuber-has-his-voice-ai-cloned-and-used-by-a-company-without-consent-jeff-geerling/.
↑ "They stole my voice with AI (UPDATE: Elecrow responded)" (in en). 21 September 2024. https://www.youtube.com/watch?v=UMofZIT9FcQ.
↑ "The dark side of AI voice cloning" (in en). 25 September 2024. https://www.youtube.com/watch?v=vHuPWQz9AlI.
↑ Forristal, Lauren (21 October 2025). "YouTube's likeness-detection technology has officially launched". Tech Crunch. https://techcrunch.com/2025/10/21/youtubes-likeness-detection-technology-has-officially-launched/.
↑ Desreumaux, Geoff (21 October 2025). "YouTube Officially Rolls Out Likeness-Detection Technology to Creators". We are Social Media. https://wersm.com/youtube-officially-rolls-out-likeness-detection-technology-to-creators/.
↑ Deepfake It Till You Make It (Report). Graphika. February 2023. https://public-assets.graphika.com/reports/graphika-report-deepfake-it-till-you-make-it.pdf.
↑ Kong, Jungil; Kim, Jaehyeon; Bae, Jaekyoung (2020-10-23). "HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis". arXiv:2010.05646 [cs.SD].
↑ Kumar, Kundan; Kumar, Rithesh; de Boissiere, Thibault; Gestin, Lucas; Teoh, Wei Zhen; Sotelo, Jose; de Brebisson, Alexandre; Bengio, Yoshua; Courville, Aaron (2019-12-08). "MelGAN: Generative Adversarial Networks for Conditional Waveform Synthesis". arXiv:1910.06711 [eess.AS].
↑ Ng, Andrew (April 1, 2020). "Voice Cloning for the Masses". https://www.deeplearning.ai/the-batch/voice-cloning-for-the-masses/.
↑ Chandraseta, Rionaldi (January 21, 2021). "Generate Your Favourite Characters' Voice Lines using Machine Learning". https://towardsdatascience.com/generate-your-favourite-characters-voice-lines-using-machine-learning-c0939270c0c6.
↑ "Audio samples from "Semi-Supervised Training for Improving Data Efficiency in End-to-End Speech Synthesis"". 2018-08-30. https://google.github.io/tacotron/publications/semisupervised/index.html.
↑ Temitope, Yusuf (December 10, 2024). "15.ai Creator reveals journey from MIT Project to internet phenomenon". The Guardian (Lagos, Nigeria). https://guardian.ng/technology/15-ai-creator-reveals-journey-from-mit-project-to-internet-phenomenon/.
↑ Kurosawa, Yuki (January 19, 2021). "ゲームキャラ音声読み上げソフト「15.ai」公開中。『Undertale』や『Portal』のキャラに好きなセリフを言ってもらえる" (in ja). https://automaton-media.com/articles/newsjp/20210119-149494/.
↑ "Navigating the Challenges and Opportunities of Synthetic Voices". March 9, 2024. https://openai.com/index/navigating-the-challenges-and-opportunities-of-synthetic-voices/.
↑ Babbel.com; GmbH, Lesson Nine. "The 10 Most Spoken Languages In The World" (in en). https://www.babbel.com/en/magazine/the-10-most-spoken-languages-in-the-world.
↑ Najafian, Maryam; Russell, Martin (September 2020). "Automatic accent identification as an analytical tool for accent robust automatic speech recognition" (in en). Speech Communication 122: 44–55. doi:10.1016/j.specom.2020.05.003. https://linkinghub.elsevier.com/retrieve/pii/S0167639317300043.
↑ Liu, Xiao; Zhang, Fanjin; Hou, Zhenyu; Mian, Li; Wang, Zhaoyu; Zhang, Jing; Tang, Jie (2021). "Self-supervised Learning: Generative or Contrastive". IEEE Transactions on Knowledge and Data Engineering 35 (1): 857–876. doi:10.1109/TKDE.2021.3090866. ISSN 1558-2191.
↑ Rimon, Inbal; Gal, Oren; Permuter, Haim (2025). "Unmasking Deepfakes: Leveraging Augmentations and Features Variability for Deepfake Speech Detection". arXiv:2501.05545 [cs.SD].
↑ Rashid, Md Mamunur; Lee, Suk-Hwan; Kwon, Ki-Ryong (2021). "Blockchain Technology for Combating Deepfake and Protect Video/Image Integrity". Journal of Korea Multimedia Society 24 (8): 1044–1058. doi:10.9717/kmms.2021.24.8.1044. ISSN 1229-7771. http://koreascience.or.kr/article/JAKO202125761199587.page.
↑ Fraga-Lamas, Paula; Fernández-Caramés, Tiago M. (2019-10-20). "Fake News, Disinformation, and Deepfakes: Leveraging Distributed Ledger Technologies and Blockchain to Combat Digital Deception and Counterfeit Reality". IT Professional 22 (2): 53–59. doi:10.1109/MITP.2020.2977589.
↑ Ki Chan, Christopher Chun; Kumar, Vimal; Delaney, Steven; Gochoo, Munkhjargal (September 2020). "Combating Deepfakes: Multi-LSTM and Blockchain as Proof of Authenticity for Digital Media". 2020 IEEE / ITU International Conference on Artificial Intelligence for Good (AI4G). pp. 55–62. doi:10.1109/AI4G50087.2020.9311067. ISBN 978-1-7281-7031-2.
↑ Mittal, Trisha; Bhattacharya, Uttaran; Chandra, Rohan; Bera, Aniket; Manocha, Dinesh (2020-10-12), "Emotions Don't Lie: An Audio-Visual Deepfake Detection Method using Affective Cues", Proceedings of the 28th ACM International Conference on Multimedia (New York, NY, USA: Association for Computing Machinery): pp. 2823–2832, doi:10.1145/3394171.3413570, ISBN 978-1-4503-7988-5, https://doi.org/10.1145/3394171.3413570, retrieved 2022-06-29
↑ Conti, Emanuele; Salvi, Davide; Borrelli, Clara; Hosler, Brian; Bestagini, Paolo; Antonacci, Fabio; Sarti, Augusto; Stamm, Matthew C. et al. (2022-05-23). "Deepfake Speech Detection Through Emotion Recognition: A Semantic Approach". ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). Singapore, Singapore: IEEE. pp. 8962–8966. doi:10.1109/ICASSP43922.2022.9747186. ISBN 978-1-6654-0540-9.
↑ Hosler, Brian; Salvi, Davide; Murray, Anthony; Antonacci, Fabio; Bestagini, Paolo; Tubaro, Stefano; Stamm, Matthew C. (June 2021). "Do Deepfakes Feel Emotions? A Semantic Approach to Detecting Deepfakes Via Emotional Inconsistencies". 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW). Nashville, TN, USA: IEEE. pp. 1013–1022. doi:10.1109/CVPRW53098.2021.00112. ISBN 978-1-6654-4899-4.
↑ Müller, Nicolas M.; Czempin, Pavel; Dieckmann, Franziska; Froghyar, Adam; Böttinger, Konstantin (2022-04-21). "Does Audio Deepfake Detection Generalize?". arXiv:2203.16263 [cs.SD].
↑ Zhang, You; Jiang, Fei; Duan, Zhiyao (2021). "One-Class Learning Towards Synthetic Voice Spoofing Detection". IEEE Signal Processing Letters 28: 937–941. doi:10.1109/LSP.2021.3076358. ISSN 1558-2361. Bibcode: 2021ISPL...28..937Z.
↑ "China Cyberspace Administration Deep Synthesis Regulations". January 2023. http://www.cac.gov.cn/.
↑ ^78.0 ^78.1 "SAM.gov". https://sam.gov/opp/a8883be78ac1442e8a22924011fc13c4/view.
↑ "The SemaFor Program". https://www.darpa.mil/program/semantic-forensics.
↑ "The DARPA MediFor Program". https://govtribe.com/file/government-file/darpabaa1558-darpa-baa-15-58-medifor-dot-pdf.
↑ "The MediFor Program". https://www.darpa.mil/program/media-forensics.
↑ "DARPA Announces Research Teams Selected to Semantic Forensics Program". https://www.darpa.mil/news-events/2021-03-02.
↑ "PREMIER" (in en-US). https://sites.google.com/unitn.it/premier/.
↑ "PREMIER - Project" (in en-US). https://sites.google.com/unitn.it/premier/project.
↑ Bird, Jordan J.; Lotfi, Ahmad (2023). "Real-time Detection of AI-Generated Speech for DeepFake Voice Conversion". arXiv:2308.12734 [cs.SD].
↑ Yamagishi, Junichi; Wang, Xin; Todisco, Massimiliano; Sahidullah, Md; Patino, Jose; Nautsch, Andreas; Liu, Xuechen; Lee, Kong Aik; Kinnunen, Tomi; Evans, Nicholas; Delgado, Héctor (2021-09-01). "ASVspoof 2021: accelerating progress in spoofed and deepfake speech detection". arXiv:2109.00537 [eess.AS].
↑ "Audio Deepfake Detection: ICASSP 2022" (in en). 2021-12-17. https://signalprocessingsociety.org/publications-resources/data-challenges/audio-deepfake-detection-icassp-2022.
↑ Yi, Jiangyan; Fu, Ruibo; Tao, Jianhua; Nie, Shuai; Ma, Haoxin; Wang, Chenglong; Wang, Tao; Tian, Zhengkun; Bai, Ye; Fan, Cunhang; Liang, Shan (2022-02-26). "ADD 2022: the First Audio Deep Synthesis Detection Challenge". arXiv:2202.08433 [cs.SD].
↑ "Joint Workshop for the Blizzard Challenge and Voice Conversion Challenge 2020 - SynSIG". https://www.synsig.org/index.php/Joint_Workshop_for_the_Blizzard_Challenge_and_Voice_Conversion_Challenge_2020.
↑ "'Stop using my voice' - ScotRail's new announcer is my AI clone" (in en-GB). 2025-05-27. https://www.bbc.com/news/articles/cn4q7984nq1o.
↑ "Voiceover artist Gayanne Potter urging ScotRail to remove her voice from new AI announcements" (in en). https://news.sky.com/story/voiceover-artist-gayanne-potter-urging-scotrail-to-remove-her-voice-from-new-ai-announcements-13375535.
↑ English, David Leask | Paul (2025-05-27). "Actress feels 'cheated' by ScotRail's new AI voice announcer" (in en). https://www.thetimes.com/uk/scotland/article/scotland-railway-ai-voice-iona-63gz9xdg9.
↑ "I've voiced ScotRail trains for 20 years and was replaced with AI without being told" (in en). 2025-05-30. https://www.thenational.scot/news/25204424.voice-scotrail-20-years-replaced-ai/.
↑ "ScotRail to replace controversial AI voice on trains" (in en-GB). 2025-08-25. https://www.bbc.com/news/articles/c5ypzzyjgego.

0.00

(0 votes)

Original source: https://en.wikipedia.org/wiki/Audio deepfake. Read more

[1] Smith, Hannah; Mansted, Katherine (April 1, 2020). Weaponised deep fakes: National security and democracy. 28. Australian Strategic Policy Institute. pp. 11–13.

[2] Lyu, Siwei (2020). "Deepfake Detection: Current Challenges and Next Steps" (in en-US). 2020 IEEE International Conference on Multimedia & Expo Workshops (ICMEW). pp. 1–6. doi:10.1109/icmew46912.2020.9105991. ISBN 978-1-7281-1485-9.

[:0-3] 3.0 ^3.1 Diakopoulos, Nicholas; Johnson, Deborah (June 2020). "Anticipating and addressing the ethical implications of deepfakes in the context of elections" (in en). New Media & Society 23 (7): 2072–2098. 2020-06-05. doi:10.1177/1461444820925811. ISSN 1461-4448. http://journals.sagepub.com/doi/10.1177/1461444820925811.

[4] Murphy, Margi (20 February 2024). "Deepfake Audio Boom Exploits One Billion-Dollar Startup's AI". Bloomberg. https://www.bloomberg.com/news/articles/2024-02-21/biden-deepfake-and-other-audio-fakes-were-made-with-elevenlabs-ai.

[:10-5] Chadha, Anupama; Kumar, Vaibhav; Kashyap, Sonu; Gupta, Mayank (2021), Singh, Pradeep Kumar; Wierzchoń, Sławomir T.; Tanwar, Sudeep et al., eds., "Deepfake: An Overview" (in en), Proceedings of Second International Conference on Computing, Communications, and Cyber-Security, Lecture Notes in Networks and Systems (Singapore: Springer Singapore) 203: pp. 557–566, doi:10.1007/978-981-16-0733-2_39, ISBN 978-981-16-0732-5, https://link.springer.com/10.1007/978-981-16-0733-2_39, retrieved 2022-06-29

[:11-6] "AI gave Val Kilmer his voice back. But critics worry the technology could be misused." (in en-US). Washington Post. ISSN 0190-8286. https://www.washingtonpost.com/technology/2021/08/18/val-kilmer-ai-voice-cloning/.

[7] Etienne, Vanessa (August 19, 2021). "Val Kilmer Gets His Voice Back After Throat Cancer Battle Using AI Technology: Hear the Results" (in en). https://people.com/movies/val-kilmer-gets-his-voice-back-after-throat-cancer-battle-using-ai-technology-hear-the-results/.

[:3-8] 8.0 ^8.1 ^8.2 ^8.3 ^8.4 Khanjani, Zahra; Watson, Gabrielle; Janeja, Vandana P. (2021-11-28). "How Deep Are the Fakes? Focusing on Audio Deepfake: A Survey". arXiv:2111.14203 [cs.SD].

[9] Pradhan, Swadhin; Sun, Wei; Baig, Ghufran; Qiu, Lili (2019-09-09). "Combating Replay Attacks Against Voice Assistants". Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 3 (3): 100:1–100:26. doi:10.1145/3351258. https://doi.org/10.1145/3351258.

[:1-10] 10.0 ^10.1 ^10.2 Ballesteros, Dora M.; Rodriguez-Ortega, Yohanna; Renza, Diego; Arce, Gonzalo (2021-12-01). "Deep4SNet: deep learning for fake speech classification" (in en). Expert Systems with Applications 184. doi:10.1016/j.eswa.2021.115465. ISSN 0957-4174. https://www.sciencedirect.com/science/article/pii/S0957417421008770.

[11] Villalba, Jesus; Lleida, Eduardo (2011). "Preventing replay attacks on speaker verification systems" (in en-US). 2011 Carnahan Conference on Security Technology. pp. 1–8. doi:10.1109/CCST.2011.6095943. ISBN 978-1-4577-0903-6.

[12] Tom, Francis; Jain, Mohit; Dey, Prasenjit (2018-09-02). "End-To-End Audio Replay Attack Detection Using Deep Convolutional Networks with Attention" (in en). Interspeech 2018 (ISCA): 681–685. doi:10.21437/Interspeech.2018-2279. https://www.isca-speech.org/archive/interspeech_2018/tom18_interspeech.html.

[13] Tan, Xu; Qin, Tao; Soong, Frank; Liu, Tie-Yan (2021-07-23). "A Survey on Neural Speech Synthesis". arXiv:2106.15561 [eess.AS].

[:2-14] 14.0 ^14.1 ^14.2 Almutairi, Zaynab; Elgibreen, Hebah (2022-05-04). "A Review of Modern Audio Deepfake Detection Methods: Challenges and Future Directions" (in en). Algorithms 15 (5): 155. doi:10.3390/a15050155. ISSN 1999-4893.

[:6-15] 15.0 ^15.1 Oord, Aaron van den; Dieleman, Sander; Zen, Heiga; Simonyan, Karen; Vinyals, Oriol; Graves, Alex; Kalchbrenner, Nal; Senior, Andrew; Kavukcuoglu, Koray (2016-09-19). "WaveNet: A Generative Model for Raw Audio". arXiv:1609.03499 [cs.SD].

[16] Kuchaiev, Oleksii; Li, Jason; Nguyen, Huyen; Hrinchuk, Oleksii; Leary, Ryan; Ginsburg, Boris; Kriman, Samuel; Beliaev, Stanislav; Lavrukhin, Vitaly; Cook, Jack; Castonguay, Patrice (2019-09-13). "NeMo: a toolkit for building AI applications using Neural Modules". arXiv:1909.09577 [cs.LG].

[17] Wang, Yuxuan; Skerry-Ryan, R. J.; Stanton, Daisy; Wu, Yonghui; Weiss, Ron J.; Jaitly, Navdeep; Yang, Zongheng; Xiao, Ying; Chen, Zhifeng; Bengio, Samy; Le, Quoc (2017-04-06). "Tacotron: Towards End-to-End Speech Synthesis". arXiv:1703.10135 [cs.CL].

[:7-18] 18.0 ^18.1 Prenger, Ryan; Valle, Rafael; Catanzaro, Bryan (2018-10-30). "WaveGlow: A Flow-based Generative Network for Speech Synthesis". arXiv:1811.00002 [cs.SD].

[19] Vasquez, Sean; Lewis, Mike (2019-06-04). "MelNet: A Generative Model for Audio in the Frequency Domain". arXiv:1906.01083 [eess.AS].

[:8-20] 20.0 ^20.1 Ping, Wei; Peng, Kainan; Gibiansky, Andrew; Arik, Sercan O.; Kannan, Ajay; Narang, Sharan; Raiman, Jonathan; Miller, John (2018-02-22). "Deep Voice 3: Scaling Text-to-Speech with Convolutional Sequence Learning". arXiv:1710.07654 [cs.SD].

[21] Ren, Yi; Ruan, Yangjun; Tan, Xu; Qin, Tao; Zhao, Sheng; Zhao, Zhou; Liu, Tie-Yan (2019-11-20). "FastSpeech: Fast, Robust and Controllable Text to Speech". arXiv:1905.09263 [cs.CL].

[22] Ning, Yishuang; He, Sheng; Wu, Zhiyong; Xing, Chunxiao; Zhang, Liang-Jie (January 2019). "A Review of Deep Learning Based Speech Synthesis" (in en). Applied Sciences 9 (19): 4050. doi:10.3390/app9194050. ISSN 2076-3417.

[:5-23] 23.0 ^23.1 Rodríguez-Ortega, Yohanna; Ballesteros, Dora María; Renza, Diego (2020). "A Machine Learning Model to Detect Fake Voice". in Florez, Hector; Misra, Sanjay (in en). Applied Informatics. Communications in Computer and Information Science. 1277. Cham: Springer International Publishing. pp. 3–13. doi:10.1007/978-3-030-61702-8_1. ISBN 978-3-030-61702-8. https://link.springer.com/chapter/10.1007/978-3-030-61702-8_1.

[24] Zhang, Mingyang; Wang, Xin; Fang, Fuming; Li, Haizhou; Yamagishi, Junichi (2019-04-07). "Joint training framework for text-to-speech and voice conversion using multi-source Tacotron and WaveNet". arXiv:1903.12389 [eess.AS].

[:21-25] 25.0 ^25.1 Sercan, Ö Arık; Jitong, Chen; Kainan, Peng; Wei, Ping; Yanqi, Zhou (2018). "Neural Voice Cloning with a Few Samples". Advances in Neural Information Processing Systems (NeurIPS 2018) 31: 10040–10050. 12 October 2018. https://papers.nips.cc/paper/2018/hash/4559912e7a94a9c32b09d894f2bc3c82-Abstract.html.

[:14-26] 26.0 ^26.1 "| ASVspoof". https://www.asvspoof.org/.

[Audio_Deepfake_Detection:_A_Survey-27] 27.0 ^27.1 Yi, Jiangyan; Tao, Jianhua; Bai, Ye; Tian, Ziqiang; Fan, Chuaneng (2023). "Audio Deepfake Detection: A Survey". arXiv:2308.14970 [cs.SD].

[28] resemble-ai/Resemblyzer, Resemble AI, 2022-06-30, https://github.com/resemble-ai/Resemblyzer, retrieved 2022-07-01

[29] xfz (2022-06-28), Synthetic-Voice-Detection, https://github.com/mendaxfz/Synthetic-Voice-Detection, retrieved 2022-07-01

[30] HUA, Guang (2022-06-29), End-to-End Synthetic Speech Detection, https://github.com/ghuawhu/end-to-end-synthetic-speech-detection, retrieved 2022-07-01

[31] Caramancion, Kevin Matthe (June 2022). "An Exploration of Mis/Disinformation in Audio Format Disseminated in Podcasts: Case Study of Spotify". 2022 IEEE International IOT, Electronics and Mechatronics Conference (IEMTRONICS). pp. 1–6. doi:10.1109/IEMTRONICS55184.2022.9795760. ISBN 978-1-6654-8684-2.

[ISCA-32] 32.0 ^32.1 Chen, Tianxiang; Kumar, Avrosh; Nagarsheth, Parav; Sivaraman, Ganesh; Khoury, Elie (2020-11-01). "Generalization of Audio Deepfake Detection" (in en). The Speaker and Language Recognition Workshop (Odyssey 2020) (ISCA): 132–137. doi:10.21437/Odyssey.2020-19. https://www.isca-speech.org/archive/odyssey_2020/chen20_odyssey.html.

[33] Suwajanakorn, Supasorn; Seitz, Steven M.; Kemelmacher-Shlizerman, Ira (2017-07-20). "Synthesizing Obama: learning lip sync from audio". ACM Transactions on Graphics 36 (4): 95:1–95:13. doi:10.1145/3072959.3073640. ISSN 0730-0301. https://doi.org/10.1145/3072959.3073640.

[34] Stupp, Catherine. "Fraudsters Used AI to Mimic CEO's Voice in Unusual Cybercrime Case" (in en-US). WSJ. https://www.wsj.com/articles/fraudsters-use-ai-to-mimic-ceos-voice-in-unusual-cybercrime-case-11567157402.

[:15-35] Brewster, Thomas. "Fraudsters Cloned Company Director's Voice In $35 Million Bank Heist, Police Find" (in en). https://www.forbes.com/sites/thomasbrewster/2021/10/14/huge-bank-fraud-uses-deep-fake-voice-tech-to-steal-millions/.

[36] "Generative AI is making voice scams easier to believe". Axios. 13 June 2023. https://www.axios.com/2023/06/13/generative-ai-voice-scams-easier-identity-fraud.

[37] Bunn, Amy (15 May 2023). "Artificial Imposters—Cybercriminals Turn to AI Voice Cloning for a New Breed of Scam". McAfee Blog. https://www.mcafee.com/blogs/privacy-identity-protection/artificial-imposters-cybercriminals-turn-to-ai-voice-cloning-for-a-new-breed-of-scam/.

[38] Cox, Joseph (23 February 2023). "How I Broke Into a Bank Account With an AI-Generated Voice" (in en). Vice. https://www.vice.com/en/article/how-i-broke-into-a-bank-account-with-an-ai-generated-voice/.

[39] Evershed, Nick; Taylor, Josh (16 March 2023). "AI can fool voice recognition used to verify identity by Centrelink and Australian tax office". The Guardian. https://www.theguardian.com/technology/2023/mar/16/voice-system-used-to-verify-identity-by-centrelink-can-be-fooled-by-ai.

[40] "Scammers use AI to enhance their family emergency schemes" (in en). 2023-03-17. https://consumer.ftc.gov/consumer-alerts/2023/03/scammers-use-ai-enhance-their-family-emergency-schemes.

[41] "Deepfake audio of Sir Keir Starmer released on first day of Labour conference". https://news.sky.com/story/labour-faces-political-attack-after-deepfake-audio-is-posted-of-sir-keir-starmer-12980181.

[42] Meaker, Morgan. "Slovakia's Election Deepfakes Show AI is a Danger to Democracy". Wired. https://www.wired.com/story/slovakias-election-deepfakes-show-ai-is-a-danger-to-democracy/.

[43] "Political consultant behind fake Biden AI robocall faces charges in New Hampshire". https://www.cnn.com/2024/05/23/politics/new-hampshire-ai-robocall-biden-charges/index.html.

[44] "Political consultant accused of hiring magician to spam voters with Biden deepfake calls" (in en). 2024-03-15. https://lawandcrime.com/high-profile/political-consultant-hired-transient-magician-to-spam-voters-with-deepfake-calls-using-joe-bidens-voice-urging-them-not-to-cast-ballot-lawsuit/.

[45] David Wright; Brian Fung; Brian Fung (February 6, 2024). "Fake Biden robocall linked to Texas-based companies, New Hampshire attorney general announces". CNN. https://www.cnn.com/2024/02/06/tech/nh-ag-robocall-update/index.html.

[46] Brian Fung (February 8, 2024). "FCC votes to ban scam robocalls that use AI-generated voices". CNN. https://www.cnn.com/2024/02/08/tech/fcc-scam-robocalls-ai-generated-voices/index.html.

[47] "FCC Makes AI-Generated Voices in Robocalls Illegal | Federal Communications Commission" (in en). 2024-02-08. https://www.fcc.gov/document/fcc-makes-ai-generated-voices-robocalls-illegal.

[48] Kramer, Marcia (2024-02-26). "Steve Kramer explains why he used AI to impersonate President Biden in New Hampshire - CBS New York" (in en-US). https://www.cbsnews.com/newyork/news/steve-kramer-explains-why-he-used-ai-to-impersonate-president-biden-in-new-hampshire/.

[49] "A political consultant faces charges and fines for Biden deepfake robocalls". https://www.npr.org/2024/05/23/nx-s1-4977582/fcc-ai-deepfake-robocall-biden-new-hampshire-political-operative.

[forbes-50] Cite error: Invalid <ref> tag; no text was provided for refs named forbes

[51] Growcoot, Matt (26 September 2024). "YouTuber Has His Voice AI-Cloned and Used by a Company Without Consent". Peta Pixel. https://petapixel.com/2024/09/26/youtuber-has-his-voice-ai-cloned-and-used-by-a-company-without-consent-jeff-geerling/.

[52] "They stole my voice with AI (UPDATE: Elecrow responded)" (in en). 21 September 2024. https://www.youtube.com/watch?v=UMofZIT9FcQ.

[53] "The dark side of AI voice cloning" (in en). 25 September 2024. https://www.youtube.com/watch?v=vHuPWQz9AlI.

[54] Forristal, Lauren (21 October 2025). "YouTube's likeness-detection technology has officially launched". Tech Crunch. https://techcrunch.com/2025/10/21/youtubes-likeness-detection-technology-has-officially-launched/.

[55] Desreumaux, Geoff (21 October 2025). "YouTube Officially Rolls Out Likeness-Detection Technology to Creators". We are Social Media. https://wersm.com/youtube-officially-rolls-out-likeness-detection-technology-to-creators/.

[56] Deepfake It Till You Make It (Report). Graphika. February 2023. https://public-assets.graphika.com/reports/graphika-report-deepfake-it-till-you-make-it.pdf.

[57] Kong, Jungil; Kim, Jaehyeon; Bae, Jaekyoung (2020-10-23). "HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis". arXiv:2010.05646 [cs.SD].

[58] Kumar, Kundan; Kumar, Rithesh; de Boissiere, Thibault; Gestin, Lucas; Teoh, Wei Zhen; Sotelo, Jose; de Brebisson, Alexandre; Bengio, Yoshua; Courville, Aaron (2019-12-08). "MelGAN: Generative Adversarial Networks for Conditional Waveform Synthesis". arXiv:1910.06711 [eess.AS].

[59] Ng, Andrew (April 1, 2020). "Voice Cloning for the Masses". https://www.deeplearning.ai/the-batch/voice-cloning-for-the-masses/.

[60] Chandraseta, Rionaldi (January 21, 2021). "Generate Your Favourite Characters' Voice Lines using Machine Learning". https://towardsdatascience.com/generate-your-favourite-characters-voice-lines-using-machine-learning-c0939270c0c6.

[61] "Audio samples from "Semi-Supervised Training for Improving Data Efficiency in End-to-End Speech Synthesis"". 2018-08-30. https://google.github.io/tacotron/publications/semisupervised/index.html.

[62] Temitope, Yusuf (December 10, 2024). "15.ai Creator reveals journey from MIT Project to internet phenomenon". The Guardian (Lagos, Nigeria). https://guardian.ng/technology/15-ai-creator-reveals-journey-from-mit-project-to-internet-phenomenon/.

[63] Kurosawa, Yuki (January 19, 2021). "ゲームキャラ音声読み上げソフト「15.ai」公開中。『Undertale』や『Portal』のキャラに好きなセリフを言ってもらえる" (in ja). https://automaton-media.com/articles/newsjp/20210119-149494/.

[64] "Navigating the Challenges and Opportunities of Synthetic Voices". March 9, 2024. https://openai.com/index/navigating-the-challenges-and-opportunities-of-synthetic-voices/.

[65] Babbel.com; GmbH, Lesson Nine. "The 10 Most Spoken Languages In The World" (in en). https://www.babbel.com/en/magazine/the-10-most-spoken-languages-in-the-world.

[66] Najafian, Maryam; Russell, Martin (September 2020). "Automatic accent identification as an analytical tool for accent robust automatic speech recognition" (in en). Speech Communication 122: 44–55. doi:10.1016/j.specom.2020.05.003. https://linkinghub.elsevier.com/retrieve/pii/S0167639317300043.

[67] Liu, Xiao; Zhang, Fanjin; Hou, Zhenyu; Mian, Li; Wang, Zhaoyu; Zhang, Jing; Tang, Jie (2021). "Self-supervised Learning: Generative or Contrastive". IEEE Transactions on Knowledge and Data Engineering 35 (1): 857–876. doi:10.1109/TKDE.2021.3090866. ISSN 1558-2191.

[68] Rimon, Inbal; Gal, Oren; Permuter, Haim (2025). "Unmasking Deepfakes: Leveraging Augmentations and Features Variability for Deepfake Speech Detection". arXiv:2501.05545 [cs.SD].

[:17-69] Rashid, Md Mamunur; Lee, Suk-Hwan; Kwon, Ki-Ryong (2021). "Blockchain Technology for Combating Deepfake and Protect Video/Image Integrity". Journal of Korea Multimedia Society 24 (8): 1044–1058. doi:10.9717/kmms.2021.24.8.1044. ISSN 1229-7771. http://koreascience.or.kr/article/JAKO202125761199587.page.

[:18-70] Fraga-Lamas, Paula; Fernández-Caramés, Tiago M. (2019-10-20). "Fake News, Disinformation, and Deepfakes: Leveraging Distributed Ledger Technologies and Blockchain to Combat Digital Deception and Counterfeit Reality". IT Professional 22 (2): 53–59. doi:10.1109/MITP.2020.2977589.

[:19-71] Ki Chan, Christopher Chun; Kumar, Vimal; Delaney, Steven; Gochoo, Munkhjargal (September 2020). "Combating Deepfakes: Multi-LSTM and Blockchain as Proof of Authenticity for Digital Media". 2020 IEEE / ITU International Conference on Artificial Intelligence for Good (AI4G). pp. 55–62. doi:10.1109/AI4G50087.2020.9311067. ISBN 978-1-7281-7031-2.

[72] Mittal, Trisha; Bhattacharya, Uttaran; Chandra, Rohan; Bera, Aniket; Manocha, Dinesh (2020-10-12), "Emotions Don't Lie: An Audio-Visual Deepfake Detection Method using Affective Cues", Proceedings of the 28th ACM International Conference on Multimedia (New York, NY, USA: Association for Computing Machinery): pp. 2823–2832, doi:10.1145/3394171.3413570, ISBN 978-1-4503-7988-5, https://doi.org/10.1145/3394171.3413570, retrieved 2022-06-29

[73] Conti, Emanuele; Salvi, Davide; Borrelli, Clara; Hosler, Brian; Bestagini, Paolo; Antonacci, Fabio; Sarti, Augusto; Stamm, Matthew C. et al. (2022-05-23). "Deepfake Speech Detection Through Emotion Recognition: A Semantic Approach". ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). Singapore, Singapore: IEEE. pp. 8962–8966. doi:10.1109/ICASSP43922.2022.9747186. ISBN 978-1-6654-0540-9.

[74] Hosler, Brian; Salvi, Davide; Murray, Anthony; Antonacci, Fabio; Bestagini, Paolo; Tubaro, Stefano; Stamm, Matthew C. (June 2021). "Do Deepfakes Feel Emotions? A Semantic Approach to Detecting Deepfakes Via Emotional Inconsistencies". 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW). Nashville, TN, USA: IEEE. pp. 1013–1022. doi:10.1109/CVPRW53098.2021.00112. ISBN 978-1-6654-4899-4.

[75] Müller, Nicolas M.; Czempin, Pavel; Dieckmann, Franziska; Froghyar, Adam; Böttinger, Konstantin (2022-04-21). "Does Audio Deepfake Detection Generalize?". arXiv:2203.16263 [cs.SD].

[76] Zhang, You; Jiang, Fei; Duan, Zhiyao (2021). "One-Class Learning Towards Synthetic Voice Spoofing Detection". IEEE Signal Processing Letters 28: 937–941. doi:10.1109/LSP.2021.3076358. ISSN 1558-2361. Bibcode: 2021ISPL...28..937Z.

[77] "China Cyberspace Administration Deep Synthesis Regulations". January 2023. http://www.cac.gov.cn/.

[:9-78] 78.0 ^78.1 "SAM.gov". https://sam.gov/opp/a8883be78ac1442e8a22924011fc13c4/view.

[79] "The SemaFor Program". https://www.darpa.mil/program/semantic-forensics.

[80] "The DARPA MediFor Program". https://govtribe.com/file/government-file/darpabaa1558-darpa-baa-15-58-medifor-dot-pdf.

[81] "The MediFor Program". https://www.darpa.mil/program/media-forensics.

[82] "DARPA Announces Research Teams Selected to Semantic Forensics Program". https://www.darpa.mil/news-events/2021-03-02.

[83] "PREMIER" (in en-US). https://sites.google.com/unitn.it/premier/.

[:20-84] "PREMIER - Project" (in en-US). https://sites.google.com/unitn.it/premier/project.

[85] Bird, Jordan J.; Lotfi, Ahmad (2023). "Real-time Detection of AI-Generated Speech for DeepFake Voice Conversion". arXiv:2308.12734 [cs.SD].

[86] Yamagishi, Junichi; Wang, Xin; Todisco, Massimiliano; Sahidullah, Md; Patino, Jose; Nautsch, Andreas; Liu, Xuechen; Lee, Kong Aik; Kinnunen, Tomi; Evans, Nicholas; Delgado, Héctor (2021-09-01). "ASVspoof 2021: accelerating progress in spoofed and deepfake speech detection". arXiv:2109.00537 [eess.AS].

[87] "Audio Deepfake Detection: ICASSP 2022" (in en). 2021-12-17. https://signalprocessingsociety.org/publications-resources/data-challenges/audio-deepfake-detection-icassp-2022.

[88] Yi, Jiangyan; Fu, Ruibo; Tao, Jianhua; Nie, Shuai; Ma, Haoxin; Wang, Chenglong; Wang, Tao; Tian, Zhengkun; Bai, Ye; Fan, Cunhang; Liang, Shan (2022-02-26). "ADD 2022: the First Audio Deep Synthesis Detection Challenge". arXiv:2202.08433 [cs.SD].

[89] "Joint Workshop for the Blizzard Challenge and Voice Conversion Challenge 2020 - SynSIG". https://www.synsig.org/index.php/Joint_Workshop_for_the_Blizzard_Challenge_and_Voice_Conversion_Challenge_2020.

[90] "'Stop using my voice' - ScotRail's new announcer is my AI clone" (in en-GB). 2025-05-27. https://www.bbc.com/news/articles/cn4q7984nq1o.

[91] "Voiceover artist Gayanne Potter urging ScotRail to remove her voice from new AI announcements" (in en). https://news.sky.com/story/voiceover-artist-gayanne-potter-urging-scotrail-to-remove-her-voice-from-new-ai-announcements-13375535.

[92] English, David Leask | Paul (2025-05-27). "Actress feels 'cheated' by ScotRail's new AI voice announcer" (in en). https://www.thetimes.com/uk/scotland/article/scotland-railway-ai-voice-iona-63gz9xdg9.

[93] "I've voiced ScotRail trains for 20 years and was replaced with AI without being told" (in en). 2025-05-30. https://www.thenational.scot/news/25204424.voice-scotrail-20-years-replaced-ai/.

[94] "ScotRail to replace controversial AI voice on trains" (in en-GB). 2025-08-25. https://www.bbc.com/news/articles/c5ypzzyjgego.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]

[15]

[16]

[17]

[18]

[19]

[20]

[21]

[22]

[23]

[24]

[25]

[26]

[27]

[28]

[29]

[30]

[31]

[32]

[33]

[34]

[35]

[36]

[37]

[38]

[39]

[40]

[41]

[42]

[43]

[44]

[45]

[46]

[47]

[48]

[49]

[50]

[51]

[52]

[53]

[54]

[55]

[56]

[57]

[58]

[59]

[60]

[61]

[62]

[63]

[64]

[65]

[66]

[67]

[68]

[69]

[70]

[71]

[72]

[73]

[74]

[75]

[76]

[77]

[78]

[79]

[80]

[81]

[82]

[83]

[84]

[85]

[86]

[87]

[88]

[89]

[90]

[91]

[92]

[93]

[94]

v t e Media manipulation
Context	Bias Crowd psychology Deception Dumbing down False balance Half-truths Machiavellianism Media Obfuscation Orwellian Persuasion Psychological manipulation
Activism	Alternative media Boycott Call-out culture Cancel culture Civil disobedience Culture jamming Demonstrations Deplatforming Guerrilla communication Hacktivism Internet Media Occupations Petitions Protests Youth
Advertising	Billboards False Infomercials Mobiles Modeling Radio Sex Slogans Testimonials TV Criticism of advertising Annoyance factor
Censorship Media regulation	Books Broadcast law Burying of scholars Catch and kill Corporate Cover-ups Euphemism Films Historical negationism Internet Political Religious Self
Hoaxing	Alternative facts April Fools' Fake news website Fakelore Fictitious entries Forgery Gaslighting List Literary Racial Urban legend Virus
Marketing	Branding Loyalty Product Product placement Publicity Research Word of mouth
News media	Agenda-setting Broadcasting Circus Cycle False balance Infotainment Managing Narcotizing dysfunction Newspeak Pseudo-event Scrum Sensationalism Tabloid journalism
Political campaigning	Advertising Astroturfing Attack ad Canvassing Character assassination Charm offensive Dog-whistle politics Election promises Lawn signs Manifestos Name recognition Negative Push polling Smear campaign Wedge issue
Propaganda	Bandwagon Crowd manipulation Disinformation Fearmongering Framing Indoctrination Loaded language Lying press National mythology Rally 'round the flag effect Techniques
Psychological warfare	Airborne leaflets False flag Fifth column Information (IT) Lawfare Political Public diplomacy Sedition Subversion
Public relations	Cult of personality Doublespeak Non-apology apology Reputation management Slogans Sound bites Spin Transfer Understatement Weasel words
Sales	Cold calling Door-to-door Pricing Product demonstrations Promotion Promotional merchandise Telemarketing
Related	Media franchise Influence of mass media Media bias Concentration of media ownership Media ethics Media bias in the United States Media proprietor Media ecology Media democracy

Anonymous

Search

Audio deepfake

Namespaces

More

Page actions

Contents

Categories

Replay-based

Synthetic-based

Imitation-based

Detection methods

Concerns and countermeasures

State-sponsored propaganda

Open challenges and future research direction

Deepfake generation

Deepfake detection

Defense against deepfakes

Research programs

Public challenges

Extended use without permission

See also

References

Navigation

Navigation

Resources

Help

googletranslator

Navigation

Wiki tools

Wiki tools

Anonymous

Search

Audio deepfake

Categories

Replay-based

Synthetic-based

Imitation-based

Detection methods

Concerns and countermeasures

State-sponsored propaganda

Open challenges and future research direction

Deepfake generation

Deepfake detection

Defense against deepfakes

Research programs

Public challenges

Extended use without permission

See also

References

Navigation

Wiki tools

Page tools

Other projects

Categories