With each passing day, it seems like it’s becoming harder to trust what you see and hear on the internet. Creating deepfakes and fake audio has become easier with the press of a button. New Research by three students of Information School And alum will make it easier to determine the authenticity of the audio clip.
Romit Barua, Gautam Kurma, and Sarah Barrington (all MIMS ’23) first presented their research on voice cloning. final project For the Master of Information Management and Systems degree program. Barrington is now pursuing a Ph.D. Is. Student in I school. Working with Professor Hany Farid, the team looked at various techniques to distinguish genuine from cloned voices designed to impersonate a specific person.
“When this team first contacted me in the early spring of 2022, I told them not to worry about deepfake audio because the voice cloning wasn’t very good and it would be some time before we had to worry about it. I was wrong, and a few months later, AI-powered voice cloning did surprisingly well, which showed how quickly this technology evolves,” said Professor Farid.
“The team has done important work in introducing a range of ideas to address the new threat of deepfake audio.”
To begin, the team first analyzed audio samples of real and simulated voices by looking for perceptual characteristics or patterns that can be visually identified. Through this lens, they focused on looking at audio waveforms and noticed that real human voices often have more pauses and variations in volume throughout the clip. This is because people have a tendency to use filler words and may move around and away from the microphone while recording.
By analyzing these characteristics, the team was able to pinpoint pauses and amplitude (consistency and variation in the voice) as key factors when attempting to determine the authenticity of a voice. However, they also found that this method – while easy to understand – could produce less accurate results.
The team then took a more detailed approach, looking at general spectral characteristics using ‘off-the-shelf’ audio waveform analysis packages. The program extracts over six thousand features – including summary statistics (mean, standard deviation, etc.), regression coefficients and more – before reducing the number to the twenty most important ones. By analyzing these extracted features and comparing them to other audio clips, Barrington, Barua, and Kurma used these features to create a more accurate method.
However, their most accurate results came with their learned features, which involved training a deep-learning model. To do this, the team feeds raw audio into the model, from which it processes and extracts multidimensional representations – called embeddings.
Once generated, the model uses these embeddings to separate real and synthetic audio. This method has consistently outperformed the previous two techniques in terms of accuracy and has recorded less than 0% error in laboratory settings. Despite the high accuracy rate, the team notes that the method can be difficult to understand without proper context.
The team believes this research could address growing concerns about the use of voice cloning and deepfakes for nefarious purposes.
“Voice cloning is one of the first examples where we’re seeing deepfakes with real-world utility, whether it’s bypassing a bank’s biometric verification or calling a family member to ask for money,” Barrington explained. .
“Now not only world leaders and celebrities are in danger, but common people are also in danger. This work represents an important step in developing and evaluating detection systems for the general public in a robust and scalable manner.
After publishing this research online, Barrington, Barua and Kurma were invited to present their findings at various prestigious conferences, such as Nobel Prize Summit And this ieee wives (Workshop in Information Forensics and Security) Conference in Nuremberg, Germany.
“WIFS provided an excellent platform to connect with researchers in digital forensics, deepening our knowledge of cutting-edge forensic techniques through detailed presentations and rich peer discussions,” Kurma said.
,[It also] “We have had a great opportunity to see the research of leaders in our field as well as find common ground for future collaboration in the field of deepfake detection,” Baruah said.
As society grapples with the implications of deepfakes affecting not only world leaders and celebrities but also everyday individuals, this research provides a robust and scalable approach to protecting the general public.
Leveraging perceptual features, spectral analysis, and advanced deep learning models has yielded promising results, and the team’s work is an important step toward restoring trust in online audio content and mitigating the risks posed by advancing the technology.
Source: UC Berkeley
#research #tackles #growing #threat #deepfake #audio