Step by Step Free AI Voice Cloning: Real-life Example

What is AI voice cloning? How can you clone your voice using free AI technology? Follow our comprehensive step-by-step guide with a real-life example get answers to these questions and create your voice clone.

2024-02-29

Step by Step Free AI Voice Cloning: Real-life Example

In this guide, "Step by Step Free AI Voice Cloning: Real-life Example," we show you how to copy voices using AI for free. It's a simple way to learn about copying voices with technology. You can use it for voice phishing simulations, work, or just to see what's possible with AI. The rise of AI-driven voice cloning presents significant cybersecurity risks, leading to financial losses, operational disruptions, and reputational damage.

In 2019, fraudsters used AI to mimic a CEO's voice, convincing a UK-based energy firm's CEO to transfer €220,000 (approximately $243,000) to a fraudulent account.
In September 2023, MGM Resorts faced a cyberattack where scammers impersonated tech support using AI-generated voices, leading to operational shutdowns and an estimated financial impact of $100 million.
In October 2023, a deepfake audio clip falsely portrayed UK Labour leader Keir Starmer making derogatory remarks, causing significant reputational harm and public confusion.

These incidents underscore the critical need for robust cybersecurity measures to counter the evolving threats posed by AI-enabled voice cloning technologies.

What is AI Voice Cloning?

AI voice cloning is a technology that lets us make a computer-generated copy of a person's voice. This process uses artificial intelligence to capture how someone sounds and recreate their voice without speaking. This means you can have a digital voice that sounds like a real person's and can say anything you type or choose.

This technology works by analyzing a lot of voice recordings. It learns how a person's voice sounds – like how high or low it is, how fast they talk, and how they pronounce words. Then, the AI uses what it learned to generate new speech that sounds like the original voice.

This is useful for different reasons. It can help digital assistants sound more authentic. It can also generate voices for game and animation characters. Additionally, it can aid individuals who have lost their ability to speak.

Voice cloning with AI is becoming more popular because it's getting better and easier to use. Now, some even free tools and apps let anyone try it out. It's a thrilling field of technology that's creating new opportunities for utilizing and engaging with digital voices.

How Does AI Voice Cloning Work?

AI voice cloning uses a special type of artificial intelligence called machine learning. Here's a simple breakdown of the process:

Picture 1: AI Voice Cloning Process

Collecting Voice Samples: First, the AI must listen to the voice it will clone. This means collecting recordings of the person's voice. These recordings should cover various sounds, words, and tones to capture the voice's uniqueness.
Analyzing the Voice: The AI then analyzes these recordings. It learns how the voice sounds in different situations, understanding how emotions affect pitch and pronunciation. This step is about understanding the patterns and characteristics of the voice.
Creating a Voice Model: The AI uses the learned information to create a digital voice model. This model is like a recipe the AI can use to produce new speech that sounds like the original voice.
Generating Speech: You can input the text after setting up the model. The AI will then convert the text into speech. The resulting speech will sound like a copied voice. AI created this new speech from scratch, following the patterns and rules it had learned from the voice recordings.

This process relies heavily on advanced algorithms and computational power. A complex mix of science and technology makes it possible to recreate voices accurately. As AI improves, voice cloning becomes easier and more versatile, creating new opportunities for creative projects and accessibility tools.

Whose voices can you clone?

If you have enough audio samples, you can clone any voice for AI voice cloning. This includes:

Your Voice: People clone their voices for projects such as creating personal digital assistants and voiceovers for content creation.
Famous Voices: You can clone famous voices for fun, learning, or to make historical figures talk interactively. Just get permission and audio samples first.

However, it's important to consider the ethical implications and legal permissions when cloning voices, especially those of other individuals. Consent is key, and respecting individuals' rights and privacy should always be a top priority. One must carefully navigate legal permissions and rights usage for public figures or copyrighted characters to avoid violation.

How to Clone Your Own Voice

Cloning voices, even your own, requires similar steps and allows recording multiple samples to train AI accurately.

Cloning your voice provides direct access to emotions and tones, which is essential for creating an authentic-sounding clone. This tutorial will guide you through replicating any voice effectively. And highlight the opportunity to produce a high-quality voice clone.

Step by Step Voice Cloning Tutorial

This guide shows you step-by-step instructions on how to copy a voice. We'll teach you how to get voice recordings, clean them up, train the computer (AI), and finally make a copy of the voice. It's a simple way to learn about making a digital twin of any voice.

Whether you want to clone your voice or try with another (remember to ask for permission!), this tutorial is for you. Let's start and see how you can create a voice clone easily.

Finding a voice record

When starting a voice cloning project, the first step is to gather clear audio recordings of the voice you want to clone. You can use recordings from webinars, podcasts, online meetings, or personal videos. The only requirement is that the voice in the recording should be clear and easy to understand.

In our example, we'll use a webinar recording of my colleague Simon to demonstrate the process. This approach ensures we have a solid foundation of voice data to work with.

It's crucial to choose recordings where the voice is the main focus, with minimal background noise or music. The quality of the voice cloning greatly depends on the clarity and variety of the voice samples you collect. After recording, clean the voice to make it clear for the AI to analyze and replicate. You can improve the quality of your source material by running it through a noise cancellation app before using it for voice cloning. A good noise cancellation app will help remove distracting background sounds, making the voice clearer and more suitable for training.

Cleaning the voice

Cleaning the voice record.png — Picture 2: Cleaning the voice record

After finding a suitable voice recording, the next step is to clean it. This involves removing background noises, other people talking, and unnecessary silence. Cleaning up is important to ensure the AI focuses on the right voice, improving the cloned voice's accuracy.

I plan to use Adobe Audition for this task, a professional tool that excels in removing unwanted sounds. However, the software requires payment. As a free alternative, Audacity is available. It's an open-source software that, while lacking some advanced features, is effective for basic voice cleaning.

The goal with either software is to prepare a clean, clear voice sample for the AI. This leads to a more successful voice cloning outcome.

Training the AI

Training the AI for voice cloning involves ensuring the cloned voice is as realistic as possible. Using Mangio-RVC, created by Cole Mangio, we can convert a voice recording into a functional voice duplicate.

Here’s how it’s done:

1. Process Data

Firstly, we introduce the cleaned voice recording to Mangio-RVC.

For this example, the dataset—a cleaned single .mp3 or .wav audio file—is in the C:\Users\onurk\datasets\simon-gen2 folder.

Naming the AI Simon-gen2 and pointing it to the directory C:\users\onurk\datasets, I selected the version 2 (v2) setting with a 48k sample rate for data processing.

This initial step prepares the voice data for the AI to analyze.

Setting the dataset.png — Picture 3: Setting the dataset

2. Feature Extraction

The AI uses features from the voice data to understand and mimic the unique qualities of the target voice. I chose the "rmvpe" pitch extraction algorithm because it effectively captures the voice details without altering the default settings.

Define features.png — Picture 4: Define features

3. Train Feature Index

The training process requires careful consideration of how often to save progress and the total duration of training. In this case, I saved the weight file every 20 epochs, aiming for a total of 500 epochs. This balance is crucial; too few epochs might capture the voice accurately, while too many can produce a robotic voice. It’s a matter of experience to find the sweet spot for the highest-quality clone.

Train Feature Index.png — Picture 5: Train Feature Index

4. Train

With the setup complete, training the AI model is the final step. By clicking the "Train model" button, the process begins.

Training time can vary based on several factors. These factors are your computer's power, how good and long the voice sample is, and how many training steps you pick. This can make the training last between 1 to 10 hours.

Recording or Generating the speech

You have a couple of options to use the cloned voice for recording or generating speech. To make the cloned voice sound like Simon, you can record your voice using Adobe Audition or your phone's voice recorder. This is especially useful if you're creating content for the cloned voice.

Another option is using a text-to-speech alternative. You can use any text-to-speech solution and generate a voice sample.

Here is the voice sample I used in this tutorial.

Keepnet Labs · Onur Recorded Voice

Cloning the voice

In the final step of voice cloning, we'll combine everything using the trained AI model to clone the voice. Here's how to do it:

Clone Voice.png — Picture 6: Clone Voice

Select Your Trained Model: Go to the "Inferencing voice" menu in your voice cloning tool and select the trained model you've worked on. If you don't see your model listed immediately, use the "refresh voice list" button to update the list.
Adjust Pitch Correction: This step is important. Especially when changing from male to female voices or if the pitch is different between your recording and the target voice. You can adjust pitch correction from -12 to +12 to closely match the original voice's pitch.
Upload Your Voice File: Make sure to locate and select the file you intend to use for conversion.
Choose the Algorithm: I like using "rmvpe" from my experience, but trying different algorithms in the tool to find the best one for your voice file is a good idea.
Convert: Hit the convert button to start the voice cloning process. The conversion time depends on your computer's power, the recording's length, and its quality, usually taking 1 to 5 minutes.
Preview and Download: Once the conversion is complete, you can listen to a preview of the cloned voice. You can download the cloned voice file if you are satisfied with the outcome.

Preview Cloned Voice.png — Picture 7: Preview Cloned Voice

Keepnet Labs · Simon Voice Cloning Sample

Please watch the video below and learn how to clone your own voice.

How to protect organizations from voice phishing?

The best way to protect your organization against voice cloning attacks is to create awareness of voice phishing attacks. Vishing simulators are necessary for security awareness training regarding the State of California’s Phishing Exercise Standard.

Vishing simulation is the best secure way to simulate voice phishing attacks. Vishing Simulation is safe and meets compliance for your organization.

Keepnet’s AI Voice Phishing Simulation tool helps organizations voice phishing simulation tests to their employees. Here is a comprehensive vishing simulation video that shows

How to create a voice phishing scenario.
How to edit voice phishing scenarios.
How to use AI in voice phishing simulation.
How to run a Vishing Simulation campaign.
How to enroll in Vishing Security Awareness

Editor's Note: This blog was updated on December 4, 2024.

Schedule your 30-minute private demo now

You'll learn:

How AI Voice Cloning Works

How to test and educate your employees with Vishing Security Awareness

How to prevent voice phishing attacks in your organization

Frequently Asked Questions

Can I clone my voice with AI for free?

Yes, you can clone your voice using free AI voice cloning tools. These tools require you to provide voice samples, which the AI then analyzes to create a digital voice model that mimics your voice.

Can AI voice cloning create voices in different languages?

AI voice cloning technology can create voices in various languages, provided it can access enough quality voice samples in the target language to analyze and replicate.