Voice Cloning Tips
This guide is aimed to help you understand how the Voice Cloning feature works, and how you can generate an accurate, high-quality voice clone for your projects!
This method grasps the most prominent qualities of a speaker's voice, and imitates the voice profile in the generated results. Currently available only on the PlayHT 2.0.
Requires a minimal amount of audio for the cloning process (as little as 30 seconds) and voice is cloned almost instantly within a few seconds.
As this method grasps the most prominent qualities of a speaker's voice, you can also use it for creating customized voice styles, emotional tone, or delivery of an existing voice clone.
Works well with almost all English accents (Multi-lingual coming soon!)
This method maps a much deeper understanding of a voice’s nuances and accent, hence requiring more training audio. The resulting voice is versatile, complex, and capable of changing tone with respect to the context of a sentence. Currently available only on the PlayHT 1.0.
Requires at least 20-30 mins of audio for decent results, but using 1 to 2 hours can give significantly better results. Even higher accuracy and accent resemblance can be achieved with 4 to 6 hours of training audio.
Cloning may take 20 mins or up to a few hours to complete depending on the length of the training audio uploaded.
Works for almost any accent and results in incredibly thorough resemblance (Multi-lingual coming soon!)
You can always delete your voice clones and create new ones with better training audio. Here are some guidelines for what kind of training audio will help you improve the quality of your voice clone.
Avoid audio that has a lot of background noise, music, or sound effects.
The Instant Cloning method only takes the first 30 seconds of the training audio you upload to create the voice clone. So, make sure you upload a short, but high-quality audio file.
As for High Fidelity Cloning, uploading 1 to 2 hours (the more, the better) of high-quality training audio is one of the most effective ways to improve the quality of your cloned voice.
Consider the amount of reverb and/or echo in the training audio, as it will likely show up in your voice clone as well. Generally, it is best to minimize the amount of reverb for better quality.
The best cloning method to get higher accent resemblance is High Fidelity. But, if you’re still facing issues getting the exact accent, try to upload higher-quality training audio with larger durations. The more training audio you provide, the better the resulting voice clone will be. Almost any accent can be accurately cloned with 4 to 6 hours of high-quality training audio.
If your cloned voice sounds bland and devoid of personality, take a closer look at the kind of tone your voice had in the audio you used for the cloning process. Keep in mind that the most prominent tone of voice in the training audio provided, is what will also be apparent in the cloned voice. So, if you’re looking for an energetic and lively cloned voice, make sure you use training audio that reflects this tone of speech as well.
The duration of the audio you submitted for the cloning process was too short: If you’re using Instant Cloning, make sure you upload at least 30 seconds of training audio. If you’re using High Fidelity, make sure you upload at least 30 minutes of training audio (the more training audio you provide, the better).
The training audio was in a language other than English: Currently, our AI Model only supports English (Multi-lingual coming soon!).
The training audio either contained a lot of background noise, or had music/sound effects through most of the training audio.
There were multiple speakers in the audio and you missed telling the AI which voice to clone (this is only available using the High Fidelity cloning process and not in Instant Cloning).
There is no preference as such. But, it comes down to the nature of the content you’re looking to create using the cloned voice. If you’re looking to have an audiobook narrated with your cloned voice, then you should probably record the audio while reading a book. If you’re looking to have a more conversational tone, then try using a recording from a podcast. The thumb rule is that whatever tone of voice you’re looking to have for your cloned voice, make sure you submit training audio that reflects the same tone of speech.
To learn how to use your cloned access using our API, please refer to our API Documentation here.
1. Difference between Instant and High Fidelity Cloning Methods:
Instant Cloning
This method grasps the most prominent qualities of a speaker's voice, and imitates the voice profile in the generated results. Currently available only on the PlayHT 2.0.
Requires a minimal amount of audio for the cloning process (as little as 30 seconds) and voice is cloned almost instantly within a few seconds.
As this method grasps the most prominent qualities of a speaker's voice, you can also use it for creating customized voice styles, emotional tone, or delivery of an existing voice clone.
Works well with almost all English accents (Multi-lingual coming soon!)
High Fidelity Cloning
This method maps a much deeper understanding of a voice’s nuances and accent, hence requiring more training audio. The resulting voice is versatile, complex, and capable of changing tone with respect to the context of a sentence. Currently available only on the PlayHT 1.0.
Requires at least 20-30 mins of audio for decent results, but using 1 to 2 hours can give significantly better results. Even higher accuracy and accent resemblance can be achieved with 4 to 6 hours of training audio.
Cloning may take 20 mins or up to a few hours to complete depending on the length of the training audio uploaded.
Works for almost any accent and results in incredibly thorough resemblance (Multi-lingual coming soon!)
2. Improving Quality of Your Voice Clone
You can always delete your voice clones and create new ones with better training audio. Here are some guidelines for what kind of training audio will help you improve the quality of your voice clone.
Avoid audio that has a lot of background noise, music, or sound effects.
The Instant Cloning method only takes the first 30 seconds of the training audio you upload to create the voice clone. So, make sure you upload a short, but high-quality audio file.
As for High Fidelity Cloning, uploading 1 to 2 hours (the more, the better) of high-quality training audio is one of the most effective ways to improve the quality of your cloned voice.
Consider the amount of reverb and/or echo in the training audio, as it will likely show up in your voice clone as well. Generally, it is best to minimize the amount of reverb for better quality.
3. Getting the Accent Right
The best cloning method to get higher accent resemblance is High Fidelity. But, if you’re still facing issues getting the exact accent, try to upload higher-quality training audio with larger durations. The more training audio you provide, the better the resulting voice clone will be. Almost any accent can be accurately cloned with 4 to 6 hours of high-quality training audio.
4. Making Your Cloned Voice Energetic and Full of Life
If your cloned voice sounds bland and devoid of personality, take a closer look at the kind of tone your voice had in the audio you used for the cloning process. Keep in mind that the most prominent tone of voice in the training audio provided, is what will also be apparent in the cloned voice. So, if you’re looking for an energetic and lively cloned voice, make sure you use training audio that reflects this tone of speech as well.
5. Reasons for the Cloning Process to Fail
The duration of the audio you submitted for the cloning process was too short: If you’re using Instant Cloning, make sure you upload at least 30 seconds of training audio. If you’re using High Fidelity, make sure you upload at least 30 minutes of training audio (the more training audio you provide, the better).
The training audio was in a language other than English: Currently, our AI Model only supports English (Multi-lingual coming soon!).
The training audio either contained a lot of background noise, or had music/sound effects through most of the training audio.
There were multiple speakers in the audio and you missed telling the AI which voice to clone (this is only available using the High Fidelity cloning process and not in Instant Cloning).
6. What should the speaker be reading/talking about in the training audio?
There is no preference as such. But, it comes down to the nature of the content you’re looking to create using the cloned voice. If you’re looking to have an audiobook narrated with your cloned voice, then you should probably record the audio while reading a book. If you’re looking to have a more conversational tone, then try using a recording from a podcast. The thumb rule is that whatever tone of voice you’re looking to have for your cloned voice, make sure you submit training audio that reflects the same tone of speech.
7. Using API to access cloned voices
To learn how to use your cloned access using our API, please refer to our API Documentation here.
Updated on: 09/02/2024
Thank you!