Creating a voice-over can be quite difficult. You often have to do numerous takes to achieve the desired result. There’s rarely enough time to practice and reach your intended tone and expression. You sift through countless audio editing software tutorials to ensure your voice is polished. Yet, even if you manage all of this perfectly, lacking access to a studio means your flawless performance may still be marred by background noise.
Should you abandon the idea and employ a voice actor? Not just yet: AI voice generators can produce remarkable outcomes. These AI text-to-speech applications have been improving in quality, realism, and control, allowing you to generate a natural reading of text without needing to connect a microphone to your computer.
After spending several weeks experimenting with various AI voice generator tools available to me, I have determined that these are the six best based on my experiences.
The best AI voice generators
- ElevenLabs for hundreds of realistic voices
- Speechify for human-like cadence
- WellSaid for word-by-word control
- Respeecher for engaging speech variations
- Altered for narration style variety
- Murf for emphasis control
What makes the best AI voice generator?
The top AI voice generators are quite easy to identify: the speech they produce sounds natural and lifelike, nearly (nearly!) as if a real person is articulating the words. In addition to this straightforward assessment, each platform provides various settings that allow you to customize the generation process, including pronunciation, pitch, volume, and speed. If you’re aiming for a fully AI-generated voice, you can explore Speech Synthesis Markup Language (SSML) to specify how each word should be delivered with maximum control. However, be cautious not to overuse these features: it may diminish the quality and authenticity of the output.
Keeping that in mind, here are the criteria I considered while evaluating the top AI voice generators:
- Realism. These text-to-speech applications provide authentic speech, featuring variations, natural tonal shifts, and appropriate pauses.
- Available controls. Features like pitch, volume, speed, and pronunciation adjustments allow you to customize the output to suit your requirements.
- Audio quality. I prioritized achieving the highest possible export audio quality so that these voices can be utilized in any project.
- Voice library. A diverse selection of voices accommodates a broader array of projects—including options in different languages—providing you with enhanced flexibility during your work.
- Extras. I factored in any additional useful tools for voice generation offered by an app, such as audio-to-audio capabilities or AI model training. However, I excluded any AI video generation applications from this list, despite some providing text-to-voice as an additional feature.
I also took it a step further. Prior to my writing career, I spent ten years as an actor, and back in the day, I participated in a one-month workshop focused on voice acting and dubbing. I leveraged that experience to evaluate these voices based on several additional criteria:
- Narration pacing. Humans naturally vary their reading speed, which is beneficial for adding emphasis or enhancing engagement. Inferior AI tends to standardize everything, so I focused on the models that provided the best variations.
- Intonation. Intonation refers to the changes in pitch throughout sentences. The least effective AI models render everything predictable, mechanical, and devoid of life—many were eliminated for this reason.
- Emotional performance. Some applications allow you to select renditions of the text that are sad, excited, or whispered. I ruled out those that lacked subtlety and either over- or under-acted the script significantly. However, achieving nuanced performances remains challenging for AI; thus, if you’re seeking something sophisticated, you might want to collaborate with a professional voice actor.
I dedicated more than three weeks to registering for all the AI voice generators I could locate. I utilized the same text across each platform to focus on the variations. I experimented with the settings to assess their capabilities and determine if they would enhance the final outcome. I collected samples from every application: below is a link to listen to a short excerpt from each one.
When evaluating the top AI voice generators for your needs, remember that your audience is likely to focus on various aspects of your content too. Minor flaws are entirely acceptable. Considering this, here are the best selections for this year.
The best AI voice generators at a glance
Best For | Pricing | |
ElevenLabs | Hundreds of realistic voices | Free plan available; paid plans start at $5/month |
Speechify | Human-like cadence | Free plan available (no downloads); paid plans start at $24/user/month (billed annually) |
WellSaid | Word-by-word control | From $44/month (billed annually) |
Respeecher | Engaging speech variations | From $4/month |
Altered | Narration style variety | Free plan available; paid plans start at $6/month |
Murf | Emphasis control | Free plan available; paid plans start at $23/month (billed annually) |
Best AI voice generator for hundreds of realistic voices
ElevenLabs (Web)
ElevenLabs is at the forefront with a voice library that boasts more than 300 voices, including AI-powered versions of real individuals available for licensing, such as Christy Carlson Romano, the TV actress known for Disney’s Kim Possible.
With numerous voices available, it’s wonderful to have effective search and filtering tools. Select Voices from the left-side menu, then click on the Voice Library tab at the top of your screen. If a friend or colleague recommended a particular voice, you can look it up by name. Alternatively, if you prefer browsing, utilize the categories to filter voices according to style or purpose: whether you’re looking for conversational tones or advertisement-focused options, there’s something for every type of project. On the right side of these categories, you can sort by four criteria, ranging from trending voices to those that have produced a high volume of outputs. Additionally, advanced filters are available nearby for further refining your search based on category, gender, age, language, and accent.
When you come across voices that appeal to you, include them in the Voice Lab. This will enable you to choose them in the speech generation tool, which can be accessed by clicking on Speech. You can either paste your text or upload an audio file, select your desired voice from the dropdown menu, and click Generate. If you’re not satisfied with the initial result, there are two primary methods for adjustments:
One option is to choose a different AI model. Each model offers a unique set of settings; for instance, some are optimized for multi-language generation while others prioritize low latency. Depending on the chosen model, you can adjust stability (a lower setting results in greater emotional variation), similarity (a lower setting leads to more divergence from the sample voice), style exaggeration (a higher setting enhances overall variation), and speaker boost (which further aligns the output with the original AI training data).
With a current valuation of $1 billion, ElevenLabs possesses the resources to evolve into a more robust AI voice generation platform. It certainly offers the necessary flexibility and quality for this growth, despite having less powerful controls compared to other platforms on this list.
ElevenLabs pricing: Free for 10 minutes of audio each month; paid plans begin at $5 per month (or $50 per year) for 30 minutes of audio and additional features such as voice cloning.
Best AI voice generator for human-like cadence
Speechify (Web, iOS, Android)
Cadence refers to the rhythm of reading a text, including the pauses between words and the overall pace. Speechify stands out from its competitors by delivering a smooth output in one go that resembles the voice of a skilled, creative voice actor. It is calm and well-paced, striking a good balance between variation and consistency.
The home page of the website might be somewhat perplexing as Speechify positions itself as a tool for reading text aloud, primarily targeting productivity scenarios. You can utilize it while driving or enjoying a walk outdoors. With voices like Snoop Dogg and Gwyneth Paltrow available, it’s entertaining to listen to your favorite digital marketing blogs narrated in the iconic style of D-O-double-G.
To generate and download voices for your projects, simply click the button at the top of the screen to access Speechify Studio. Although you won’t have access to the popular voices—unfortunately—you’ll find that the available options are excellent. As you paste your script and begin generating, you can adjust the speed, control pitch, modify volume, add custom pronunciations, and set pauses at various points in the text.
There are two useful features here. If you often create slide-based videos, Speechify offers a tool that allows you to assemble a simple presentation. Just generate the voice, include a background music track, and export it. The second feature enables you to upload your own voice to the platform so that you can create sound using your personal voice.
Speechify pricing: Free without download options; paid plans start at $24 per user per month (billed annually) or $69 per user per month (billed monthly).
Best AI voice generator for word-by-word control
WellSaid (Web)
While other platforms tend to be more general, WellSaid Labs provides complete control over specific sections of your script, even allowing for precise word adjustments if needed.
So how does it work? Simply open the editor and paste your script. On the right-side tab, click on Cues to access the controls. The words displayed will be highlighted: you can click on a single word or a group of words to select them and then modify their loudness or pace. If you choose a comma or period instead, you can set the duration of the pause.
Once you’ve finished editing one section, click anywhere in the central area of the screen to deselect it. You’ll see that your edits are now highlighted with color: changes in pace appear in green; adjustments in loudness show up as blue; and punctuation pauses are marked in purple. This serves as a helpful reference if you wish to revisit and make further modifications later on. A piece of advice: avoid making drastic changes—the most significant alterations can compromise overall realism.
Pronunciation controls are not found in the generation editor. Instead, check the left-side menu, click on Pronunciation, and input your replacements. Begin by entering the original word, then write out how it should be pronounced—even if it distorts the spelling. There’s a learning curve and experimentation process around this, so make sure to take a look at the respelling guide.
To fully utilize the tools available, you can refer to the Resources section, which provides access to key topics in the documentation. You’ll find detailed guides designed to assist you in enhancing your voice generation workflow or managing pronunciations. Additionally, if you’re working with a team, it’s easy to share a project link for feedback collection.
WellSaid Labs pricing: Free trial offered; subscription plans begin at $44/month (billed annually) or $49/month (billed monthly).
Best AI voice generator for engaging speech variations
Respeecher (Web)
Are you weary of listening to monotonous robotic speech that feels like an endless, dull line? Respeecher offers variations that enhance the narration, making it more engaging and ensuring each voice sounds more natural and realistic.
The best part is that you don’t need to do any engineering yourself. Simply input your text, and you can experiment with different voices or narration styles. Each generated version will be organized under the corresponding section of the script, featuring naturally sounding variations.
However, the user interface can be somewhat confusing; it was unexpected to find the generation controls tucked away from the main editing screen. To access them, click on the Settings tab located on the left side where you can adjust pitch calibration, emotional range, and overall audio properties. Keep in mind that any changes made here will affect all future outputs, so make sure to revisit this section if you require something different.
Besides pasting your text or uploading an audio file, you can also use your microphone to record live. In this scenario, the app simply modifies your voice to align with the template’s, allowing you complete control over how the text is performed. If you have some acting skills or natural talent in this area, definitely give it a shot.
You can train an AI model using your own voice or those of others, enabling you to portray a whole cast of characters through your keyboard. Since this could facilitate the creation of deepfakes, Respeecher conducts a security check to verify your identity and significantly increases the monthly subscription cost.
I experimented with various voices using the same text and found that there’s a more creative feel compared to others on this list. This style of enunciation and voice is well-suited for cartoons and more eccentric projects. While it’s not unsuitable for serious business applications, it may deter those seeking a more professional-sounding avatar. Whether this is seen as a drawback or an opportunity to stand out from competitors is up to you to decide.
Respeecher pricing: Starting at $4/month
Best AI voice generator for narration style variety
Altered (Web, Desktop)
The narration style serves as a general alteration in pitch and rhythm to impart a distinct atmosphere to the generated text. The application that offers the most extensive range of options in this regard is Altered. In addition to style, this platform provides more features than others on this list, so it may take some time for you to become acquainted with all its facets. Let’s explore everything you can accomplish here.
Real-time morphing powers the Altered Virtual Microphone, allowing your original voice to be transformed into that of an AI avatar instantly. This can be entertaining when you’re 14 and chatting online with your gamer friends, but professionals can utilize it to directly record this voice into another audio editing application, enhancing their workflow.
Post-production morphing refers to audio-to-audio generation. Simply add a recording of text, select the desired voice, and click generate. You can then download the results and integrate them into your project.
Rapid voice creation allows you to incorporate clear clips of a voice lasting between 4 to 8 seconds into the platform, enabling you to replicate it for generation purposes. (Terms and conditions apply.)
The text-to-speech feature brings up the familiar editor where you can enter your script and choose your voice. The narration styles vary based on your selection, so explore each option to understand the key differences. The available styles range from “Just Below Neutral” for uniformity to “Positive, Shout” for added emphasis and energy. Keep in mind that depending on your script and chosen tone, the outcomes may be inconsistent, peculiar, humorous, or a mix of all three.
Lastly, Altered includes an Audio Editor with a wide array of controls. You can upload any type of audio and utilize features like transcription, speech generation, or noise removal among many other options. The learning curve is a bit steep here, as this screen has a real audio editor vibe: be sure to open the docs and use them as a companion.
Altered price: Limited free plan available; paid plans from $6/month
Best AI voice generator for emphasis control
Murf (Web)
Here’s a straightforward acting exercise for beginners: select a sentence from this article and read it aloud. Then, repeat the sentence while emphasizing a different word each time. As you do this, pay attention to how the overall meaning and tone of the sentence shifts. Murf allows you to perform this with your AI-generated voices.
The emphasis control button can be easily overlooked. When you’re working on a project, begin by adding text to the first block. While doing so, observe the icon located to the left of the play button—it resembles a comment icon—and click on it. A pop-up will show up displaying all the words in that block along with a high-medium-low scale: click anywhere on it to assign a point. The location where you click is important, so feel free to experiment with placing points along both axes—left/right and top/bottom.
In addition to these controls, you can modify general speed and pitch, insert pauses, or customize pronunciation. If you opt for the Ken voice, you’ll gain access to an extensive variety of narrative styles—nine in total—from Storytelling to Sad. I tested out the Sobbing setting expecting poor results but was pleasantly surprised by its subtle performance. Well done, Ken!
At the bottom of the screen, you have the option to expand the timeline to access additional features. You can directly incorporate video and music into the platform for content creation and export it straight from Murf AI, making it ready for sharing. As you advance your content strategy, feel free to invite your teammates to collaborate on voice generation projects; anyone can leave feedback on each script block, allowing you to make adjustments until you achieve the best possible outcome.
A final piece of advice: voices available in the paid plan are significantly better than those in the free tier. If you’re committed to voice generation and appreciate Murf AI’s controls, it may be wise to invest sooner rather than later.
Murf pricing: Free for 10 minutes of voice generation and 2 projects; paid plans begin at $23/month (billed annually) or $29/month (billed monthly).
Does OpenAI have an AI voice generation model?
Yes, the creators of ChatGPT are in the game. The only way to use the OpenAI text-to-speech is via API, requiring a bit of tech-savvy to set this up.
They also have an AI voice cloning model that’s reportedly so powerful that it’s not available for general use. (Yikes.) There’s no estimate as to when a commercial version will pop up. Read more in the official blog post on the challenges and opportunities of synthetic voices.
Are AI-generated voices legal?
All the platforms on this list offer a collection of voices that were created by fine-tuning the training data or modeling a real person’s voice with their consent. Using these voices is legal, provided you remain within the service and licensing terms of the app you’re using.
The main problem lies with AI voice cloning. With just a few samples of a real person’s voice, anyone could tune an AI model to talk like anyone—including famous people. And including you. Creating and using these deepfakes can lead to identity theft, manipulation, misinformation, blackmail, or infringement of copyright laws (when talking about artists and their work).
Depending on where you are in the world, there may be legislation to control these kinds of uses, meaning there are legal consequences if consent isn’t secured or if the voice is used with criminal intent—or in a way that can be interpreted as such. If you’re cloning someone else’s voice and using it to generate with AI, always secure their (preferably written) consent before using the outputs.
Speaking without a mouth
With an AI voice generator, you can turn scripts into a flowing narrative, ready to add as a voice-over on a video, without dozens of takes and without hiring a production team.
All the platforms on this list offer ways to try out the features and voices, so pick one of your scripts and run your tests. It’s also important to find one that has controls that make sense to you, so take some time to feel how each one works. Now that you can speak using just your keyboard, what will you create next?