Text to speech

Learn how to generate speech from text

Alting offers a standarized API for interacting with text-to-speech models. This guide will show you how to generate speech from text using one of our supported models. Our text-to-speech models can be used for various use cases, such as:

Narrate a written story
Produce multi-lingual speech
Generate real-time speech using streaming

Quickstart

To generate speech from text, you can use the speech endpoint in the REST API, as seen in the examples below. We recommend using either our REST API using your HTTP client of choice, or the OpenAI SDK for your language of choice.

Note: Currently, the output format only supports MP3.

Choosing a model

When making a text-to-speech request, the first thing you need to decide is which model to use. We currently support the OpenAI TTS-1 and TTS-1-HD models. ElevenLabs models will be available soon.

You can experiment with different models in our Text to Speech app.

Choosing a voice

When generating speech, you can choose from a variety of voices. Voices are limited to the models they are associated with. To get a list of voices supported by a model, you can use the voices endpoint. See examples below.