Hello everyone, in this article, we will discover one really interesting TTS (text-to-speech) open-source, easy-to-use, free-for-commercial use project from coqui-ai. We will create a simple text-to-speech synthesizer in less than 10 minutes that sound really human-like. Let’s get started!
You can find the complete video down there.
So there are really interesting text-to-speech packages that you can use for your own project, we have chosen to discover coqui in this series of 2 articles. We will maybe explore more packages later.
Let’s start here by exploring the package that you can find here: https://github.com/coqui-ai/TTS
You can install it with python pip3 or direct from the source. We will install it simply by typing:
pip install TTS
Then we have the module installed. We can start playing with it by typing for example:
tts --text "Hello from a machine" --out_path ./audio.wav
It will produce audio that we can use and listen to.
The next step will be to create a python project and integrate the TTS module inside, so you can easily integrate that into any python project that you can have. And for that, we need multiple steps.
First import the modules:
# import all the modules that we will need to use
from TTS.utils.manage import ModelManager
from TTS.utils.synthesizer import Synthesizer
Then download one of the pre-trained models (we can also train our own model, buttttt it takes time and resources)
path = "/path/to/pip/site-packages/TTS/.models.json"
model_manager = ModelManager(path)
model_path, config_path, model_item = model_manager.download_model("tts_models/en/ljspeech/tacotron2-DDC")
voc_path, voc_config_path, _ = model_manager.download_model(model_item["default_vocoder"])
The path should be replaced by the place where pip installs all the packages on your computer. And you can find it by typing
python -m site
Then comes the last part. We just instantiate the synthesizer and use it to read a given text that can come from request parameters or wherever you may need it to be:
syn = Synthesizer(
tts_checkpoint=model_path,
tts_config_path=config_path,
vocoder_checkpoint=voc_path,
vocoder_config=voc_config_path
)
text = "Hello from a machine"
outputs = syn.tts(text)
syn.save_wav(outputs, "audio-1.wav")
And here we are. you have a fully working text-to-speech synthesizer ready to be used.
In the next story, we will go further by making it sound better, creating a flask app and deploying it on a server using docker. So stay tuned
We have the knowledge and the infrastructure to build, deploy and monitor Ai solutions for any of your needs.
Contact us