fr-tts

As a beginner in French, I want to master the correct pronunciations of the French sentences. So in this post I’ll introduce some important tools for automating French pronunciation, and share a convenient script at the end.

We can divide this process into these steps:

get IPA by eSpeak NG

First, install eSpeak NG to convert sentences into IPA (International Phonetic Alphabet).

On windows, we can easily download it by using winget

1 2	winget -v winget search espeak

and install the latest version, and check it by using

1 2	espeak-ng --version espeak-ng -v fr --ipa -q "Je voudrais faire votre connaissance."

synthesize Speech with Coqui TTS

In python, we can use Coqui TTS to get the pronunciations. It’s a kind of neural network based tool with a lot of high quality pre-trained models.

1	pip install -U TTS

Then we can get all the models and choose one, here we use css10/VITS.

tts --list_models | findstr /i "fr"

tts --text "Ravi de faire votre connaissance." `
    --model_name "tts_models/fr/css10/vits" `
    --out_path "\coqui.wav"

In our script, we use it to convert sentences into sounds and store the output in WAV format.

script: from sentence to IPA and sounds

Finally, here is a convenient script that automates the entire process from text input to audio output and try to play it automatically.

import os
import subprocess
import sys
from pathlib import Path

from playsound import playsound
playsound = playsound if playsound else None

# ==============================================================================
# --- config ---
# ==============================================================================

# choose a model here(single)
DEFAULT_MODEL = "tts_models/fr/css10/vits"

# if you want to use a multilingual model such as xtts_v2, name the value here, for example 'fr'
LANGUAGE_CODE = "fr"

OUTPUT_DIR = Path("outputs_audio")

# ==============================================================================
# --- config end ---
# ==============================================================================


def get_ipa_espeak(text: str) -> str:
    """get IPA by using eSpeak NG CLI"""
    try:
        cmd = ["espeak-ng", "-v", "fr", "--ipa", "-q", text]
        encoding = 'utf-8' if sys.platform != 'win32' else 'oem'
        ipa = subprocess.check_output(cmd, encoding=encoding, stderr=subprocess.DEVNULL)
        return ipa.strip()
    except (subprocess.CalledProcessError, FileNotFoundError):
        return "eSpeak NG not found."

def generate_coqui_tts(text: str, model: str, lang_code: str, out_path: Path) -> bool:
    """generate sounds by using Coqui TTS CLI"""
    try:
        print(f"\ngenerate sounds by using Coqui TTS CLI model: ({model})")
        cmd = [
            "tts",
            "--text", text,
            "--model_name", model,
            "--out_path", str(out_path)
        ]
        # if specified language code, add it to the command
        if lang_code:
            cmd.extend(["--language_idx", lang_code])
        
        # redirect tts output to avoid clutter
        subprocess.check_call(cmd, stdout=subprocess.DEVNULL, stderr=subprocess.DEVNULL)
        print(f"成功保存到: {out_path}")
        return True 
    except (subprocess.CalledProcessError, FileNotFoundError):
        print("Error 'tts' not found, check Coqui TTS installation")
        return False 
    except Exception as e:
        print(f"generate sounds error: {e}")
        return False 
def play_audio(file_path: Path):
    """Play audio using playsound library."""
    if not playsound:
        print("\ncannot play audio because 'playsound' library import failed.")
        return
    try:
        print("playing audio...")
        playsound(str(file_path))
    except Exception as e:
        print(f"\n error: cannot play audio, please open the file manually.")
        print(f"details: {e}")

def main(text_to_pronounce: str):
    """main"""
    OUTPUT_DIR.mkdir(exist_ok=True)
    
    print("-" * 40)
    print(f"text to pronounce: {text_to_pronounce}")
    print("-" * 40)

    # 1. get and print IPA
    ipa = get_ipa_espeak(text_to_pronounce)
    print(f"IPA (from eSpeak NG):\n{ipa}")

    # 2. generate Coqui TTS audio
    safe_filename = "".join(c for c in text_to_pronounce if c.isalnum()).rstrip()[:40]
    coqui_out_path = OUTPUT_DIR / f"{safe_filename}.wav"
    
    success = generate_coqui_tts(text_to_pronounce, DEFAULT_MODEL, LANGUAGE_CODE, coqui_out_path)
    
    if success:
        play_audio(coqui_out_path)
    
if __name__ == "__main__":
    if len(sys.argv) > 1:
        input_text = " ".join(sys.argv[1:])
        main(input_text)
    else:
        print("input sentence and press Enter to pronounce, input exit or empty line to quit.")
        while True:
            input_text = input("input:")
            if not input_text.strip() or input_text.strip().lower() == "exit":
                print("exited")
                break
            main(input_text)