0%

fr-tts

As a beginner in French, I want to master the correct pronunciations of the French sentences. So in this post I’ll introduce some important tools for automating French pronunciation, and share a convenient script at the end.

We can divide this process into these steps:

get IPA by eSpeak NG

First, install eSpeak NG to convert sentences into IPA (International Phonetic Alphabet).

On windows, we can easily download it by using winget

1
2
winget -v
winget search espeak

and install the latest version, and check it by using

1
2
espeak-ng --version
espeak-ng -v fr --ipa -q "Je voudrais faire votre connaissance."

synthesize Speech with Coqui TTS

In python, we can use Coqui TTS to get the pronunciations. It’s a kind of neural network based tool with a lot of high quality pre-trained models.

1
pip install -U TTS

Then we can get all the models and choose one, here we use css10/VITS.

1
2
3
4
5
tts --list_models | findstr /i "fr"

tts --text "Ravi de faire votre connaissance." `
--model_name "tts_models/fr/css10/vits" `
--out_path "\coqui.wav"

In our script, we use it to convert sentences into sounds and store the output in WAV format.

script: from sentence to IPA and sounds

Finally, here is a convenient script that automates the entire process from text input to audio output and try to play it automatically.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
import os
import subprocess
import sys
from pathlib import Path

from playsound import playsound
playsound = playsound if playsound else None

# ==============================================================================
# --- config ---
# ==============================================================================

# choose a model here(single)
DEFAULT_MODEL = "tts_models/fr/css10/vits"

# if you want to use a multilingual model such as xtts_v2, name the value here, for example 'fr'
LANGUAGE_CODE = "fr"

OUTPUT_DIR = Path("outputs_audio")

# ==============================================================================
# --- config end ---
# ==============================================================================


def get_ipa_espeak(text: str) -> str:
"""get IPA by using eSpeak NG CLI"""
try:
cmd = ["espeak-ng", "-v", "fr", "--ipa", "-q", text]
encoding = 'utf-8' if sys.platform != 'win32' else 'oem'
ipa = subprocess.check_output(cmd, encoding=encoding, stderr=subprocess.DEVNULL)
return ipa.strip()
except (subprocess.CalledProcessError, FileNotFoundError):
return "eSpeak NG not found."

def generate_coqui_tts(text: str, model: str, lang_code: str, out_path: Path) -> bool:
"""generate sounds by using Coqui TTS CLI"""
try:
print(f"\ngenerate sounds by using Coqui TTS CLI model: ({model})")
cmd = [
"tts",
"--text", text,
"--model_name", model,
"--out_path", str(out_path)
]
# if specified language code, add it to the command
if lang_code:
cmd.extend(["--language_idx", lang_code])

# redirect tts output to avoid clutter
subprocess.check_call(cmd, stdout=subprocess.DEVNULL, stderr=subprocess.DEVNULL)
print(f"成功保存到: {out_path}")
return True
except (subprocess.CalledProcessError, FileNotFoundError):
print("Error 'tts' not found, check Coqui TTS installation")
return False
except Exception as e:
print(f"generate sounds error: {e}")
return False
def play_audio(file_path: Path):
"""Play audio using playsound library."""
if not playsound:
print("\ncannot play audio because 'playsound' library import failed.")
return
try:
print("playing audio...")
playsound(str(file_path))
except Exception as e:
print(f"\n error: cannot play audio, please open the file manually.")
print(f"details: {e}")

def main(text_to_pronounce: str):
"""main"""
OUTPUT_DIR.mkdir(exist_ok=True)

print("-" * 40)
print(f"text to pronounce: {text_to_pronounce}")
print("-" * 40)

# 1. get and print IPA
ipa = get_ipa_espeak(text_to_pronounce)
print(f"IPA (from eSpeak NG):\n{ipa}")

# 2. generate Coqui TTS audio
safe_filename = "".join(c for c in text_to_pronounce if c.isalnum()).rstrip()[:40]
coqui_out_path = OUTPUT_DIR / f"{safe_filename}.wav"

success = generate_coqui_tts(text_to_pronounce, DEFAULT_MODEL, LANGUAGE_CODE, coqui_out_path)

if success:
play_audio(coqui_out_path)

if __name__ == "__main__":
if len(sys.argv) > 1:
input_text = " ".join(sys.argv[1:])
main(input_text)
else:
print("input sentence and press Enter to pronounce, input exit or empty line to quit.")
while True:
input_text = input("input:")
if not input_text.strip() or input_text.strip().lower() == "exit":
print("exited")
break
main(input_text)