7 个开源的TTS（文本转语音）系统推荐

前言：TTS在电视产品的应用，能够帮助对电视机界面无法采用可视化标准访问的盲人和弱视的人，在欧洲在美国已经开始制订了规范的实现标准，和实施的规章制度。

TTS（Text To Speech，文本转语音）是语音合成应用的一种，它将储存于电脑中的文件，如帮助文件或者网页，转换成自然语音输出。TTS可以帮助有视觉障碍的人阅读计算机上的信息，或者只是简单的用来增加文本文档的可读性。TTS经常与声音识别程序一起使用。

本文主要介绍7款开源的TTS系统，你可以用来学习，也可以在你的项目中使用。

1. MARY - Text-to-Speech System

MARY是一个采用Java开发的、多语种的文本转语音平台，它支持：德语、英语、美式英语、泰卢固语、土耳其语和俄语。

The MARY Text-to-Speech System (MaryTTS) MaryTTS is an open-source, multilingual Text-to-Speech Synthesis platform written in Java. It was originally developed as a collaborative project of DFKI’s Language Technology Lab and the Institute of Phonetics at Saarland University. It is now maintained by the Multimodal Speech Processing Group in the Cluster of Excellence MMCI and DFKI.

As of version 5.2, MaryTTS supports German, British and American English, French, Italian, Luxembourgish, Russian, Swedish, Telugu, and Turkish; more languages are in preparation. MaryTTS comes with toolkits for quickly adding support for new languages and for building unit selection and HMM-based synthesis voices.

2. SpeakRight Framework - Helps to build Speech Recognition Applications

SpeakRight 是一个 Java 框架，用于编写语音识别应用，基于 VoiceXML 技术。使用 StringTemplate 模板引擎自动生成 VoiceXML 文档。

3. Festival - Speech Synthesis System

Festival提供了一个通用的框架，用于构建语音合成系统，该系统包含了各种模块示例。它提供了完整的文本转语音的API，原生支持Mac OS，支持的语言包括英语和西班牙语。

Festival offers a general framework for building speech synthesis systems as well as including examples of various modules. As a whole it offers full text to speech through a number APIs: from shell level, though a Scheme command interpreter, as a C++ library and an Emacs interface. Festival is multi-lingual (currently English (British and American), and Spanish) though English is the most advanced. Other groups release new languages for the system. And full tools and documentation for build new voices are available through Carnegie Mellon's FestVox project (http://festvox.org)

The system is written in C++ and uses the Edinburgh Speech Tools Library for low level architecture and has a Scheme (SIOD) based command interpreter for control. Documentation is given in the FSF texinfo format which can generate, a printed manual, info files and HTML.

Festival is free software. Festival and the speech tools are distributed under an X11-type licence allowing unrestricted commercial and non-commercial use alike.

This distribution includes:

Full English (British and American English) text to speech
Full C++ source for modules, SIOD interpreter, and Scheme library
Lexicon based on CMULEX and OALD (OALD is restricted to non-commercial use only)
Edinburgh Speech Tools, low level C++ library
Full documentation (html, postscript and GNU info format)

4. FreeTTS - Speech Synthesizer in Java

FreeTTS 是完全采用 Java 开发的语音合成系统，它是卡内基梅隆大学基于 Flite 这个小型的语音合成引擎开发的。

5. Festvox - Builds New Synthetic Voices

Festvox项目构建了一个更加系统化、全新的语音合成功能。Festvox是大部分语音合成库的基础。

6. eSpeak - Text to Speech

eSpeak是一个小型的、开放源码的语音合成系统，支持多种语言。eSpeak使用共振峰合成方法，这可以使提供的语言文件非常小。该系统支持Windows平台上的SAPI5，所以能用于屏幕阅读程序和其他支持Windows SAPI5接口的程序。eSpeak可以将文本转换成音素代码，因此它也可以用于另一个语音合成引擎的前端。

eSpeak is a compact open source software speech synthesizer for English and other languages, for Linux and Windows. http://espeak.sourceforge.net eSpeak uses a "formant synthesis" method. This allows many languages to be provided in a small size. The speech is clear, and can be used at high speeds, but is not as natural or smooth as larger synthesizers which are based on human speech recordings.

eSpeak is available as:

A command line program (Linux and Windows) to speak text from a file or from stdin.
A shared library version for use by other programs. (On Windows this is a DLL).
A SAPI5 version for Windows, so it can be used with screen-readers and other programs that
support the Windows SAPI5 interface.
eSpeak has been ported to other platforms, including Android, Mac OSX and Solaris. Features.
Includes different Voices, whose characteristics can be altered.
Can produce speech output as a WAV file.
SSML (Speech Synthesis Markup Language) is supported (not complete), and also HTML.
Compact size. The program and its data, including many languages, totals about 2 Mbytes.
Can be used as a front-end to MBROLA diphone voices, see mbrola.html. eSpeak converts text to phonemes with pitch and length information.
Can translate text into phoneme codes, so it could be adapted as a front end for another speech synthesis engine.
Potential for other languages. Several are included in varying stages of progress. Help from native speakers for these or other languages is welcome.
Development tools are available for producing and tuning phoneme data.
Written in C.
I regularly use eSpeak to listen to blogs and news sites. I prefer the sound through a domestic stereo system rather than small computer speakers, which can sound rather harsh.

7. Flite - Fast Run time Synthesis Engine

Flite是一个小型、快速的TTS系统，是著名的语音合成系统festival的C版本，可用于嵌入式系统。

英文原文：http://www.findbestopensource.com/tagged/text-to-speech

Flite (festival-lite) is a small, fast run-time synthesis engine developed at CMU and primarily designed for small embedded machines and/or large servers. Flite is designed as an alternative synthesis engine to Festival for voices built using the FestVox suite of voice building tools. Flite 1.4-release is now released as source. Flite offers:

Completely in C (no C++ or Scheme) for portability, size and speed
Reimplentation of the core parts of the Festival architecture (HRG) allowing close compabilility between voices built for each system.
Support for compiling FestVox voices into Flite voices.
Thread safe
Scalable voice size with all data const so it can be in ROM
Target architectures, ipaq (Linux/WinCE), Palm OS (treo) and smaller
Flite is in basically written and is in its first stages of testing before release, as free software. A
small diphone voice based on the CMU KAL voice is included. along with a sample limited domain talking clock.

HSY75案

TTS 的几个验证可以访问的网站：

其他参考：

Architecture Walkthrough

本文由创作，采用知识共享署名4.0 国际许可协议进行许可。本站文章除注明转载/出处外，均为本站原创或翻译，转载前请务必署名。最后编辑时间为: 2020/06/28 03:16