Speech dispatcher

From ArchWiki

Speech Dispatcher is a device independent layer for speech synthesis that provides a common easy to use interface for both client applications (programs that want to speak) and for software synthesizers (programs actually able to convert text to speech).

It is a part of the Free(b)soft project, which is intended to allow blind and visually impaired people to work with computer and Internet based on free software.

Installation

Install the speech-dispatcher package. Next, install one of several supported speech synthesizers.

To use Festival, install festival-freebsoft-utilsAUR. Follow configuration instructions in the dedicated section below.

To use a modern neural text to speech system Piper, install piper-tts-binAUR and one of the voice packages for your language, e.g. piper-voices-en-usAUR. Configure speech dispatcher to use Piper as described in the dedicated section below. Alternatively to the above AUR packages, Piper along with voices and the speech dispatcher configuration may be installed using Pied, an automated graphical installer distributed via Flatpak.

Configuration

The main configuration file is located at /etc/speech-dispatcher/speechd.conf however speech-dispatcher is usually run on a per user basis to allow for multiple users to have differing preferences. User configuration files are stored at ~/.config/speech-dispatcher/. There is also support to allow different speech synthesis engine clients to have their own configurations too.

Use the included spd-conf tool to change configuration options. By default it will run in interactive mode and ask you a series of questions in order to generate the type of file you require. It is recommended that you create a per user configuration unless you are absolutely sure you will be the only user. Altering the system configuration requires root permissions.

Basic configuration

To use interactive mode and answer questions about what you need run the following:

$ spd-conf

To create a per user configuration run the following:

$ spd-conf -uc

To edit the system wide configuration file run the following:

# spd-conf -C

Festival specific

The factual accuracy of this article or section is disputed.

Reason: From testing it appears as though this step is unnecessary, as long as Festival is running as a server speech-dispatcher seems to work without this edit. (Discuss in Talk:Speech dispatcher)

If you intend to use Festival as your speech synthesis engine then you should also do the following:

$ $EDITOR ~/.config/speech-dispatcher/speechd.conf

Find and uncomment (by removing the # from in front of it) the line:

~/.config/speech-dispatcher/speechd.conf
...
#AddModule "festival"
...

Then save the file.

Piper specific

Speech dispatcher supports Piper only through a generic interface module that interacts through a custom shell command. An audio player is necessary for this. For Pulseaudio, install mpv; for ALSA, use aplay from alsa-utils.

In your user's speech-dispatcher configuration file (see earlier section for how to create it), add the module and configuration file for Piper:

~/.config/speech-dispatcher/speechd.conf
AddModule "piper-tts-generic" "sd_generic" "piper-tts-generic.conf"

Create the following module configuration file for Piper. In the shell command, edit the path to the model for your desired voice among those in /usr/share/piper-voices/, and the audio player appropriate for your audio back-end:

~/.config/speech-dispatcher/modules/piper-tts-generic.conf
GenericExecuteSynth "export XDATA=\'$DATA\'; echo \"$XDATA\" | sed -z 's/\\n/ /g' | piper-tts -q -m \"/usr/share/piper-voices/en/en_US/ryan/high/en_US-ryan-high.onnx\" -s 21 -f - | mpv --volume=80 --no-terminal --keep-open=no -"

AddVoice "en-US" "MALE1"   "en_US-ryan-high"

The shell command needs to filter out the newlines, since piper-tts exits on newline. To accomplish this, the input text, which speech-dispatcher substitutes into the shell command as a literal string in place of the $DATA placeholder string, is first assigned to an environment variable, the contents of the variable is then piped into sed for substitution of the newline character with a space.

Usage

Using speech-dispatcher directly is not a common scenario as its intended to provide an access layer to other speech synthesis engines, that said you can interact with it directly by using the included spd-say binary as follows:

$ spd-say "Arch Linux is the best"

The Firefox browser is one of the applications that supports speech-dispatcher. Switch to reader view (Ctrl-Alt-R) and a button for narration (headphones icon) should be visible in the small menu. You may need to restart Firefox whenever speech-dispatcher daemon is started or restarted.

The Okular PDF viewer also supports speech-dispatcher. Select text in "Text Selection" mode, right-click it, and choose "Speak Text", or choose "Speak Current Page" in the Tools menu. You may need to restart Okular whenever speech-dispatcher daemon is started or restarted.

Troubleshooting

Logs

Speech-dispatcher writes very little to the system journal, however it does write useful information to its own logs. You can find the location of these in the output of this command:

$ /usr/bin/speech-dispatcher -l 3

Spd-conf tests

spd-conf contains a routine to test the operation of speech-dispatcher, you can run it with the following command:

$ spd-conf -d

Or use the following to get a very verbose log dump:

$ spd-conf -D

Other tests are available, for example testing Alsa, PulseAudio and Festival, to see a full list of available options run the following:

$ spd-conf --help

Most of the available tests will run as part of the test routine.

Speech-dispatcher fails to start

The tests above won't work if speech-dispatcher fails to start. If you want more information than is in the logs you can attempt to start the server like this:

$ /usr/bin/speech-dispatcher -l 3

This will output information about the startup process to the terminal.

Using TTS causes the dummy output module to speak an error message

This article or section needs expansion.

Reason: Its probably possible to automate this using a systemd service file (Discuss in Talk:Speech dispatcher)

This happens when speech dispatcher cannot connect to the speech synthesis engine. If you are using Festival then it needs to be running as a server, this can be achieved with the following command:

$ festival --server &

See also