Auditioning your Voice

After you have finished recording your full inventory, we build six versions of your voice in our lab, using different software settings. Normally, we test the six versions ourselves to choose the one we believe is best, and package that as your final voice.

We are introducing a new feature that allows you to use some of the same techniques we use in the lab to decide which version of your voice you personally prefer. There are really two reasons for doing this. First, because your concept of how your synthetic voice should sound might be different from ours, we believe it is important for you to have a say in selecting the voice and setting some of its controls. Second, in preparation for the time when we will begin charging for voices (we are not doing this yet), we want to provide an opportunity for you to hear the voice and be certain you want to use it before you buy it.

There are three steps in the process of selecting your voice:

  1. Set synthesis parameters
  2. Compare the six voice versions
  3. Make a final choice

In the following, we will give you a little more detail regarding each step. When you are ready, click the Begin button below to begin step 1.

1. Set synthesis parameters

We provide controls for three voice parameters: speaking rate; sentence intonation; and syllable timing. Speaking rate is simply the overall speaking rate and can be adjusted from very slow to very fast. By default, your ModelTalker voice will speak at about the same rate that you spoke when recording the speech inventory. Sentence intonation refers to the tonal pattern of your voice, for example, does the pitch rise (as with some questions) or fall (as with most statements) at the end of a sentence. By default, ModelTalker attempts to find examples of your recorded speech that match the best intonation pattern for a sentence, but it does not attempt to modify the intonation of the speech, so sometimes sentences may have inappropriate or disjoint intonation. If you enable synthetic intonation, ModelTalker will use Digital Signal Processing (DSP) strategies to make the sentence match normal intonation more closely. However, that may also reduce the naturalness of your voice quality to some extent. You need to decide whether, on balance, enabling synthetic intonation helps or hurts your voice.

Syllable timing refers to the way the rhythm of your speech can vary within a phrase. For example, the same syllable may be spoken more rapidly at the beginning of a phrase or more drawn out if it is at the end of a phrase. Similarly, stressed or emphasized syllables tend to be longer in duration than unstressed syllables. As with intonation, by default ModelTalker attempts to find syllables that match what is appropriate for normal syllable timing. If you enable synthetic syllable timing, ModelTalker will use DSP to modify syllable durations to try to make them match typical speech timing patterns more closely. In this case, as with intonation, the DSP may reduce the naturalness of your voice quality, however, sometimes this helps make the speech easier to understand, particularly if you have some dysarthria, or if your speaking rate varied a lot when you recorded your inventory.

In step one, there is a text box where you can enter a sentence or try several different sentences while trying different combinations of speaking rate, timing, and intonation control. The point of this step is to determine the combination of settings that will be used for the second step when you compare different versions of your voice, all with the same synthesis parameters.

2. Compare the six voice versions

Step 2 consists of 30 sentence comparison “trials.” On each trial, you will be able to listen to the same sentence generated by two different versions of your voice and you must choose which of the two versions you prefer. The sentences we are using are chosen to challenge the limits of your synthetic voice. Some may be hard or nearly impossible to understand. More often it may be difficult to decide which of the two versions sounds better, but you must pick one version even if you feel the choice is random.

Over the 30 trials, you will hear 10 sentences made by each version of your voice and each version will be paired twice with each other version of your voice. From these “paired comparison” trials, we will select the two versions that you chose as the preferred voice most frequently and use those two versions in the third step.

3. Make a final choice

In step 3, you will again have access to the speaking rate, timing, and intonation controls along with a text box as in Step 1. You may enter short sentences in the text box, or copy and paste much longer passages (up to about 2000 characters). We encourage you to use this step to test your synthetic voice with things you may actually want to say.

In addition to the same controls from Step 1, you will also have radio control buttons that let you choose which of the two voices selected in Step 2 you want to use to render the text in the text box. You should try both voice versions with multiple texts and also try again modifying the rate, timing, and intonation controls. When you have decided on the combination of controls and the voice that you prefer, click the Done button to save those settings as the ones we will use in building your voice installer.

Note that you must use the Chrome web browser to do the audition process. Neither Internet Explorer nor Safari support the audio features we use. Also, you may safely dismiss/ignore the warning message about using a USB headset, but we do recommend using headphones to do the audition.