What is involved in creating a personal synthetic voice?
Is there any charge for building a ModelTalker synthetic voice?
Can you guarantee that I will be able to make a usable synthetic voice?
If I make a personal synthetic voice, will it sound just like me?
Some commercial synthetic voices sound almost perfectly human. Why won’t my ModelTalker voice sound that good?
I have ALS and I am starting to have trouble speaking. Can I still make a personal voice?
I cannot speak well enough to record full sentences. Is there anything I can do to get a personal voice?
If I make a voice, how can I use it?
What do I need to know about computers in order to create a personal voice?
What are the basic computer system requirements for creating a voice?
Do you offer any customer support/help, or am I completely on my own?
I submitted my screening sentences and never got a response. What should I do?
I am a Speech Language Pathologist and my client has asked for my assistance with voice banking, what do I need to know?
What microphone should I use?
How can I use the manual calibration tool in MTVR?
The Measure button in the web recorder Settings is inactive! How do I fix this?
Why is the system making me redo the screening inventory even though you said I can do the full inventory?
My microphone is too loud/quiet for web recording. How do I adjust it?
I’m using the web recorder. How can I verify that Chrome is using my headset microphone?
The system keeps saying I am talking too slowly, but I’m not. What should I do?
How do I disable audio enhancements in Windows?
Can I rerecord a sentence if I make a mistake or think I could do a better job?
After I finish all sentences can I go back and redo a few?
Can I create custom sentences, for instance, with names of family and friends?
How can I improve the quality of my synthetic voice?
What if I can’t finish all 1600 sentences?
I installed my voice, now what?
What are those numbers that show up in the Listen button of the web recorder?

What is involved in creating a personal synthetic voice?

First, you must have a PC or laptop with audio capabilities and a good quality consumer grade or better head-mounted microphone. We now offer the possibility of using either a web-based recording tool or an installable Windows program called MTVR (an iOS recording app is in the works). When you are set up to record, you will then carefully record a short inventory of 10 sentences for us to review. The recording tool will guide you through that process by prompting you for each utterance that is needed. After you upload these test speech files to our server, we will look them over and possibly make additional suggestions for creating better recordings. If all is well, you will be able to continue recording the full inventory of about 1600 sentences. Although most sentences are fairly short and the total amount of speech is only about one hour, it is recorded one sentence at a time, and it may take more than one try to get a sentence right. Thus, you should expect that recording the full inventory will take at least 6 hours distributed over 3 or 4 days; for some people it can take a lot longer. When all of the phrases are recorded, we will convert your recordings to a synthetic voice. This may take several days. As soon as it is ready we will send you a web link to download a voice installer for your computer or device.

Back to top

Is there any charge for building a ModelTalker synthetic voice?

For over 10 years, we have been able to provide voice banking and creation of personal synthetic voices for free as a side activity associated with our ongoing research into speech synthesis technology for people with special oral communication needs. However, in the past two to three years, voice banking has become such an accepted technology for patients with neurodegenerative diseases that it has shifted from being merely an adjunct to our research to become a valuable service that we deliver to English-speaking people throughout the world. So that we can continue to provide and even expand our voice banking services for ALS/MND patients, we anticipate shifting to a fee for service model within the first quarter of 2017. For the moment, voice building is still free. We will post updates and provide users with plenty of warning before we charge for delivering their voice.

Although the voice banking technology that we have pioneered has been of benefit to many ALS/MND patients, our primary goal remains that of providing high quality personal voices for pediatric patients, many of whom have never had a voice of their own. Please consider making a contribution to the Nemours Speech Research Lab to support our research toward that goal. Your gift will support our efforts to deliver personalized synthetic voices, in particular to children who use communication devices.

Back to top

Can you guarantee that I will be able to make a usable synthetic voice?

Unfortunately not. Everyone’s voice as well as their recording equipment and environment is different, and so we cannot promise a successful outcome. We do know that we have produced some very good voices that are being used routinely by their owners for communication. However, a few people have not succeeded in creating a usable voice. Please listen to the samples on our Demo page to get a reasonable idea of the range of voices that have been created by users in the field.

Back to top

If I make a personal synthetic voice, will it sound just like me?

Your personal synthetic voice will probably capture your natural voice quality very well, but the speech will still not sound exactly like you because it is synthetic. The timing and intonation of sentences “spoken” by ModelTalker will probably sound much more robotic than your natural speech. Be sure to listen to the examples of natural and synthetic voices to get a realistic idea of how ModelTalker voices compare to natural speech.

Back to top

Some commercial synthetic voices sound almost perfectly human. Why won’t my ModelTalker voice sound that good?

The technology in ModelTalker is similar to that used for some of the very best commercially available voices, but there are also many important differences. The highest quality commercial voices are constructed from many hours of recordings made under studio conditions by professional speakers who work with technicians to record everything in exactly the best possible way. Even though it may take you several hours to record the sentences for a ModelTalker voice, there will only be about an hour of actual speech recorded for your voice. In a commercial system, there may be 20 times as much speech, or even more!

Back to top

I have ALS and I am starting to have trouble speaking. Can I still make a personal voice?

The quality of the personal synthetic voice you create with the ModelTalker Voice Recorder is very dependent on your natural voice and speech quality. If you have a progressive condition, you should record your voice before it is affected. Nonetheless, if your voice is just a little breathy or hoarse, it will probably still be possible to make a useful synthetic voice. The more trouble you have speaking, the more difficult it will be for you to record all the sentences that are needed, and the more difficult it will be for our software to find all the speech sounds it needs to make a useful synthetic voice. If you cannot repeat short sentences without pausing or slurring, you may find the recording process to be too taxing for you and the resulting synthetic voice may not be very usable.

Back to top

I cannot speak well enough to record full sentences. Is there anything I can do to get a personal voice?

Yes! There a couple of options. First, if you have a close relative who sounds very much like you, they might be willing to do the recording for you and donate their voice for your use. Because personal synthetic voices are not perfectly natural sounding, it will almost always be possible for people to tell you apart and yet no other augmented communicator will have that voice.

If you do not have a close relative who sounds like you and is willing to donate his/her voice, we may still be able to help using a new process that we are developing. Using this process, we can take the voice of any donor who is a fairly good match to you in terms of gender, voice pitch, and dialect. Someone like a friend or neighbor who grew up or has lived near you for most of his or her life would be a great candidate for this. We will have the voice donor record the list of sentences and will sample as much of your own speech range as possible to modify the donor’s recordings to sound more like you. If you are interested in trying this very experimental approach, please contact us at beta@modeltalker.org.

Back to top

If I make a voice, how can I use it?

Your ModelTalker voice can be installed for use by a variety of programs and apps on recent Microsoft Windows computers, notebooks, and Windows 8 mobile devices as well as Mac desktop and laptop computers. Voices can also be used on iOS and Android mobile devices. For iOS devices (iPad, iPhone) ModelTalker voices are exclusively available for apps from Therapy Box, who have made it possible to load ModelTalker voices into the latest versions of their Predictable and ChatAble apps.

Special-purpose AAC or Speech Generating Devices made by many manufacturers are either based on Windows or Android and may allow you to use your ModelTalker voice with the device. If you want to use an SGD that does not currently support your ModelTalker, be sure to tell the manufacturer or their representative. We would be happy to work with them if they want to expand their system to support ModeTalker voices.

Back to top

What do I need to know about computers in order to create a personal voice?

You should have, or be working with someone who has relatively good computer skills. At the very minimum, you should be knowledgeable and confident doing the following:

  • Read and follow written instructions.
  • Send and receive email messages with attachments.
  • Upload and Download files from websites.
  • Locate files in a directory on your computer.
  • Cut, Copy, and Paste files from one location to another on your computer.
  • Install and Uninstall programs on your computer.
  • Work with features in your Windows Control Panel or OS X System Preferences.
  • Work with WinZip or a similar program to manage compressed archive files.

Back to top

What are the basic computer system requirements for creating a voice?

For running the MTVR program, most recent computers running Windows 7 or newer should work well. For our new web-based recording tool, most desktop, or laptop computers running OS X, or Windows will work, but you will need a recent version of the Google Chrome web browser (as of this writing Safari will not work; Firefox has been causing users a lot of trouble), and a stable high speed internet connection. We have had mixed results with tablet computers running Windows or Android. The Chrome browser on these devices does appear to work, but some tablets do not have enough CPU power for successful recording. For iOS devices such as iPads, we are developing a new recording app, but it is not presently available.

Back to top

Do you offer any customer support/help, or am I completely on my own?

We do offer help in a few ways. First we do always try to answer email questions from anyone who is interested in or who is using either the ModelTalker Text-to-Speech system or the ModelTalker Voice Recorder (MTVR) or web-based recording tool. In addition, there are several mailing lists dedicated to supporting beta testers. If you are a potential augmentative communication user who is trying to make a personalized synthetic voice, we provide assistance in the following ways: 1) We analyze the test inventory that you upload to our server and make recommendations for improvements to your recording environment and process; 2) we do the final conversion of field recorded speech files to a synthesis database using our voice creation software; 3) we can diagnose and fix some problems with the speech files that would be difficult or impossible for you to handle yourself; and 4) we return the synthetic voice in the form of an installable executable file that should simplify the installation process for you.

When we begin charging for voices, we will expand our support service to provide phone support during regular business hours (East Coast US time).

Back to top

I submitted my screening sentences and never got a response. What should I do?

First, check your email spam or junk folder, as our communications have on occasion ended up there. If you do not find an email from us in your spam or junk folder, then please contact us at beta@modeltalker.org.

Back to top

I am a Speech Language Pathologist and my client has asked for my assistance with voice banking, what do I need to know?

Please visit this page for specific FAQs for SLPs.

Back to top

What microphone should I use?

For home recording, we typically require a head-mounted USB microphone. The specific microphone we recommend is the Sennheiser PC 36. This microphone is usually available from online retailers such as Amazon.com for around $50. In our experience, it performs as well or better than other microphones in its price range, and is often better than more expensive microphones. Very inexpensive microphones, built-in laptop, tablet, or smartphone microphones, headsets with 1/8 inch phone plug connections (non-USB), or blue tooth “wearable” microphones that are designed strictly for telephony are unsuitable for voice banking.

If you are able to record in a sound studio, or with professional equipment in an appropriate setting, the thing to keep in mind is that the microphone should be shock mounted and have a pop screen/filter to isolate the microphone from air bursts as you speak. You should be recorded very close to the microphone, with the microphone slightly off to the side (avoiding direct airflow from your mouth) and you must maintain a very consistent distance and orientation relative to the microphone.
Back to top

My microphone is too loud/quiet for web recording. How do I adjust it?

It depends on your computer and operating system and there are many variants of each, but here are some general suggestions:

    • Mac OS X – Open System Preferences then click SoundInput, and select your microphone from the list of input devices. You can adjust the volume slider while speaking at a comfortable level and choose a setting that has the volume indicator rising to about 3/4 of the full scale.
    • Windows – Usually there is a speaker icon in the right side of the taskbar. You can right-click the speaker to bring up a menu. Select Recording Devices in the menu. In the settings panel, select your input device, then click Properties to get to a volume control. You can adjust the volume slider while speaking at a comfortable level and choose a setting that has the volume indicator rising to about 3/4 of the full scale. This page has more information for recent Windows versions.

Back to top

I’m using the web recorder. How can I verify that Chrome is using my headset microphone?

If you are using a USB microphone (as we recommend), or a professional grade microphone via a USB audio interface, you should be able to find and select the microphone (or the audio interface) in the Settings dialog Microphone drop down list. Once you have selected a microphone and used it to record, it will become the default and will show up as the selected microphone as long as it is available. Always check to see that it is the selected mic. If it is not, do not continue to record.

If you are not using a USB microphone or audio interface (not recommended), the Settings dialog microphone will probably show up as Default or have some other system-specific name. Since it is not always obvious what the current default is, you will need to check your system settings to ensure the correct microphone is actually being used. Be aware that computers are often prone to default to their built-in microphones as the audio input device. If this happens and is not corrected, your recordings will not be acceptable for voice banking!

Back to top

The system keeps saying I am talking too slowly, but I’m not. What should I do?

Fist, check your silence measure. Using the online tool you should expect to see silence measures in the range from about -60 to -80. If you are seeing values outside that range, something with the audio configuration is suspect. Is the silence measure is really silent? A dead giveaway would be silence measures > -60 dB (these are negative numbers so -50 is > -60). There is a Listen button right next to the Measure button so you can verify that there is nothing being recorded when it’s supposed to be silence. Use headphones to listen.

If that is not the problem, maybe the silence measure is unbelievably low, e.g., a value much less that -80 such as -96 or -120. This could indicate that (a) your microphone is not working, or (b) your computer is doing background noise reduction. To check if you microphone is working, you can try speaking while doing the silence measure and verify that the system does record your speech; if it does not, look for a problem with your microphone. To check if your system is set to do background noise reduction (this is most likely on Windows computers), see our FAQ on disabling audio enhancements below.

Back to top

How do I disable audio enhancements in Windows?

In recent versions of Windows, this is usually done via the recording properties control panel, which can be accessed by right clicking the speaker icon in the task bar and selecting “Recording devices.” Within the Recording devices panel, click on the image of the microphone you are using and then click “Properties.” In the properties panel, there should be several tabs. In the Levels tab, make sure the microphone level is turned up to full. If there is an Enhancements tab, make sure to select disable all enhancements. In the Advanced tab, select two channel 16-bit 44100 Hz (CD quality) as the default format. If you see checkboxes that say “allow applications to take exclusive control…” and/or “give exclusive mode applications priority” check them both.

For Windows Vista, follow the instructions here. For Windows 7, follow the instructions here.

Back to top

Can I rerecord a sentence if I make a mistake or think I could do a better job?

Yes. When logged into the web recorder, use the Listen Menu –> Recordings dialog to get a list of all the sentences you’ve recorded, select the sentence you want to redo from the dropdown list and click Rerecord. The interface will be reset to that sentence and you can redo it. Note that after doing this, you may need to use the Listen menu or Forward button to move back to where you were in the list before deciding to rerecord a sentence.

If you are using MTVR, simply click on any sentence in the list of sentences and you can redo it.

Back to top

After I finish all sentences can I go back and redo a few?

Yes. Using MTVR on a Windows system, you can jump to any sentence, listen to your recording and rerecord it. When using the web recorder, as long as you have not finalized the inventory you can review all your recorded sentences using the Listen menu and redo any sentence that does not sound right. Once you have reached the end of an inventory, or if you decide you cannot finish and request a voice before reaching the end, a special Finalize button will appear. After you have clicked Finalize, it will no longer be possible to do any recording.

Back to top

Can I create custom sentences, for instance, with names of family and friends?

One way to make sure that the names of people and places that are important to you are correctly pronounced is to record them in custom sentence. We have added a Custom Inventory Tool to our web recorder that will allow you to do this. You access the tool from the voice recorder Session > Custom Inventory menu. In the Custom Inventory Tool, you may enter names such as the names of people, places, or things that are important to you. You may also enter whole phrases or sentences that you would like to record. Sentences you record will usually sound just like your recorded speech when you later try to synthesize them with your ModelTalker voice. Each of the words you enter will be embedded in four different sentences for you to record. So be aware that if you enter, for example, 20 words, you will have 80 sentences to record with those words.

Back to top

How can I improve the quality of my synthetic voice?

When Starting Out:

    While doing the recordings the four things that will lead to the best quality for your synthetic voice are:

    1. Consistency
    2. Consistency
    3. Consistency
    4. Audio recording quality

    To elaborate, your speech should be consistent in vocal effort (loudness), in speech rate, and voice quality. For voice quality, think of the difference in the way your voice sounds when you are relaxed and speaking softly versus when you are tense or angry, or happy and excited. You might not be speaking any louder, but your speech may have a different quality; your pitch might be a bit higher and your voice less breathy when excited. Synthetic voices tend to come out sounding better when the speaker sounds relaxed and not tense, but even more importantly, try to use exactly the same voice quality throughout the recording process. Other things where consistency helps are with regard to (a) microphone position, (b) time of day when you record, (c) things you’ve been eating or drinking just before recording.

After Finishing Your Voice:

    Unless there were serious problems with some of the recordings you made, probably the best way to improve the quality of an existing synthetic voice is to add additional speech to it. If you would like to do this, please let us know and we can give you some instructions on adding new speech to an existing inventory then rebuilding your voice.

Back to top

What if I can’t finish all 1600 sentences?

While it is best to record all of our full standard 1600-sentence inventory, that sometimes turns out to be too difficult. We will try to build a voice from as many recordings as you are able to complete. Our sentence material is ordered so that the most important material is recorded earliest. In studies we’ve run with these sentences, we have found the following to be a rough guideline to the tradeoff between the number of sentences recorded and the intelligibility of the resulting TTS voice.

  1. 200 sentences — Using only the first 200 sentences, it is possible to get a voice that will work some of the time, but it will not be generally usable for communication, particularly with strangers
  2. 400 sentences — Voices made with the first 400 sentences can be usable, but there will still be many words that are mangled and hard/impossible to understand. The prosody (speech timing and intonation) will be quite robotic. This is the smallest number of sentences we recommend attempting to use as a real TTS voice.
  3. 800 sentences — With 800 sentences recorded, the synthetic voice will be approaching its maximum intelligibility. That is, recording more sentences will probable only slightly improve the intelligibility of the voice. However, speech prosody will still be awkward and frequently sound incorrect. For example, questions are more likely to sound like statements, or statements to sound like questions because the intonation is not appropriate.
  4. 1600 sentences — As you go from 800 to 1600 sentence, the majority of the changes in voice quality will be changes in the naturalness of the speech. Sentences will more frequently sound like they have the correct rhythm and intonation. Effects like the way we indicate phrase and sentence boundaries will more often be correct.

Note that studies we’ve conducted to determine these guidelines were run with voices created from speech recorded under studio conditions by American English speaking voice talent. For speakers of other English dialects, speech recorded under less than ideal audio conditions, and speech recorded by talkers who are dysarthric or less able to produce exactly the correct sentences with consistent speaking rate and style these break points are likely to be optimistic. Your experience may differ considerable.
Back to top

I installed my voice, now what?

ModelTalker is a Text to Speech or TTS voice, not a communication device. On Windows systems, we do provide an app called ModelTalker2 that you can use to test your voice and adjust settings, but that is not the case for MacOS (i.e., Mac laptop and desktop systems), Android (non-apple smartphones and tablets), or iOS (iPhones and iPads). For these other operating systems and devices, you may be able to play a short sentence within the systems settings where the system default voice is selected, but to make real use of your voice, you will want to find an AAC app or an AAC device specifically designed for communication to make good use of your voice. Note that most special purpose AAC devices are actually Windows or Android tablets that have been specifically tailored for use as communication devices. We recommend that you speak with an AAC clinical specialist, Speech Language Pathologist, or Speech Language Therapist to get assistance in finding the best device or app for your needs.

Back to top

How can I use the manual calibration tool in MTVR?

Instead of doing the standard calibration, choose Manual Calibration and open the Calibration dialog box. Then:

  1. Make sure the correct mic is selected in the lower left.
  2. Start speaking at your normal comfortable level and use the slider in the right side of the dialog to increase the amplitude as far as possible without seeing any clipping.
  3. While remaining quiet, next click the Measure button in the left side. It will stop automatically after recording some silence and the silence level in dB will be updated in the box next to the button.
  4. Add 6 to the value in the silence level and write that into the box above it marked Auto Trim Threshold. Important note: these are negative numbers so if the silence level is -60, adding 6 will give you -54, which is what should go in the Auto Trim Threshold.
  5. Set the Pitch. Reasonable values to enter here range from 100 (for a low-pitched adult male voice) to 180 (low-pitched adult female) to 220 (high-pitched adult female) to 260 (child).
  6. Click OK — you’re done.

Back to top

The Measure button in the web recorder Settings is not active! How can I fix this?

Because we save separate settings information for each inventory you are asked to record, the Measure button only becomes active when you have entered a valid Inventory name under Inventory in the Settings dialog. Our instructions always tell you what Inventory name to use. When you are trying to do the 10 screening sentences, the inventory to use is called “screen,” so you should type the word “screen” (without quotes) into the Inventory field. When you have finished the screening process and ready to record a complete 1600-sentence inventory, we will tell you what other name to enter as the Inventory. Usually, but not always, we will tell you to use the inventory named “full” when doing the 1600 sentence inventory. So, for that, you would enter the word “full” as the inventory.

Note that you cannot make up your own inventory name — you must use one that we have set up for use in the system. If you are not sure what the correct inventory name is, please ask us.

Back to top

Why is the system making me redo the screening inventory even though you said I can do the full inventory?

Sometimes we send users a special link that takes them directly to the online recorder and sets the Inventory name. For example, a link to do the screening inventory might have “?inv=screen” as the last part of the URL. When we do that, it locks the inventory name and it cannot be changed. We have seen cases where a browser might auto-complete the URL and include the ?inv=screen part even though you no longer want to do another screen inventory. The easiest way to fix this is to login at the main page (https://modeltalker.org) and then use the Recording > Online recorder menu to get to the recorder. If your browser still insists on adding “?inv=screen” to the address, you may be able to delete that from the address bar.

Back to top

What are those numbers that show up in the Listen button of the web recorder?

As you add new recordings, the system occasionally builds examples of your synthetic voice for you to listen to. The number indicates how many of these example voices have been built for you to listen to and compare. New voices are built whenever we think you have added enough new material to make a perceptible difference in the voice quality. This amounts to about every 25 sentences at first, then less frequently as you go along. A new voice is built every 400 sentences for the last half of the inventory because after 800 sentences, it takes a lot of additional recording to make a difference you can easily hear. These example voices are build with default parameters that are not necessarily well-tuned to your speech and so the voice quality is probably not quite as good as the final voice we will build for you, but they will give you a reasonable sense of progress as you do the recording.

Back to top