Our voice banking process
If I make a voice, how can I use it?
I installed my voice, now what?
Is there any charge for building a ModelTalker synthetic voice?
What do I need to know about computers in order to create a personal voice?
What are the basic computer system requirements for creating a voice?
What to expect with a ModelTalker synthetic voice
Can you guarantee that I will be able to make a usable synthetic voice?
If I make a personal synthetic voice, will it sound just like me?
Some commercial synthetic voices sound almost perfectly human. Why won’t my ModelTalker voice sound that good?
Already have trouble speaking or lost your voice?
I have ALS and I am starting to have trouble speaking. Can I still make a personal voice?
I cannot speak well enough to record full sentences. Is there anything I can do to get a personal voice?
What if I can’t finish all 1600 sentences?
Getting started
I am a Speech Language Pathologist and my client has asked for my assistance with voice banking, what do I need to know?
What microphone should I use?
How can I use the manual calibration tool in MTVR?
Using the recorder
Can I rerecord a sentence if I make a mistake or think I could do a better job?
After I finish all sentences can I go back and redo a few?
Can I create custom sentences, for instance, with names of family and friends?
How can I improve the quality of my synthetic voice?
What are those numbers that show up in the Listen button of the web recorder?
I’m using the web recorder. How can I verify that Chrome is using my headset microphone?
Troubleshooting
Do you offer any customer support/help, or am I completely on my own?
I submitted my screening sentences and never got a response. What should I do?
The Measure button in the web recorder Settings is inactive! How do I fix this?
I’m getting an error that says: Error capturing audio: message=null name=OverconstrainedError constraintName=undefined. What’s wrong?!
Why is the system making me redo the screening inventory even though you said I can do the full inventory?
My microphone is too loud/quiet for web recording. How do I adjust it?
The system keeps saying I am talking too slowly, but I’m not. What should I do?
How do I disable audio enhancements in Windows?
Where can I go to select my voice as the default Windows 10 voice?
What is involved in creating a personal synthetic voice?
First, you must have a PC or laptop with audio capabilities and a good quality consumer grade or better head-mounted microphone. We now offer the possibility of using either a web-based recording tool or an installable Windows program called MTVR. When you are set up to record, you will then carefully record a short inventory of 10 sentences for us to review. The recording tool will guide you through that process by prompting you for each utterance that is needed. After you upload these test speech files to our server, we will look them over and possibly make suggestions for creating better recordings. If all is well, you will be able to start voice banking using our inventory of up to 3155 sentences. It is not necessary to record all 3155 sentences, but they are there for anyone who wants to create the very best possible sounding voice. Further, those sentences can be supplemented by added Custom sentences that you would like to record. Although most sentences are fairly short, the total amount of running speech is around two hours if you choose to record all 3155 sentences. Because the inventory is recorded one sentence at a time, and it may take more than one try to get a sentence right, you should expect that recording the full inventory will take 15 hours or more distributed over multiple days; for some people it can take a lot longer. When all of the phrases are recorded, we will convert your recordings to a synthetic voice. This may take several days. As soon as it is ready we will send you a web link to audition examples of your voice and select one you like.
Is there any charge for building a ModelTalker synthetic voice?
There is no up-front charge for registration or recording, but there is a $100 USD fee, payable when your final voice is accepted. You will not be able to download and install your completed voice until payment is received. At present, we are only able to accept credit card payments from individual clients. However, there are several organizations that are willing to help offset this cost. After registering, or when logged in at the website, you can view the information at Registration > Support Organizations to see if there is an option appropriate for you. For example, if you are living with ALS in the US, Team Gleason has volunteered to pay the cost of the voice for you.
The payment covers our assistance in helping you to get started with recording, in answering your questions if problems arise, in constructing your final synthetic voice, and in maintaining copies of your voice in our archive. At present, we are waiving this fee for those registering to assist others with voice banking (e.g., Clinicians) and for those who are registering to donate a voice for others to use in speech generating devices. To be eligible as a donor, we will require that you have no speech disorder and that you use high quality or professional grade recording hardware, ideally in a studio-like environment and we may request that you do additional recording to ensure a high quality final voice. Donors will be allowed to download a single copy of their voice for personal use only.
Although the voice banking technology that we have pioneered has been of benefit to many ALS/MND patients, our primary goal remains that of providing high quality personal voices for pediatric patients, many of whom have never had a voice of their own. Please consider making a contribution to the Nemours Speech Research Lab to support our research toward that goal. Your gift will support our efforts to deliver personalized synthetic voices, in particular to children who use communication devices.
Can you guarantee that I will be able to make a usable synthetic voice?
Unfortunately not. Everyone’s voice as well as their recording equipment and environment is different, and so we cannot promise a successful outcome. We do know that we have produced some very good voices that are being used routinely by their owners for communication. However, a few people have not succeeded in creating a usable voice. Please listen to the samples on our Demo page to get a reasonable idea of the range of voices that have been created by users in the field.
If I make a personal synthetic voice, will it sound just like me?
Your personal synthetic voice will probably capture your natural voice quality very well, but the speech will still not sound exactly like you because it is synthetic. The timing and intonation of sentences “spoken” by ModelTalker will probably sound much more robotic than your natural speech. Be sure to listen to the examples of natural and synthetic voices to get a realistic idea of how ModelTalker voices compare to natural speech.
Some commercial synthetic voices sound almost perfectly human. Why won’t my ModelTalker voice sound that good?
The technology in ModelTalker is similar to that used for some of the very best commercially available voices, but there are also many important differences. The highest quality commercial voices are constructed from many hours of recordings made under studio conditions by professional speakers who work with technicians to record everything in exactly the best possible way. Even though it may take you several hours to record the sentences for a ModelTalker voice, there will only be about an hour of actual speech recorded for your voice. In a commercial system, there may be 20 times as much speech, or even more!
I have ALS and I am starting to have trouble speaking. Can I still make a personal voice?
The quality of the personal synthetic voice you create with the ModelTalker Voice Recorder is very dependent on your natural voice and speech quality. If you have a progressive condition, you should record your voice before it is affected. Nonetheless, if your voice is just a little breathy or hoarse, it will probably still be possible to make a useful synthetic voice. The more trouble you have speaking, the more difficult it will be for you to record all the sentences that are needed, and the more difficult it will be for our software to find all the speech sounds it needs to make a useful synthetic voice. If you cannot repeat short sentences without pausing or slurring, you may find the recording process to be too taxing for you and the resulting synthetic voice may not be very usable.
I cannot speak well enough to record full sentences. Is there anything I can do to get a personal voice?
Yes! There a couple of options. First, if you have a close relative who sounds very much like you, they might be willing to do the recording for you and donate their voice for your use. Because personal synthetic voices are not perfectly natural sounding, it will almost always be possible for people to tell you apart and yet no other augmented communicator will have that voice.
If you do not have a close relative who sounds like you and is willing to donate his/her voice, we may still be able to help using a new process that we are developing. Using this process, we can take the voice of any donor who is a fairly good match to you in terms of gender, voice pitch, and dialect. Someone like a friend or neighbor who grew up or has lived near you for most of his or her life would be a great candidate for this. We will have the voice donor record the list of sentences and will sample as much of your own speech range as possible to modify the donor’s recordings to sound more like you. If you are interested in trying this very experimental approach, please contact us at staff@modeltalker.org.
If I make a voice, how can I use it?
Your ModelTalker voice can be installed for use by a variety of programs and apps on recent Microsoft Windows computers, notebooks, and Windows 8 mobile devices as well as Mac desktop and laptop computers. Voices can also be used on iOS and Android mobile devices. For iOS devices (iPad, iPhone) ModelTalker voices are exclusively available for apps from Therapy Box, who have made it possible to load ModelTalker voices into the latest versions of their Predictable and ChatAble apps.
Special-purpose AAC or Speech Generating Devices made by many manufacturers are either based on Windows or Android and may allow you to use your ModelTalker voice with the device. If you want to use an SGD that does not currently support ModelTalker voices, be sure to tell the manufacturer or their representative. We would be happy to work with them if they want to expand their system to support ModeTalker voices.
What do I need to know about computers in order to create a personal voice?
You should have, or be working with someone who has relatively good computer skills. At the very minimum, you should be knowledgeable and confident doing the following:
- Read and follow written instructions.
- Send and receive email messages with attachments.
- Upload and Download files from websites.
- Locate files in a directory on your computer.
- Cut, Copy, and Paste files from one location to another on your computer.
- Install and Uninstall programs on your computer.
- Work with features in your Windows Control Panel or OS X System Preferences.
- Work with WinZip or a similar program to manage compressed archive files.
What are the basic computer system requirements for creating a voice?
High quality audio recording requires a reasonably powerful laptop or desktop computer for ideal results. If you have access to a Mac laptop or desktop system and good internet connection, our users have generally had the least amount of trouble with those systems. Some of the least expensive ChromeBooks, Android or Windows laptops and tablets have been problematic for our users. More powerful Windows laptops or desktops with Core i5 or i7 CPUs have have generally been acceptable. For iOS devices such as iPads, we are developing a new recording app, but it is not presently available.
Do you offer any customer support/help, or am I completely on my own?
We do offer help in a few ways. First we always try to answer email questions from anyone who is interested in or who is using either the ModelTalker Text-to-Speech system, the ModelTalker Voice Recorder (MTVR), or our online recording tool. We also offer limited phone support (+1-302-651-6545) between the hours of 8am and 2pm EST (GMT – 5) Monday through Friday. Note that we are located on the US East Coast in Wilmington Delaware (near Philadelphia, Pennsylvania) and those hours are local time. For calls outside that time frame, you may leave a voicemail, but will probably get a faster response via email.
I submitted my screening sentences and never got a response. What should I do?
First, check your email spam or junk folder, as our communications have on occasion ended up there. If you do not find an email from us in your spam or junk folder, then please contact us at staff@modeltalker.org.
I am a Speech Language Pathologist or Therapist and my client has asked for my assistance with voice banking, what do I need to know?
Please visit this page for specific FAQs for SLPs.
What microphone should I use?
For home recording, we typically require a head-mounted USB microphone. One model we recommend is the Sennheiser PC 36. It is sometimes available from online retailers such as Amazon.com for around $50. In our experience, it performs as well or better than other microphones in its price range, and is often better than more expensive microphones. However, we are now also encouraging users of the Sennheiser to purchase an additional foam windscreen because the microphone is very sensitive to airflow from your mouth and nose, which can cause audio distortion. Unfortunately, the PC-36 mode is becoming more difficult to find. Other acceptable alternatives from Sennheiser include the PC-8, SC-160 or 165, SC-60, or SC-70. The Jabra UC Voice 550 Duo is another option as are Jabra Evolve series headsets (e.g., Evolve 20 or 40).
Users have had mixed success with Logitech and Plantronics headsets, but if you already have one of these or another brand, you may try it with our software before buying a new headset. It is also worth noting that USB headsets are strongly preferred over the type with a small phone plug. If you are using a USB headset, our software can identify the microphone and make sure it is being used for every recording session. The phone plug type connection is not typically identifiable by our software and places an additional burden on the user to make sure that their audio is configured correctly before recording. Very inexpensive microphones, built-in laptop, tablet, or smartphone microphones, many headsets with small phone plug connections (non-USB), or blue tooth “wearable” microphones that are designed strictly for telephony are typically unsuitable for voice banking and should be avoided.
If you are able to record in a sound studio, or with professional equipment in an appropriate setting, the thing to keep in mind is that the microphone should be shock mounted and have a pop screen/filter to isolate the microphone from air bursts as you speak. You should be recorded very close to the microphone, with the microphone slightly off to the side (avoiding direct airflow from your mouth) and you must maintain a very consistent distance and orientation relative to the microphone.
Back to top
My microphone is too loud/quiet for web recording. How do I adjust it?
It depends on your computer and operating system and there are many variants of each, but here are some general suggestions:
- Mac OS X – Open System Preferences then click Sound → Input, and select your microphone from the list of input devices. You can adjust the volume slider while speaking at a comfortable level and choose a setting that has the volume indicator rising to about 3/4 of the full scale.
- Windows – Usually there is a speaker icon in the right side of the taskbar. You can right-click the speaker to bring up a menu. Select Recording Devices in the menu. In the settings panel, select your input device, then click Properties to get to a volume control. You can adjust the volume slider while speaking at a comfortable level and choose a setting that has the volume indicator rising to about 3/4 of the full scale. This page has more information for recent Windows versions.
I’m using the web recorder. How can I verify that Chrome is using my headset microphone?
If you are using a USB microphone (as we recommend), or a professional grade microphone via a USB audio interface, you should be able to find and select the microphone (or the audio interface) in the Settings dialog Microphone drop down list. Once you have selected a microphone and used it to record, it will become the default and will show up as the selected microphone as long as it is available. Always check to see that it is the selected mic. If it is not, do not continue to record.
If you are not using a USB microphone or audio interface (not recommended), the Settings dialog microphone will probably show up as Default or have some other system-specific name. Since it is not always obvious what the current default is, you will need to check your system settings to ensure the correct microphone is actually being used. Be aware that computers are often prone to default to their built-in microphones as the audio input device. If this happens and is not corrected, your recordings will not be acceptable for voice banking!
The system keeps saying I am talking too slowly, but I’m not. What should I do?
Fist, check your silence measure. Using the online tool you should expect to see silence measures in the range from about -60 to -80. If you are seeing values outside that range, something with the audio configuration is suspect. Is the silence measure is really silent? A dead giveaway would be silence measures > -60 dB (these are negative numbers so -50 is > -60). There is a Listen button right next to the Measure button so you can verify that there is nothing being recorded when it’s supposed to be silence. Use headphones to listen.
If that is not the problem, maybe the silence measure is unbelievably low, e.g., a value much less that -80 such as -96 or -120. This could indicate that (a) your microphone is not working, or (b) your computer is doing background noise reduction. To check if you microphone is working, you can try speaking while doing the silence measure and verify that the system does record your speech; if it does not, look for a problem with your microphone. To check if your system is set to do background noise reduction (this is most likely on Windows computers), see our FAQ on disabling audio enhancements below.
How do I disable audio enhancements in Windows?
In recent versions of Windows, this is usually done via the recording properties control panel, which can be accessed by right clicking the speaker icon in the task bar and selecting “Recording devices.” Within the Recording devices panel, click on the image of the microphone you are using and then click “Properties.” In the properties panel, there should be several tabs. In the Levels tab, make sure the microphone level is turned up to full. If there is an Enhancements tab, make sure to select disable all enhancements. In the Advanced tab, select two channel 16-bit 44100 Hz (CD quality) as the default format. If you see checkboxes that say “allow applications to take exclusive control…” and/or “give exclusive mode applications priority” check them both.
For Windows Vista, follow the instructions here. For Windows 7, follow the instructions here.
Can I rerecord a sentence if I make a mistake or think I could do a better job?
Yes. When logged into the web recorder, use the Listen Menu –> Recordings dialog to get a list of all the sentences you’ve recorded, select the sentence you want to redo from the dropdown list and click Rerecord. The interface will be reset to that sentence and you can redo it. Note that after doing this, you may need to use the Listen menu or Forward button to move back to where you were in the list before deciding to rerecord a sentence.
If you are using MTVR, simply click on any sentence in the list of sentences and you can redo it.
After I finish all sentences can I go back and redo a few?
Yes. Using MTVR on a Windows system, you can jump to any sentence, listen to your recording and rerecord it. When using the web recorder, as long as you have not finalized the inventory you can review all your recorded sentences using the Listen menu and redo any sentence that does not sound right. Once you have reached the end of an inventory, or if you decide you cannot finish and request a voice before reaching the end, a special Finalize button will appear. After you have clicked Finalize, it will no longer be possible to do any recording.
Can I create custom sentences, for instance, with names of family and friends?
One way to make sure that the names of people and places that are important to you are correctly pronounced is to record them in custom sentence. We have added a Custom Inventory Tool to our web recorder that will allow you to do this. You access the tool from the voice recorder Session > Custom Inventory menu. In the Custom Inventory Tool, you may enter names such as the names of people, places, or things that are important to you. You may also enter whole phrases or sentences that you would like to record. Sentences you record will usually sound just like your recorded speech when you later try to synthesize them with your ModelTalker voice. Each of the words you enter will be embedded in four different sentences for you to record. So be aware that if you enter, for example, 20 words, you will have 80 sentences to record with those words.
You can also enter custom material if you are recording with MTVR. The instructions are located in the MTVR Help document. Start MTVR and select Help > Contents then click the [+] next to “Recording your voice”. The next to the last item in the list you will see describes how to enter custom sentences in MTVR.
How can I improve the quality of my synthetic voice?
When Starting Out:
-
While doing the recordings the four things that will lead to the best quality for your synthetic voice are:
- Consistency
- Consistency
- Consistency
- Audio recording quality
To elaborate, your speech should be consistent in vocal effort (loudness), in speech rate, and voice quality. For voice quality, think of the difference in the way your voice sounds when you are relaxed and speaking softly versus when you are tense or angry, or happy and excited. You might not be speaking any louder, but your speech may have a different quality; your pitch might be a bit higher and your voice less breathy when excited. Synthetic voices tend to come out sounding better when the speaker sounds relaxed and not tense, but even more importantly, try to use exactly the same voice quality throughout the recording process. Other things where consistency helps are with regard to (a) microphone position, (b) time of day when you record, (c) things you’ve been eating or drinking just before recording.
After Finishing Your Voice:
Unless there were serious problems with some of the recordings you made, probably the best way to improve the quality of an existing synthetic voice is to add additional speech to it. If you would like to do this, please let us know and we can give you some instructions on adding new speech to an existing inventory then rebuilding your voice.
What if I can’t finish all 1600 sentences?
While it is best to record all of our full standard 1600-sentence inventory, that sometimes turns out to be too difficult. We will try to build a voice from as many recordings as you are able to complete. Our sentence material is ordered so that the most important material is recorded earliest. In studies we’ve run with these sentences, we have found the following to be a rough guideline to the tradeoff between the number of sentences recorded and the intelligibility of the resulting TTS voice.
- 200 sentences — Using only the first 200 sentences, it is possible to get a voice that will work some of the time, but it will not be generally usable for communication, particularly with strangers
- 400 sentences — Voices made with the first 400 sentences can be usable, but there will still be many words that are mangled and hard/impossible to understand. The prosody (speech timing and intonation) will be quite robotic. This is the smallest number of sentences we recommend attempting to use as a real TTS voice.
- 800 sentences — With 800 sentences recorded, the synthetic voice will be approaching its maximum intelligibility. That is, recording more sentences will probable only slightly improve the intelligibility of the voice. However, speech prosody will still be awkward and frequently sound incorrect. For example, questions are more likely to sound like statements, or statements to sound like questions because the intonation is not appropriate.
- 1600 sentences — As you go from 800 to 1600 sentence, the majority of the changes in voice quality will be changes in the naturalness of the speech. Sentences will more frequently sound like they have the correct rhythm and intonation. Effects like the way we indicate phrase and sentence boundaries will more often be correct.
Note that studies we’ve conducted to determine these guidelines were run with voices created from speech recorded under studio conditions by American English speaking voice talent. For speakers of other English dialects, speech recorded under less than ideal audio conditions, and speech recorded by talkers who are dysarthric or less able to produce exactly the correct sentences with consistent speaking rate and style these break points are likely to be optimistic. Your experience may differ considerable.
Back to top
I installed my voice, now what?
ModelTalker is a Text to Speech or TTS voice, not a communication device. On Windows systems, we do provide an app called ModelTalker2 that you can use to test your voice and adjust settings, but that is not the case for MacOS (i.e., Mac laptop and desktop systems), Android (non-apple smartphones and tablets), or iOS (iPhones and iPads). For these other operating systems and devices, you may be able to play a short sentence within the systems settings where the system default voice is selected, but to make real use of your voice, you will want to find an AAC app or an AAC device specifically designed for communication to make good use of your voice. Note that most special purpose AAC devices are actually Windows or Android tablets that have been specifically tailored for use as communication devices. We recommend that you speak with an AAC clinical specialist, Speech Language Pathologist, or Speech Language Therapist to get assistance in finding the best device or app for your needs.
How can I use the manual calibration tool in MTVR?
Instead of doing the standard calibration, choose Manual Calibration and open the Calibration dialog box. Then:
- Make sure the correct mic is selected in the lower left.
- Start speaking at your normal comfortable level and use the slider in the right side of the dialog to increase the amplitude as far as possible without seeing any clipping.
- While remaining quiet, next click the Measure button in the left side. It will stop automatically after recording some silence and the silence level in dB will be updated in the box next to the button.
- Add 6 to the value in the silence level and write that into the box above it marked Auto Trim Threshold. Important note: these are negative numbers so if the silence level is -60, adding 6 will give you -54, which is what should go in the Auto Trim Threshold.
- Set the Pitch. Reasonable values to enter here range from 100 (for a low-pitched adult male voice) to 180 (low-pitched adult female) to 220 (high-pitched adult female) to 260 (child).
- Click OK — you’re done.
The Measure button in the web recorder Settings is not active! How can I fix this?
Because we save separate settings information for each inventory you are asked to record, the Measure button only becomes active when you have entered a valid Inventory name under Inventory in the Settings dialog. Our instructions always tell you what Inventory name to use. When you are trying to do the 10 screening sentences, the inventory to use is called “screen,” so you should type the word “screen” (without quotes) into the Inventory field. When you have finished the screening process and ready to record a complete 1600-sentence inventory, we will tell you what other name to enter as the Inventory. Usually, but not always, we will tell you to use the inventory named “full” when doing the 1600 sentence inventory. So, for that, you would enter the word “full” as the inventory.
Note that you cannot make up your own inventory name — you must use one that we have set up for use in the system. If you are not sure what the correct inventory name is, please ask us.
I’m getting an error that says: Error capturing audio: message=null name=OverconstrainedError constraintName=undefined. What’s wrong?!
This is due to a change that Google introduced in a recent version of Chrome–and we are working on a fix. Until we have completed testing the fix, there is a work-around you can use:
- Login to https://modeltalker.org/vrec, but do not fill in the Settings dialog
- Click the movie camera icon in the right side of the chrome address bar. That will bring up a popup box to allow Microphone controls.
- Click the Manage button in the bottom left of the popup
- At the top of the resulting Chrome Settings page click the microphone selector, and choose the entry that says: “Default <your_microphone_name>” Where <your_microphone_name> is whatever the name of your microphone is such as “Sennheiser PC 36 USB Headset” possibly followed by some numbers in parentheses.
- Close the Chrome Settings page, log out of the online recorded, and log back in again
Thereafter, it should be possible for you to measure the silence and proceed with recording.
Why is the system making me redo the screening inventory even though you said I can do the full inventory?
Sometimes we send users a special link that takes them directly to the online recorder and sets the Inventory name. For example, a link to do the screening inventory might have “?inv=screen” as the last part of the URL. When we do that, it locks the inventory name and it cannot be changed. We have seen cases where a browser might auto-complete the URL and include the ?inv=screen part even though you no longer want to do another screen inventory. The easiest way to fix this is to login at the main page (https://modeltalker.org) and then use the Recording > Online recorder menu to get to the recorder. If your browser still insists on adding “?inv=screen” to the address, you may be able to delete that from the address bar.
What are those numbers that show up in the Listen button of the web recorder?
As you add new recordings, the system occasionally builds examples of your synthetic voice for you to listen to. The number indicates how many of these example voices have been built for you to listen to and compare. New voices are built whenever we think you have added enough new material to make a perceptible difference in the voice quality. This amounts to about every 25 sentences at first, then less frequently as you go along. A new voice is built every 400 sentences for the last half of the inventory because after 800 sentences, it takes a lot of additional recording to make a difference you can easily hear. These example voices are build with default parameters that are not necessarily well-tuned to your speech and so the voice quality is probably not quite as good as the final voice we will build for you, but they will give you a reasonable sense of progress as you do the recording.
Where can I go to select my voice as the default Windows 10 voice?
Go to Settings > Ease of Access > Narrator > Personalize Narrator’s Voice > ModelTalker <your voice name>