Microphone type and placement
- We strongly recommend that you record with a headset USB-interface microphone. There are three important advantages to using a USB headset: 1) The headset will maintain a constant distance and orientation between your mouth and the microphone. This is extremely important for consistency from one recording to the next; 2) Many computers, especially laptops, have less-than-optimal sound cards. A USB mic will have its own built-in sound processing hardware and will bypass the sound system of your computer; and 3) A headset mic is less likely to pick up ambient noise. Do not use a desktop or built-in microphone; the audio quality will not be acceptable.
- Be sure the microphone is not directly in front of your mouth. It should be positioned to the side of your mouth so that it does not pick up direct airflow from your mouth or nose. People sometimes don’t realize how sensitive the microphones are – you do not have to speak directly into it!
- You should record in a room where you can eliminate intrusive background noises. There should be no one speaking in the background, no phones that could ring, and the television should be turned off. Make sure the windows are closed and no appliances are running (e.g., fan, air conditioner, dishwasher, washing machine, etc.).
- In addition, while the room you are recording in may be shielded from outside noise, the room itself can cause the sound of your voice to reverberate from hard surfaces (rooms with vaulted ceilings, kitchens, bathrooms, hallways and large empty rooms with hardwood, tile, or marble floors are notorious for this). Carpet will help a lot, or even placing blankets on the floor when recording in order to absorb some of the sound.
Speaking style/speech noises
- It is important to speak as naturally and consistently as possible. You should speak evenly and without over emphasizing any words for meaning or clarity. The tone of your voice should not be very expressive. Do not try to make the phrases sound interesting. It is also important to NOT pause between words. You should speak as smoothly as you would in normal conversational speech. Try to say the phrases just like the auditory prompt.
- Be mindful of taking an audible breath in after you have hit record. Likewise, be careful not to exhale audibly when you’ve finished speaking. Remember, the microphone is sensitive, so a breath doesn’t have to be deep to be audible.
- If your mouth gets dry and you start to make audible lip smack/opening sounds as you prepare to speak, try taking a sip of water before you record.
- Please be advised that the ModelTalker speech synthesis system is optimized for an American dialect of English. This has to do with the mapping of phoneme transcriptions to speech sounds, which differ among dialects. We have had mixed results with other accents in the past, so we cannot guarantee a good outcome.
Time needed to record
- Recording the minimum number of sentences recommended for a hybrid DNN voice (400 sentences) can usually be completed in one recording session of between one and two hours with breaks.
- Recording all of the standard inventory (3155 sentences) for our best quality unit selection voice will typically take a strong speaker about 12 – 16 hours. We suggest this recording process be done over several days, a few hours at a time, and usually at the same time each day. People’s voices are usually strongest in the morning. If you feel you are getting tired, or if you voice begins to waver or become inconsistent, stop for the day and continue the next day.
- Recording an intermediate-sized inventory, of course, would fall somewhere between these extremes, and there is also wide variability in how rapidly individuals are able to progress due to mistakes that require redoing sentences.