Advice for recording

NOTE: New users looking for instructions for how to get started recording with ModelTalker should begin at this link.

This page gives general recommendations for recording your voice. We also have more specific recommendations and instructions for Windows and Mac users. You may find it useful to print out these checklists and review them before each recording session.

Microphone type and placement

We strongly recommend that you record with a headset USB-interface microphone. There are three important advantages to using a USB headset: 1) The headset will maintain a constant distance and orientation between your mouth and the microphone. This is extremely important for consistency from one recording to the next; 2) Many computers, especially laptops, have less-than-optimal sound cards. A USB mic will have its own built-in sound processing hardware and will bypass the sound system of your computer; and 3) A headset mic is less likely to pick up ambient noise. Do not use a desktop or built-in microphone; the audio quality will not be acceptable.
Be sure the microphone is not directly in front of your mouth. It should be positioned to the side of your mouth so that it does not pick up direct airflow from your mouth or nose. People sometimes don’t realize how sensitive the microphones are – you do not have to speak directly into it!

Location

You should record in a room where you can eliminate intrusive background noises. There should be no one speaking in the background, no phones that could ring, and the television should be turned off. Make sure the windows are closed and no appliances are running (e.g., fan, air conditioner, dishwasher, washing machine, etc.).
In addition, while the room you are recording in may be shielded from outside noise, the room itself can cause the sound of your voice to reverberate from hard surfaces (rooms with vaulted ceilings, kitchens, bathrooms, hallways and large empty rooms with hardwood, tile, or marble floors are notorious for this). Carpet will help a lot, or even placing blankets on the floor when recording in order to absorb some of the sound.

Speaking style/speech noises

As our technology has improved over the years, the amount of recording needed has significantly decreased and our instructions for how to speak when recording have changed as well. Our latest “Gen3” inventory consists of about 300 sentences that are designed to elicit more expressive speech that we can model with our new generative AI technology. We can easily create very realistic personal voices with fewer than half that number of recordings, but with the 300 sentences we are able to capture more of each individual’s expressive speech qualities. We also continue to provide a standard voice-banking inventory of over 3000 sentences intended more for creating older-style concatenative speech voices, as well as an additional custom inventory to give clients the ability to record sentences of their own design, e.g., for message banking.

Gen3 inventory — Listen to the way each prompt is spoken and try to convey the same meaning including how the prompt intonation varies and which words are emphasized in the utterance.
Standard inventory — Speak as naturally and consistently as possible. You should speak evenly and without over emphasizing any words for meaning or clarity. Do not try to make the phrases sound interesting and natural as though reading aloud to someone. It is also important to NOT pause between words. You should speak as smoothly as you would in normal conversational speech. Try to say the phrases just like the auditory prompt.
Custom sentences — We provide a form to allow you to create your own sentences like those you might want to record for message banking. Our custom form also allows you to specify words such as person or place names or technical terms that you want to be sure we have an accurate pronunciation for. You only need to add these words if they are unusual or often mispronounced.
Be mindful of taking an audible breath in after you have hit the Record button. Likewise, be careful not to exhale audibly when you’ve finished speaking. Remember, the microphone is sensitive, so a breath doesn’t have to be deep to be audible.
If your mouth gets dry and you start to make audible lip smack/opening sounds as you prepare to speak, try taking a sip of water before you record.
Our pronunciation meter is optimized for an American dialect of English. This has to do with the mapping of phoneme transcriptions to speech sounds, which differ among dialects. If your dialect is not General American English, pay less attention to the pronunciation score, and just focus on speaking the words in the printed order without disfluencies.

Time needed to record

Gen3 — If all sentences are recorded, you should be able to complete them in 1 – 2 hours on a single day.
Standard — Recording all of the standard inventory (3155 sentences), recommended for our best quality concatenative voices, will typically take a strong speaker about 12 – 16 hours. We suggest this recording process be done over several days, a few hours at a time, and usually at the same time each day. People’s voices are usually strongest in the morning. If you feel you are getting tired, or if you voice begins to waver or become inconsistent, stop for the day and continue the next day.
Recording only the first 250 – 300 sentences of the Standard inventory should take about the same amount of time as the Gen3 inventory, but is less likely to capture some of the more expressive elements of your speech.
Custom — This depends entirely on how much custom material you would like to record.