Complexity of Language and ASR Challenges: How Cultural Variation Affects Transcription
Technical
Technical
In a previous blog, Catherine talked about what we linguists do here at Voci. (If you haven’t read it, click here!). In this blog, I’m going to expand on this, and talk about some examples of the challenges we run into everyday, challenges which are just part of how language works.
Regional and cultural variation is a big one. Even if two people speak the same language, where they are from can affect not only the words they use, but how they use the very same words. In some regions — New York, for example — overtalk, or talking at the same time as someone else, isn’t always seen as rude.
If you’re very engaged with what someone is saying, you might jump into the conversation before they’re finished speaking. That just means you’re interested! But in other parts of the United States, and the English-speaking world more generally, overtalk can often be seen as very rude instead of being engaged in the conversation.
So, overtalk can have different implications in a conversation — depending on where the speaker is from.
Another way that cultural and regional variation manifests itself is in sentiment and emotion. Sentiment and emotion are features that Voci’s ASR system includes in transcripts. This can create a challenge when speakers from different places express different emotions in the same way. Consider raising your voice. In some parts of the United States, you wouldn’t yell unless you were upset or angry — which would be negative emotion. However, in other parts of the US, yelling can be a sign of excitement and engagement — which are positive!
A related issue is that of second (or third, or fourth...) languages. Features of the first language tend to influence how the second language is used. This happens not only in obvious ways, like word choice, but also in more subtle, nuanced ways, like the sounds you use when saying a word.
For example, consider Spanish and English. Spanish has a more limited set of vowels than English — about five for Caribbean Spanish, compared to about 12 for American English. One simple exercise to illustrate how a speaker’s native Spanish could impact English is the “America test”. This test highlights the effect of vowel reduction that occurs in some North American English varieties.
It’s pretty simple. Ask someone to say the word "America". If they say it similar to "Uh-meh-ruh-kuh" /əˈmɛrəkə/ — in other words, with the unstressed central vowel (technically called a “schwa”) or other reduced vowels throughout — then you’re probably dealing with someone from North America who speaks English natively.
This tendency (which is called vowel reduction) towards "schwa" does not happen in Spanish. Spanish speakers produce mostly unreduced vowels in their native language, making it so that each vowel sound is noticeably different from each other. This leads to them uttering the word "America" with markedly different vowels for each syllable: "ah -mEH-rih-kah" /aˈmɛrɪka/.
And that’s just in North America. When you include other parts of the English-speaking world, like Australia and the UK, it gets even more complex.
As you can see, it’s not an easy task to analyze language accurately. It can make our jobs a challenge, but it’s also what makes it fun!
With up to 1000 hours of audio at no charge