Checking Automatic Transcriptions and Not Getting Coffee: On Being a Language Intern
Technical
Technical
Developing an automatic speech recognition system, as we do here at Voci, takes a lot of different types of contributions, from experts in both language and technology. We work on the linguistics side, as language interns. And, while we may not be experts (yet!), we’re still an important part of the process, helping build language models from the ground up.
The three of us — Annika Puskar, Marleigh Bickel and John Starr — are linguistics students at the University of Pittsburgh, currently working on our second internship at Voci. We originally came to Voci through Pitt’s Linguistics Internship Program, run by Dr. Abdesalam Soudi. We enjoyed it the first time, so we jumped at the opportunity to come back for the summer!
Voci’s Wayne Ramprashad (left) and John Kominek (right) receiving the 2019 Outstanding Industry Partner Award from Dr. Soudi
There’s a few reasons we came back. For all of us, this is basically our first “corporate” or “business” job, where you have to be at a particular place for 40 hours a week, Monday to Friday. So, it’s good to be in an environment where everyone is working hard, but is also friendly and approachable.
Also, it’s a great opportunity to explore what career paths are available after we’ve finished our degrees. You don’t always get a clear sense of what a linguistics degree can do, when you’re in the middle of classes and exams and everything. Here, we get the chance to see our training and skills put to use in building a product that helps other businesses.
And finally, sometimes interns aren’t seen as valuable members of the team. We can be asked to fetch coffee or make copies, and never really learn what the business does and how we can fit in to the business world. At Voci, we are treated with respect. We do the same work as other members of the linguistics team, even though they have more years of experience and a lot more training.
Primarily, our work involves annotating linguistic data sets. We aren’t just doing this in English. John and Annika are involved in research work on Spanish language data, and Marleigh has worked on German, with a focus on number rules. Most of what we do focuses on labelling sentiment and emotion, listening to audio recordings and annotating the transcripts appropriately.
We have pretty strict guidelines to follow for labelling sentiment. We also review our annotations together, to make sure that we’re applying the rules consistently. When it comes to emotion, however, the labelling is based on our pragmatic knowledge as native speakers of English. We use that to to identify acoustic properties, such as pitch, intonation, cadence, and volume, that indicate the presence of strong positive or negative emotion.
Since we are preparing data that will be used for training software, we need to make sure that it is labelled as accurately and consistently as possible. Each transcript is annotated individually by multiple people. John wrote software to consolidate and programmatically compare everyone's annotations. We then go through a group review process to verify and validate the quality of the annotations. It's a much more involved process than we had anticipated...!
It can be a challenge to be here! Going from university classes to staying focused at a desk for 40 hours a week is a transition. But seeing that we’re making a contribution to Voci’s software really helps — and some of the phone calls can be really entertaining. (Of course, we can’t share details!)
When projects just aren’t working the way we want them to, it can be frustrating. But the experience of working through it, and getting to the result we need, is really rewarding. Sometimes, when you’re in a classroom, you can lose sight of why you’re here and what you’re trying to accomplish. Seeing it all work in the business world makes it more tangible and concrete.
With up to 1000 hours of audio at no charge