Voci is the ASR Engine for
Contact Center Solutions
Built specifically for the scale of the contact center, Voci’s modern, future-proof ASR engine rewrites the rules of what's possible for contact center solutions.
Ligtning Fast
Industry-leading speed, efficiency, and time to results.
Highly Accurate
Leading out-of-the-box, and can be tuned for any business or industry.
Open and Flexible
Numerous native integrations, and compatible with virtually any tech stack.
Smart Transcription
Auto-formatting, speaker separation, gender, emotion, sentiment, and more.
Safe and Secure
PCI DSS compatible automatic redaction of sensitive information.
Deployment Options
Both in-cloud and on-premises deployments available.
One Giant Leap Forward for Your Solution
Leading Accuracy
Data-rich Transcripts
Low Infrastructure Costs
Under the Hood:
- State-of-the-art NVIDIA® GPUs that increase performance and (STT) conversion speed
- Optimized cloud solutions powered by AWS
- Latest-generation Intel® Xeon® or AMD processors
- High-speed DDR4 SDRAM for high-bandwidth data transfers
- Containerized machines for VMware, AWS AMIs, and others
Transcription Features:
Auto Punctuation
Adds automatic punctuation and capitalization in the output transcription file.
Number Formatting
Formats text into to a human readable numeric format i.e. "twelve thirty" to "12:30" - localized for a given language
Transcoding
Allows a user to upload most formats of audio directly into the engine natively
Call Backs
Callbacks are used to enable another application to receive and directly interact with the produced transcripts. Allows for automated Production workflows for Speech transcription.
Speaker Separation [Diarization]
Automatic speaker separation of customer and agent voices when both are recorded on one channel, enabling their utterances to be analyzed independently.
Acoustic Emotion
Classifies & trends emotion (over time) based on acoustic features for a given call/utterance/audio file.
Emotional Intellegence
Uses a combination of acoustic emotion and text-based sentiment scores to determine if a given utterance is Positive, Improving, Neutral, Worsening, or Negative.
Sentiment Analysis
Classifies sentiment based on the text of the call/utterance with negative, mostly negative, neutral, mostly positive, or positive.
Confidence Scores
Scores words, utterances, and calls with the system's confidence in the transcription results.
Language Identification
Automatically predicts and tags the incoming languge being spoken, and utilizes said language model for duration of call./audio file.
Age ID
Acoustic AI model that predicts the estimated age of a given speaker.
Agent ID
Predictive model to identify which audio channel(speaker) is the Agent (vs the customer).
Music Detection
Classifies a given utterance to be = music or not where music/hold time will not be sent to the engine for transcription.
Silence and Overtalk
Percentage of overtalk that occured for a given call or audio file.
Credit Card Detection
Adds a tag to the transcript predicting numbers that are Credit Cards (even if it was redacted).
Call Analysis
Call Metrics related to the total number of words spoken, speaker turns, speech time, and number of substitutions per call and per speaker.
Ready to see the world’s most efficient ASR in action?
Access our API or request a demo now.