The traditional call centre QA process works like this: a team of quality analysts manually listens to a random sample of recorded calls -- typically 1-3% of total volume -- scores them against a rubric, and provides feedback to agents days or weeks after the interaction occurred. Everyone involved knows this is inadequate. A 2% sample means 98% of customer interactions receive zero quality oversight. Feedback that arrives two weeks late is too disconnected from the original interaction to drive meaningful behavioural change.
Large language models have fundamentally changed this equation. With modern speech-to-text pipelines and LLM-powered analysis, it is now possible -- and economically viable -- to analyse 100% of customer interactions in near real-time. This is not an incremental improvement. It is a category shift in how contact centres operate.
The Technical Pipeline
Understanding the technology stack behind AI-powered call analytics is important for evaluating its capabilities and limitations.
The stack layers -- speech-to-text, LLM analysis, and integration -- each require distinct engineering attention.
Speech-to-Text
The first step is converting audio to text. Modern speech-to-text engines (Whisper, Deepgram, AssemblyAI, Google Speech-to-Text) achieve word error rates below 5% for clear audio in supported languages, with speaker diarisation (identifying who said what) as a standard feature. For South African contact centres, multilingual support is critical -- agents and customers frequently switch between English, Afrikaans, Zulu, and other languages within a single call. The best engines now handle this code-switching reasonably well, though accuracy does degrade compared to monolingual conversations.
At Pepla, our Voice AI platform processes audio through a pipeline that includes noise reduction, speaker diarisation, language detection, and transcription. The output is a structured transcript with speaker labels, timestamps, and confidence scores for each segment.
LLM Analysis
Once you have a transcript, the LLM does the heavy lifting. A single pass through a well-prompted model can extract multiple dimensions of analysis simultaneously: compliance adherence, sentiment trajectory, topic classification, resolution status, agent performance metrics, and identified issues. The key architectural decision is whether to run multiple focused prompts (one for compliance, one for sentiment, one for quality) or a single comprehensive prompt. We have found that a hybrid approach works best -- a comprehensive initial analysis followed by targeted deep-dives on flagged interactions.
Integration Layer
Raw analysis is useless without integration into existing operational systems. Results need to flow into workforce management platforms, CRM systems, coaching tools, and management dashboards. This integration layer is often the most engineering-intensive part of the system, requiring careful attention to data formats, API rate limits, and real-time versus batch processing decisions.
QA Automation: From Sampling to Census
The shift from sampling 2% of calls to analysing 100% of calls has implications that go beyond simply having more data.
When you analyse every interaction, you stop looking for problems in a sample and start seeing patterns across the entire operation. The questions you can answer change fundamentally.
With sampling, you can answer: "Is this agent generally performing well based on a handful of calls?" With census-level analysis, you can answer: "What specific types of calls does this agent handle exceptionally well, and where do they consistently struggle? Which product issues generate the most customer frustration? Which scripts are associated with the highest resolution rates? At what time of day does service quality degrade, and why?"
Practical examples from production deployments:
- Compliance coverage goes to 100%. In financial services, every call must include certain disclosures. With manual QA, compliance violations in the unsampled 98% go undetected until a customer complaint or regulatory audit surfaces them. Automated analysis catches every violation in real-time.
- Emerging issues surface in hours, not weeks. When a product defect starts generating customer calls, the pattern appears in the analytics within hours. Manual QA might take weeks to notice a trend in their 2% sample.
- Agent coaching becomes specific and timely. Instead of generic feedback based on a few calls, supervisors can provide targeted coaching based on patterns across hundreds of interactions. "You handle billing queries well, but your average handle time on technical support calls is 40% above the team average -- let us look at why."
Analysing 100% of calls replaces guesswork from 2% sampling with full operational visibility.
Real-Time Sentiment Analysis
Sentiment analysis in contact centres is not new. What is new is the granularity and accuracy that LLMs bring to it. Traditional keyword-based sentiment analysis was crude -- it might catch that a customer used the word "angry" but miss sarcasm, frustration expressed through questions, or the subtle shift from patience to irritation that happens mid-call.
LLMs understand context. They can track sentiment as a trajectory over the course of a conversation, identifying the precise moment a customer's mood shifts and what triggered the change. Was it a long hold time? An unhelpful response? A policy the customer perceives as unfair? This level of detail is actionable in ways that a simple positive/negative classification is not.
Real-Time Applications
When sentiment analysis runs in real-time (on the live call, not the recording), it enables intervention before a situation escalates. A supervisor dashboard can flag calls where sentiment is deteriorating rapidly, allowing a senior agent to join or take over the call. Some systems provide real-time prompts to the agent: suggested responses, relevant knowledge base articles, or de-escalation techniques tailored to the specific scenario.
The latency requirements for real-time analysis are significant. The pipeline -- audio capture, transcription, analysis, and delivery -- needs to complete in under 10-15 seconds to be useful for in-call intervention. This typically requires streaming transcription (processing audio as it arrives, not waiting for the call to end) and lightweight, fast-inference models for the real-time layer, with deeper analysis happening asynchronously after the call.
Compliance Checking
For regulated industries -- financial services, insurance, healthcare, telecommunications -- compliance is not optional. Agents must follow prescribed scripts, make required disclosures, obtain proper consent, and avoid certain types of statements. The consequences of non-compliance range from fines to licence revocation.
LLM-based compliance checking is remarkably effective because compliance rules can be expressed as natural language instructions that the model evaluates against the transcript. "Did the agent inform the customer that the call is being recorded? Did the agent verify the customer's identity before discussing account details? Did the agent read the required terms and conditions disclosure before processing the application?"
These are not simple keyword matches. The model understands paraphrasing ("This call may be recorded for quality purposes" is equivalent to "I should let you know we record our calls"), handles interruptions (the disclosure was started but the customer interrupted and it was never completed), and evaluates completeness (the agent mentioned three of four required risk factors).
Agent Coaching and Development
Perhaps the highest-value application of call centre AI is in agent development. Traditional coaching is limited by the supervisor's ability to listen to calls, which means most agents receive infrequent, surface-level feedback. AI-powered analysis enables a fundamentally different coaching model.
- Individualised development plans based on patterns across all of an agent's interactions, not a handful of randomly sampled calls.
- Skill-specific feedback with concrete examples: "Here is a call where your objection handling was excellent. Here is a similar call where you could have used the same technique but defaulted to escalation instead."
- Peer benchmarking that identifies top performers for specific call types and extracts the patterns that make them effective, creating a data-driven best practices library.
- Progress tracking that measures coaching effectiveness over time. Did the agent's first-call resolution rate improve after the coaching intervention? Did average handle time decrease for the targeted call type?
The shift is from punitive QA -- catching agents doing things wrong -- to developmental QA -- helping agents get better at what they do. AI makes this possible at scale.
Shift from punitive QA to developmental QA -- help agents improve, not just catch mistakes.
Integration Patterns
For organisations considering implementing AI-powered call analytics, the integration architecture matters as much as the AI models themselves.
Batch Processing
The simplest pattern: recordings are uploaded after calls complete, transcribed, and analysed in batch. Results are available within hours, typically overnight. This is sufficient for QA reporting, trend analysis, and next-day coaching. It is the easiest to implement and the most cost-effective, as batch processing allows for optimised throughput.
Near Real-Time
Recordings are processed as they complete, with results available within minutes. This enables same-day coaching and rapid identification of emerging issues. It requires a more robust processing pipeline but does not require the low-latency infrastructure of true real-time systems.
Real-Time Streaming
Audio is processed as the call is happening, with analysis delivered to supervisors and agents during the conversation. This is the most technically demanding pattern, requiring streaming speech-to-text, fast inference, and a delivery mechanism (usually WebSocket-based) to push insights to agent desktops or supervisor dashboards. The reward is the ability to intervene during interactions rather than only learning from them afterwards.
Organisations that move early gain compounding advantages in agent performance and operational visibility.
Practical Considerations
Before diving into implementation, organisations should consider several practical factors.
- Data privacy. Call recordings contain personal information. Processing them through AI systems -- whether cloud-hosted or on-premise -- must comply with POPIA, GDPR, or relevant local regulations. Customer consent for recording and analysis must be clearly obtained.
- Audio quality. AI analysis is only as good as the audio it processes. Noisy environments, poor headset quality, or compression artifacts all degrade transcription accuracy. Invest in audio infrastructure before investing in AI analysis.
- Change management. Agents may perceive AI monitoring as surveillance. Framing the system as a development tool rather than a punishment mechanism is essential for adoption. Involve agents in the design process and share how the insights will be used.
- Cost modelling. Processing every call through an LLM has a per-call cost. For high-volume contact centres, this cost is significant. Model the economics carefully, considering that smaller models may be sufficient for initial screening, with detailed analysis reserved for flagged interactions.
The transformation of call centres through LLM technology is not theoretical -- it is happening now, across industries and geographies. The organisations that move early gain operational advantages that compound over time: better agent performance, higher customer satisfaction, tighter compliance, and deeper operational visibility. The technology is mature enough for production. The question is no longer whether to adopt it, but how quickly and at what scale.




