Evaluating AI Applications for Customer Service

Evaluating AI Applications for Customer Service
A comprehensive resource to guide customer service leaders as they incorporate AI into their departments.
The Challenge
You've been tasked to bring AI into your Customer Service organization. You first get a sense of what investments would be most impactful to customers and the business - increasing the number of conversations that customers can resolve independently or perhaps reducing the mean time to resolution. You've even thought about some of the solutions that could impact those metrics: conversational chatbots, workflow automation, agent assist, or a "super tool."
But then it starts to get overwhelming — the number of vendors in the space is mind-boggling, and finding information on Reddit or anywhere that isn't sales-oriented is nil. Crickets.
If that's you — this resource was written just for you.  Here's what it covers:
1
What Makes AI Unique
Gain insight into the qualities of Machine Learning and Generative AI-based features that are different from what you're used to.
2
Use-case-specific considerations
Learn about the unique factors to consider when evaluating AI applications for different customer service scenarios.
3
Vendor Questions
Discover key questions to ask vendors when assessing AI solutions for your organization.
4
Evaluation Methods
Explore ways to gather feedback and evaluate AI features and products effectively.
What Makes AI Unique
Before diving in it's helpful to understand what's different about Machine Learning (ML) and Generative AI-based features and how this may affect your evaluation.
Ethical Concerns
Both ML and Generative AI have ethical considerations, such as bias and fairness.  Responsible AI use requires organizations to consider how users can cause harm and be harmed; and to mitigate that harm.
Probabilistic
Generative AI responses are based on patterns found in the data they're trained on and presented with. This means that the results they produce are not always certain, but are instead based on probabilities. Use common sense and asses whether the results make sense rather than taking them as fact.
Explainability
Understanding the "why" is difficult as the technology used identifies patterns through observing many data points; it isn't configured with rules or specific parameters that precisely control the output.
Adaptability
AI models deployed with adaptive learning techniques can leverage feedback to improve over time with new data.
Understanding these unique qualities can help you make informed decisions when selecting AI applications for your organization.
Use Case Specific Evaluation
How you evaluate a customer-facing Chatbot will be different from a tool built to assist an internal team of agents. Each use case has different associated risks and, therefore, associated work you'll need to do, features to consider for testing, launching, and measuring performance.
Chatbots
Risk Level: Highest

Pre-Work

Refine KB

Align terminology and brand voice

Resource strategy for Chatbot administrator, QA, and Knowledge Management

Testing Strategy

Utilize a test bot, ideally can be tested programmatically with various customer personas and impressions

Launch Strategy

Canary launch by segment or traffic %

Increase traffic based on KPIs

Impact Assessment

Gauge CX impact

Measure deflection, dissatisfaction, relevance, brand voice, helpfulness
Agent Assist
Risk Level: Lower

Pre-Work

Refine SOPs and guidebooks to remove old content

Identify data strategy and meta-data needed to navigate content effectively

API access for actions and data enrichment from external sources

Resource strategy for administrator, internal knowledge management, integrations

Testing Strategy

Test with a small group of experts

Capture bad suggestions and harmful content that, if used, would negatively impact CX and should be disabled.

Ensure brand voice consistency for suggested replies

Validate integrations

Launch Strategy

Train agents on new features and processes

Provide access to early adopters in each team to foster a group of internal champions

Launch broadly

Improve documentation and configuration based on errors & feedback

Impact Assessment

Measure average handle time

Quantify harmfulness of suggestions and error rate
Automation
Risk Level: Varies

Pre-Work

- Develop a monitoring strategy- Separate high and low-risk actions- Resource strategy for administrator, integrations

Testing Strategy

- Manual workflow testing- Programmatically test a wide range of scenarios- Grade results and identify failure scenarios and adjustments needed

Launch Strategy

- Context-specific launch based on risk to the CX, reversibility of the action, ability to measure, audit, and troubleshoot- Launch workflows individually based on ROI

Impact Assessment

- Context-specific deflection, wait time, average handle time, or other metric- Impact is cumulative as additional workflows are enabled
Voice Assistants
Risk Level: Highest

Pre-Work

- Ensure robustness for accents and issue types

- Resource strategy for administrator, QA, and Knowledge Management

Testing

- Grade calls

- Collect user ratings

- Test with diverse use cases

Launch Strategy

- Canary rollout

- A/B testing for changes

Impact Assessment

- Compare AI agent to BPO agents

- Example: Higher ratings for AI agent in telecom provider case
Data Q&A
Risk Level: Varies

Pre-Work

- Access management to limit access to PII- Identify data sources and views most relevant to issue types and agent needs- Document data quality, lineage, and business rules- Resource strategy for administrator, integrations

Testing Strategy

- Manual data analysis and exploration- Validate accuracy and improve documentation around usage and limitations

Launch Strategy

- Internal training with a focus on usage and limitations- Launch to more senior members to validate data quality and risk- Define access based on risk (e.g. limited to tier 2 support or broad access)

Impact Assessment

- Measure average handle time or mean time to resolution to understand the impact to CX; potentially the number of participants to measure impact internally- Continuously expand views available based on needs identified
Questions to Ask Vendors
When choosing an AI application for your customer service team, it's important to ask the right questions to make sure the solution meets your specific needs. If this list seems overwhelming, focus on the questions that best match your needs.
These questions are designed to help you assess the maturity of the AI solution operationally and from a data science perspective. They also evaluate the maturity of the data model and data management features affecting onboarding time, result quality, and the application's ability to adapt to your organizational changes in the long run.
Open in Google Docs to copy the questions into a document you can edit
docs.google.com
Loading...
AI Capabilities and Differentiation
What specific customer service challenges does your AI application address?
How is your product different from ChatGPT?
What type of data is supported by your AI solution? Can it handle structured data like Excel, reports, or images, in addition to unstructured text? How about database tables?
How does your AI handle complex and multi-part questions?
Does your AI have the capability to handle objections and demonstrate emotional intelligence with upset customers?
What is your approach to mitigate bias and ensure the ethical use of AI
Integration and Customization
How does your AI solution integrate with our existing ticketing system and other internal and 3rd party tools? What work is needed on our side to enable integrations?
How customizable is your AI solution to fit our specific business requirements?
Can your AI be trained on industry-specific and company-specific terminology?
What support do you offer for incorporating brand voice, canned responses, and topic guardrails?
Does your AI offer translation or multi-language support?
In what ways can prompts be modified?
Implementation and Support
How should we assess our current data readiness for implementing your AI solution?
What work is needed beforehand by our team to use this tool for USE_CASE successfully?
What does the onboarding process look like and how long does it typically take?
What kind of training and support do you offer during the onboarding process?
How do you handle model updates, and are these updates documented anywhere?
What kind of resources are needed to manage and support the solution on an ongoing basis?
Data Management and Security
How is our data used, and is it used for model training specifically?
Do you have SOC 2 Type 2 or ISO 27001 certification?
Can you provide a list of subprocessors?
What data security and privacy measures are in place to protect our data?
What kind of data is available to export?
Training and Learning Mechanisms
Is there a learning mechanism or adaptive learning capability in your AI system?
How is feedback on AI-generated results used to improve the system?
What capabilities exist for adjusting and improving the quality of results? Does the AI itself critique and improve its responses?
Evaluation and Trials
How can we evaluate your product before making a decision?
Do you have a recorded demo of your AI solution in action?
Can you provide case studies or examples of successful implementations in similar industries?
Performance Evaluation and Quality Assurance
Does your product include a way to evaluate the quality of AI output? Is it customizable?
What does pre-launch testing look like? Can we test the AI with a predetermined list of questions or instructions in bulk?
Does your solution support A/B testing to validate how changes affect key metrics?
Can your solution be rolled out to users in stages? For instance, to a subset of users. How can we control access?
Is there any auto-scoring or auto-evaluation feature in your AI system?
Can we see the information retrieved by the AI during its decision-making process, such as documents or other inputs, in order to better understand what led to a response?
Evaluation Methods
To evaluate AI products effectively, adopt a comprehensive strategy that incorporates multiple information sources oriented around the most important problem you're trying to solve with an AI solution.
Online research
Use tools like Perplexity to summarize discussions from forums such as Reddit.
First-hand experiences
Request customer references from your sales representatives.
Engage with professional communities like Support Driven to gain first-hand insights from peers.
Analyze company materials
Watch demo videos to visualize product capabilities.
Review documentation to assess support quality and implementation processes.
Conduct a proof of concept
Perform hands-on evaluations or trials to test the AI in your specific environment, understand nuances and identify strengths and limitations
By combining these methods, you'll gain a well-rounded understanding of the AI product's strengths, limitations, and suitability for your organization. This approach ensures a thorough evaluation, helping you make an informed decision based on diverse perspectives and real-world applications.
Final Thoughts
It is impossible to eliminate all risks and know with full certainty that a solution will do what you need. The goal is to reduce risk so you can make a bet with confidence and a sense of what you can and can't do with the product and what resources and timeline can be expected to achieve your business goals.