Skip to main contentCreate custom evaluation suites to batch test your agent’s performance and ensure consistent, high-quality responses across different scenarios.
How evaluations work
-
Define evaluation questions - Build a set of test questions for each agent. You can either:
-
Manually create questions that represent common use cases
-
Select responses from existing agent conversations in the admin page to add to your evaluation set
-
Run batch tests - Execute all prompts in your evaluation set against the agent to see how it responds
-
Review results - Manually review the agent’s responses to ensure they meet your quality standards and expectations
Using feedback to improve evaluations
Encourage your team to actively use the thumbs-up/thumbs-down feature when interacting with AI agents. This feedback helps admins in two key ways:
- Identify improvement areas - Thumbs-down responses highlight where the agent needs work
- Build better evaluation sets - Filter and easily add thumbs-down responses to your evaluation suite to test fixes and prevent regressions
This systematic testing approach helps you:
- Verify agent performance before deploying changes
- Ensure consistency across common queries