Getting the most out of your Atla evaluator
Our tools are designed to be highly customizable to fit your specific needs. We have compiled guidance based on common feedback and internal benchmarks.
Use built-in fields when possible
Our Evaluation Components guide describes several optional fields designed for
common use cases (e.g., the model_context
field for RAG applications). We recommend checking if these built-in
fields meet your needs.
Customize your metrics
When you need custom evaluation criteria, you can use our SDK to create and iterate on your evaluation metrics.
Provide few-shot examples when possible
Including a few high-quality examples in your evaluation criteria can significantly improve the consistency and accuracy of evaluations. These examples help calibrate the evaluator and provide clear benchmarks for what constitutes good or poor performance.
Add conversation history to your model input when applicable
For chatbot evaluations, include relevant conversation history in your model input. This context helps the evaluator better understand the full scope of the interaction and provide more accurate assessments of response quality and relevance.
Use our Async Client for large datasets
When evaluating large numbers of interactions, use the async client in our Python SDK to improve performance. The async client can process multiple evaluations concurrently, significantly reducing overall processing time for batch evaluations.
To prevent abuse and maintain performance, we enforce rate limits. Adjust the number of concurrent requests to avoid hitting these limits. For more information about rate limits for your usage tier, visit our website.
Use Selene Mini for faster evals
Selene Mini is 2x faster than Selene, if you need to speed up your evals, consider using it.
Use the preferred template with Selene Mini
Selene Mini is optimized to run on a particular prompt template. Make sure you follow these instructions for best results.