Get Started
Learn about Selene
Build with Selene
- Overview
- Usage
- Build
- Metrics
- Best Practices
- Use Cases
Self-deployment
Best Practices
Getting the most out of your Atla evaluator
Our tools are designed to be highly customizable to fit your specific needs. We have compiled guidance based on common feedback and internal benchmarks.
Our Evaluation Components guide describes several optional fields designed for
common use cases (e.g., the model_context
field for RAG applications). We recommend checking if these built-in
fields meet your needs.
When you need custom evaluation criteria, you can use our SDK or Atla Eval Copilot to create and iterate on your evaluation metrics.
Including a few high-quality examples in your evaluation criteria can significantly improve the consistency and accuracy of evaluations. These examples help calibrate the evaluator and provide clear benchmarks for what constitutes good or poor performance.
For chatbot evaluations, include relevant conversation history in your model input. This context helps the evaluator better understand the full scope of the interaction and provide more accurate assessments of response quality and relevance.
When evaluating large numbers of interactions, use the async client in our Python SDK to improve performance. The async client can process multiple evaluations concurrently, significantly reducing overall processing time for batch evaluations.
To prevent abuse and maintain performance, we enforce rate limits. Adjust the number of concurrent requests to avoid hitting these limits. For more information about rate limits for your usage tier, visit our website.
Selene Mini is 2x faster than Selene, if you need to speed up your evals, consider using it.
Selene Mini is optimized to run on a particular prompt template. Make sure you follow these instructions for best results.