Get Started
Build with Atla
- Usage
- Models
- Metrics
- Best Practices
- Use Cases
Self-deployment
Best Practices
Getting the most out of your Atla evaluator
Our tools are designed to be highly customizable to fit your specific needs. We have compiled guidance based on common feedback and internal benchmarks.
Our Evaluation Components guide describes several
optional fields designed for common use cases (e.g., the model_context
field
for RAG applications). We recommend checking if these built-in fields meet
your needs.
When you need custom evaluation criteria, use the Atla Alignment Platform to create and iterate on your metrics. The platform provides a structured way to define, test, and refine your evaluation criteria while maintaining consistency across your organization.
Including a few high-quality examples in your evaluation criteria can significantly improve the consistency and accuracy of evaluations. These examples help calibrate the evaluator and provide clear benchmarks for what constitutes good or poor performance.
For chatbot evaluations, include relevant conversation history in your model input. This context helps the evaluator better understand the full scope of the interaction and provide more accurate assessments of response quality and relevance.
When evaluating large numbers of interactions, use the async client in our Python SDK to improve performance. The async client can process multiple evaluations concurrently, significantly reducing overall processing time for batch evaluations.
To prevent abuse and maintain performance, we enforce rate limits. Adjust the number of concurrent requests to avoid hitting these limits. For more information about rate limits for your usage tier, visit our website.