Custom Metrics
Creating and using custom metrics
Using Metrics
Custom evaluation metrics are used to enumerate the dimensions of quality you want to evaluate against.
They are defined by evaluation prompts that can be iteratively refined to get the best possible results.
Building Custom Metrics
Initialize Atla
Initialize Atla by creating an instance of the Atla
class.
Create a metric
Create a metric by specifying the metric name, type and (optionally) a description.
Create a prompt
Create an evaluation prompt that describes how to evaluate the metric, and set it as the active version.
A metric will use the “active” version of its evaluation prompt unless otherwise specified. This must be set before running an evaluation.
(Optional) Add few-shot examples
Add few-shot examples to help the model understand how to evaluate the metric.
Run an evaluation
Run an evaluation using the metric.
(Optional) Iteratively refine the metric
Iteratively refine the metric’s evaluation prompt. New prompts are automatically versioned.
You can still use the old version of the prompt by specifying the version number:
Custom metrics can also be created without any coding via the Eval Copilot (beta).
Learn more about using custom metrics in our API Reference.