Atla supports two approaches for adding custom evaluation metrics to your traces:
  1. Automated Metrics - Configure metrics in the Atla UI that we run automatically and scalably using LLMs-as-a-Judge.
  2. Programmatic Metrics - Log metrics directly from your code using the Atla SDK.
Both approaches allow you to track specific performance indicators and evaluation results that matter to your application.

Supported Data Types

The permitted data_type fields are:
Data TypeDescriptionExpected Value
likert_1_to_5A numeric 1-5 scaleint between 1 and 5
booleanA boolean scalebool (True/False)

Automated Metrics

Automated metrics allow you to configure custom evaluations directly in the Atla UI without modifying your code. These metrics run automatically on your traces at the trace level using LLMs-as-a-Judge (LLMJ), evaluating the entire conversation or interaction flow. Atla has a research specialization in building frontier LLMJs, check out our Selene model family. You may initialize any number of custom, automated metrics. To create one:
  1. Configure in UI: Navigate to the Metrics tab in the Atla UI, and select Create Metric.
  2. Define attributes: Define the name, data type, and fraction of traces on which it should run.
  3. Define evaluation criteria: Provide your custom evaluation prompt.
  4. Advanced attributes: Define metadata filters to which the metric applies.

Programmatic Metrics

Use programmatic metrics when you want to evaluate and log metrics directly in your code during runtime. This approach is ideal for custom evaluations that benefit from being in the active runtime environment.

Usage

from atla_insights import instrument, set_custom_metrics

@instrument()
def my_function():
    set_custom_metrics({"my_metric": {"data_type": "boolean", "value": False}})  

Examples

Boolean Metric

from atla_insights import instrument, set_custom_metrics

@instrument()
def validate_response():
    response = get_ai_response()
    is_valid = validate_output(response)

    set_custom_metrics({
        "output_validation": {
            "data_type": "boolean",
            "value": is_valid
        }
    })

    return response

Likert Scale Metric

from atla_insights import instrument, set_custom_metrics

@instrument()
def rate_response_quality():
    response = get_ai_response()
    quality_score = evaluate_quality(response)  # Returns 1-5

    set_custom_metrics({
        "response_quality": {
            "data_type": "likert_1_to_5",
            "value": quality_score
        }
    })

    return response

Multiple Metrics

from atla_insights import instrument, set_custom_metrics

@instrument()
def comprehensive_evaluation():
    response = get_ai_response()

    # Evaluate multiple aspects
    accuracy = check_accuracy(response)  # Boolean
    helpfulness = rate_helpfulness(response)  # 1-5 scale

    set_custom_metrics({
        "accuracy": {
            "data_type": "boolean",
            "value": accuracy
        },
        "helpfulness": {
            "data_type": "likert_1_to_5",
            "value": helpfulness
        }
    })

    return response