You can add custom evaluation metrics to your trace using the Atla SDK to track specific performance indicators and evaluation results that matter to your application. You can also set these up to run automatically using the “Metrics” section in the Atla UI.

Usage

from atla_insights import instrument, set_custom_metrics

@instrument()
def my_function():
    # Some GenAI logic here
    eval_result = False
    set_custom_metrics({"my_metric": {"data_type": "boolean", "value": eval_result}})

Supported Data Types

The permitted data_type fields are:
Data TypeDescriptionExpected Value
likert_1_to_5A numeric 1-5 scaleint between 1 and 5
booleanA boolean scalebool (True/False)

Use Cases

The primary intended use case is logging custom code evals that benefit from being in the active runtime environment. You can, however, log any arbitrary metric - including custom LLMJ eval results.

Examples

Boolean Metric

from atla_insights import instrument, set_custom_metrics

@instrument()
def validate_response():
    response = get_ai_response()
    is_valid = validate_output(response)
    
    set_custom_metrics({
        "output_validation": {
            "data_type": "boolean", 
            "value": is_valid
        }
    })
    
    return response

Likert Scale Metric

from atla_insights import instrument, set_custom_metrics

@instrument()
def rate_response_quality():
    response = get_ai_response()
    quality_score = evaluate_quality(response)  # Returns 1-5
    
    set_custom_metrics({
        "response_quality": {
            "data_type": "likert_1_to_5", 
            "value": quality_score
        }
    })
    
    return response

Multiple Metrics

from atla_insights import instrument, set_custom_metrics

@instrument()
def comprehensive_evaluation():
    response = get_ai_response()
    
    # Evaluate multiple aspects
    accuracy = check_accuracy(response)  # Boolean
    helpfulness = rate_helpfulness(response)  # 1-5 scale
    
    set_custom_metrics({
        "accuracy": {
            "data_type": "boolean", 
            "value": accuracy
        },
        "helpfulness": {
            "data_type": "likert_1_to_5", 
            "value": helpfulness
        }
    })
    
    return response