Alignment Platform
Using the Alignment Platform
Our Alignment Platform helps users create and refine custom evaluation criteria. Think of it as a way to prompt engineer Selene effectively.
Naturally, different use cases require different ways to evaluate AI responses. We trained Selene to be steerable. You can use few-shot examples, make Selene stricter / more lenient, and much more.
For example, if you’re building an AI therapy chatbot, you can edit the
atla_default_helpfulness
metric to ensure responses show empathy while avoiding therapeutic claims.
Walkthrough
Generate your evaluation prompt
Define your own eval or adapt one of our templates.
-
Tailor the evaluation criteria to your domain / use case
-
Select your desired scoring format in the Metric Type dropdown
-
Select your input variables in the Input Variables dropdown
Review your generated prompt
-
Ensure the generated evaluation criteria and scoring rubric align with your objective
-
Add a metric name (used later via the Atla API as a Custom Metric)
Add test data
There are two main ways you can add test data to the Alignment Platform:
Upload your CSV
-
Upload your own data via Upload CSV
-
Map the column names that correspond to each input variable
If you don’t have ‘Expected Score’ labels already, you can add these in the UI once your data has been uploaded.
Generate test cases
-
Use Generate a test case to return a synthetic test case
-
Review and adjust the ‘Expected Score’ label as needed
Test out the metric
- Click Run evaluations to return Selene scores on your test data
The Alignment Score measures how closely Selene’s predictions match expected scores. For reliable deployment in CI/CD pipelines or monitoring systems, aim for moderate (50-75%) or high (≥75%) alignment scores.
The Alignment Score is calculated as the normalized Mean Absolute Error (MAE) of Selene’s predictions against your expected scores.
Align your eval metric
There are two main ways you can align your eval metric to your expected scores:
Adjust your prompt
-
Access your prompt via the Show prompt toggle
-
Directly edit your prompt OR use the Describe how to edit the prompt functionality to let AI make the edit for you
Your prompts will be versioned in the Alignment Platform so you can revert back as you wish.
Add few-shot examples
- Click the icon to directly add misaligned test cases (highlighted in amber or red) as few-shot examples
OR
-
Select Edit few-shot examples (beneath your prompt) to access your few-shot library
-
Click Add few-shot to add your own example
-
Use Generate few-shot to return a synthetic example
Deploy!
When you are confident that your evaluation metric is calibrated, you can deploy it to be used with the Atla API.
This evaluation metric is custom, so only you will be able to access it via your API key.