You can use the alignment tool on Eval Copilot to easily align Selene to expert scores. Here is how you can go about it:

  1. Create your eval task using Atla metrics: You can choose from the default metrics or create a custom one. You just need a few lines of description and the Eval Copilot generates a prompt for you. You can adjust the prompt to your preference or accept it as is. When you are saving your task, ensure you have chosen all the right fields that we spoke about in the “Develop your test cases” section.
  2. Upload your test cases: You can manually create them on the Eval Copilot, or upload from a csv of your already created test cases, or you could also ask the Eval Copilot to generate some for you. Again, we recommend adding about 20 test cases for each metric.
  3. Align evaluations: The goal of this exercise is to achieve an alignment score of 80% or above between Selene and experts. When you run evaluations, you can see the scores and critiques generated by Selene. You can iterate on the eval prompts, or add edge cases as few-shot examples till you are satisfied with the alignment score.
  4. Run your evaluations: Once you are happy with the alignment, you can make that version of the prompt active by deploying it. Now the metric is available and ready to use from the Python SDK.

Read a detailed walkthrough of how to do this on our Eval Copilot.