Development Workflow Steps
1
Create custom metrics
Navigate to your Metrics screen and create custom metrics that matter to your domain. While we automatically flag error types and failure patterns, you can add domain-specific metrics like
tool_call_efficiency
.2
Configure metadata for test runs
Set up metadata to track different test configurations. For example, you might run three versions of a prompt that are concise, balanced, and verbose:
3
Run your tests
Execute your test suite with the configured metadata and custom metrics enabled.
4
Compare results
Navigate to the Compare screen to analyze the relative error rates and performance on your custom metrics across different configurations.
5
Deep dive into issues
Click “View” on any column to deep dive into the specific step-level errors in the traces to understand deeper issues.
6
Continue iterating
Continue to iterate on different dimensions. Use the “experiment” metadata tag to track performance of different experiments, such as testing different architectures and setups.