Creating an Experiment
To create an experiment, you’ll need:- A base runnable to test variations against
- Previous runs of the base runnable to use to source realistic input resources from
- A set of automated evaluators and/or gold labels from previous runs
- Configuration for candidate runnables to test
Experiment Configuration
Key fields in the experiment configuration:runnable_id: The ID of the base runnable being testedevaluator_ids: Array of evaluator IDs to use for assessmentrun_filters: Optional filters to select specific runs for evaluationcandidate_runnables: Configuration for variations to testtimeout_seconds: Maximum time allowed per run (-1 for no timeout)max_runs: Maximum number of runs to evaluate
Managing Experiments
List experiments for a runnable:Experiment Results
Results are available through the run evaluations associated with each experiment run. You can analyze these to compare performance across different configurations.Best Practices
- Evaluator Selection: Choose evaluators that measure relevant aspects of performance for your use case.
- Timeout Configuration: Set appropriate timeouts based on expected processing time.
- Run Filters: Use filters to focus evaluation on specific types of inputs or scenarios.
- Resource Management: Monitor resource usage when running large experiments.

