pydantic_evals.lifecycle
Case lifecycle hooks for pydantic evals.
This module provides the CaseLifecycle class,
which allows defining setup, context preparation, and teardown hooks that run at different
stages of case evaluation.
CaseLifecycle
Bases: Generic[InputsT, OutputT, MetadataT]
Per-case lifecycle hooks for evaluation.
A new instance is created for each case during evaluation. Subclass and override any methods you need — all methods are no-ops by default.
The evaluation flow for each case is:
setup()— called before task execution- Task runs
prepare_context()— called after task, before evaluators; can enrich metrics/attributes- Evaluators run
teardown()— called after evaluators complete; receives the full result
Exceptions raised by setup() or prepare_context() are caught and recorded as
a ReportCaseFailure; teardown() is still called afterward so you can clean up.
Exceptions raised by teardown() propagate to the caller and may abort the evaluation.
If your teardown may raise and you don't want it to crash the evaluation run,
handle exceptions within your teardown() implementation itself.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
case
|
Case[InputsT, OutputT, MetadataT]
|
The case being evaluated. Available as |
required |
Example
from pydantic_evals import Case, Dataset
from pydantic_evals.evaluators.context import EvaluatorContext
from pydantic_evals.lifecycle import CaseLifecycle
class EnrichMetrics(CaseLifecycle):
async def prepare_context(self, ctx: EvaluatorContext) -> EvaluatorContext:
ctx.metrics['custom_metric'] = 42
return ctx
dataset = Dataset(cases=[Case(name='test', inputs='hello')])
report = dataset.evaluate_sync(lambda inputs: inputs.upper(), lifecycle=EnrichMetrics)
print(report.cases[0].metrics['custom_metric'])
#> 42
Source code in pydantic_evals/pydantic_evals/lifecycle.py
25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 | |
setup
async
setup() -> None
Called before task execution.
Override to perform per-case resource setup (e.g., create a test database,
start a service). The case metadata is available via self.case.metadata.
Source code in pydantic_evals/pydantic_evals/lifecycle.py
74 75 76 77 78 79 | |
prepare_context
async
prepare_context(
ctx: EvaluatorContext[InputsT, OutputT, MetadataT],
) -> EvaluatorContext[InputsT, OutputT, MetadataT]
Called after the task completes, before evaluators run.
Override to enrich the evaluator context with additional metrics or attributes derived from the task output, span tree, or external state.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
ctx
|
EvaluatorContext[InputsT, OutputT, MetadataT]
|
The evaluator context produced by the task run. |
required |
Returns:
| Type | Description |
|---|---|
EvaluatorContext[InputsT, OutputT, MetadataT]
|
The (possibly modified) evaluator context to pass to evaluators. |
Source code in pydantic_evals/pydantic_evals/lifecycle.py
81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 | |
teardown
async
teardown(
result: (
ReportCase[InputsT, OutputT, MetadataT]
| ReportCaseFailure[InputsT, OutputT, MetadataT]
),
) -> None
Called after evaluators complete.
Override to perform per-case resource cleanup. The result is provided so that teardown logic can vary based on success/failure (e.g., keep resources up for inspection on failure).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
result
|
ReportCase[InputsT, OutputT, MetadataT] | ReportCaseFailure[InputsT, OutputT, MetadataT]
|
The evaluation result — either a |
required |
Source code in pydantic_evals/pydantic_evals/lifecycle.py
97 98 99 100 101 102 103 104 105 106 107 108 109 | |