Skip to content

Dataset Serialization

Learn how to save and load datasets in different formats, with support for custom evaluators and IDE integration.

Overview

Pydantic Evals supports serializing datasets to files in two formats:

  • YAML (.yaml, .yml) - Human-readable, great for version control
  • JSON (.json) - Structured, machine-readable

Both formats support: - Automatic JSON schema generation for IDE autocomplete and validation - Custom evaluator serialization/deserialization - Type-safe loading with generic parameters

YAML Format

YAML is the recommended format for most use cases due to its readability and compact syntax.

Basic Example

from typing import Any

from pydantic_evals import Case, Dataset
from pydantic_evals.evaluators import EqualsExpected, IsInstance

# Create a dataset with typed parameters
dataset = Dataset[str, str, Any](
    name='my_tests',
    cases=[
        Case(
            name='test_1',
            inputs='hello',
            expected_output='HELLO',
        ),
    ],
    evaluators=[
        IsInstance(type_name='str'),
        EqualsExpected(),
    ],
)

# Save to YAML
dataset.to_file('my_tests.yaml')

This creates two files:

  1. my_tests.yaml - The dataset
  2. my_tests_schema.json - JSON schema for IDE support

YAML Output

# yaml-language-server: $schema=my_tests_schema.json
name: my_tests
cases:
- name: test_1
  inputs: hello
  expected_output: HELLO
evaluators:
- IsInstance: str
- EqualsExpected

JSON Schema for IDEs

The first line references the schema file:

# yaml-language-server: $schema=my_tests_schema.json

This enables: - ✅ Autocomplete in VS Code, PyCharm, and other editors - ✅ Inline validation while editing - ✅ Documentation tooltips for fields - ✅ Error highlighting for invalid data

Editor Support

The yaml-language-server comment is supported by:

  • VS Code (with YAML extension)
  • JetBrains IDEs (PyCharm, IntelliJ, etc.)
  • Most editors with YAML language server support

See the YAML Language Server docs for more details.

Loading from YAML

from pathlib import Path
from typing import Any

from pydantic_evals import Case, Dataset
from pydantic_evals.evaluators import EqualsExpected, IsInstance

# First create and save the dataset
Path('my_tests.yaml').parent.mkdir(exist_ok=True)
dataset = Dataset[str, str, Any](
    name='my_tests',
    cases=[Case(name='test_1', inputs='hello', expected_output='HELLO')],
    evaluators=[IsInstance(type_name='str'), EqualsExpected()],
)
dataset.to_file('my_tests.yaml')

# Load the dataset with type parameters
dataset = Dataset[str, str, Any].from_file('my_tests.yaml')


def my_task(text: str) -> str:
    return text.upper()


# Run evaluation
report = dataset.evaluate_sync(my_task)

JSON Format

JSON format is useful for programmatic generation or when strict structure is required.

Basic Example

from typing import Any

from pydantic_evals import Case, Dataset
from pydantic_evals.evaluators import EqualsExpected

dataset = Dataset[str, str, Any](
    name='my_tests',
    cases=[
        Case(name='test_1', inputs='hello', expected_output='HELLO'),
    ],
    evaluators=[EqualsExpected()],
)

# Save to JSON
dataset.to_file('my_tests.json')

JSON Output

{
  "$schema": "my_tests_schema.json",
  "name": "my_tests",
  "cases": [
    {
      "name": "test_1",
      "inputs": "hello",
      "expected_output": "HELLO"
    }
  ],
  "evaluators": [
    "EqualsExpected"
  ]
}

The $schema key at the top enables IDE support similar to YAML.

Loading from JSON

from typing import Any

from pydantic_evals import Case, Dataset
from pydantic_evals.evaluators import EqualsExpected

# First create and save the dataset
dataset = Dataset[str, str, Any](
    name='my_tests',
    cases=[Case(name='test_1', inputs='hello', expected_output='HELLO')],
    evaluators=[EqualsExpected()],
)
dataset.to_file('my_tests.json')

# Load from JSON
dataset = Dataset[str, str, Any].from_file('my_tests.json')

Schema Generation

Automatic Schema Creation

By default, to_file() creates a JSON schema file alongside your dataset:

from typing import Any

from pydantic_evals import Case, Dataset

dataset = Dataset[str, str, Any](cases=[Case(inputs='test')])

# Creates both my_tests.yaml AND my_tests_schema.json
dataset.to_file('my_tests.yaml')

Custom Schema Location

from pathlib import Path
from typing import Any

from pydantic_evals import Case, Dataset

dataset = Dataset[str, str, Any](cases=[Case(inputs='test')])

# Create directories
Path('data').mkdir(exist_ok=True)

# Custom schema filename (relative to dataset file location)
dataset.to_file(
    'data/my_tests.yaml',
    schema_path='my_schema.json',
)

# No schema file
dataset.to_file('my_tests.yaml', schema_path=None)

Schema Path Templates

Use {stem} to reference the dataset filename:

from typing import Any

from pydantic_evals import Case, Dataset

dataset = Dataset[str, str, Any](cases=[Case(inputs='test')])

# Creates: my_tests.yaml and my_tests.schema.json
dataset.to_file(
    'my_tests.yaml',
    schema_path='{stem}.schema.json',
)

Manual Schema Generation

Generate a schema without saving the dataset:

import json
from typing import Any

from pydantic_evals import Dataset

# Get schema as dictionary for a specific dataset type
schema = Dataset[str, str, Any].model_json_schema_with_evaluators()

# Save manually
with open('custom_schema.json', 'w') as f:
    json.dump(schema, f, indent=2)

Custom Evaluators

Custom evaluators require special handling during serialization and deserialization.

Requirements

Custom evaluators must:

  1. Be decorated with @dataclass
  2. Inherit from Evaluator
  3. Be passed to both to_file() and from_file()

Complete Example

from dataclasses import dataclass
from typing import Any

from pydantic_evals import Case, Dataset
from pydantic_evals.evaluators import Evaluator, EvaluatorContext


@dataclass
class CustomThreshold(Evaluator):
    """Check if output length exceeds a threshold."""

    min_length: int
    max_length: int = 100

    def evaluate(self, ctx: EvaluatorContext) -> bool:
        length = len(str(ctx.output))
        return self.min_length <= length <= self.max_length


# Create dataset with custom evaluator
dataset = Dataset[str, str, Any](
    cases=[
        Case(
            name='test_length',
            inputs='example',
            expected_output='long result',
            evaluators=[
                CustomThreshold(min_length=5, max_length=20),
            ],
        ),
    ],
)

# Save with custom evaluator types
dataset.to_file(
    'dataset.yaml',
    custom_evaluator_types=[CustomThreshold],
)

Saved YAML

# yaml-language-server: $schema=dataset_schema.json
cases:
- name: test_length
  inputs: example
  expected_output: long result
  evaluators:
  - CustomThreshold:
      min_length: 5
      max_length: 20

Loading with Custom Evaluators

from dataclasses import dataclass
from typing import Any

from pydantic_evals import Case, Dataset
from pydantic_evals.evaluators import Evaluator, EvaluatorContext


@dataclass
class CustomThreshold(Evaluator):
    """Check if output length exceeds a threshold."""

    min_length: int
    max_length: int = 100

    def evaluate(self, ctx: EvaluatorContext) -> bool:
        length = len(str(ctx.output))
        return self.min_length <= length <= self.max_length


# First create and save the dataset
dataset = Dataset[str, str, Any](
    cases=[
        Case(
            name='test_length',
            inputs='example',
            expected_output='long result',
            evaluators=[CustomThreshold(min_length=5, max_length=20)],
        ),
    ],
)
dataset.to_file('dataset.yaml', custom_evaluator_types=[CustomThreshold])

# Load with custom evaluator registry
dataset = Dataset[str, str, Any].from_file(
    'dataset.yaml',
    custom_evaluator_types=[CustomThreshold],
)

Important

You must pass custom_evaluator_types to both to_file() and from_file().

  • to_file(): Includes the evaluator in the JSON schema
  • from_file(): Registers the evaluator for deserialization

Evaluator Serialization Formats

Evaluators can be serialized in three forms:

1. Name Only (No Parameters)

evaluators:
- EqualsExpected
- IsInstance: str  # Using default parameter

2. Single Parameter (Short Form)

evaluators:
- IsInstance: str
- Contains: "required text"
- MaxDuration: 2.0

3. Multiple Parameters (Dict Form)

evaluators:
- CustomThreshold:
    min_length: 5
    max_length: 20
- LLMJudge:
    rubric: "Response is accurate"
    model: "openai:gpt-4o"
    include_input: true

Format Comparison

Feature YAML JSON
Human readable ✅ Excellent ⚠️ Good
Comments ✅ Yes ❌ No
Compact ✅ Yes ⚠️ Verbose
Machine parsing ✅ Good ✅ Excellent
IDE support ✅ Yes ✅ Yes
Version control ✅ Clean diffs ⚠️ Noisy diffs

Recommendation: Use YAML for most cases, JSON for programmatic generation.

Advanced: Evaluator Serialization Name

Customize how your evaluator appears in serialized files:

from dataclasses import dataclass

from pydantic_evals.evaluators import Evaluator, EvaluatorContext


@dataclass
class VeryLongDescriptiveEvaluatorName(Evaluator):
    @classmethod
    def get_serialization_name(cls) -> str:
        return 'ShortName'

    def evaluate(self, ctx: EvaluatorContext) -> bool:
        return True

In YAML:

evaluators:
- ShortName  # Instead of VeryLongDescriptiveEvaluatorName

Troubleshooting

Schema Not Found in IDE

Problem: YAML file doesn't show autocomplete

Solutions:

  1. Check the schema path in the first line of YAML:

    # yaml-language-server: $schema=correct_schema_name.json
    

  2. Verify schema file exists in the same directory

  3. Restart the language server in your IDE

  4. Install YAML extension (VS Code: "YAML" by Red Hat)

Custom Evaluator Not Found

Problem: ValueError: Unknown evaluator name: 'CustomEvaluator'

Solution: Pass custom_evaluator_types when loading:

from dataclasses import dataclass
from typing import Any

from pydantic_evals import Case, Dataset
from pydantic_evals.evaluators import Evaluator, EvaluatorContext


@dataclass
class CustomEvaluator(Evaluator):
    def evaluate(self, ctx: EvaluatorContext) -> bool:
        return True


# First create and save with custom evaluator
dataset = Dataset[str, str, Any](
    cases=[Case(inputs='test', evaluators=[CustomEvaluator()])],
)
dataset.to_file('tests.yaml', custom_evaluator_types=[CustomEvaluator])

# Load with custom evaluator types
dataset = Dataset[str, str, Any].from_file(
    'tests.yaml',
    custom_evaluator_types=[CustomEvaluator],  # Required!
)

Format Inference Failed

Problem: ValueError: Cannot infer format from extension

Solution: Specify format explicitly:

from typing import Any

from pydantic_evals import Case, Dataset

dataset = Dataset[str, str, Any](cases=[Case(inputs='test')])

# Explicit format for unusual extensions
dataset.to_file('data.txt', fmt='yaml')
dataset_loaded = Dataset[str, str, Any].from_file('data.txt', fmt='yaml')

Schema Generation Error

Problem: Custom evaluator causes schema generation to fail

Solution: Ensure evaluator is a proper dataclass:

from dataclasses import dataclass

from pydantic_evals.evaluators import Evaluator, EvaluatorContext


# ✅ Correct
@dataclass
class MyEvaluator(Evaluator):
    value: int

    def evaluate(self, ctx: EvaluatorContext) -> bool:
        return True


# ❌ Wrong: Missing @dataclass
class BadEvaluator(Evaluator):
    def __init__(self, value: int):
        self.value = value

    def evaluate(self, ctx: EvaluatorContext) -> bool:
        return True

Next Steps