🐣 Building Your First Schema¶

In this tutorial, you'll learn the fundamentals of Flycatcher by building a complete schema for a simple blog system. By the end, you'll understand how to:

Define a schema with multiple field types
Add constraints to validate data
Generate Pydantic models for API validation
Generate Polars validators for bulk operations
Generate SQLAlchemy tables for database access

Time to complete: ~15 minutes

🔨 What We're Building¶

We'll create a schema for blog posts with the following requirements:

Unique post IDs
Titles between 5-200 characters
Content with minimum length
Optional tags
Author emails (validated format)
View counts (non-negative)
Publication timestamps

This is a realistic example that demonstrates most of Flycatcher's core features.

📦 Prerequisites¶

Make sure you have Flycatcher installed:

pip install flycatcher
# or
uv add flycatcher

1️⃣ Step 1: Create Your First Schema¶

Let's start with a minimal schema. Create a new Python file called blog.py:

blog.py

from flycatcher import Schema, Field

class PostSchema(Schema):
    id: int = Field(primary_key=True)
    title: str
    content: str

That's it! You've defined your first schema with three fields.

Understanding the Code¶

Schema - Base class for all schemas. Uses metaclass magic to collect fields.
id: int = Field(primary_key=True) - Integer column marked as the primary key (useful for databases).
title: str / content: str - Simple string fields with no constraints.

2️⃣ Step 2: Add Field Constraints¶

Let's make our schema more robust by adding validation constraints:

from datetime import datetime
from flycatcher import Schema, Field

class PostSchema(Schema):
    id: int = Field(primary_key=True)
    title: str = Field(min_length=5, max_length=200)
    content: str = Field(min_length=100)
    author_email: str = Field(pattern=r'^[^@]+@[^@]+\.[^@]+$', index=True)
    view_count: int = Field(ge=0, default=0)
    published_at: datetime
    tags: str | None = Field(default=None)

What Changed?¶

min_length / max_length - Validates string length (titles must be 5-200 chars)
pattern - Regex validation for email format
ge=0 - "Greater than or equal" constraint (view count can't be negative)
default=0 - Default value when none is provided
nullable=True - Allows null/None values (tags are optional)
index=True - Creates a database index on this field (for faster queries)

Field Constraints

All constraints work across both Pydantic and Polars! Define once, validate everywhere.

3️⃣ Step 3: Generate a Pydantic Model¶

Now let's use our schema to validate individual blog posts (perfect for APIs):

from datetime import datetime

# Generate Pydantic model
Post = PostSchema.to_pydantic()

# Create and validate a post
post = Post(
    id=1,
    title="My First Blog Post",
    content="This is the content of my blog post. " * 10,  # Make it long enough
    author_email="alice@example.com",
    view_count=42,
    published_at=datetime(2024, 1, 15, 10, 30),
    tags="python, tutorial"
)

# Access fields
print(f"Post ID: {post.id}")
print(f"Title: {post.title}")

# Serialize to JSON
print(post.model_dump_json(indent=2))

What Happens Here?¶

to_pydantic() generates a Pydantic BaseModel from your schema
All field constraints are translated to Pydantic validators
You get full Pydantic functionality: validation, serialization, type hints

Try Breaking It!¶

What happens if you violate a constraint?

# This will raise a ValidationError
invalid_post = Post(
    id=1,
    title="Hi",  # Too short! (min_length=5)
    content="Short",  # Too short! (min_length=100)
    author_email="not-an-email",  # Invalid format
    view_count=-5,  # Negative! (ge=0)
    published_at=datetime.now(),
)

Pydantic will tell you exactly what's wrong with helpful error messages!

For example, running the above code will print an error like:

pydantic_core._pydantic_core.ValidationError: 4 validation errors for Post
title
  String should have at least 5 characters [input_value='Hi', input_type=str]
content
  String should have at least 100 characters [input_value='Short', input_type=str]
author_email
  Invalid email address: value is not a valid email address [input_value='not-an-email', input_type=str]
view_count
  Input should be greater than or equal to 0 [input_value=-5, input_type=int]

You'll get a detailed list of exactly which fields failed and why!

4️⃣ Step 4: Generate a Polars Validator¶

For bulk operations (like validating 10,000 blog posts from a CSV), use the Polars validator:

import polars as pl

# Generate Polars validator
PostValidator = PostSchema.to_polars_validator()

# Create sample data (imagine this came from a CSV)
df = pl.DataFrame({
    "id": [1, 2, 3],
    "title": ["First Post", "Second Post", "Another Great Article"],
    "content": ["This is long enough content for validation. " * 10] * 3,
    "author_email": ["alice@example.com", "bob@example.com", "charlie@example.com"],
    "view_count": [10, 25, 100],
    "published_at": [datetime(2024, 1, i) for i in range(1, 4)],
    "tags": ["python", "rust", None]  # Third post has no tags
})

# Validate the entire DataFrame
validated_df = PostValidator.validate(df, strict=True)

print(f"✓ Validated {len(validated_df)} posts")
print(validated_df)

What's Happening?¶

to_polars_validator() generates a validator optimized for DataFrames
validate() checks all constraints in bulk (much faster than row-by-row!)
strict=True raises an error if any row fails validation

Validation Modes¶

# Strict mode (default): Raise on errors
validated_df = PostValidator.validate(df, strict=True)

# Non-strict mode: Filter out invalid rows
valid_df = PostValidator.validate(df, strict=False)

# Show violations for debugging
validated_df = PostValidator.validate(df, strict=True, show_violations=True)

Performance

Polars validation is orders of magnitude faster than validating row-by-row with Pydantic. Use it for large datasets!

5️⃣ Step 5: Generate a SQLAlchemy Table¶

Finally, let's create a database table for storing our posts:

from sqlalchemy import create_engine, MetaData

# Generate SQLAlchemy table
metadata = MetaData()
PostTable = PostSchema.to_sqlalchemy(table_name="posts", metadata=metadata)

# Create an in-memory SQLite database
engine = create_engine("sqlite:///:memory:", echo=True)

# Create the table
metadata.create_all(engine)

# Insert data
with engine.connect() as conn:
    # Insert from our validated DataFrame
    conn.execute(PostTable.insert(), validated_df.to_dicts())
    conn.commit()

    # Query posts by author
    result = conn.execute(
        PostTable.select()
        .where(PostTable.c.author_email == "alice@example.com")
    )

    for row in result:
        print(f"Found post: {row.title}")

What's Happening?¶

to_sqlalchemy() generates a SQLAlchemy Table object
All relevant field attributes (primary keys, indexes, types) are translated
You get type-safe database operations with SQLAlchemy Core

Field Attributes in SQL¶

The schema's field attributes map to SQL features:

primary_key=True → PRIMARY KEY
unique=True → UNIQUE constraint
index=True → Creates an index
nullable=True → Allows NULL values
default=X → DEFAULT X

6️⃣ Step 6: Putting It All Together¶

Here's a complete example showing all three outputs in action:

from datetime import datetime
import polars as pl
from sqlalchemy import create_engine, MetaData
from flycatcher import Schema, Field

# 1. Define schema once
class PostSchema(Schema):
    id: int = Field(primary_key=True)
    title: str = Field(min_length=5, max_length=200)
    content: str = Field(min_length=100)
    author_email: str = Field(pattern=r'^[^@]+@[^@]+\.[^@]+$', index=True)
    view_count: int = Field(ge=0, default=0)
    published_at: datetime
    tags: str | None = Field(default=None)

# 2. Validate single record with Pydantic
Post = PostSchema.to_pydantic()
post = Post(
    id=1,
    title="Understanding Flycatcher",
    content="Flycatcher makes schema management easy. " * 10,
    author_email="author@example.com",
    published_at=datetime.now()
)
print(f"✓ Validated single post: {post.title}")

# 3. Validate bulk data with Polars
PostValidator = PostSchema.to_polars_validator()
df = pl.read_csv("posts.csv")  # Imagine you have this file
validated_df = PostValidator.validate(df, strict=True)
print(f"✓ Validated {len(validated_df)} posts from CSV")

# 4. Store in database with SQLAlchemy
metadata = MetaData()
PostTable = PostSchema.to_sqlalchemy(table_name="posts", metadata=metadata)
engine = create_engine("sqlite:///blog.db")
metadata.create_all(engine)

with engine.connect() as conn:
    conn.execute(PostTable.insert(), validated_df.to_dicts())
    conn.commit()
    print("✓ Stored posts in database")

✅ Result: One schema definition → Three optimized outputs → Complete data pipeline!

🐾 Next Steps¶

Congratulations! 🎉 You've built your first Flycatcher schema and learned how to use all three outputs.

Go Deeper (Coming Soon!)¶

[Custom Validators] - Add cross-field validation
[Field Types Reference] - Explore all available field types
[Why Flycatcher?] - Understand how it compares to alternatives

Try These Exercises¶

Add a rating field (rating: int = Field(ge=1, le=5))
Make title unique (unique=True)
Add a slug field (URL-safe version of title, with pattern validation)
Create a CommentSchema that references PostSchema

Get Help¶

💬 GitHub Discussions
🐛 Report Issues

Full Example Code¶

Here's the complete working example from this tutorial:

blog.py

"""Complete blog schema example."""
from datetime import datetime
import polars as pl
from sqlalchemy import create_engine, MetaData
from flycatcher import Schema, Field


class PostSchema(Schema):
    """Schema for blog posts."""

    id: int = Field(
        primary_key=True,
        description="Unique identifier for the post",
    )
    title: str = Field(
        min_length=5,
        max_length=200,
        description="Post title",
    )
    content: str = Field(
        min_length=100,
        description="Post content body",
    )
    author_email: str = Field(
        pattern=r'^[^@]+@[^@]+\.[^@]+$',
        index=True,
        description="Author's email address",
    )
    view_count: int = Field(
        ge=0,
        default=0,
        description="Number of views",
    )
    published_at: datetime = Field(
        description="Publication timestamp",
    )
    tags: str | None = Field(
        default=None,
        description="Comma-separated tags",
    )


def main():
    """Demonstrate all three outputs."""

    # 1. Pydantic for single record validation
    Post = PostSchema.to_pydantic()
    post = Post(
        id=1,
        title="My First Post",
        content="This is a great blog post! " * 15,
        author_email="author@example.com",
        published_at=datetime.now(),
        tags="python, tutorial"
    )
    print(f"✓ Created post: {post.title}")

    # 2. Polars for bulk validation
    PostValidator = PostSchema.to_polars_validator()
    df = pl.DataFrame({
        "id": [1, 2, 3],
        "title": ["First Post", "Second Post", "Third Post"],
        "content": ["Long content here! " * 15] * 3,
        "author_email": ["a@ex.com", "b@ex.com", "c@ex.com"],
        "view_count": [10, 25, 50],
        "published_at": [datetime.now()] * 3,
        "tags": ["python", "rust", None]
    })
    validated_df = PostValidator.validate(df, strict=True)
    print(f"✓ Validated {len(validated_df)} posts")

    # 3. SQLAlchemy for database operations
    metadata = MetaData()
    PostTable = PostSchema.to_sqlalchemy(table_name="posts", metadata=metadata)
    engine = create_engine("sqlite:///blog.db")
    metadata.create_all(engine)

    with engine.connect() as conn:
        conn.execute(PostTable.insert(), validated_df.to_dicts())
        conn.commit()
        print("✓ Stored posts in database")

        # Query example
        result = conn.execute(
            PostTable.select().where(PostTable.c.view_count > 20)
        )
        print(f"✓ Found {len(result.fetchall())} posts with >20 views")


if __name__ == "__main__":
    main()

Run it with:

python blog.py

Happy schema building! 🚀