Flycatcher

Define your schema once. Validate at scale. Stay columnar.

Built for DataFrames, powered across Pydantic, Polars, and SQLAlchemy.

Flycatcher is a DataFrame-native schema layer for Python. Define your data model once and generate optimized representations for every part of your stack:

🎯 Pydantic models for API validation & serialization
⚡ Polars validators for blazing-fast bulk validation
🗄️ SQLAlchemy tables for typed database access

Built for modern data workflows: Validate millions of rows at high speed, keep schema drift at zero, and stay columnar end-to-end.

❓ Why Flycatcher?¶

Modern Python data projects need row-level validation (Pydantic), efficient bulk operations (Polars), and typed database queries (SQLAlchemy). But maintaining multiple schemas across this stack can lead to duplication, drift, and manually juggling row-oriented and columnar paradigms.

Flycatcher solves this: One schema definition → three optimized outputs.

from flycatcher import Schema, Field, col, model_validator

class ProductSchema(Schema):
    id: int = Field(primary_key=True)
    name: str = Field(min_length=3, max_length=100)
    price: float = Field(gt=0)
    discount_price: float | None = Field(default=None, gt=0, nullable=True)

    @model_validator
    def check_discount():
        # Cross-field validation with DSL
        return (
            col('discount_price') < col('price'),
            "Discount price must be less than regular price"
        )

# Generate three optimized representations
ProductModel = ProductSchema.to_pydantic()         # → Pydantic BaseModel
ProductValidator = ProductSchema.to_polars_validator() # → Polars DataFrame validator
ProductTable = ProductSchema.to_sqlalchemy()       # → SQLAlchemy Table

Flycatcher lets you stay DataFrame-native without giving up the speed of Polars, the ergonomic validation of Pydantic, or the Pythonic power of SQLAlchemy.

🚀 Quick Start¶

Installation¶

pip install flycatcher
# or
uv add flycatcher

Define Your Schema¶

from datetime import datetime
from flycatcher import Schema, Field

class UserSchema(Schema):
    id: int = Field(primary_key=True)
    username: str = Field(min_length=3, max_length=50, unique=True)
    email: str = Field(pattern=r'^[^@]+@[^@]+\.[^@]+$', unique=True, index=True)
    age: int = Field(ge=13, le=120)
    is_active: bool = Field(default=True)
    created_at: datetime

Use Pydantic for Row-Level Validation¶

Perfect for APIs, forms, and single-record validation:

from datetime import datetime

User = UserSchema.to_pydantic()

# Validates constraints automatically via Pydantic
user = User(
    id=1,
    username="alice",
    email="alice@example.com",
    age=25,
    created_at=datetime.utcnow()
)

# Serialize to JSON/dict
print(user.model_dump_json())

Use Polars for Bulk Validation¶

Perfect for DataFrame-level validation:

import polars as pl

UserValidator = UserSchema.to_polars_validator()

# Validate 1M+ rows with blazing speed
df = pl.read_csv("users.csv")
validated_df = UserValidator.validate(df, strict=True)

validated_df.write_parquet("validated_users.parquet")

Use SQLAlchemy for Database Operations¶

Perfect for typed queries and database interactions:

from sqlalchemy import create_engine

UserTable = UserSchema.to_sqlalchemy(table_name="users")

engine = create_engine("postgresql://localhost/mydb")

# Type-safe queries
with engine.connect() as conn:
    result = conn.execute(
        UserTable.select()
        .where(UserTable.c.is_active == True)
        .where(UserTable.c.age >= 18)
    )
    for row in result:
        print(row)

✨ Key Features¶

Rich Field Types & Constraints¶

Use standard Python types with Field(...) constraints:

Python Type	Constraints	Example
`int`	`ge`, `gt`, `le`, `lt`, `multiple_of`	`age: int = Field(ge=0, le=120)`
`float`	`ge`, `gt`, `le`, `lt`	`price: float = Field(gt=0)`
`str`	`min_length`, `max_length`, `pattern`	`email: str = Field(pattern=r'^[^@]+@...')`
`bool`	-	`is_active: bool = Field(default=True)`
`datetime`	`ge`, `gt`, `le`, `lt`	`created_at: datetime = Field(ge=datetime(2020, 1, 1))`
`date`	`ge`, `gt`, `le`, `lt`	`birth_date: date`

All fields support (validation): nullable, default, description

SQLAlchemy-specific: primary_key, unique, index, autoincrement

Custom & Cross-Field Validation¶

Use the col() DSL for powerful field-level and cross-field validation that works across both Pydantic and Polars:

from datetime import datetime
from flycatcher import Schema, Field, col, model_validator

class BookingSchema(Schema):
    email: str
    phone: str
    check_in: datetime = Field(ge=datetime(2024, 1, 1))
    check_out: datetime = Field(ge=datetime(2024, 1, 1))
    nights: int = Field(ge=1)

    @model_validator
    def check_dates():
        return (
            col('check_out') > col('check_in'),
            "Check-out must be after check-in"
        )

    @model_validator
    def check_phone_format():
        cleaned = col('phone').str.replace(r'[^\d]', '')
        return (cleaned.str.len_chars() == 10, "Phone must have 10 digits")

    @model_validator
    def check_minimum_stay():
        # For operations not yet in DSL (like .is_in()), use explicit Polars format
        # Note: .dt.month() is available in DSL, but .is_in() is not yet supported
        import polars as pl
        return {
            'polars': (
                (~pl.col('check_in').dt.month().is_in([7, 8])) | (pl.col('nights') >= 3),
                "Minimum stay in July and August is 3 nights"
            ),
            'pydantic': lambda v: (
                v.check_in.month not in [7, 8] or v.nights >= 3,
                "Minimum stay in July and August is 3 nights"
            )
        }

Validation Modes¶

Polars validation supports flexible error handling:

# Strict mode: Raise on validation errors (default)
validated_df = UserValidator.validate(df, strict=True)

# Non-strict mode: Filter out invalid rows
valid_df = UserValidator.validate(df, strict=False)

# Show violations for debugging
validated_df = UserValidator.validate(df, strict=True, show_violations=True)

🎯 Complete Example: ETL Pipeline¶

import polars as pl
from datetime import datetime
from flycatcher import Schema, Field, col, model_validator
from sqlalchemy import create_engine, MetaData

# 1. Define schema once
class OrderSchema(Schema):
    order_id: int = Field(primary_key=True)
    customer_email: str = Field(pattern=r'^[^@]+@[^@]+\.[^@]+$', index=True)
    amount: float = Field(gt=0)
    tax: float = Field(ge=0)
    total: float = Field(gt=0)
    created_at: datetime

    @model_validator
    def check_total():
        return (
            col('total') == col('amount') + col('tax'),
            "Total must equal amount + tax"
        )

# 2. Extract & Validate with Polars (handles millions of rows)
OrderValidator = OrderSchema.to_polars_validator()
df = pl.read_csv("orders.csv")
validated_df = OrderValidator.validate(df, strict=True)

# 3. Load to database with SQLAlchemy
OrderTable = OrderSchema.to_sqlalchemy(table_name="orders")
engine = create_engine("postgresql://localhost/analytics")

with engine.connect() as conn:
    conn.execute(OrderTable.insert(), validated_df.to_dicts())
    conn.commit()

✅ Result: Validated millions of rows, enforced business rules, and loaded to database — all from one schema definition.

🏗️ Design Philosophy¶

One schema, three representations. Each optimized for its use case.

        Schema Definition
               ↓
    ┌──────────┼──────────┐
    ↓          ↓          ↓
Pydantic    Polars    SQLAlchemy
   ↓          ↓          ↓
 APIs       ETL      Database

What Flycatcher Does¶

✅ Single source of truth for schema definitions
✅ Generate optimized representations for different use cases
✅ Keep runtimes separate (no ORM ↔ DataFrame conversions)
✅ Use stable public APIs (Pydantic, Polars, SQLAlchemy)

What Flycatcher Doesn't Do¶

❌ Mix row-oriented and columnar paradigms
❌ Create a "unified runtime" (that would be slow)
❌ Reinvent validation logic (delegates to proven libraries when possible)
❌ Depend on internal APIs

⚠️ Current Limitations (v0.1.0)¶

Flycatcher v0.1.0 is an alpha release. The core functionality works perfectly, but some advanced features are planned for future versions:

Polars DSL¶

The col() DSL supports basic operations (>, <, ==, +, etc.), numeric math operations (.abs(), .round(), .floor(), .ceil(), .sqrt(), .pow()), limited string operations (.str.contains(), .str.starts_with(), .str.len_chars(), etc.), and a limited datetime accessor (.dt.year(), .dt.month(), .dt.total_days(other), etc.).

The col() DSL does not support the full range of Polars operations. However, additional operations will be added in future versions to better support the full functionality of Polars.

Workaround: Use the explicit format in @model_validator:

@model_validator
def check():
    return {
        'polars': (pl.col('field').is_null(), "Message"),
        'pydantic': lambda v: (v.field is None, "Message")
    }

Pydantic Features¶

❌ @field_validator - Only @model_validator is supported (coming in v0.2.0)
❌ Field aliases and computed fields (coming in v0.2.0+)
❌ Custom serialization options (coming in v0.2.0+)

Workaround: Use @model_validator for all validation needs.

SQLAlchemy Features¶

❌ Foreign key relationships - Must be added manually after table generation (coming in v0.3.0+)
❌ Composite primary keys - Only single-field primary keys supported (coming in v0.3.0+)
❌ Function-based defaults (e.g., default=func.now()) - Only literal defaults supported

Workaround: Add relationships and composite keys manually in SQLAlchemy after table generation.

Field Types¶

❌ Enum, UUID, JSON, Array field types (coming in v0.3.0+)
❌ Numeric/Decimal field type (coming in v0.3.0+)

Workaround: Use String with pattern validation or manual handling.

📊 Comparison¶

Feature	Flycatcher	SQLModel	Patito
Pydantic support	✅	✅	✅
Polars support	✅	❌	✅
SQLAlchemy support	✅	✅	❌
DataFrame-level DB ops	🚧 (v0.2)	❌	❌
Cross-field validation	✅	⚠️ (Pydantic only)	⚠️ (Polars only)
Single schema definition	✅	⚠️ (Pydantic + ORM hybrid)	⚠️ (Pydantic + Polars hybrid)

Flycatcher is the only library that generates optimized representations for all three systems while keeping them properly separated.

📚 Documentation¶

Getting Started - Installation and basics
Tutorials - Step-by-step guides
How-To Guides - Solve specific problems
API Reference - Complete API documentation
Explanations - Deep dives and concepts

🛣️ Roadmap¶

v0.1.0 (Released) 🚀¶

Core schema definition with metaclass
Field types with constraints (Integer, String, Float, Boolean, Datetime, Date)
Pydantic model generator
Polars DataFrame validator with bulk validation
SQLAlchemy table generator
Cross-field validators with DSL (col())
Test suite with 70%+ coverage
Complete documentation site
PyPI publication

v0.2.0 (In Progress) 🚧¶

Theme: Enhanced validation and database operations

@field_validator support in addition to existing @model_validator
Enhanced Polars DSL: .is_null(), .is_not_null(), .str.contains(), .str.startswith(), .dt.month, .dt.year, .is_in([...]), .is_between()
Pydantic enhancements: field aliases, computed fields, custom serialization
Enable inheritance of Schema to create subclasses with different fields
For more details, see the GitHub Milestone for v0.2.0

v0.3.0 (Planned)¶

DataFrame-level queries (Schema.query())
Bulk write operations (Schema.insert(), Schema.update(), Schema.upsert())
Complete ETL loop staying columnar end-to-end
Add PascalCase metaclass
Additional Pydantic validation modes (mode='before', mode='wrap')
For more details, see the GitHub Milestone for v0.3.0

v0.4.0+ (Future)¶

Theme: Advanced field types and relationships

Additional field types: Enum, UUID, JSON, Array, Numeric/Decimal, Time, Binary, Interval
SQLAlchemy relationships: Foreign keys, composite primary keys
SQLAlchemy function-based defaults (e.g., default=func.now())
JOIN support in queries
Aggregations (GROUP BY, COUNT, SUM)
Schema migrations helper

🤝 Contributing¶

Contributions are welcome! Please see our [Contributing Guide] for details.

📄 License¶

MIT License - see LICENSE for details.

💬 Community¶

GitHub Issues - Bug reports and feature requests
GitHub Discussions - Questions and community discussion
Documentation - Full guides and API reference

Built with ❤️ for the DataFrame generation

⭐ Star us on GitHub | 📖 Read the docs | 🐛 Report a bug