Skip to content
Flycatcher Logo

Define your schema once. Validate at scale. Stay columnar.

Built for DataFrames, powered across Pydantic, Polars, and SQLAlchemy.

CI codecov PyPI version Python 3.12+ License: MIT Documentation


Flycatcher is a DataFrame-native schema layer for Python. Define your data model once and generate optimized representations for every part of your stack:

  • 🎯 Pydantic models for API validation & serialization
  • Polars validators for blazing-fast bulk validation
  • 🗄️ SQLAlchemy tables for typed database access

Built for modern data workflows: Validate millions of rows at high speed, keep schema drift at zero, and stay columnar end-to-end.

❓ Why Flycatcher?

Modern Python data projects need row-level validation (Pydantic), efficient bulk operations (Polars), and typed database queries (SQLAlchemy). But maintaining multiple schemas across this stack can lead to duplication, drift, and manually juggling row-oriented and columnar paradigms.

Flycatcher solves this: One schema definition → three optimized outputs.

from flycatcher import Schema, Field, col, model_validator

class ProductSchema(Schema):
    id: int = Field(primary_key=True)
    name: str = Field(min_length=3, max_length=100)
    price: float = Field(gt=0)
    discount_price: float | None = Field(default=None, gt=0, nullable=True)

    @model_validator
    def check_discount():
        # Cross-field validation with DSL
        return (
            col('discount_price') < col('price'),
            "Discount price must be less than regular price"
        )

# Generate three optimized representations
ProductModel = ProductSchema.to_pydantic()         # → Pydantic BaseModel
ProductValidator = ProductSchema.to_polars_validator() # → Polars DataFrame validator
ProductTable = ProductSchema.to_sqlalchemy()       # → SQLAlchemy Table

Flycatcher lets you stay DataFrame-native without giving up the speed of Polars, the ergonomic validation of Pydantic, or the Pythonic power of SQLAlchemy.


🚀 Quick Start

Installation

pip install flycatcher
# or
uv add flycatcher

Define Your Schema

from datetime import datetime
from flycatcher import Schema, Field

class UserSchema(Schema):
    id: int = Field(primary_key=True)
    username: str = Field(min_length=3, max_length=50, unique=True)
    email: str = Field(pattern=r'^[^@]+@[^@]+\.[^@]+$', unique=True, index=True)
    age: int = Field(ge=13, le=120)
    is_active: bool = Field(default=True)
    created_at: datetime

Use Pydantic for Row-Level Validation

Perfect for APIs, forms, and single-record validation:

from datetime import datetime

User = UserSchema.to_pydantic()

# Validates constraints automatically via Pydantic
user = User(
    id=1,
    username="alice",
    email="alice@example.com",
    age=25,
    created_at=datetime.utcnow()
)

# Serialize to JSON/dict
print(user.model_dump_json())

Use Polars for Bulk Validation

Perfect for DataFrame-level validation:

import polars as pl

UserValidator = UserSchema.to_polars_validator()

# Validate 1M+ rows with blazing speed
df = pl.read_csv("users.csv")
validated_df = UserValidator.validate(df, strict=True)

validated_df.write_parquet("validated_users.parquet")

Use SQLAlchemy for Database Operations

Perfect for typed queries and database interactions:

from sqlalchemy import create_engine

UserTable = UserSchema.to_sqlalchemy(table_name="users")

engine = create_engine("postgresql://localhost/mydb")

# Type-safe queries
with engine.connect() as conn:
    result = conn.execute(
        UserTable.select()
        .where(UserTable.c.is_active == True)
        .where(UserTable.c.age >= 18)
    )
    for row in result:
        print(row)

✨ Key Features

Rich Field Types & Constraints

Use standard Python types with Field(...) constraints:

Python Type Constraints Example
int ge, gt, le, lt, multiple_of age: int = Field(ge=0, le=120)
float ge, gt, le, lt price: float = Field(gt=0)
str min_length, max_length, pattern email: str = Field(pattern=r'^[^@]+@...')
bool - is_active: bool = Field(default=True)
datetime ge, gt, le, lt created_at: datetime = Field(ge=datetime(2020, 1, 1))
date ge, gt, le, lt birth_date: date

All fields support (validation): nullable, default, description

SQLAlchemy-specific: primary_key, unique, index, autoincrement

Custom & Cross-Field Validation

Use the col() DSL for powerful field-level and cross-field validation that works across both Pydantic and Polars:

from datetime import datetime
from flycatcher import Schema, Field, col, model_validator

class BookingSchema(Schema):
    email: str
    phone: str
    check_in: datetime = Field(ge=datetime(2024, 1, 1))
    check_out: datetime = Field(ge=datetime(2024, 1, 1))
    nights: int = Field(ge=1)

    @model_validator
    def check_dates():
        return (
            col('check_out') > col('check_in'),
            "Check-out must be after check-in"
        )

    @model_validator
    def check_phone_format():
        cleaned = col('phone').str.replace(r'[^\d]', '')
        return (cleaned.str.len_chars() == 10, "Phone must have 10 digits")

    @model_validator
    def check_minimum_stay():
        # For operations not yet in DSL (like .is_in()), use explicit Polars format
        # Note: .dt.month() is available in DSL, but .is_in() is not yet supported
        import polars as pl
        return {
            'polars': (
                (~pl.col('check_in').dt.month().is_in([7, 8])) | (pl.col('nights') >= 3),
                "Minimum stay in July and August is 3 nights"
            ),
            'pydantic': lambda v: (
                v.check_in.month not in [7, 8] or v.nights >= 3,
                "Minimum stay in July and August is 3 nights"
            )
        }

Validation Modes

Polars validation supports flexible error handling:

# Strict mode: Raise on validation errors (default)
validated_df = UserValidator.validate(df, strict=True)

# Non-strict mode: Filter out invalid rows
valid_df = UserValidator.validate(df, strict=False)

# Show violations for debugging
validated_df = UserValidator.validate(df, strict=True, show_violations=True)

🎯 Complete Example: ETL Pipeline

import polars as pl
from datetime import datetime
from flycatcher import Schema, Field, col, model_validator
from sqlalchemy import create_engine, MetaData

# 1. Define schema once
class OrderSchema(Schema):
    order_id: int = Field(primary_key=True)
    customer_email: str = Field(pattern=r'^[^@]+@[^@]+\.[^@]+$', index=True)
    amount: float = Field(gt=0)
    tax: float = Field(ge=0)
    total: float = Field(gt=0)
    created_at: datetime

    @model_validator
    def check_total():
        return (
            col('total') == col('amount') + col('tax'),
            "Total must equal amount + tax"
        )

# 2. Extract & Validate with Polars (handles millions of rows)
OrderValidator = OrderSchema.to_polars_validator()
df = pl.read_csv("orders.csv")
validated_df = OrderValidator.validate(df, strict=True)

# 3. Load to database with SQLAlchemy
OrderTable = OrderSchema.to_sqlalchemy(table_name="orders")
engine = create_engine("postgresql://localhost/analytics")

with engine.connect() as conn:
    conn.execute(OrderTable.insert(), validated_df.to_dicts())
    conn.commit()

Result: Validated millions of rows, enforced business rules, and loaded to database — all from one schema definition.


🏗️ Design Philosophy

One schema, three representations. Each optimized for its use case.

        Schema Definition
    ┌──────────┼──────────┐
    ↓          ↓          ↓
Pydantic    Polars    SQLAlchemy
   ↓          ↓          ↓
 APIs       ETL      Database

What Flycatcher Does

✅ Single source of truth for schema definitions
✅ Generate optimized representations for different use cases
✅ Keep runtimes separate (no ORM ↔ DataFrame conversions)
✅ Use stable public APIs (Pydantic, Polars, SQLAlchemy)

What Flycatcher Doesn't Do

❌ Mix row-oriented and columnar paradigms
❌ Create a "unified runtime" (that would be slow)
❌ Reinvent validation logic (delegates to proven libraries when possible)
❌ Depend on internal APIs


⚠️ Current Limitations (v0.1.0)

Flycatcher v0.1.0 is an alpha release. The core functionality works perfectly, but some advanced features are planned for future versions:

Polars DSL

The col() DSL supports basic operations (>, <, ==, +, etc.), numeric math operations (.abs(), .round(), .floor(), .ceil(), .sqrt(), .pow()), limited string operations (.str.contains(), .str.starts_with(), .str.len_chars(), etc.), and a limited datetime accessor (.dt.year(), .dt.month(), .dt.total_days(other), etc.).

The col() DSL does not support the full range of Polars operations. However, additional operations will be added in future versions to better support the full functionality of Polars.

Workaround: Use the explicit format in @model_validator:

@model_validator
def check():
    return {
        'polars': (pl.col('field').is_null(), "Message"),
        'pydantic': lambda v: (v.field is None, "Message")
    }

Pydantic Features

  • @field_validator - Only @model_validator is supported (coming in v0.2.0)
  • ❌ Field aliases and computed fields (coming in v0.2.0+)
  • ❌ Custom serialization options (coming in v0.2.0+)

Workaround: Use @model_validator for all validation needs.

SQLAlchemy Features

  • ❌ Foreign key relationships - Must be added manually after table generation (coming in v0.3.0+)
  • ❌ Composite primary keys - Only single-field primary keys supported (coming in v0.3.0+)
  • ❌ Function-based defaults (e.g., default=func.now()) - Only literal defaults supported

Workaround: Add relationships and composite keys manually in SQLAlchemy after table generation.

Field Types

  • ❌ Enum, UUID, JSON, Array field types (coming in v0.3.0+)
  • ❌ Numeric/Decimal field type (coming in v0.3.0+)

Workaround: Use String with pattern validation or manual handling.


📊 Comparison

Feature Flycatcher SQLModel Patito
Pydantic support
Polars support
SQLAlchemy support
DataFrame-level DB ops 🚧 (v0.2)
Cross-field validation ⚠️ (Pydantic only) ⚠️ (Polars only)
Single schema definition ⚠️ (Pydantic + ORM hybrid) ⚠️ (Pydantic + Polars hybrid)

Flycatcher is the only library that generates optimized representations for all three systems while keeping them properly separated.


📚 Documentation


🛣️ Roadmap

v0.1.0 (Released) 🚀

  • Core schema definition with metaclass
  • Field types with constraints (Integer, String, Float, Boolean, Datetime, Date)
  • Pydantic model generator
  • Polars DataFrame validator with bulk validation
  • SQLAlchemy table generator
  • Cross-field validators with DSL (col())
  • Test suite with 70%+ coverage
  • Complete documentation site
  • PyPI publication

v0.2.0 (In Progress) 🚧

Theme: Enhanced validation and database operations

  • @field_validator support in addition to existing @model_validator
  • Enhanced Polars DSL: .is_null(), .is_not_null(), .str.contains(), .str.startswith(), .dt.month, .dt.year, .is_in([...]), .is_between()
  • Pydantic enhancements: field aliases, computed fields, custom serialization
  • Enable inheritance of Schema to create subclasses with different fields
  • For more details, see the GitHub Milestone for v0.2.0

v0.3.0 (Planned)

  • DataFrame-level queries (Schema.query())
  • Bulk write operations (Schema.insert(), Schema.update(), Schema.upsert())
  • Complete ETL loop staying columnar end-to-end
  • Add PascalCase metaclass
  • Additional Pydantic validation modes (mode='before', mode='wrap')
  • For more details, see the GitHub Milestone for v0.3.0

v0.4.0+ (Future)

Theme: Advanced field types and relationships

  • Additional field types: Enum, UUID, JSON, Array, Numeric/Decimal, Time, Binary, Interval
  • SQLAlchemy relationships: Foreign keys, composite primary keys
  • SQLAlchemy function-based defaults (e.g., default=func.now())
  • JOIN support in queries
  • Aggregations (GROUP BY, COUNT, SUM)
  • Schema migrations helper

🤝 Contributing

Contributions are welcome! Please see our [Contributing Guide] for details.


📄 License

MIT License - see LICENSE for details.


💬 Community


Built with ❤️ for the DataFrame generation

⭐ Star us on GitHub  |  📖 Read the docs  |  🐛 Report a bug