Define your schema once. Validate at scale. Stay columnar.
Built for DataFrames, powered across Pydantic, Polars, and SQLAlchemy.
Flycatcher is a DataFrame-native schema layer for Python. Define your data model once and generate optimized representations for every part of your stack:
- 🎯 Pydantic models for API validation & serialization
- ⚡ Polars validators for blazing-fast bulk validation
- 🗄️ SQLAlchemy tables for typed database access
Built for modern data workflows: Validate millions of rows at high speed, keep schema drift at zero, and stay columnar end-to-end.
❓ Why Flycatcher?¶
Modern Python data projects need row-level validation (Pydantic), efficient bulk operations (Polars), and typed database queries (SQLAlchemy). But maintaining multiple schemas across this stack can lead to duplication, drift, and manually juggling row-oriented and columnar paradigms.
Flycatcher solves this: One schema definition → three optimized outputs.
from flycatcher import Schema, Field, col, model_validator
class ProductSchema(Schema):
id: int = Field(primary_key=True)
name: str = Field(min_length=3, max_length=100)
price: float = Field(gt=0)
discount_price: float | None = Field(default=None, gt=0, nullable=True)
@model_validator
def check_discount():
# Cross-field validation with DSL
return (
col('discount_price') < col('price'),
"Discount price must be less than regular price"
)
# Generate three optimized representations
ProductModel = ProductSchema.to_pydantic() # → Pydantic BaseModel
ProductValidator = ProductSchema.to_polars_validator() # → Polars DataFrame validator
ProductTable = ProductSchema.to_sqlalchemy() # → SQLAlchemy Table
Flycatcher lets you stay DataFrame-native without giving up the speed of Polars, the ergonomic validation of Pydantic, or the Pythonic power of SQLAlchemy.
🚀 Quick Start¶
Installation¶
Define Your Schema¶
from datetime import datetime
from flycatcher import Schema, Field
class UserSchema(Schema):
id: int = Field(primary_key=True)
username: str = Field(min_length=3, max_length=50, unique=True)
email: str = Field(pattern=r'^[^@]+@[^@]+\.[^@]+$', unique=True, index=True)
age: int = Field(ge=13, le=120)
is_active: bool = Field(default=True)
created_at: datetime
Use Pydantic for Row-Level Validation¶
Perfect for APIs, forms, and single-record validation:
from datetime import datetime
User = UserSchema.to_pydantic()
# Validates constraints automatically via Pydantic
user = User(
id=1,
username="alice",
email="alice@example.com",
age=25,
created_at=datetime.utcnow()
)
# Serialize to JSON/dict
print(user.model_dump_json())
Use Polars for Bulk Validation¶
Perfect for DataFrame-level validation:
import polars as pl
UserValidator = UserSchema.to_polars_validator()
# Validate 1M+ rows with blazing speed
df = pl.read_csv("users.csv")
validated_df = UserValidator.validate(df, strict=True)
validated_df.write_parquet("validated_users.parquet")
Use SQLAlchemy for Database Operations¶
Perfect for typed queries and database interactions:
from sqlalchemy import create_engine
UserTable = UserSchema.to_sqlalchemy(table_name="users")
engine = create_engine("postgresql://localhost/mydb")
# Type-safe queries
with engine.connect() as conn:
result = conn.execute(
UserTable.select()
.where(UserTable.c.is_active == True)
.where(UserTable.c.age >= 18)
)
for row in result:
print(row)
✨ Key Features¶
Rich Field Types & Constraints¶
Use standard Python types with Field(...) constraints:
| Python Type | Constraints | Example |
|---|---|---|
int |
ge, gt, le, lt, multiple_of |
age: int = Field(ge=0, le=120) |
float |
ge, gt, le, lt |
price: float = Field(gt=0) |
str |
min_length, max_length, pattern |
email: str = Field(pattern=r'^[^@]+@...') |
bool |
- | is_active: bool = Field(default=True) |
datetime |
ge, gt, le, lt |
created_at: datetime = Field(ge=datetime(2020, 1, 1)) |
date |
ge, gt, le, lt |
birth_date: date |
All fields support (validation): nullable, default, description
SQLAlchemy-specific: primary_key, unique, index, autoincrement
Custom & Cross-Field Validation¶
Use the col() DSL for powerful field-level and cross-field validation that works across both Pydantic and Polars:
from datetime import datetime
from flycatcher import Schema, Field, col, model_validator
class BookingSchema(Schema):
email: str
phone: str
check_in: datetime = Field(ge=datetime(2024, 1, 1))
check_out: datetime = Field(ge=datetime(2024, 1, 1))
nights: int = Field(ge=1)
@model_validator
def check_dates():
return (
col('check_out') > col('check_in'),
"Check-out must be after check-in"
)
@model_validator
def check_phone_format():
cleaned = col('phone').str.replace(r'[^\d]', '')
return (cleaned.str.len_chars() == 10, "Phone must have 10 digits")
@model_validator
def check_minimum_stay():
# For operations not yet in DSL (like .is_in()), use explicit Polars format
# Note: .dt.month() is available in DSL, but .is_in() is not yet supported
import polars as pl
return {
'polars': (
(~pl.col('check_in').dt.month().is_in([7, 8])) | (pl.col('nights') >= 3),
"Minimum stay in July and August is 3 nights"
),
'pydantic': lambda v: (
v.check_in.month not in [7, 8] or v.nights >= 3,
"Minimum stay in July and August is 3 nights"
)
}
Validation Modes¶
Polars validation supports flexible error handling:
# Strict mode: Raise on validation errors (default)
validated_df = UserValidator.validate(df, strict=True)
# Non-strict mode: Filter out invalid rows
valid_df = UserValidator.validate(df, strict=False)
# Show violations for debugging
validated_df = UserValidator.validate(df, strict=True, show_violations=True)
🎯 Complete Example: ETL Pipeline¶
import polars as pl
from datetime import datetime
from flycatcher import Schema, Field, col, model_validator
from sqlalchemy import create_engine, MetaData
# 1. Define schema once
class OrderSchema(Schema):
order_id: int = Field(primary_key=True)
customer_email: str = Field(pattern=r'^[^@]+@[^@]+\.[^@]+$', index=True)
amount: float = Field(gt=0)
tax: float = Field(ge=0)
total: float = Field(gt=0)
created_at: datetime
@model_validator
def check_total():
return (
col('total') == col('amount') + col('tax'),
"Total must equal amount + tax"
)
# 2. Extract & Validate with Polars (handles millions of rows)
OrderValidator = OrderSchema.to_polars_validator()
df = pl.read_csv("orders.csv")
validated_df = OrderValidator.validate(df, strict=True)
# 3. Load to database with SQLAlchemy
OrderTable = OrderSchema.to_sqlalchemy(table_name="orders")
engine = create_engine("postgresql://localhost/analytics")
with engine.connect() as conn:
conn.execute(OrderTable.insert(), validated_df.to_dicts())
conn.commit()
✅ Result: Validated millions of rows, enforced business rules, and loaded to database — all from one schema definition.
🏗️ Design Philosophy¶
One schema, three representations. Each optimized for its use case.
Schema Definition
↓
┌──────────┼──────────┐
↓ ↓ ↓
Pydantic Polars SQLAlchemy
↓ ↓ ↓
APIs ETL Database
What Flycatcher Does¶
✅ Single source of truth for schema definitions
✅ Generate optimized representations for different use cases
✅ Keep runtimes separate (no ORM ↔ DataFrame conversions)
✅ Use stable public APIs (Pydantic, Polars, SQLAlchemy)
What Flycatcher Doesn't Do¶
❌ Mix row-oriented and columnar paradigms
❌ Create a "unified runtime" (that would be slow)
❌ Reinvent validation logic (delegates to proven libraries when possible)
❌ Depend on internal APIs
⚠️ Current Limitations (v0.1.0)¶
Flycatcher v0.1.0 is an alpha release. The core functionality works perfectly, but some advanced features are planned for future versions:
Polars DSL¶
The col() DSL supports basic operations (>, <, ==, +, etc.),
numeric math operations (.abs(), .round(), .floor(), .ceil(), .sqrt(), .pow()),
limited string operations (.str.contains(), .str.starts_with(), .str.len_chars(), etc.),
and a limited datetime accessor (.dt.year(), .dt.month(), .dt.total_days(other), etc.).
The col() DSL does not support the full range of Polars operations. However, additional
operations will be added in future versions to better support the full functionality of Polars.
Workaround: Use the explicit format in @model_validator:
@model_validator
def check():
return {
'polars': (pl.col('field').is_null(), "Message"),
'pydantic': lambda v: (v.field is None, "Message")
}
Pydantic Features¶
- ❌
@field_validator- Only@model_validatoris supported (coming in v0.2.0) - ❌ Field aliases and computed fields (coming in v0.2.0+)
- ❌ Custom serialization options (coming in v0.2.0+)
Workaround: Use @model_validator for all validation needs.
SQLAlchemy Features¶
- ❌ Foreign key relationships - Must be added manually after table generation (coming in v0.3.0+)
- ❌ Composite primary keys - Only single-field primary keys supported (coming in v0.3.0+)
- ❌ Function-based defaults (e.g.,
default=func.now()) - Only literal defaults supported
Workaround: Add relationships and composite keys manually in SQLAlchemy after table generation.
Field Types¶
- ❌ Enum, UUID, JSON, Array field types (coming in v0.3.0+)
- ❌ Numeric/Decimal field type (coming in v0.3.0+)
Workaround: Use String with pattern validation or manual handling.
📊 Comparison¶
| Feature | Flycatcher | SQLModel | Patito |
|---|---|---|---|
| Pydantic support | ✅ | ✅ | ✅ |
| Polars support | ✅ | ❌ | ✅ |
| SQLAlchemy support | ✅ | ✅ | ❌ |
| DataFrame-level DB ops | 🚧 (v0.2) | ❌ | ❌ |
| Cross-field validation | ✅ | ⚠️ (Pydantic only) | ⚠️ (Polars only) |
| Single schema definition | ✅ | ⚠️ (Pydantic + ORM hybrid) | ⚠️ (Pydantic + Polars hybrid) |
Flycatcher is the only library that generates optimized representations for all three systems while keeping them properly separated.
📚 Documentation¶
- Getting Started - Installation and basics
- Tutorials - Step-by-step guides
- How-To Guides - Solve specific problems
- API Reference - Complete API documentation
- Explanations - Deep dives and concepts
🛣️ Roadmap¶
v0.1.0 (Released) 🚀¶
- Core schema definition with metaclass
- Field types with constraints (Integer, String, Float, Boolean, Datetime, Date)
- Pydantic model generator
- Polars DataFrame validator with bulk validation
- SQLAlchemy table generator
- Cross-field validators with DSL (
col()) - Test suite with 70%+ coverage
- Complete documentation site
- PyPI publication
v0.2.0 (In Progress) 🚧¶
Theme: Enhanced validation and database operations
-
@field_validatorsupport in addition to existing@model_validator - Enhanced Polars DSL:
.is_null(),.is_not_null(),.str.contains(),.str.startswith(),.dt.month,.dt.year,.is_in([...]),.is_between() - Pydantic enhancements: field aliases, computed fields, custom serialization
- Enable inheritance of
Schemato create subclasses with different fields - For more details, see the GitHub Milestone for v0.2.0
v0.3.0 (Planned)¶
- DataFrame-level queries (
Schema.query()) - Bulk write operations (
Schema.insert(),Schema.update(),Schema.upsert()) - Complete ETL loop staying columnar end-to-end
- Add PascalCase metaclass
- Additional Pydantic validation modes (
mode='before',mode='wrap') - For more details, see the GitHub Milestone for v0.3.0
v0.4.0+ (Future)¶
Theme: Advanced field types and relationships
- Additional field types: Enum, UUID, JSON, Array, Numeric/Decimal, Time, Binary, Interval
- SQLAlchemy relationships: Foreign keys, composite primary keys
- SQLAlchemy function-based defaults (e.g.,
default=func.now()) - JOIN support in queries
- Aggregations (GROUP BY, COUNT, SUM)
- Schema migrations helper
🤝 Contributing¶
Contributions are welcome! Please see our [Contributing Guide] for details.
📄 License¶
MIT License - see LICENSE for details.
💬 Community¶
- GitHub Issues - Bug reports and feature requests
- GitHub Discussions - Questions and community discussion
- Documentation - Full guides and API reference
Built with ❤️ for the DataFrame generation