Skip to content

julien777z/pydantic-encryption

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

59 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

pydantic-encryption

Field-level encryption, hashing, and blind indexing for Pydantic models with SQLAlchemy integration.

Installation

pip install pydantic-encryption

Optional Extras

pip install "pydantic-encryption[sqlalchemy]"  # SQLAlchemy integration
pip install "pydantic-encryption[aws]"         # AWS KMS encryption
pip install "pydantic-encryption[all]"         # All optional dependencies

Quick Start

Mix DeferredDecryptMixin into any model with encrypted columns. The first time you read an encrypted attribute on any loaded row, the column is batch-decrypted across every sibling instance in the session — columns you never read stay encrypted and cost nothing:

from sqlalchemy import select
from sqlalchemy.ext.asyncio import async_sessionmaker, create_async_engine
from sqlalchemy.orm import DeclarativeBase, Mapped, mapped_column
from pydantic_encryption import DeferredDecryptMixin, SQLAlchemyEncryptedValue


class Base(DeclarativeBase):
    pass


class User(Base, DeferredDecryptMixin):
    __tablename__ = "users"
    id: Mapped[int] = mapped_column(primary_key=True)
    email: Mapped[bytes] = mapped_column(SQLAlchemyEncryptedValue())


engine = create_async_engine("sqlite+aiosqlite:///:memory:")
Session = async_sessionmaker(engine, expire_on_commit=False)

async with Session() as session:
    session.add(User(email="john@example.com"))
    await session.commit()

    result = await session.execute(select(User))
    user = result.scalar_one()
    print(user.email)  # "john@example.com" — decrypted on first read

SQLAlchemy Integration

Install with pip install "pydantic-encryption[sqlalchemy]".

from sqlalchemy import create_engine
from sqlalchemy.orm import DeclarativeBase, Mapped, mapped_column, Session

from pydantic_encryption import (
    SQLAlchemyEncryptedValue,
    SQLAlchemyHashedValue,
    SQLAlchemyBlindIndexValue,
    BlindIndexMethod,
)


class Base(DeclarativeBase):
    pass


class User(Base):
    __tablename__ = "users"

    id: Mapped[int] = mapped_column(primary_key=True)
    username: Mapped[str]
    email: Mapped[bytes] = mapped_column(SQLAlchemyEncryptedValue())
    password: Mapped[bytes] = mapped_column(SQLAlchemyHashedValue())
    blind_index_email: Mapped[bytes] = mapped_column(
        SQLAlchemyBlindIndexValue(BlindIndexMethod.HMAC_SHA256)
    )


engine = create_engine("sqlite:///:memory:")
Base.metadata.create_all(engine)

with Session(engine) as session:
    user = User(
        username="john",
        email="john@example.com",
        password="secret123",
        blind_index_email="john@example.com",
    )
    session.add(user)
    session.commit()

    # Query by blind index — automatically hashed
    found = session.query(User).filter(
        User.blind_index_email == "john@example.com"
    ).first()
    print(found.email)  # decrypted

Supported Types

SQLAlchemyEncryptedValue preserves the Python type of your data:

str, bytes, bool, int, float, Decimal, UUID, date, datetime, time, timedelta

Array Support (PostgreSQL)

from pydantic_encryption import SQLAlchemyPGEncryptedArray

tags: Mapped[list[str] | None] = mapped_column(SQLAlchemyPGEncryptedArray(), nullable=True)

Each element is individually encrypted. Requires PostgreSQL.

Async Decryption

TypeDecorator is sync by contract, so slow backends (AWS KMS) can block the event loop. Two paths:

  • Default. Under AsyncSession, decryption uses SQLAlchemy's greenlet bridge so each call yields the event loop. Argon2 hashing and blind-indexing use the same bridge.
  • On-access batch decrypt. DeferredDecryptMixin defers each encrypted column until the first read, then batch-decrypts that column across every sibling instance loaded into the same session via a single asyncio.gather. Columns the caller never reads stay encrypted and cost nothing.

Mix the helper into any model with encrypted columns and read as usual:

from pydantic_encryption import DeferredDecryptMixin, SQLAlchemyEncryptedValue


class User(Base, DeferredDecryptMixin):
    __tablename__ = "users"
    id: Mapped[int] = mapped_column(primary_key=True)
    email: Mapped[bytes] = mapped_column(SQLAlchemyEncryptedValue())


Session = async_sessionmaker(engine, expire_on_commit=False)

async with Session() as session:
    result = await session.execute(select(User))
    users = result.scalars().all()

    # First read of `email` batch-decrypts it across every user in the session.
    for user in users:
        print(user.email)

decrypt_pending_fields(session) is an optional escape hatch when you need to pre-warm every encrypted column on every loaded row before leaving the session context (e.g. serializing outside a greenlet spawn):

from pydantic_encryption import decrypt_pending_fields

async with Session() as session:
    users = (await session.execute(select(User))).scalars().all()

    # Decrypt every encrypted column on every row loaded so far.
    await decrypt_pending_fields(session)

    payload = [{"id": u.id, "email": u.email} for u in users]

finalize_sqlalchemy_session(session) combines the above with a commit(), returning the pooled connection before response construction. Handy on read endpoints that would otherwise hold a DB connection through descriptor-driven KMS decryption:

from pydantic_encryption import finalize_sqlalchemy_session

async with Session() as session:
    users = (await session.execute(select(User))).scalars().all()
    await finalize_sqlalchemy_session(session)  # decrypt pending + commit — connection released
    return [{"id": u.id, "email": u.email} for u in users]

Manual helpers for rows loaded outside a session or flat ciphertext lists:

from pydantic_encryption import decrypt_rows, decrypt_values


async with AsyncSession(engine) as session:
    users = (await session.execute(select(User))).scalars().all()
    ciphertexts = [u.email for u in users]

    await users[0].decrypt()                              # one mixin instance
    await User.decrypt_many(users)                        # batch of one class
    await decrypt_rows(users, User.email, concurrency=8)  # InstrumentedAttribute or column names
    await decrypt_values(ciphertexts, concurrency=8)      # flat ciphertexts; preserves None positions

Safety: Catching Accidental Ciphertext Access

Reads go through the on-access descriptor. When the underlying cell is still an EncryptedValue, the descriptor prefers an async batch decrypt over the session's pending siblings (via SQLAlchemy's greenlet bridge), and transparently falls back to a synchronous decrypt either when the read happens outside a greenlet or when the instance is detached from any session.

An EncryptedValue only reaches user code if something bypasses the descriptor entirely (raw state.dict[col], a logged row). Coercing it via str(value) / f"{value}" / "%s" % value raises EncryptedValueAccessError. repr(value) is a safe <EncryptedValue: N bytes> marker, and bytes(value) returns the raw ciphertext. Use is_encrypted(value) to guard at a boundary.

Manual Encryption or Hashing

Fields annotated with Encrypted are encrypted and fields annotated with Hashed are hashed during model initialization:

from typing import Annotated
from pydantic_encryption import BaseModel, Encrypted, Hashed

class User(BaseModel):
    name: str
    address: Annotated[bytes, Encrypted]
    password: Annotated[str, Hashed]

user = User(name="John Doe", address="123 Main St", password="secret123")

print(user.name)      # "John Doe"
print(user.address)   # encrypted bytes
print(user.password)  # argon2 hash bytes

Decrypting

Call decrypt_data() to decrypt all Encrypted fields in-place. It returns self, so it can be chained:

user = User(name="John", address="123 Main St", password="secret")
user.decrypt_data()
print(user.address)  # "123 Main St"

Async Support

Use async_init() to construct models with async encryption, hashing, and blind indexing, and async_decrypt_data() for async decryption:

user = await User.async_init(name="John", address="123 Main St", password="secret")
await user.async_decrypt_data()

All phases (encrypt, hash, blind-index) run concurrently via asyncio.gather, and nested BaseModel instances — including those inside list, tuple, dict, and set containers — are processed recursively.

Encryption Methods

Set the encryption method via environment variable:

ENCRYPTION_METHOD=fernet   # Fernet symmetric encryption (requires ENCRYPTION_KEY)
ENCRYPTION_METHOD=aws      # AWS KMS (requires AWS_KMS_KEY_ARN, AWS_KMS_REGION, etc.)

There is no default — you must explicitly set ENCRYPTION_METHOD if using Encrypted fields.

Fernet Setup

# Generate a key
python -c "from cryptography.fernet import Fernet; print(Fernet.generate_key().decode())"

# Set environment variables
ENCRYPTION_METHOD=fernet
ENCRYPTION_KEY=your_generated_key

AWS KMS Setup

ENCRYPTION_METHOD=aws
AWS_KMS_KEY_ARN=arn:aws:kms:us-east-1:123456789:key/your-key-id
AWS_KMS_REGION=us-east-1
AWS_KMS_ACCESS_KEY_ID=your_access_key
AWS_KMS_SECRET_ACCESS_KEY=your_secret_key

As an alternative to AWS_KMS_KEY_ARN, separate encrypt/decrypt keys are supported for key rotation or read-only scenarios:

AWS_KMS_ENCRYPT_KEY_ARN=arn:aws:kms:...encrypt-key
AWS_KMS_DECRYPT_KEY_ARN=arn:aws:kms:...decrypt-key

Use one mode or the other — combining AWS_KMS_KEY_ARN with either split variant raises a validation error. A decrypt-only key alone is allowed (read-only workloads).

Plaintext Cache (Opt-In)

For read-heavy workloads that repeatedly decrypt the same ciphertexts, AWS KMS round-trips dominate. An in-process LRU of ciphertext → plaintext is available as opt-in:

AWS_KMS_PLAINTEXT_CACHE_ENABLED=true      # default: false
AWS_KMS_PLAINTEXT_CACHE_CAPACITY=2048     # default: 2048 entries

Disabled by default because cache entries hold decrypted sensitive data in a process-wide cachetools.LRUCache for the lifetime of the process. Enable it when the perf win outweighs keeping plaintext resident in memory.

Model-Level Config

Override encryption settings per model instead of relying on environment variables:

from pydantic_encryption import BaseModel, Encrypted, EncryptionMethod
from typing import Annotated

class SpecialUser(BaseModel, encryption_method=EncryptionMethod.FERNET, encryption_key="my-key"):
    email: Annotated[bytes, Encrypted]

Supported kwargs: encryption_method, encryption_key, blind_index_key. Falls back to env vars if not set.

Blind Indexes

Blind indexes enable equality searches on encrypted data by storing a deterministic keyed hash alongside the ciphertext.

Configuration: Set BLIND_INDEX_SECRET_KEY via environment variable.

Pydantic Models

from typing import Annotated
from pydantic_encryption import BaseModel, BlindIndex, BlindIndexMethod

class User(BaseModel):
    email_index: Annotated[bytes, BlindIndex(BlindIndexMethod.HMAC_SHA256)]

Normalization

Normalize values before hashing to ensure consistent lookups:

email_index: Annotated[bytes, BlindIndex(
    BlindIndexMethod.HMAC_SHA256,
    normalize_to_lowercase=True,
    strip_whitespace=True,
)]

Available options:

Option Effect
strip_whitespace Strip leading/trailing whitespace, collapse internal whitespace
strip_non_characters Remove all non-letter characters (keep only a-zA-Z)
strip_non_digits Remove all non-digit characters (keep only 0-9)
normalize_to_lowercase Convert to lowercase
normalize_to_uppercase Convert to uppercase

Methods

Method Description
BlindIndexMethod.HMAC_SHA256 Fast HMAC-SHA256 keyed hash. Standard choice.
BlindIndexMethod.ARGON2 Memory-hard Argon2 hash with deterministic salt. Better brute-force resistance.

Custom Encryption or Hashing

Subclass BaseModel and override any of encrypt_data, hash_data, blind_index_data (or their async variants) to plug in your own logic. The post-init hook runs automatically:

from pydantic_encryption import BaseModel

class MyModel(BaseModel):
    def encrypt_data(self) -> None:
        # your encryption logic (mutate self in-place)
        ...

To implement a new backend instead of replacing the per-model path, subclass one of the adapter ABCs (EncryptionAdapter, HashingAdapter, BlindIndexAdapter) and register it via register_encryption_backend / register_blind_index_backend. Async variants are inherited by default — override async_encrypt / async_decrypt only for natively-async backends.

Run Tests

pip install -e ".[dev]"
pytest -v

About

Encryption and hashing models for Pydantic

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages