Skip to content

[FEATURE] Point-in-Time (PIT) Mode for EntityFacts.to_dataframe() #697

@amcamc92

Description

@amcamc92

Feature Category

  • New API functionality
  • Performance improvement
  • Developer experience improvement
  • Documentation enhancement
  • Tool/utility addition

Problem Statement

Is your feature request related to a problem? Please describe.
EntityFacts.to_dataframe() deduplicates by period — keeping only the latest filing for each (concept, period_end) combination. This means Q1 FY2024 data from the Q1 FY2025 10-Q (filed 2025-05-02) overwrites Q1 FY2024 from the original Q1 FY2024 10-Q (filed 2024-05-01).

For point-in-time (PIT) backtesting, you need to answer: "What data was publicly known at date T?" If the dedup always keeps the latest filing, you get lookahead bias — the number 1 mistake in financial backtesting.

Example of the problem:

AAPL Revenue Q1 FY2024:
  - Original filing: 10-Q filed 2024-02-02 → Revenue = $119.6B
  - Restated in:     10-Q filed 2025-02-01 → Revenue = $119.6B (same, but sometimes restated)

If you filter to_dataframe() for "as of 2024-06-01":
  - Current behavior: Only the 2025-02-01 version exists → LOOKAHEAD BIAS
  - PIT behavior: Both versions exist → filter filing_date <= 2024-06-01 → get 2024-02-02 version

Who would benefit from this feature?

  • Beginner Python users working with SEC filings
  • Financial analysts and researchers
  • Advanced developers building financial applications
  • Data scientists working with financial datasets

Proposed Solution

Describe the solution you'd like
Add a pit_mode parameter to to_dataframe() that preserves all filing versions:

class EntityFacts:
    def to_dataframe(
        self,
        include_metadata: bool = False,
        columns: list[str] | None = None,
        pit_mode: bool = False,  # NEW
    ) -> pd.DataFrame:
        """Export facts to DataFrame.

        Args:
            pit_mode: If True, include filing_date and preserve all filing
                versions (don't deduplicate by period). Enables point-in-time
                analysis by filtering: df[df['filing_date'] <= as_of_date].
        """

When pit_mode=True:

  1. Skip the period-based dedup step
  2. Include filing_date column in the output
  3. Dedup key becomes (concept, period_start, period_end, filing_date) — removes exact duplicates but preserves filing versions

Describe alternatives you've considered

  1. Always preserve filing versions: Would change default behavior and increase DataFrame size. Not backward-compatible.
  2. Separate to_pit_dataframe() method: Possible but adds API surface. A parameter is simpler.
  3. Include filing_date by default: Minimal change but doesn't solve the dedup issue.

Use Case Example

How would you use this feature?

from edgar import Company
import pandas as pd

company = Company("AAPL")
ef = company.get_facts()

# Standard mode (current behavior — for latest-value analysis)
df_latest = ef.to_dataframe()

# PIT mode (for backtesting — preserves all filing versions)
df_pit = ef.to_dataframe(pit_mode=True)

# Simulate "what was known on 2024-06-01?"
as_of = pd.Timestamp("2024-06-01")
known_facts = df_pit[df_pit["filing_date"] <= as_of]

# Get the latest-known value for each concept/period combination
pit_latest = (known_facts
    .sort_values("filing_date")
    .drop_duplicates(subset=["concept", "period_end"], keep="last"))

# Now pit_latest contains only data that was publicly available by 2024-06-01
# No lookahead bias!
revenue = pit_latest[pit_latest["concept"].str.contains("Revenue")]
print(f"Known revenue data points as of {as_of.date()}: {len(revenue)}")

Implementation Considerations

Complexity Level:

  • Simple (minor API addition)
  • Moderate (new functionality with existing patterns)
  • Complex (significant architectural changes)

Backwards Compatibility:

  • This feature maintains backwards compatibility
  • This feature might break existing code (please explain below)
  • Unsure about compatibility impact

The default pit_mode=False preserves current behavior exactly.

Additional Context

  • PIT analysis is critical for backtesting, academic research, and compliance
  • Lookahead bias is the number 1 source of invalid backtesting results in quantitative finance
  • No new data is needed — filing_date is already available on FinancialFact, just needs to be preserved through the export
  • The implementation is essentially: skip one dedup step and add one column

Related Issues/Features:


Feature requests are evaluated based on EdgarTools' core principles: Simple yet powerful, accurate financials, beginner-friendly, and joyful UX.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions