Skip to content

Provide Reliable Company Categorization for Accurate Financial Analysis #561

@amcamc92

Description

@amcamc92

Feature Category

  • New API functionality
  • Performance improvement
  • Developer experience improvement
  • Documentation enhancement
  • Tool/utility addition

Problem Statement

Is your feature request related to a problem? Please describe.
Yes. When performing financial analysis at scale, it is crucial to accurately classify companies by their primary business type. Applying the same analytical models and financial metrics (e.g., P/E ratio, revenue growth, EPS) to all entities listed with the SEC is fundamentally flawed, as different company types require different analytical frameworks.

For example, a screener based on a methodology like CAN SLIM (focused on operational growth) is designed for operating companies. Its metrics are largely irrelevant or misleading for investment funds, ETFs, REITs, or Business Development Companies (BDCs).

Currently, EdgarTools provides no reliable, standardized attribute or method to distinguish these fundamental company types. Users are left to implement their own heuristic-based classification systems, which are prone to error and inconsistency. A simplistic check (e.g., merely detecting a 13F-HR filing) is insufficient, as many operating companies also have investment arms and file 13Fs. This leads to misclassification of operating companies as "Investment" firms, distorting analytical results.

Who would benefit from this feature?

  • Beginner Python users working with SEC filings
  • Financial analysts and researchers
  • Advanced developers building financial applications
  • Data scientists working with financial datasets

Proposed Solution

Describe the solution you'd like
I propose enhancing the Company object in EdgarTools to include a new, reliable classification method or attribute, such as company.get_business_category() or company.entity_type. This would return a standardized string representing the company's primary business type (e.g., "Operating", "Investment Manager", "Fund", "ETF", "REIT", "SPAC").

This classification should be derived using a multi-factor, priority-based algorithm that leverages various data points already accessible through EdgarTools:

  1. Primary Signal: SIC Code Analysis: The company's Standard Industrial Classification (SIC) code is a strong indicator. The algorithm should check company.sic against known ranges for specific entity types (e.g., SIC codes in 67xx for holding/investment offices, 65xx for real estate, 62xx for brokers).
  2. Secondary Signal: SEC Form Type Analysis: The portfolio of SEC forms a company files is highly indicative. Companies primarily filing 10-K and 10-Q are typically operating entities. Companies filing forms like N-CSR, N-CEN, N-PORT, N-MFP are registered investment companies or funds.
  3. Tertiary Signal: Company Name Keyword Analysis: Keywords in company.name (e.g., "Fund", "ETF", "Trust", "REIT", "Acquisition Corp") can serve as reinforcing signals, especially when SIC codes are broad or ambiguous.
  4. Priority-Based Decision Logic: These signals should be combined in a prioritized manner. For instance, N-* form filings might take precedence over a less specific SIC code, or a specific SIC code might override a keyword in the name. A final "Other" or "Unclassified" category could handle edge cases.

This would provide users with a clean, programmatic way to correctly segment their universe of companies, enabling them to apply the appropriate analytical models and avoid misinterpretations.

Describe alternatives you've considered

  1. Manual Implementation by Users: This is the current state, where each user building an application on top of EdgarTools has to implement their own heuristic-based classification logic. This is inefficient, error-prone, and leads to inconsistent results across different applications.
  2. Relying on a Single Factor (e.g., 13F-HR): As discussed, checking solely for the presence of a 13F-HR filing is insufficient and often misleading, leading to misclassification.

Use Case Example

How would you use this feature?

    1 # Provide a code example of how you envision using this feature
    2 from edgar import Company
    3
    4 # Example 1: An operating company
    5 company_aapl = Company("AAPL")
    6 category_aapl = company_aapl.get_business_category() # Proposed new method
    7 print(f"{company_aapl.name} ({company_aapl.ticker}) is classified as: {category_aapl}")
    8 # Expected output: Apple Inc. (AAPL) is classified as: Operating
    9
   10 # Example 2: An investment vehicle (e.g., a BDC or holding company)
   11 company_bk = Company("BK") # Bank of New York Mellon Corp
   12 category_bk = company_bk.get_business_category()
   13 print(f"{company_bk.name} ({company_bk.ticker}) is classified as: {category_bk}")
   14 # Expected output might be "Bank" or "Investment Manager"
   15
   16 # Example 3: An ETF (e.g., SPY)
   17 company_spy = Company("SPY")
   18 category_spy = company_spy.get_business_category()
   19 print(f"{company_spy.name} ({company_spy.ticker}) is classified as: {category_spy}")
   20 # Expected output: SPDR S&P 500 ETF Trust (SPY) is classified as: ETF
   21
   22 # Example 4: A REIT
   23 company_o = Company("O") # Realty Income Corporation
   24 category_o = company_o.get_business_category()
   25 print(f"{company_o.name} ({company_o.ticker}) is classified as: {category_o}")
   26 # Expected output: Realty Income Corporation (O) is classified as: REIT
   27
   28 # This allows for accurate segmentation in analytical pipelines
   29 def analyze_company(company: Company):
   30     category = company.get_business_category()
   31     if category == "Operating":
   32         # Apply CAN SLIM or traditional fundamental analysis
   33         print(f"Applying CAN SLIM for {company.name}...")
   34     elif category in ["Fund", "ETF", "Investment Manager"]:
   35         # Apply NAV-based analysis, compare expense ratios, etc.
   36         print(f"Analyzing portfolio for {company.name}...")
   37     elif category == "REIT":
   38         # Analyze FFO (Funds From Operations)
   39         print(f"Analyzing FFO for {company.name}...")
   40     else:
   41         print(f"Unknown analysis framework for {company.name}.")
   42
   43 # analyze_company(company_aapl)
   44 # analyze_company(company_spy)

Implementation Considerations

Complexity Level:

  • Simple (minor API addition)
  • Moderate (new functionality with existing patterns)
  • Complex (significant architectural changes)

Reasoning: Implementing a robust, multi-factor classification algorithm requires careful design, data analysis of SIC codes and filing patterns, and ongoing maintenance. It's more than a minor API addition but fits within existing patterns of data processing and attribute generation.

Backwards Compatibility:

  • This feature maintains backwards compatibility
  • This feature might break existing code (please explain below)
  • Unsure about compatibility impact

Additional Context

Accurate company segmentation is a foundational step for any meaningful financial analysis. Providing this capability directly within EdgarTools would dramatically improve its utility for advanced users, saving them from reinventing complex heuristic wheels. It ensures consistency and reliability across applications built using the library. This feature aligns perfectly with EdgarTools' core principle of providing accurate financials and a powerful developer experience.

Related Issues/Features:


Feature requests are evaluated based on EdgarTools' core principles: Simple yet powerful, accurate financials, beginner-friendly, and joyful UX.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions