[PULL REQUEST] New actual/implied hhp balancing methodology by Eric-Liu-SANDAG · Pull Request #208 · SANDAG/Estimates-Program

Eric-Liu-SANDAG · 2026-03-20T23:27:34Z

Describe this pull request. What changes are being made?

New actual/implied hhp balancing methodology. This change was made mostly for speed purposes

What issues does this pull request address?

Resolves [FEATURE] Speed up the Household Characteristics module #178

Additional context

See the issue for old and new timing

Copilot

Pull request overview

Introduces a new methodology for balancing actual vs. implied household population (HHP) in the Household Characteristics module, aiming to improve runtime performance while keeping MGRA household-size distributions consistent with MGRA-level HHP controls.

Changes:

Refactors MGRA HHP alignment from a deterministic stepwise shifting loop to a weighted-random adjustment routine applied per MGRA row.
Adds post-adjustment validation to ensure implied min/max HHP aligns with MGRA hhp_total, raising an error on failure.
Reshapes the adjusted wide household-size table back into the long format output via melt.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

python/hh_characteristics.py

Eric-Liu-SANDAG · 2026-03-23T16:28:21Z

Runtime is now approximately 14 minutes per year, even including the employment module.

SELECT *, [end_date] - [start_date]
FROM [EstimatesProgram].[metadata].[run]
WHERE [run_id] = 187

Eric-Liu-SANDAG · 2026-03-23T16:29:04Z

Just need to do some output comparisons between old and new methodologies before this PR will be ready

Eric-Liu-SANDAG · 2026-03-23T18:49:47Z

The following dynamic SQL query (dynamic sql my beloved 😍) compares 2024 Estimates ([run_id]=82) and the new methodology test runs ([run_id]=187])

DECLARE @base_run_id NVARCHAR(MAX) = '82';
DECLARE @other_run_id NVARCHAR(MAX) = '187';
DECLARE @year NVARCHAR(MAX) = '2020';
DECLARE @group_geo NVARCHAR(MAX) = 'jurisdiction';

DECLARE @query NVARCHAR(MAX) = '
WITH [base] AS (
    SELECT 
        [run_id],
        [year],
        [' + @group_geo + '],
        [metric],
        SUM([value]) AS [' + @base_run_id + '_value]
    FROM [EstimatesProgram].[outputs].[hh_characteristics]
    INNER JOIN [demographic_warehouse].[dim].[mgra]
        ON [hh_characteristics].[mgra] = [mgra].[mgra]
        AND [series] = 15
    INNER JOIN [demographic_warehouse].[dim].[mgra_xref]
        ON [mgra].[mgra_id] = [mgra_xref].[mgra_id]
        AND [xref_year] = 9999
    WHERE [run_id] = ' + @base_run_id + '
        AND [year] = ' + @year + '
        AND [metric] LIKE ''%Household Size%''
    GROUP BY [run_id], [year], [' + @group_geo + '], [metric]
),
[other] AS (
    SELECT 
        [run_id],
        [year],
        [' + @group_geo + '],
        [metric],
        SUM([value]) AS [' + @other_run_id + '_value]
    FROM [EstimatesProgram].[outputs].[hh_characteristics]
    INNER JOIN [demographic_warehouse].[dim].[mgra]
        ON [hh_characteristics].[mgra] = [mgra].[mgra]
        AND [series] = 15
    INNER JOIN [demographic_warehouse].[dim].[mgra_xref]
        ON [mgra].[mgra_id] = [mgra_xref].[mgra_id]
        AND [xref_year] = 9999
    WHERE [run_id] = ' + @other_run_id + '
        AND [year] = ' + @year + '
        AND [metric] LIKE ''%Household Size%''
    GROUP BY [run_id], [year], [' + @group_geo + '], [metric]
)

SELECT 
    [base].[year],
    [base].[' + @group_geo + '],
    [base].[metric],
    [' + @base_run_id + '_value],
    [' + @other_run_id + '_value]
FROM [base]
INNER JOIN [other]
    ON [base].[year] = [other].[year]
    AND [base].[' + @group_geo + '] = [other].[' + @group_geo + ']
    AND [base].[metric] = [other].[metric]
ORDER BY [base].[year], [base].[' + @group_geo + '], [base].[metric]
'
EXEC sp_executesql @query;

Eric-Liu-SANDAG · 2026-03-23T18:53:41Z

I think the changes are for the better, but I still need to compare with the ACS. I think they are better because the way the old methodology worked, it would always shift households starting at 1-->7+ or 7+-->1. For the most part, the changes were increases, which is why in 82 the data is much lower in HHS1, and mostly higher in HHS2+, especially in 7+.

The new methodology uses the same technique as the 1D integerizer where it's a weighted random shifting, which I think makes the output of 187 less skewed. But again, I think the ACS will be the final determining factor here, if we match better or worse with the new methodology

Eric-Liu-SANDAG · 2026-03-23T18:54:17Z

Actually, I'm not even sure if the ACS is the best final check, as all this processing in the first place is to correct a known error in ACS data... But we'll see

#178: New actual/implied hhp balancing methodology

30280c6

Eric-Liu-SANDAG self-assigned this Mar 20, 2026

Eric-Liu-SANDAG requested a review from Copilot March 20, 2026 23:36

Copilot started reviewing on behalf of Eric-Liu-SANDAG March 20, 2026 23:36 View session

Copilot AI reviewed Mar 20, 2026

View reviewed changes

#178: Fixed spelling

5dde2af

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[PULL REQUEST] New actual/implied hhp balancing methodology#208

[PULL REQUEST] New actual/implied hhp balancing methodology#208
Eric-Liu-SANDAG wants to merge 2 commits intomainfrom
actual-implied-hhp-balancing

Eric-Liu-SANDAG commented Mar 20, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Eric-Liu-SANDAG commented Mar 23, 2026

Uh oh!

Eric-Liu-SANDAG commented Mar 23, 2026

Uh oh!

Eric-Liu-SANDAG commented Mar 23, 2026

Uh oh!

Eric-Liu-SANDAG commented Mar 23, 2026

Uh oh!

Eric-Liu-SANDAG commented Mar 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Eric-Liu-SANDAG commented Mar 20, 2026

Describe this pull request. What changes are being made?

What issues does this pull request address?

Additional context

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Eric-Liu-SANDAG commented Mar 23, 2026

Uh oh!

Eric-Liu-SANDAG commented Mar 23, 2026

Uh oh!

Eric-Liu-SANDAG commented Mar 23, 2026

Uh oh!

Eric-Liu-SANDAG commented Mar 23, 2026

Uh oh!

Eric-Liu-SANDAG commented Mar 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants