Skip to content

feat: SATP circumplex SEM module — functional API and architecture refactor#130

Merged
MitchellAcoustics merged 14 commits intodevfrom
analysis/satp-circe-notebook
Mar 1, 2026
Merged

feat: SATP circumplex SEM module — functional API and architecture refactor#130
MitchellAcoustics merged 14 commits intodevfrom
analysis/satp-circe-notebook

Conversation

@MitchellAcoustics
Copy link
Owner

@MitchellAcoustics MitchellAcoustics commented Mar 1, 2026

Summary

Replaces the SATP class-based API with a clean functional design and delivers a canonical verification notebook confirming numerical parity with the original R analysis.

  • New fit_circe(data, language, datasource) — primary public API; validates, ipsatizes, fits all four circumplex model types, and returns a tidy DataFrame directly
  • Deleted SATP class and ModelType class — replaced with stateless function; equal_ang/equal_com folded into CircModelE enum properties
  • CircE converted from pydantic dataclass → stdlib dataclasses.dataclass — removes dead BeforeValidator machinery (already handled by extract_bfgs_fit)
  • polar_angles: pd.Series | None (was pd.DataFrame | None) — PAQ_IDS index, estimates only; gdiff property computes RMSD against ideal circumplex
  • ipsatize() promoted to public module-level function
  • Listwise deletion (complete.dropna() before correlation) — consistent with R's na.omit
  • Unified case-insensitive column normalization via _COLUMN_ALIASES constant — handles PAQ label names ("Pleasant""PAQ1"), PAQ IDs ("paq1""PAQ1"), and participant field ("PARTICIPANT""participant")
  • SATP CircE Analysis notebook — Quarto .qmd verifying 16 languages × 4 models against canonical R CSV; confirms RMSEA.L/U swap bug in canonical data
  • 41 tests covering numerical regression anchors, all new API paths, edge cases (n=0, error rows, ipsatize_data=False, RMSEA bounds ordering, gdiff, case-insensitive schema)

Test plan

  • uv run pytest test/satp/test_circe.py -v — 41 tests pass
  • uv run quarto render docs/tutorials/SATP_CircE_Analysis.qmd — all 12 cells execute cleanly
  • Verify fit_circe(data, language=..., datasource=...) returns a 4-row DataFrame with correct columns
  • Verify CircModelE.UNCONSTRAINED.equal_ang is False, CircModelE.CIRCUMPLEX.equal_ang is True
  • Verify "PARTICIPANT" and "Pleasant" column names are accepted by SATPSchema

Andrew Mitchell and others added 14 commits March 1, 2026 02:06
…ebook

- Add `gdiff` computed property to `CircE` dataclass: RMSD between fitted
  polar angles and ideal 45°-spaced circumplex positions. Returns None for
  models with fixed angles (EQUAL_ANG, CIRCUMPLEX). Adds module-level
  `_IDEAL_ANGLES` and `_IDEAL_ANGLES_REV` constants mirroring the R
  `sem_funcs.R` implementation.

- Add `test/satp/fixtures/sem-fit-ipsatized-canonical.csv`: canonical
  reference output from the original R analysis (2024-06-13, SATP v1.4,
  16 languages × 4 models). Documents a known RMSEA.L/RMSEA.U swap bug
  in the original R CSV export code.

- Add `docs/tutorials/SATP_CircE_Analysis.qmd`: Quarto notebook replicating
  the SATP circumplex SEM analysis using Soundscapy. Confirms numerical
  consistency against the canonical: all df values match exactly, RMSEA
  bounds are correctly ordered, and 6 reflected (equivalent) angular
  solutions are detected and documented.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…rchitecture

- Delete SATP class and ModelType class; replace with fit_circe() function
  returning a tidy DataFrame directly (one row per model)
- Fold equal_ang/equal_com boolean properties into CircModelE enum directly,
  removing the redundant ModelType wrapper
- Convert CircE from pydantic dataclass to stdlib dataclasses.dataclass;
  remove dead BeforeValidator/length_1_array_to_number machinery (already
  handled by extract_bfgs_fit())
- Change polar_angles: pd.DataFrame|None → pd.Series|None with PAQ_IDS index;
  fix extraction to correctly use pd.DataFrame(raw_pa).T.iloc[0] for R matrix
  orientation (variables × stats)
- Add CircE.to_dict() with PAQ angle columns expanded for DataFrame construction
- Add public ipsatize() function (was private SATP._ipsatize_df())
- Fix n/correlation to use listwise deletion (complete cases), consistent with
  R's na.omit — resolves n discrepancies for languages with NaN PAQ values
- Update exports: fit_circe, ipsatize added; SATP, ModelType removed
- Rewrite test suite: preserve all numerical regression anchors in
  TestBfgsWrapper unchanged; replace TestSATP with TestFitCirce using new API;
  add TestCircModelEProperties; add to_dict and listwise deletion tests
- Update SATP_CircE_Analysis.qmd: use fit_circe() loop, normalize canonical
  CSV columns to lowercase/snake_case for comparison (language, model, chisq_can
  etc.), remove mixed-case column gymnastics

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- circe.py: generalize SATPSchema.column_alias to normalize all schema field
  names case-insensitively via a lowercase→canonical mapping dict, covering
  PAQ_IDS and 'participant' without hardcoded special cases
- circe.py: add pre-validation empty-data guard in fit_circe() — raises
  ValueError immediately rather than producing 4 cryptic R error rows
- circe.py: add post-ipsatization n=0 guard for cases where validation passes
  but no complete PAQ rows survive listwise deletion
- circe.py: fix to_dict() return annotation dict → dict[str, Any]
- _circe_wrapper.py: fix docstring example (sspy.spi.bfgs → sspyr.bfgs),
  add Any import, fix extract_bfgs_fit() return annotation dict → dict[str, Any]
- test_circe.py: add 8 new tests — gdiff None/float for constrained/free-angle
  models, rmsea_l≤rmsea≤rmsea_u invariant, ipsatize_data=False path,
  models=[] returns empty DataFrame, error row structure via mock, n=0 raises
  ValueError, case-insensitive PARTICIPANT → participant schema normalization

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ions

- Extract _COLUMN_ALIASES module-level constant combining PAQ label names,
  PAQ IDs, and participant field into a single lowercase→canonical lookup.
  Built once at import time instead of inside column_alias on every call.
- Extend case-insensitive normalization to PAQ label names: 'Pleasant',
  'PLEASANT' etc. now correctly map to 'PAQ1' (previously only exact-match
  lowercase labels like 'pleasant' were handled).
- Simplify column_alias parser to a single dict comprehension over _COLUMN_ALIASES
  replacing the two-pass rename_dict construction.
- Fix CircE dataclass field type annotations: m, chisq, d, p, cfi, gfi, agfi,
  srmr, mcsc, rmsea, rmsea_l, rmsea_u declared as T|None to match from_bfgs()
  which uses .get(key, None) for all fit statistics.
- Add test_satp_schema_paq_label_case_insensitive: verifies title-cased PAQ
  label names ('Pleasant', 'Vibrant', ...) are normalized to PAQ_IDS.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
SATP class was replaced by fit_circe() in the refactor; the smoke test
hadn't been updated and was failing in CI.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…pant optional

- polar_angles extraction: use label-based column access ("estimates") with
  iloc[:, 0] fallback instead of fragile positional .T.iloc[0]
- extract_bfgs_fit: explicit int() cast for m/d/dfnull stats to guarantee
  annotation holds regardless of rpy2 storage type
- fit_circe error rows: populate all expected columns with None to prevent
  pandas from promoting numeric dtypes across successful rows
- SATPSchema: make participant Optional so ipsatize_data=False callers do not
  need a participant column; add runtime ValueError if ipsatize_data=True
  without participant
- Tests: 3 new tests covering ipsatize_data=False sans participant, the
  ValueError path, and dtype preservation under partial failure

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* Initial plan

* fix: code review fixes — install_r_packages empty list, docstring format, error message

Co-authored-by: MitchellAcoustics <22335636+MitchellAcoustics@users.noreply.github.com>

---------

Co-authored-by: MitchellAcoustics <22335636+MitchellAcoustics@users.noreply.github.com>
The old name clashed with circumplex.ipsatize(), which implements a
different operation (row-wise centering within a single observation vs.
column-wise centering per participant across observations).

Changes:
- Rename ipsatize() -> person_center() throughout; update module header,
  __init__.py exports, and all test references
- Rename fit_circe() parameter ipsatize_data -> center_by_participant
- Expand docstring with psychometric background: explains the distinction
  between column-wise within-person centering (SATP) and row-wise
  ipsatization (circumplex package), and the rationale for each
- Improve implementation: replace lambda-based groupby.transform with
  explicit column selection + transform("mean") + vectorised subtraction
  (~2.3x faster, forward-compatible with pandas 3.x include_groups changes)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
#1 — RRuntimeError added to fit_exceptions
  rpy2's RRuntimeError inherits from Exception, not RuntimeError, so
  R-level convergence failures were escaping the per-model except block
  and crashing the entire fit_circe() call. Added import and included it
  in the fit_exceptions tuple.

#2 — Guard p-value key access in extract_bfgs_fit
  Direct py_res["chisq"] / py_res["d"] bracket access replaced with
  .get() + None guard so a missing or None key raises a clear diagnostic
  rather than a bare KeyError or scipy TypeError.

#4 — dtype preservation for d and m in mixed error/success DataFrames
  numpy int64 cannot hold NaN, so any error row with "d": None promoted
  the entire column to float64. Fixed by casting n, d, m to
  pd.Int64Dtype() (pandas nullable integer) after building the DataFrame.
  Extended test_fit_circe_error_row_preserves_numeric_dtypes to assert
  integer dtype for all three columns (n, d, m), not just n.

Also: pandera participant field comment (| None vs nullable=True semantics)
and models=[] edge case noted in fit_circe docstring.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
CircE's BFGS optimisation can converge to a reflected solution where
polar angles decrease (clockwise) rather than increase (counter-clockwise).
Add normalize_polar_angles() which detects this via a monotonicity check
(PAQ2 < PAQ3 < PAQ4) and corrects by applying 360 - angle to PAQ2-PAQ8.

Apply normalization automatically in CircE.from_bfgs() so polar_angles
are always in canonical orientation. Simplify gdiff to compare directly
against _IDEAL_ANGLES without the fragile sum-threshold heuristic.
Remove _IDEAL_ANGLES_REV and _ANGLE_REV_THRESHOLD constants.

Export normalize_polar_angles from soundscapy.satp and soundscapy
top-level (lazy-loaded) so downstream analyses can use the same
correction without re-implementing it.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The notebook lives in a separate analysis repo and should not be
part of the soundscapy package. Add gitignore entries for both the
.qmd source and .html output so they stay excluded going forward.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Integer Series input produces integer output; the doctest expected
floats. Switching to float literals in the example makes got/want match.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@MitchellAcoustics MitchellAcoustics merged commit 6fa45e6 into dev Mar 1, 2026
17 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants