Skip to content

Causal inference analysis#1

Open
xuxoramos wants to merge 12 commits intosoftwareguru:mainfrom
xuxoramos:main
Open

Causal inference analysis#1
xuxoramos wants to merge 12 commits intosoftwareguru:mainfrom
xuxoramos:main

Conversation

@xuxoramos
Copy link
Copy Markdown

This pull request adds comprehensive documentation, analysis, and communication materials for a causal inference study on Mexican IT salary survey data. The main contributions include a detailed execution summary mapping the project to its specification, a formal specification document, and a set of social media scripts designed to communicate key findings. Additionally, the original README is updated to clarify data usage permissions.

Documentation and Analysis:

  • Added CAUSAL_ANALYSIS_SUMMARY.md, providing a detailed mapping of how each item in the project specification was addressed, including methodology, confounder controls, key findings, limitations, and actionable insights.
  • Created SPECIFICATION.md, outlining the requirements and constraints for the causal analysis, including data sources, periods, exclusions, and the main analytical goal.

Communication Materials:

  • Added four Instagram reel scripts (ig_scripts/01_intro_estudio.md, 02_experiencia_vale.md, 03_ingles_48k.md, 04_brecha_genero.md) to communicate the study’s main findings, methodology, and societal implications in an accessible, engaging way. Each script includes a storyboard, required assets, music suggestions, hashtags, and captions. [1] [2] [3] [4]

Repository Information:

  • Added README_original.md to clarify the academic/research-only license and provide guidance for commercial use requests.

REDESIGN_2026.md: Full redesign document with three goals:
  - Goal 1: Product separation from BP2C (retained/removed/modified fields,
    tech stack role-first architecture, existing question improvements)
  - Goal 2: Policy-driving artifact via AMITI (6 policy blocks + advocacy brief)
  - Goal 3: BP2C enrollment hook (eNPS, leave reason, job search)
  - 62 active items (12 retained, 12 redesigned, 13 tech stack, 25 new)
  - ~165 items removed (benefits, COVID, checkbox matrices)

question_inventory_2026.csv: Excel-friendly question list with bilingual
  questions, types, options, skip logic, goal mapping, and section references.
  Includes 18 removed-field rows documenting what was dropped and why.
Monte Carlo simulation (n=6000, seed=2026) comparing old ~130-item
checkbox-heavy design vs new 62-item structured design:

Key findings:
  - R² improves from 0.34 to 0.49 (+14.9 pp)
  - seniority_level alone adds +12.4 pp (biggest gap closed)
  - Information efficiency (R²/min) triples (+208%)
  - Coefficient stability improves 87% (bootstrap CV)
  - 629 more usable responses from higher completion rate
  - 62% more effective information (R² × N)

Files:
  simulation_old_vs_new.py       — reproducible simulation script
  SIMULATION_FINDINGS.md         — interpreted findings document
  simulation_results/*.csv       — raw numeric outputs
16 slides covering: rationale, 3 goals, removals, additions,
seniority_level impact, tech stack redesign, policy blocks,
BP2C hook, simulation evidence, information efficiency,
respondent experience, roadmap, and decisions needed.

Source: slides_redesign_2026.md (Pandoc Markdown)
Output: slides_redesign_2026.html (reveal.js, CDN-loaded)
Directory layout:
- data/          — raw survey CSVs + options/ lookup tables
- notebooks/     — salarios.ipynb, causal_analysis.ipynb
- output/        — figures/, simulation_results/, model outputs
- docs/          — writeups, ig_scripts/
- redesign-2026/ — REDESIGN_2026.md, question inventory, simulation,
                   slidev-deck/ (18-slide presentation in Mexican Spanish)

Cleanup:
- Remove voila-demo.ipynb (unused)
- Remove slides_redesign_2026.html/md (superseded by Slidev)
- Remove tablas/ (empty, outputs now in output/)
- Remove README_original.md
- Update notebook paths to match new layout
- Update .gitignore to track slidev-deck source, ignore node_modules/dist
Redesigns salary survey for 2026 with improved structure and analytics
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant