Skip to content

Latest commit

 

History

History
90 lines (64 loc) · 4.26 KB

File metadata and controls

90 lines (64 loc) · 4.26 KB

Auptimize

Auptimize optimizes spatial audio layouts for improved perceptual sound localization. It adjusts sound source positions based on empirical human perception data to compensate for localization errors such as front-back confusion and elevation misjudgment.

Overview

Given a set of visual element positions on a sphere around a listener, Auptimize uses integer programming (Gurobi) to find optimal sound source positions that maximize the likelihood of correct auditory localization. The system uses frequency distributions derived from collected participant data to model how humans perceive sound directions.

Requirements

  • Python 3.9
  • numpy
  • pandas
  • scipy
  • gurobipy (Gurobi optimizer)
conda create -n auptimize python=3.9
python -m pip install numpy pandas scipy gurobipy

Note: Using Gurobi requires a valid license, which is free for academics (https://www.gurobi.com/solutions/licensing).

Getting Started

Option A: Use Pre-computed User Study Layouts in Unity

If you want to test Auptimize using the outputs from our user study, the pre-computed results are already included in simulation_data_userstudy/. You can convert them to Unity-compatible coordinate files and export them directly to a Unity project.

  1. Run generate_unity_layouts.py to convert the user study data into Unity coordinate files:

    python generate_unity_layouts.py

    This reads the CSV files from simulation_data_userstudy/ and outputs .txt layout files to ../unity/Assets/Resources/TrialCoordinates/. Each file contains pairs of visual and sound coordinates in radians, ready for use in Unity.

Option B: Run the Full Simulation Pipeline

If you want to run the optimization yourself on new layout configurations, you will need the participant data to build the perception model.

  1. Download participant data. If the data/ directory is not included in this repository, download it from this link.

    Place the downloaded participant folders into ./data/ so the directory structure looks like:

    python/
      data/
        p1/
          headposition.csv
          main.csv
        p2_.../
          headposition.csv
          main.csv
        ...
    
  2. Run Auptimize optimization. This generates layouts, optimizes sound positions using integer programming, and saves the results to CSV:

    python simulation.py --layout_type random --num_elements 2 --num_layouts 1000

    Options:

    • --layout_type: random, side_by_side, or cone_of_confusion
    • --num_elements: Number of audio-visual elements (only for random and side_by_side layouts)
    • --num_layouts: Number of layouts to generate (only for random layouts; all possible layouts are generated for side_by_side and cone_of_confusion layouts.)

    Results are saved to auptimize_simulation/.

  3. Export to Unity (optional). Convert your new simulation results to Unity layout files by updating the in_dir path in generate_unity_layouts.py to point to your output directory, then run:

    python generate_unity_layouts.py

File Descriptions

File Description
optimization.py Integer programming optimization using Gurobi. Computes frequency distributions from participant data and solves for optimal sound positions
simulation.py Main CLI entry point. Runs optimization, predicts perceived locations, and computes evaluation metrics
utils.py Data loading (setup()) and coordinate conversion between rectangular and spherical coordinate systems
evaluation_metrics.py Distance and error functions: circular difference, Haversine angular distance, geodesic distance, L1/L2 norms, cone-of-confusion distance, and adjusted error metrics
generate_layouts.py Generates layout configurations: random, side-by-side, cone-of-confusion, and exhaustive grid combinations
generate_unity_layouts.py Converts simulation result CSVs to Unity-compatible coordinate files (radians)

Coordinate System

  • Theta (azimuthal): 0-360 degrees (front=0, left=90, back=180, right=270)
  • Phi (elevation): -60 to +60 degrees (down=-60, horizontal=0, up=+60)
  • Radius: Fixed at 1.5 (listener-centric sphere)