Skip to content

3. Getting Started

AHdezS edited this page Jan 23, 2026 · 2 revisions

Prepare your data: Dataset structure and format

sarlib is designed to operate on data structured as:

  • A one-dimensional array for the response variable
  • A two-dimensional array for the predictor variables

For sarlib to function correctly, both the response and predictor variables must have the same number of observations, including any NaN values.

If using the GUI, the input data must be provided as a .csv file.

Graphical User Interface (GUI)

After installing sarlib, you can launch the graphical user interface (GUI) by running the following command in your terminal or command prompt:

python sar_gui.py

The initial sar_gui window should appear as shown in the figure below: Initial GUI Window


Loading Data Files

As mentioned earlier, if the GUI is used, the input data must be provided as a .csv file. The data file must consist of a dataframe where all columns have the same number of rows (samples). If any column contains missing data, those entries must be explicitly set to NaN for sarlib to function properly.


Frame for Data Selection

  • Select a single column from the dataframe as the response variable
  • Select at least one additional column as a predictor variable
  • For analyses with multiple predictors, select only the corresponding columns to be used as predictors

Show Scatter Plots

Scatter plots of each predictor’s projection with respect to the response variable can be generated by clicking the ‘Show scatter plots’ button. Scatter Plots


Frame for Parameter Selection

Set the parameters that will determine how the SAR validation is performed:

  • Realizations (int): Number of Monte Carlo realizations (default: 100)
  • Norm ('rmse', 'epsins'): Loss function norm. 'epsins' for epsilon-insensitive, 'rmse' for root mean squared error (default: 'rmse')
  • Training mode ('resusb', 'kfold', 'leaveoo'): 'resusb' for resubstitution, 'kfold' for k-fold CV, 'leaveoo' for leave-one-out CV (default: 'resusb')
  • Threshold eps value (float, None): Threshold parameter for loss. If None, uses SVR epsilon (default: None)
  • Alpha (float): Significance level for hypothesis testing (default: 0.05)
  • Upper Bound ('pacbayes', 'vapnik', 'igp', 'igp_approx'): Bound type (default: 'pacbayes')
  • Eta (float): Confidence parameter for bounds (default: 0.5)
  • Dropout rate (float): Dropout rate parameter for Bayesian bound (default: 0.5)

Run Analysis

Once the data and parameters have been selected, click on ‘Run analysis’ to start the SAR method.

After the analysis is completed on the full sample (all available data):

  • SAR validation results are displayed on screen
  • Results from the sample size analysis are saved for later visualization

Analysis Results

Validation outcomes based on the complete dataset are displayed in a window showing:

  • Sample size
  • Expected losses
  • Estimated flatness threshold
  • Statistical power of the SAR method

Sample Size Analysis

Three plots are provided to evaluate metrics relevant for regression model validation:

  1. Plot Loss
    Shows the mean and standard deviation of the Monte Carlo realizations for expected losses as a function of sample size. Also includes the estimated flatness threshold (mean and standard deviation).
    Plot Loss

  2. Plot P-value
    Shows p-values from the SAR method based on the R-statistic (expected loss) across different sample sizes, along with SAR statistical power.
    Plot P-value

  3. Plot Coefficients
    Displays coefficients of the regression hyperplane obtained via SAR as a function of sample size.
    Plot Coefficients

Clone this wiki locally