-
Notifications
You must be signed in to change notification settings - Fork 0
3. Getting Started
sarlib is designed to operate on data structured as:
- A one-dimensional array for the response variable
- A two-dimensional array for the predictor variables
For sarlib to function correctly, both the response and predictor variables must have the same number of observations, including any NaN values.
If using the GUI, the input data must be provided as a .csv file.
After installing sarlib, you can launch the graphical user interface (GUI) by running the following command in your terminal or command prompt:
python sar_gui.pyThe initial sar_gui window should appear as shown in the figure below:

As mentioned earlier, if the GUI is used, the input data must be provided as a .csv file. The data file must consist of a dataframe where all columns have the same number of rows (samples). If any column contains missing data, those entries must be explicitly set to NaN for sarlib to function properly.
- Select a single column from the dataframe as the response variable
- Select at least one additional column as a predictor variable
- For analyses with multiple predictors, select only the corresponding columns to be used as predictors
Scatter plots of each predictor’s projection with respect to the response variable can be generated by clicking the ‘Show scatter plots’ button.

Set the parameters that will determine how the SAR validation is performed:
- Realizations (int): Number of Monte Carlo realizations (default: 100)
-
Norm ('rmse', 'epsins'): Loss function norm.
'epsins'for epsilon-insensitive,'rmse'for root mean squared error (default:'rmse') -
Training mode ('resusb', 'kfold', 'leaveoo'):
'resusb'for resubstitution,'kfold'for k-fold CV,'leaveoo'for leave-one-out CV (default:'resusb') -
Threshold eps value (float, None): Threshold parameter for loss. If
None, uses SVR epsilon (default:None) - Alpha (float): Significance level for hypothesis testing (default: 0.05)
-
Upper Bound ('pacbayes', 'vapnik', 'igp', 'igp_approx'): Bound type (default:
'pacbayes') - Eta (float): Confidence parameter for bounds (default: 0.5)
- Dropout rate (float): Dropout rate parameter for Bayesian bound (default: 0.5)
Once the data and parameters have been selected, click on ‘Run analysis’ to start the SAR method.
After the analysis is completed on the full sample (all available data):
- SAR validation results are displayed on screen
- Results from the sample size analysis are saved for later visualization
Validation outcomes based on the complete dataset are displayed in a window showing:
- Sample size
- Expected losses
- Estimated flatness threshold
- Statistical power of the SAR method
Three plots are provided to evaluate metrics relevant for regression model validation:
-
Plot Loss
Shows the mean and standard deviation of the Monte Carlo realizations for expected losses as a function of sample size. Also includes the estimated flatness threshold (mean and standard deviation).

-
Plot P-value
Shows p-values from the SAR method based on the R-statistic (expected loss) across different sample sizes, along with SAR statistical power.

-
Plot Coefficients
Displays coefficients of the regression hyperplane obtained via SAR as a function of sample size.
