3. Getting Started

Prepare your data: Dataset structure and format

sarlib is designed to operate on data structured as:

A one-dimensional array for the response variable
A two-dimensional array for the predictor variables

For sarlib to function correctly, both the response and predictor variables must have the same number of observations, including any NaN values.

If using the GUI, the input data must be provided as a .csv file.

Graphical User Interface (GUI)

After installing sarlib, you can launch the graphical user interface (GUI) by running the following command in your terminal or command prompt:

python sar_gui.py

The initial sar_gui window should appear as shown in the figure below: Initial GUI Window

Loading Data Files

As mentioned earlier, if the GUI is used, the input data must be provided as a .csv file. The data file must consist of a dataframe where all columns have the same number of rows (samples). If any column contains missing data, those entries must be explicitly set to NaN for sarlib to function properly.

Frame for Data Selection

Select a single column from the dataframe as the response variable
Select at least one additional column as a predictor variable
For analyses with multiple predictors, select only the corresponding columns to be used as predictors

Show Scatter Plots

Scatter plots of each predictor’s projection with respect to the response variable can be generated by clicking the ‘Show scatter plots’ button.

Frame for Parameter Selection

Set the parameters that will determine how the SAR validation is performed:

Realizations (int): Number of Monte Carlo realizations (default: 100)
Norm ('rmse', 'epsins'): Loss function norm. 'epsins' for epsilon-insensitive, 'rmse' for root mean squared error (default: 'rmse')
Training mode ('resusb', 'kfold', 'leaveoo'): 'resusb' for resubstitution, 'kfold' for k-fold CV, 'leaveoo' for leave-one-out CV (default: 'resusb')
Threshold eps value (float, None): Threshold parameter for loss. If None, uses SVR epsilon (default: None)
Alpha (float): Significance level for hypothesis testing (default: 0.05)
Upper Bound ('pacbayes', 'vapnik', 'igp', 'igp_approx'): Bound type (default: 'pacbayes')
Eta (float): Confidence parameter for bounds (default: 0.5)
Dropout rate (float): Dropout rate parameter for Bayesian bound (default: 0.5)

Run Analysis

Once the data and parameters have been selected, click on ‘Run analysis’ to start the SAR method.

After the analysis is completed on the full sample (all available data):

SAR validation results are displayed on screen
Results from the sample size analysis are saved for later visualization

Analysis Results

Validation outcomes based on the complete dataset are displayed in a window showing:

Sample size
Expected losses
Estimated flatness threshold
Statistical power of the SAR method

Sample Size Analysis

Three plots are provided to evaluate metrics relevant for regression model validation:

Plot Loss
Shows the mean and standard deviation of the Monte Carlo realizations for expected losses as a function of sample size. Also includes the estimated flatness threshold (mean and standard deviation).
Plot P-value
Shows p-values from the SAR method based on the R-statistic (expected loss) across different sample sizes, along with SAR statistical power.
Plot Coefficients
Displays coefficients of the regression hyperplane obtained via SAR as a function of sample size.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

3. Getting Started

Prepare your data: Dataset structure and format

Graphical User Interface (GUI)

Loading Data Files

Frame for Data Selection

Show Scatter Plots

Frame for Parameter Selection

Run Analysis

Analysis Results

Sample Size Analysis

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally