VIGILANT is a measurement toolkit for performance assessment of adaptive AI systems. The three included measurements — learning, rentention, and potential — help to disentangle performance changes due to model adaptations from those caused by shifts in the evaluation environment.
Tip
For a more detailed description of the measurements provided in this repository, see our [open-access paper](Link to be added).
Adaptive AI refers to artificial intelligence models developed in multiple discrete versions over time. This differs from locked models, which remain unchanged after training, and continually learning models, which treat all incoming data as training data.
The adaptive AI paradigm introduces challenges for performance assessment because both (a) the model and (b) the evaluation dataset may change simultaneously. Consider the example below: if performance improves from Performance 1 to Performance 2, the cause may be either (a) an increase in model capability or (b) a change in the difficulty of the evaluation dataset.
flowchart TD
m1(("Model 1"))
e1["Evaluation<br>dataset 1"]
p1[["Performance 1"]]
m1 --> e1 ==> p1
m2(("Model 2"))
e2["Evaluation<br>dataset 2"]
p2[["Performance 2"]]
m2 --> e2 ==> p2
VIGILANT provides three measurements to help separate performance changes due to model updates from those caused by variations in the evaluation data: learning, potential, and retention.
All measurements assume a sequential modification paradigm with
flowchart TD
classDef invisible fill:transparent,stroke:transparent;
m1(("Model 1"))
e1["Evaluation<br>dataset 1"]
m1 ~~~ e1
m2(("Model 2"))
e2["Evaluation<br>dataset 2"]
m2 ~~~ e2
m3(("..."))
e3["...."]
m3 ~~~ e3
class m3 invisible
class e3 invisible
mn(("Model V"))
en["Evaluation<br>dataset V"]
mn ~~~ en
Learning: Improvement in performance from the previous step, measured with respect to the current evaluation dataset.
$learning(M_V) = S(M_V|D_V) - S(M_{V-1}|D_V)$
Potential: Change in performance resulting from changes to the evaluation dataset.
$potential(M_V) = S(M_{V-1}|D_{V-1}) - S(M_{V-1}|D_V)$
Retention: The model's maintained performance on previous datasets.
$retention(M_V)=\sum_{v=0}^{V-1}S(M_V|D_v)\times W((V-1)-v)$
Where
For more detailed information, visit our project documentation.
VIGILANT can be used either as a Python package (by cloning the source repository) or through your browser. Instructions and examples for the browser version are provided within the interface.
This toolkit works with adaptive AI systems developed in discrete model versions, each paired with a corresponding evaluation dataset. The required input is the performance of every model version evaluated on every dataset version.
For example, for model versions
| Model version | Dataset version | Performance |
|---|---|---|
Clone the source repository, then cd into the cloned directory (cd VIGILANT/).
From this directory, the VIGILANT package can be installed using the command: pip install ".".
import vigilant
import pandas as pd
# Direct to the file containing performance data
data_file = "performance_data.csv"
data = pd.read_csv(data_file)
""""
By default, vigilant assumes that model version, dataset version, and performance are
in columns named "model", "dataset", and "performance", respectively.
This behavior can be changed by adjusting the appropriate keys in the config object.
The example below indicates that the performance will be found in a column named "AUROC"
""""
vigilant.config.performance_key = 'AUROC'
# Calculate individual measurements
L = vigilant.learning(data)
P = vigilant.potential(data)
R = vigilant.rentention(data)The output of each of the measurement functions is a two column dataframe (version and the name of the measurement).
