The current underlying implementation of BERT score supports a limited set of transformer models, and FMEval further truncates this list to microsoft/deberta-xlarge-mnli and roberta-large-mnli.
Torchmetrics provides a more generic BERT score implementation. The specific request here is to not limit what transformer models users can configure. The behavior should be as follows:
- If
microsoft/deberta-xlarge-mnli and roberta-large-mnli is specified, use the bert-score implementation to avoid regressions for existing customers.
- Otherwise, use the
torchmetrics implementation of BERT score.
Use cases for broader set of models underlying BERT score:
- Monolingual BERT models have been shown to outperform multi-lingual BERT models on certain tasks. https://aclanthology.org/2021.acl-long.243.pdf
- Customers may fine tune their own transformers which can be downloaded into the container running AWSFMeval and passed into the
torchmetrics BERT score implementation.
The current underlying implementation of BERT score supports a limited set of transformer models, and FMEval further truncates this list to
microsoft/deberta-xlarge-mnliandroberta-large-mnli.Torchmetrics provides a more generic BERT score implementation. The specific request here is to not limit what transformer models users can configure. The behavior should be as follows:
microsoft/deberta-xlarge-mnliandroberta-large-mnliis specified, use thebert-scoreimplementation to avoid regressions for existing customers.torchmetricsimplementation of BERT score.Use cases for broader set of models underlying BERT score:
torchmetricsBERT score implementation.