Model Cards for Bci Essentials

Model cards are brief markdown files that accompany machine learning models to aid in user comprehension. According to huggingface, these card include relevant information on the model, its intended uses and limitations, the training parameters, the datasets used for training, and the evaluation results.

There are many resources available to help build these model cards (see below, thanks Anup) but first we need to decide what a model card looks like for a Bessy model.

https://www.evidentlyai.com/blog/ml-model-card-tutorial

https://cloud.google.com/blog/products/ai-machine-learning/create-a-model-card-with-scikit-learn

https://towardsdatascience.com/create-a-custom-model-card-with-googles-model-card-toolkit-a1e89a7887b5

https://www.tensorflow.org/responsible_ai/model_card_toolkit/examples/Scikit_Learn_Model_Card_Toolkit_Demo

https://huggingface.co/docs/huggingface_hub/guides/model-cards

https://pypi.org/project/model-card-toolkit/

The main problem with using the typical model card layout is that ML models for BCI are not generalizable, because data is heterogeneous and non-stationary. For this reason, BCI models are typically trained on calibration data collected within the session.

This means that there are two ways that a model card could represent performance.

  1. The performance of the model on the population as a whole. (population-level evaluation)
  2. The performance of the model for a specific target user. (target-level evaluation)

If the model is not using transfer learning, then these evaluations are independent. If the model uses transfer, then the target-level evaluation is dependent on the population-level performance.

Population-level evaluation can help the reader to contextualize the target-level evaluation. This is the same idea as if you ever get a DEXA body composition scan, you get your results placed in the context of the population. In a BCI context, this could tell you if the issue is with the classifier or with the individual/EEG recordings.

All this is to say that while much of the information is the same between population and target-level, the performances could be quite different. The fields on our model card look something like this.

Common Fields

Preprocessing steps - describe the filter/resampling that was applied

Feature selection steps - any channel selection, spatial filtering, etc.

Classifier information - describe the steps of the classifier in plain language, this section will vary heavily from model to model

Individual Fields

Performance for a specific individual.

Channels selected, channels rejected, other analysis of EEG data.

Population Fields

Table of performance on different datasets (accuracy +/- SEM, etc.).