Machine Learning Join the group

Machine Learning combines data, software, models and workflows. There is a need to harmonize and connect those different elements to have a full picture of a Machine Learning approach from the metadata perspective.

Folder:		Task & Issues:

Objectives

Describe training datasets including characterization of features and attributes that can be used for training (e.g., number of data points, classes, target variable).
Describe software used for training purposes including elements related to the optimization process.
Describe ML models together with their evaluation.
Describe links among the different elements involved in ML approaches clearly and explicitly.

Profiles

Dataset

Latest Release: (version 1.0-RELEASE)
Latest Draft: (version 1.1-DRAFT)

ComputationalTool

Latest Release: (version 1.0-RELEASE)
Latest Draft: (version 1.1-DRAFT)

Group Leader(s)

Leyla Jael Castro

Fotis E. Psomopoulos

Daniel S. Katz

Other team members

Alban Gaignard

Dietrich Rebholz-Schuhmann

Ivan Mičetić

Further Details

Machine Learning (ML) is nowadays a common path in data-driven research due to the amount of available data and the resources needed to process it and make sense out of it. In addition to data, software also plays and important role in ML. Models produced by an ML training process also become a thing on their own, a thing that could be seen as similar to software (e.g., prediction model that can be executed with some input and produce a prediction as output) or to data (e.g., clusters emerged from a clustering approach). Furthermore, the training software has to be tuned and optimized while the model has to be evaluted, either intrinsic or extrinsically. Ideally, all of this information should be reported and represented as metadata of the ML process. However, this is not always the case. This group, a joint effort across Research Data Alliance FAIR4ML Interest Group, ELIXIR Machine Learning Focus Group and NFDI4DataScience, aims at providing a common ground for the metadata necessary to describe ML approaches.

To achieve its objectives, this group is using as a starting point Machine Learning Cards for models and datasets. Other efforst will also be taken into account, e.g., Data, Optimization, Model and Evaluation (DOME) recommendations, AIMe registry for artificial intelligence in biomedical research and HuggingFace.