Christophe Biernacki a animé un Seminar@SystemX le 15 décembre 2022

Séminaires scientifiques

Publié le 09/11/2022

Christophe Biernacki (Professor of Statistics at the University of Lille et directeur scientifique adjoint à l’Inria) a animé un Seminar@SystemX sur le thème « Frugal Gaussian clustering of huge imbalanced datasets », le 15 décembre 2022 de 14h00 à 15h30.

Résumé :

Clustering conceptually reveals all its interest when the dataset size considerably increases since there is the opportunity to discover tiny but possibly high value clusters which were out of reach with more modest sample sizes. However, clustering is practically faced to computer limits with such high data volume, since possibly requiring extremely high memory and computation resources. In addition, the classical subsampling strategy, often adopted to overcome these limitations, is expected to heavily failed for discovering clusters in the highly imbalanced cluster case. Our proposal first consists in drastically compressing the data volume by just preserving its bin-marginal values, thus discarding the bin-cross ones. Despite this extreme information loss, we then prove identifiability property for the diagonal mixture model and also introduce a specific EM-like algorithm associated to a composite likelihood approach. This latter is extremely more frugal than a regular but unfeasible EM algorithm expected to be used on our bin-marginal data, while preserving all consistency properties. Finally, numerical experiments highlight that this proposed method outperforms subsampling both in controlled simulations and in various real applications where imbalanced clusters may typically appear, such as image segmentation, hazardous asteroids recognition and fraud detection.

Biographie :

Christophe Biernacki is a Professor of Statistics at the University of Lille / Laboratory of mathematics Paul Painlevé, UMR CNRS 8524 (France) and is also a Senior Researcher at INRIA. He was formerly the scientific leader of the Modal (MOdels for Data Analysis and Learning https://team.inria.fr/modal/ ) research team he created in 2011 at INRIA Lille Nord-Europe. He is currently Deputy Scientific Director of INRIA in charge of the domain of Applied Mathematics, Computation and Simulation. His research interests include statistical learning, model-based clustering and classification, mixture modeling, EM algorithms and model selection. Some of his seminal contributions include the so popular ICL (Integrated Completed Likelihood) criterion for choosing the number of clusters in a mixture model and the MIXMOD (MIXture MODelling) software for fiting mixture models and that is publicly available for different platforms (Linux, Unix, Windows) https://github.com/mixmod/mixmod

Replay :

- Téléchargez la présentation

À consulter également

Evénement

Seminar@SystemX animé par Ang Liu

08 octobre 2025

Ang Liu (School of Mechanical and Manufacturing Engineering, University of New South Wales - Sydney, Australia) a animé un Seminar@SystemX sur le thème "Synergy of AI and Digital Twins: Driving Data-Driven Design", le 8 octobre 2025 de 11h à 12h. Ce séminaire est organisé dans le cadre du programme Jumeaux Numériques des systèmes Industriels de…

Publié le 04/09/2025

Evénement

Joaquin Garcia-Alfaro a animé un séminaire le 5 juillet 2023

05 juillet 2023

Joaquin Garcia-Alfaro (Télécom SudParis, Institut Mines-Télécom) a animé un Seminar@SystemX sur le thème «Cybersecurity in Critical Sectors — Ten years later», le 5 juillet 2023 de 14h00 à 15h00. Résumé : The talk will focus on protection of critical sectors, from energy distribution to industrial, supply chain and IoT facing cybersecurity threats. We will address…

Publié le 08/06/2023

Evénement

Seminar@SystemX animé par Alessandro Leite

10 octobre 2024

Alessandro Leite (Inria Saclay - LISN, Paris-Saclay University) animera un Seminar@SystemX sur le thème "Causal Knowledge Discovery through Large Language Models: Challenges and Opportunities", le 10 octobre 2024 de 14h à 15h. >> Lien de connexion au webinaire << Résumé (en anglais) Understanding the data-generating process is critical in many disciplines, including physics, econometrics,…

Publié le 30/09/2024