Publications des scientifiques de l'IRD

Comignani U., Novelli N., Berti-Equille Laure. (2020). Data quality checking for machine learning with MeSQuaL [demonstration paper]. In : Bonifati A. (ed.), Zhou I. (ed.), Vaz Salles M. A. (ed.), Böhm A. (ed.), Olteanu D. (ed.), Fletcher G. (ed.), Khan A. (ed.), Yang B. (ed.). Advances in database technology : EDBT 2020. Constance : Open Proceedings, p. 591-594. (Open Proceedings ; 23). International Conference on Extending Database Technology, 23., Copenhague (DNK), 2020/30/03-2020/04/02. ISBN 978-3-89318-083-7. ISSN 2367-2005.

Titre du document
Data quality checking for machine learning with MeSQuaL [demonstration paper]
Année de publication
2020
Type de document
Partie d'ouvrage
Auteurs
Comignani U., Novelli N., Berti-Equille Laure
In
Bonifati A. (ed.), Zhou I. (ed.), Vaz Salles M. A. (ed.), Böhm A. (ed.), Olteanu D. (ed.), Fletcher G. (ed.), Khan A. (ed.), Yang B. (ed.) Advances in database technology : EDBT 2020
Source
Constance : Open Proceedings, 2020, p. 591-594 (Open Proceedings ; 23). ISBN 978-3-89318-083-7 ISSN 2367-2005
Colloque
International Conference on Extending Database Technology, 23., Copenhague (DNK), 2020/30/03-2020/04/02
This demo proposes MeSQuaL, a system for profiling and check-ing data quality before further tasks, such as data analytics and machine learning. MeSQuaL extends SQL for querying relational data with constraints on data quality and facilitates the verification of statistical tests. The system includes: (1) a query interpreter for SQuaL, the SQL-extended language we propose for declaring and querying data with data quality checks and statistical tests; (2) an extensible library of user-defined functions for profiling the data and computing various data quality indicators ;and (3) a user interface for declaring data quality constraints, profiling data, monitoring data quality with SQuaL queries, and visualizing the results via data quality dashboards. We showcaseour system in action with various scenarios on real-world datasets and show its usability for monitoring data quality over timeand checking the quality of data on-demand.
Plan de classement
Informatique [122]
Localisation
Fonds IRD [F B010078830]
Identifiant IRD
fdi:010078830
Contact