Publications des scientifiques de l'IRD

Jradeh C. K., Raoufi E., David J., Larmande Pierre, Scharffe F., Todorov K., Trojahn C., Association for Computing Machinery. (2025). Graph embeddings meet link keys discovery for entity matching. Proceedings of the ACM Web Conference 2025, WWW 2025, p. 3344-3353.

Titre du document
Graph embeddings meet link keys discovery for entity matching
Année de publication
2025
Type de document
Article référencé dans le Web of Science WOS:001505285200277
Auteurs
Jradeh C. K., Raoufi E., David J., Larmande Pierre, Scharffe F., Todorov K., Trojahn C., Association for Computing Machinery
Source
Proceedings of the ACM Web Conference 2025, WWW 2025, 2025, p. 3344-3353
Entity Matching (EM) automates the discovery of identity links between entities within different Knowledge Graphs (KGs). Link keys are crucial for EM, serving as rules allowing to identify identity links across different KGs, possibly described using different ontologies. However, the approach for extracting link keys struggles to scale on large KGs. While embedding-based EM methods efficiently handle large KGs they lack explainability. This paper proposes a novel hybrid EM approach to guarantee the scalability link key extraction approach and improve the explainability of embeddingbased EM methods. First, embedding-based EM approaches are used to sample the KGs based on the identity links they generate, thereby reducing the search space to relevant sub-graphs for link key extraction. Second, rules (in the form of link keys) are extracted to explain the generation of identity links by the embedding-based methods. Experimental results demonstrate that the proposed approach allows link key extraction to scale on large KGs, preserving the quality of the extracted link keys. Additionally, it shows that link keys can improve the explainability of the identity links generated by embedding-methods, allowing for the regeneration of 77% of the identity links produced for a specific EM task, thereby providing an approximation of the reasons behind their generation.
Plan de classement
Sciences fondamentales / Techniques d'analyse et de recherche [020] ; Informatique [122]
Localisation
Fonds IRD [F B010094312]
Identifiant IRD
fdi:010094312
Contact