%0 Journal Article
%9 ACL : Articles dans des revues avec comité de lecture répertoriées par l'AERES
%A Sklab, Youcef
%A Ariouat, H.
%A Chenin, Eric
%A Prifti, Edi
%A Zucker, Jean-Daniel
%T SIM-Net : a multimodal fusion network using inferred 3D object shape point clouds from RGB images for 2D classification
%D 2025
%L fdi:010095035
%G ENG
%J IET Computer Vision
%@ 1751-9632
%K botany ; computer vision ; image classification ; image processing ; image representation ; learning (artificial intelligence) ; neural net ; architecture ; neural nets ; object recognition
%M ISI:001576587900001
%N 1
%P e70036 [18 ]
%R 10.1049/cvi2.70036
%U https://www.documentation.ird.fr/hor/fdi:010095035
%> https://horizon.documentation.ird.fr/exl-doc/pleins_textes/2025-11/010095035.pdf
%V 19
%W Horizon (IRD)
%X We introduce the shape-image multimodal network (SIM-Net), a novel 2D image classification architecture that integrates 3D point cloud representations inferred directly from RGB images. Our key contribution lies in a pixel-to-point transformation that converts 2D object masks into 3D point clouds, enabling the fusion of texture-based and geometric features for enhanced classification performance. SIM-Net is particularly well-suited for the classification of digitised herbarium specimens-a task made challenging by heterogeneous backgrounds, nonplant elements, and occlusions that compromise conventional image-based models. To address these issues, SIM-Net employs a segmentation-based preprocessing step to extract object masks prior to 3D point cloud generation. The architecture comprises a CNN encoder for 2D image features and a PointNet-based encoder for geometric features, which are fused into a unified latent space. Experimental evaluations on herbarium datasets demonstrate that SIM-Net consistently outperforms ResNet101, achieving gains of up to 9.9% in accuracy and 12.3% in F-score. It also surpasses several transformer-based state-of-the-art architectures, highlighting the benefits of incorporating 3D structural reasoning into 2D image classification tasks.
%$ 122 ; 076