<?xml version="1.0"?>
<oai_dc:dc xmlns:oai_dc="http://www.openarchives.org/OAI/2.0/oai_dc/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/oai_dc/ http://www.openarchives.org/OAI/2.0/oai_dc.xsd">
  <dc:title>SIM-Net : a multimodal fusion network using inferred 3D object shape point clouds from RGB images for 2D classification</dc:title>
  <dc:creator>/Sklab, Youcef</dc:creator>
  <dc:creator>Ariouat, H.</dc:creator>
  <dc:creator>/Chenin, Eric</dc:creator>
  <dc:creator>/Prifti, Edi</dc:creator>
  <dc:creator>/Zucker, Jean-Daniel</dc:creator>
  <dc:subject>botany</dc:subject>
  <dc:subject>computer vision</dc:subject>
  <dc:subject>image classification</dc:subject>
  <dc:subject>image processing</dc:subject>
  <dc:subject>image representation</dc:subject>
  <dc:subject>learning (artificial intelligence)</dc:subject>
  <dc:subject>neural net</dc:subject>
  <dc:subject>architecture</dc:subject>
  <dc:subject>neural nets</dc:subject>
  <dc:subject>object recognition</dc:subject>
  <dc:description>We introduce the shape-image multimodal network (SIM-Net), a novel 2D image classification architecture that integrates 3D point cloud representations inferred directly from RGB images. Our key contribution lies in a pixel-to-point transformation that converts 2D object masks into 3D point clouds, enabling the fusion of texture-based and geometric features for enhanced classification performance. SIM-Net is particularly well-suited for the classification of digitised herbarium specimens-a task made challenging by heterogeneous backgrounds, nonplant elements, and occlusions that compromise conventional image-based models. To address these issues, SIM-Net employs a segmentation-based preprocessing step to extract object masks prior to 3D point cloud generation. The architecture comprises a CNN encoder for 2D image features and a PointNet-based encoder for geometric features, which are fused into a unified latent space. Experimental evaluations on herbarium datasets demonstrate that SIM-Net consistently outperforms ResNet101, achieving gains of up to 9.9% in accuracy and 12.3% in F-score. It also surpasses several transformer-based state-of-the-art architectures, highlighting the benefits of incorporating 3D structural reasoning into 2D image classification tasks.</dc:description>
  <dc:date>2025</dc:date>
  <dc:type>text</dc:type>
  <dc:identifier>https://www.documentation.ird.fr/hor/fdi:010095035</dc:identifier>
  <dc:identifier>fdi:010095035</dc:identifier>
  <dc:identifier>Sklab Youcef, Ariouat H., Chenin Eric, Prifti Edi, Zucker Jean-Daniel. SIM-Net : a multimodal fusion network using inferred 3D object shape point clouds from RGB images for 2D classification. 2025, 19 (1),  e70036 [18 p.]</dc:identifier>
  <dc:language>EN</dc:language>
</oai_dc:dc>
