<?xml version="1.0" encoding="UTF-8"?>
<xml>
  <records>
    <record>
      <source-app name="Horizon">Horizon</source-app>
      <rec-number>1</rec-number>
      <foreign-keys>
        <key app="Horizon" db-id="fdi:010095035">1</key>
      </foreign-keys>
      <ref-type name="Journal Article">17</ref-type>
      <work-type>ACL : Articles dans des revues avec comité de lecture répertoriées par l'AERES</work-type>
      <contributors>
        <authors>
          <author>
            <style face="bold" font="default" size="100%">Sklab, Youcef</style>
          </author>
          <author>
            <style face="normal" font="default" size="100%">Ariouat, H.</style>
          </author>
          <author>
            <style face="bold" font="default" size="100%">Chenin, Eric</style>
          </author>
          <author>
            <style face="bold" font="default" size="100%">Prifti, Edi</style>
          </author>
          <author>
            <style face="bold" font="default" size="100%">Zucker, Jean-Daniel</style>
          </author>
        </authors>
      </contributors>
      <titles>
        <title>SIM-Net : a multimodal fusion network using inferred 3D object shape point clouds from RGB images for 2D classification</title>
        <secondary-title>IET Computer Vision</secondary-title>
      </titles>
      <pages>e70036 [18 p.]</pages>
      <keywords>
        <keyword>botany</keyword>
        <keyword>computer vision</keyword>
        <keyword>image classification</keyword>
        <keyword>image processing</keyword>
        <keyword>image representation</keyword>
        <keyword>learning (artificial intelligence)</keyword>
        <keyword>neural net</keyword>
        <keyword>architecture</keyword>
        <keyword>neural nets</keyword>
        <keyword>object recognition</keyword>
      </keywords>
      <dates>
        <year>2025</year>
      </dates>
      <call-num>fdi:010095035</call-num>
      <language>ENG</language>
      <periodical>
        <full-title>IET Computer Vision</full-title>
      </periodical>
      <isbn>1751-9632</isbn>
      <accession-num>ISI:001576587900001</accession-num>
      <number>1</number>
      <electronic-resource-num>10.1049/cvi2.70036</electronic-resource-num>
      <urls>
        <related-urls>
          <url>https://www.documentation.ird.fr/hor/fdi:010095035</url>
        </related-urls>
        <pdf-urls>
          <url>https://horizon.documentation.ird.fr/exl-doc/pleins_textes/2025-11/010095035.pdf</url>
        </pdf-urls>
      </urls>
      <volume>19</volume>
      <remote-database-provider>Horizon (IRD)</remote-database-provider>
      <abstract>We introduce the shape-image multimodal network (SIM-Net), a novel 2D image classification architecture that integrates 3D point cloud representations inferred directly from RGB images. Our key contribution lies in a pixel-to-point transformation that converts 2D object masks into 3D point clouds, enabling the fusion of texture-based and geometric features for enhanced classification performance. SIM-Net is particularly well-suited for the classification of digitised herbarium specimens-a task made challenging by heterogeneous backgrounds, nonplant elements, and occlusions that compromise conventional image-based models. To address these issues, SIM-Net employs a segmentation-based preprocessing step to extract object masks prior to 3D point cloud generation. The architecture comprises a CNN encoder for 2D image features and a PointNet-based encoder for geometric features, which are fused into a unified latent space. Experimental evaluations on herbarium datasets demonstrate that SIM-Net consistently outperforms ResNet101, achieving gains of up to 9.9% in accuracy and 12.3% in F-score. It also surpasses several transformer-based state-of-the-art architectures, highlighting the benefits of incorporating 3D structural reasoning into 2D image classification tasks.</abstract>
      <custom6>122 ; 076</custom6>
      <custom1>UR209</custom1>
    </record>
  </records>
</xml>
