<?xml version="1.0" encoding="UTF-8"?>
<modsCollection xmlns="http://www.loc.gov/mods/v3" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.loc.gov/mods/v3 http://www.loc.gov/standards/mods/v3/mods-3-3.xsd">
  <mods>
    <titleInfo>
      <title>FloraNER : a new dataset for species and morphological terms named entity recognition in French botanical text</title>
    </titleInfo>
    <name type="personnal">
      <namePart type="family">Nainia</namePart>
      <namePart type="given">A.</namePart>
      <role>
        <roleTerm type="text">auteur</roleTerm>
        <roleTerm type="code" authority="marcrelator">aut</roleTerm>
      </role>
      <affiliation>IRD</affiliation>
    </name>
    <name type="personnal">
      <namePart type="family">Vignes-Lebbe</namePart>
      <namePart type="given">R.</namePart>
      <role>
        <roleTerm type="text">auteur</roleTerm>
        <roleTerm type="code" authority="marcrelator">aut</roleTerm>
      </role>
      <affiliation>IRD</affiliation>
    </name>
    <name type="personnal">
      <namePart type="family">Chenin</namePart>
      <namePart type="given">Eric</namePart>
      <role>
        <roleTerm type="text">auteur</roleTerm>
        <roleTerm type="code" authority="marcrelator">aut</roleTerm>
      </role>
      <affiliation>IRD</affiliation>
    </name>
    <name type="personnal">
      <namePart type="family">Sahraoui</namePart>
      <namePart type="given">M.</namePart>
      <role>
        <roleTerm type="text">auteur</roleTerm>
        <roleTerm type="code" authority="marcrelator">aut</roleTerm>
      </role>
      <affiliation>IRD</affiliation>
    </name>
    <name type="personnal">
      <namePart type="family">Mousannif</namePart>
      <namePart type="given">H.</namePart>
      <role>
        <roleTerm type="text">auteur</roleTerm>
        <roleTerm type="code" authority="marcrelator">aut</roleTerm>
      </role>
      <affiliation>IRD</affiliation>
    </name>
    <name type="personnal">
      <namePart type="family">Zahir</namePart>
      <namePart type="given">J.</namePart>
      <role>
        <roleTerm type="text">auteur</roleTerm>
        <roleTerm type="code" authority="marcrelator">aut</roleTerm>
      </role>
      <affiliation>IRD</affiliation>
    </name>
    <typeOfResource>text</typeOfResource>
    <genre authority="local">journalArticle</genre>
    <language>
      <languageTerm type="code" authority="iso639-2b">eng</languageTerm>
    </language>
    <physicalDescription>
      <internetMediaType>text/pdf</internetMediaType>
      <digitalOrigin>reformatted digital</digitalOrigin>
      <reformattingQuality>access</reformattingQuality>
    </physicalDescription>
    <abstract>FloraNER is a distantly supervised named entity recognition dataset (NER). The dataset is built from botanical French literature extracted from the OCR-preprocessed flora of New Caledonia, provided by the National Museum of Natural History in France (MNHN), and distantly annotated with a botanical French corpus created by merging botanical lexicons available online. FloraNER comprises separate subdatasets for the recognition of plant species names, as well as coarse-grained and fine-grained botanical morphological terms. The resulting datasets are in CSV format, displaying textual data, identified named entities, and their annotations, covering one named entity type "Species" (Esp &amp; egrave;ce in French) for species name identification, two named entity types "Organ" and "Descriptor" for coarse-grained morphological term identification, and eight named entity types for fine-grained morphological term identification: Organ, Descriptor, Form, Color, Development, Structure, Surface, Position, Disposition, and Measure. This dataset can be utilized to train and evaluate named entity recognition models for extracting information from botanical French literature.</abstract>
    <targetAudience authority="marctarget">specialized</targetAudience>
    <subject>
      <topic>NER Dataset</topic>
      <topic>Biodiversity dataset</topic>
      <topic>Species identification dataset</topic>
      <topic>Plant morphology dataset</topic>
    </subject>
    <subject authority="local">
      <geographic>NOUVELLE CALEDONIE</geographic>
    </subject>
    <classification authority="local">076</classification>
    <classification authority="local">124</classification>
    <relatedItem type="host">
      <titleInfo>
        <title>Data in Brief</title>
      </titleInfo>
      <part>
        <detail type="volume">
          <number>56</number>
        </detail>
        <extent unit="pages">
          <list> 110824 [10 p.]</list>
        </extent>
      </part>
      <originInfo>
        <dateIssued>2024</dateIssued>
      </originInfo>
      <identifier type="issn">2352-3409</identifier>
    </relatedItem>
    <identifier type="uri">https://www.documentation.ird.fr/hor/fdi:010091277</identifier>
    <identifier type="doi">10.1016/j.dib.2024.110824</identifier>
    <identifier type="issn">2352-3409</identifier>
    <location>
      <shelfLocator>[F B010091277]</shelfLocator>
      <url usage="primary display" access="object in context">https://www.documentation.ird.fr/hor/fdi:010091277</url>
      <url access="row object">https://horizon.documentation.ird.fr/exl-doc/pleins_textes/2024-10/010091277.pdf</url>
    </location>
    <recordInfo>
      <recordContentSource>IRD - Base Horizon / Pleins textes</recordContentSource>
      <recordCreationDate encoding="w3cdtf">2024-10-23</recordCreationDate>
      <recordChangeDate encoding="w3cdtf">2025-11-06</recordChangeDate>
      <recordIdentifier>fdi:010091277</recordIdentifier>
      <languageOfCataloging>
        <languageTerm authority="iso639-2b">fre</languageTerm>
      </languageOfCataloging>
    </recordInfo>
  </mods>
</modsCollection>
