%0 Journal Article %9 ACL : Articles dans des revues avec comité de lecture répertoriées par l'AERES %A Nainia, A. %A Vignes-Lebbe, R. %A Chenin, Eric %A Sahraoui, M. %A Mousannif, H. %A Zahir, J. %T FloraNER : a new dataset for species and morphological terms named entity recognition in French botanical text %D 2024 %L fdi:010091277 %G ENG %J Data in Brief %@ 2352-3409 %K NER Dataset ; Biodiversity dataset ; Species identification dataset ; Plant morphology dataset %K NOUVELLE CALEDONIE %M ISI:001299243700001 %P 110824 [10 ] %R 10.1016/j.dib.2024.110824 %U https://www.documentation.ird.fr/hor/fdi:010091277 %> https://horizon.documentation.ird.fr/exl-doc/pleins_textes/2024-10/010091277.pdf %V 56 %W Horizon (IRD) %X FloraNER is a distantly supervised named entity recognition dataset (NER). The dataset is built from botanical French literature extracted from the OCR-preprocessed flora of New Caledonia, provided by the National Museum of Natural History in France (MNHN), and distantly annotated with a botanical French corpus created by merging botanical lexicons available online. FloraNER comprises separate subdatasets for the recognition of plant species names, as well as coarse-grained and fine-grained botanical morphological terms. The resulting datasets are in CSV format, displaying textual data, identified named entities, and their annotations, covering one named entity type "Species" (Esp & egrave;ce in French) for species name identification, two named entity types "Organ" and "Descriptor" for coarse-grained morphological term identification, and eight named entity types for fine-grained morphological term identification: Organ, Descriptor, Form, Color, Development, Structure, Surface, Position, Disposition, and Measure. This dataset can be utilized to train and evaluate named entity recognition models for extracting information from botanical French literature. %$ 076 ; 124