Nainia A., Vignes-Lebbe R., Chenin Eric, Sahraoui M., Mousannif H., Zahir J. (2024). FloraNER : a new dataset for species and morphological terms named entity recognition in French botanical text. Data in Brief, 56, p. 110824 [10 p.]. ISSN 2352-3409.
Titre du document
FloraNER : a new dataset for species and morphological terms named entity recognition in French botanical text
Nainia A., Vignes-Lebbe R., Chenin Eric, Sahraoui M., Mousannif H., Zahir J.
Source
Data in Brief, 2024,
56, p. 110824 [10 p.] ISSN 2352-3409
FloraNER is a distantly supervised named entity recognition dataset (NER). The dataset is built from botanical French literature extracted from the OCR-preprocessed flora of New Caledonia, provided by the National Museum of Natural History in France (MNHN), and distantly annotated with a botanical French corpus created by merging botanical lexicons available online. FloraNER comprises separate subdatasets for the recognition of plant species names, as well as coarse-grained and fine-grained botanical morphological terms. The resulting datasets are in CSV format, displaying textual data, identified named entities, and their annotations, covering one named entity type "Species" (Esp & egrave;ce in French) for species name identification, two named entity types "Organ" and "Descriptor" for coarse-grained morphological term identification, and eight named entity types for fine-grained morphological term identification: Organ, Descriptor, Form, Color, Development, Structure, Surface, Position, Disposition, and Measure. This dataset can be utilized to train and evaluate named entity recognition models for extracting information from botanical French literature.
Plan de classement
Sciences du monde végétal [076]
;
Documentation [124]