<?xml version="1.0" encoding="UTF-8"?>
<modsCollection xmlns="http://www.loc.gov/mods/v3" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.loc.gov/mods/v3 http://www.loc.gov/standards/mods/v3/mods-3-3.xsd">
  <mods>
    <titleInfo>
      <title>An approach to optimizing abstaining area for small sample data classification</title>
    </titleInfo>
    <name type="personnal">
      <namePart type="family">Hanczar</namePart>
      <namePart type="given">B.</namePart>
      <role>
        <roleTerm type="text">auteur</roleTerm>
        <roleTerm type="code" authority="marcrelator">aut</roleTerm>
      </role>
      <affiliation>IRD</affiliation>
    </name>
    <name type="personnal">
      <namePart type="family">Zucker</namePart>
      <namePart type="given">Jean-Daniel</namePart>
      <role>
        <roleTerm type="text">auteur</roleTerm>
        <roleTerm type="code" authority="marcrelator">aut</roleTerm>
      </role>
      <affiliation>IRD</affiliation>
    </name>
    <typeOfResource>text</typeOfResource>
    <genre authority="local">journalArticle</genre>
    <language>
      <languageTerm type="code" authority="iso639-2b">eng</languageTerm>
    </language>
    <physicalDescription>
      <internetMediaType>text/pdf</internetMediaType>
      <digitalOrigin>born digital</digitalOrigin>
      <reformattingQuality>access</reformattingQuality>
    </physicalDescription>
    <abstract>Given a classification task, an approach to improve accuracy relies on the use of abstaining classifiers. These classifiers are trained to reject observations for which predicted values are not reliable enough: these rejected observations belong to an abstaining area in the feature space. Two equivalent methods exist to theoretically compute the optimal abstaining area for a given classification problem. The first one is based on the posterior probability computed by the model and the other is based on the derivative of the ROC function of the model. Although the second method has proved to give the best results, in small-sample settings such as the one found in omits data, the estimation of posterior probabilities and derivative of ROC curve are both lacking of precision leading to far from optimal abstaining areas. As a consequence none of the two methods bring the expected improvements in accuracy. We propose five alternative algorithms to compute the abstaining area adapted to small-sample problems. The idea of these algorithms is to compute an accurate and robust estimation of the ROC curve and its derivatives. These estimation are mainly based on the assumption that the distribution of the output of the classifier for each class is normal or mixture of normal distributions. These distributions are estimated by a kernel density estimator or Bayesian semiparametric estimator. Another method works on the approximation of the convex hull of the ROC curve. Once the derivative of the ROC curve are estimated, the optimal abstaining area can be directly computed. The performance of our algorithms are directly related to their capacity to compute an accurate estimation of the ROC curve. A sensitivity analysis of our methods to the dataset size and rejection cost has been done on a set of experiments. We show that our methods improve the performances of the abstaining classifiers on several real datasets and for different learning algorithms.</abstract>
    <targetAudience authority="marctarget">specialized</targetAudience>
    <subject>
      <topic>Supervised leaming</topic>
      <topic>Reject option</topic>
      <topic>Small-sample setting</topic>
      <topic>Abstaining classifier</topic>
      <topic>ROC curve estimation</topic>
    </subject>
    <classification authority="local">020</classification>
    <relatedItem type="host">
      <titleInfo>
        <title>Expert Systems with Applications</title>
      </titleInfo>
      <part>
        <detail type="volume">
          <number>95</number>
        </detail>
        <extent unit="pages">
          <list> 153-161</list>
        </extent>
      </part>
      <originInfo>
        <dateIssued>2018</dateIssued>
      </originInfo>
      <identifier type="issn">0957-4174</identifier>
    </relatedItem>
    <identifier type="uri">https://www.documentation.ird.fr/hor/fdi:010072026</identifier>
    <identifier type="doi">10.1016/j.eswa.2017.11.013</identifier>
    <identifier type="issn">0957-4174</identifier>
    <location>
      <shelfLocator>[F B010072026]</shelfLocator>
      <url usage="primary display" access="object in context">https://www.documentation.ird.fr/hor/fdi:010072026</url>
      <url access="row object">https://www.documentation.ird.fr/intranet/publi/2018/02/010072026.pdf</url>
    </location>
    <accessCondition type="restriction access" displayLabel="Accès réservé">Accès réservé (Intranet de l'IRD)</accessCondition>
    <recordInfo>
      <recordContentSource>IRD - Base Horizon / Pleins textes</recordContentSource>
      <recordCreationDate encoding="w3cdtf">2018-03-05</recordCreationDate>
      <recordChangeDate encoding="w3cdtf">2023-07-11</recordChangeDate>
      <recordIdentifier>fdi:010072026</recordIdentifier>
      <languageOfCataloging>
        <languageTerm authority="iso639-2b">fre</languageTerm>
      </languageOfCataloging>
    </recordInfo>
  </mods>
</modsCollection>
