<?xml version="1.0" encoding="UTF-8"?>
<modsCollection xmlns="http://www.loc.gov/mods/v3" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.loc.gov/mods/v3 http://www.loc.gov/standards/mods/v3/mods-3-3.xsd">
  <mods>
    <titleInfo>
      <title>Analysis of feature selection stability on high dimension and small sample data</title>
    </titleInfo>
    <name type="personnal">
      <namePart type="family">Dernoncourt</namePart>
      <namePart type="given">D.</namePart>
      <role>
        <roleTerm type="text">auteur</roleTerm>
        <roleTerm type="code" authority="marcrelator">aut</roleTerm>
      </role>
      <affiliation>IRD</affiliation>
    </name>
    <name type="personnal">
      <namePart type="family">Hanczar</namePart>
      <namePart type="given">B.</namePart>
      <role>
        <roleTerm type="text">auteur</roleTerm>
        <roleTerm type="code" authority="marcrelator">aut</roleTerm>
      </role>
      <affiliation>IRD</affiliation>
    </name>
    <name type="personnal">
      <namePart type="family">Zucker</namePart>
      <namePart type="given">Jean-Daniel</namePart>
      <role>
        <roleTerm type="text">auteur</roleTerm>
        <roleTerm type="code" authority="marcrelator">aut</roleTerm>
      </role>
      <affiliation>IRD</affiliation>
    </name>
    <typeOfResource>text</typeOfResource>
    <genre authority="local">journalArticle</genre>
    <language>
      <languageTerm type="code" authority="iso639-2b">eng</languageTerm>
    </language>
    <physicalDescription>
      <internetMediaType>text/pdf</internetMediaType>
      <digitalOrigin>born digital</digitalOrigin>
      <reformattingQuality>access</reformattingQuality>
    </physicalDescription>
    <abstract>Feature selection is an important step when building a classifier on high dimensional data. As the number of observations is small, the feature selection tends to be unstable. It is common that two feature subsets, obtained from different datasets but dealing with the same classification problem, do not overlap significantly. Although it is a crucial problem, few works have been done on the selection stability. The behavior of feature selection is analyzed in various conditions, not exclusively but with a focus on t-score based feature selection approaches and small sample data. The analysis is in three steps: the first one is theoretical using a simple mathematical model; the second one is empirical and based on artificial data; and the last one is based on real data. These three analyses lead to the same results and give a better understanding of the feature selection problem in high dimension data.</abstract>
    <targetAudience authority="marctarget">specialized</targetAudience>
    <subject>
      <topic>Feature selection</topic>
      <topic>Small sample</topic>
      <topic>Stability</topic>
      <topic>Low N/D ratio</topic>
    </subject>
    <classification authority="local">020</classification>
    <relatedItem type="host">
      <titleInfo>
        <title>Computational Statistics and Data Analysis</title>
      </titleInfo>
      <part>
        <detail type="volume">
          <number>71</number>
        </detail>
        <detail type="volume">
          <number>SI</number>
        </detail>
        <extent unit="pages">
          <list> 681-693</list>
        </extent>
      </part>
      <originInfo>
        <dateIssued>2014</dateIssued>
      </originInfo>
      <identifier type="issn">0167-9473</identifier>
    </relatedItem>
    <identifier type="uri">https://www.documentation.ird.fr/hor/fdi:010061389</identifier>
    <identifier type="doi">10.1016/j.csda.2013.07.012</identifier>
    <identifier type="issn">0167-9473</identifier>
    <location>
      <shelfLocator>[F B010061389]</shelfLocator>
      <url usage="primary display" access="object in context">https://www.documentation.ird.fr/hor/fdi:010061389</url>
      <url access="row object">https://www.documentation.ird.fr/intranet/publi/2014/01/010061389.pdf</url>
    </location>
    <accessCondition type="restriction access" displayLabel="Accès réservé">Accès réservé (Intranet de l'IRD)</accessCondition>
    <recordInfo>
      <recordContentSource>IRD - Base Horizon / Pleins textes</recordContentSource>
      <recordCreationDate encoding="w3cdtf">2014-02-05</recordCreationDate>
      <recordChangeDate encoding="w3cdtf">2017-08-23</recordChangeDate>
      <recordIdentifier>fdi:010061389</recordIdentifier>
      <languageOfCataloging>
        <languageTerm authority="iso639-2b">fre</languageTerm>
      </languageOfCataloging>
    </recordInfo>
  </mods>
</modsCollection>
