%0 Journal Article %9 ACL : Articles dans des revues avec comité de lecture répertoriées par l'AERES %A Paradis, Emmanuel %T Probabilistic unsupervised classification for large-scale analysis of spectral imaging data %D 2022 %L fdi:010084215 %G ENG %J International Journal of Applied Earth Observation and Geoinformation %@ 1569-8432 %K Unsupervised classification ; k-means ; Land cover ; Multivariate normal ; density ; Spectral imaging data %M ISI:000748644800001 %P 102675 [13 ] %R 10.1016/j.jag.2022.102675 %U https://www.documentation.ird.fr/hor/fdi:010084215 %> https://horizon.documentation.ird.fr/exl-doc/pleins_textes/2022-03/010084215.pdf %V 107 %W Horizon (IRD) %X Land cover classification of remote sensing data is a fundamental tool to study changes in the environment such as deforestation or wildfires. A current challenge is to quantify land cover changes with real-time, large-scale data from modern hyper- or multispectral sensors. A range of methods are available for this task, several of them being based on the k-means classification method which is efficient when classes of land cover are well separated. Here a new algorithm, called probabilistic k-means, is presented to solve some of the limitations of the standard k-means. It is shown that the new algorithm performs better than the standard k-means when the data are noisy. If the number of land cover classes is unknown, an entropy-based criterion can be used to select the best number of classes. The proposed new algorithm is implemented in a combination of R and C computer codes which is particularly efficient with large data sets: a whole image with more than 3 million pixels and covering more than 10,000 km2 can be analysed in a few minutes. Four applications with hyperspectral and multispectral data are presented. For the data sets with ground truth data, the overall accuracy of the probabilistic k-means was substantially improved compared to the standard k-means. One of these data sets includes more than 120 million pixels, demonstrating the scalability of the proposed approach. These developments open new perspectives for the large scale analysis of remote sensing data. All computer code are available in an open-source package called sentinel. %$ 126 ; 020