%0 Journal Article %9 ACL : Articles dans des revues avec comité de lecture répertoriées par l'AERES %A Amer-Yahia, S. %A Berti-Equille, Laure %A Chibah, A. %T A framework for statistically-sound customer segment search %B 2021 IEEE 8th International Conference on Data Science and Advanced Analytics (DSAA) %C Piscataway %D 2021 %L fdi:010085545 %G ENG %I IEEE %@ 978-1-6654-2100-3 %M ISI:000783799800060 %P 1-10 %R 10.1109/DSAA53316.2021.9564199 %U https://www.documentation.ird.fr/hor/fdi:010085545 %> https://www.documentation.ird.fr/intranet/publi/2023-02/010085545.pdf %W Horizon (IRD) %X We develop S4, a Statistically-Sound Segment Search framework that combines principled data partitioning and sound statistical testing to verify common hypotheses in retail data and return interpretable customer data segments. Our framework accommodates one-sample, two-sample, and multiple-sample testing, to provide various aggregations and comparisons of customer transactions. To control the proportion of false discoveries in multiple hypothesis testing, we enforce an FDR-controlling procedure and formulate a unified optimization problem that returns customer data segments that satisfy the test for a given significance level, maximize coverage of the input data, and are within a risk capital. We develop a greedy algorithm to explore different data partitions and test multiple hypotheses in a sound manner. Our extensive experiments on four retail data sets examine the interaction between significance, risk and coverage, and demonstrate the expressivity, usefulness, and scalability of S4 in practice. %B International Conference on Data Science and Advanced Analytics (DSAA) %8 2021/10/06-09 %$ 122 ; 020