The analysis of face imagery offers the possibility to identify many properties at different levels of specificity. Some of the most interesting are: gender, ethnicity, and age. In this work, we will consider the union of these three properties as demographic profiling.
More formally, the demographic face profiling task can be defined as, given one or more subsequent samples of face images, to obtain a single prediction on a set of attributes (age, gender and ethnicity).
In the last decade, demographic profiling from facial imagery has grown its importance in the computer vision field. The process of gender, ethnicity, and age determination (disjoint, partially joint, or joint) finds several application areas. A person’s age could be verified to implement age-based access control and verification, prior to physical access to a place or product being sold or virtual access to a website is granted.
In the task of targeted advertising, as an example, a digital sign can display commercials based on demographics of audience walking past. Soft biometrics makes use of ethnicity, gender, and age-based to index face images into huge-scale biometric databases for faster retrieval.
Furthermore, the analysis of crowded environments to identify the age, the gender, and the ethnicity distributions of the people is becoming strategic for the retail chains. All the aforementioned applications have real-time requirements, since attribute estimation must be performed as fast as possible, and in certain cases on devices with limited resources. The severity of this requirement increases, especially if there is a limited amount of time to make a decision. Taking as an example the targeted advertising case, the temporal window coincides with the time a person takes to walk past the digital sign.
We propose a thoroughly and carefully engineered computer vision and image processing pipeline for demographic profiling, which is suitable for real-time embedded environments. First, we approach the full demographic profiling problem, by predicting age, ethnicity and gender. Ethnicity and gender predictors are often learned on imbalanced datasets. We therefore apply Truncated Isotropic Principal Component Analysis Classifier which allows us not to perform re-sampling on data. Second, we present a novel method to incorporate semantic predictions from video sequences. Finally, we report results on all three tasks on two large scale datasets.