This work is licensed under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
1. Introduction
Human face is a basic biological feature of human beings, and its image contains a lot of useful information, such as age, gender, identity, race, and emotion [1]. Face age estimation is aimed at using computer technology to predict the accurate age values for the given facial images. However, variations of the shape of the skull, the position of the facial features, wrinkles, lighting, expressions, and movements of videos likely give rises to bias prediction in the wild conditions [2]. Particularly when a small amount of training data is used, the accuracy of age prediction is generally not high.
Recently, although people have been working on age estimation research, the performance is still very limited. This is mainly affected by two factors. On the one hand, because the existing dataset is not complete enough, most methods are trained in a supervised way, which requires manual annotations. On the other hand, the relationship of face data and age labels is usually complexly heterogeneous and nonlinear [3, 4]. Hence, this urgently prompts us to propose robust and accurate facial age estimation particularly under unconstrained environments.
Conventional age estimation methods could be roughly categorized into two major ingredients: feature representation and age predictor. Feature representation-based methods [5–7] are aimed at seeking discriminative feature descriptors for ages based on the face images. Respectively, age predictor-based methods [8, 9] basically learn to classify the age ranker based on the input feature representation. Apart from that, label distribution has emerged as the widely employed and state-of-the-art methods such as [10–12]. The algorithm typically encodes a range of age labels to a symmetrical distribution, e.g., Gaussian or triangle distribution, reflecting the smoothness for high-performance age estimation. Nevertheless, they are constrained to take only fixed-structural form to model the ambiguous properties of age labels, which are usually nonrobust to complex cross-population face data domains. In order to solve this problem, most scholars usually adopt feature fusion methods, such as [13, 14], but these methods seldom pay attention to the high correlation between adjacent samples and often require a lot of annotation data to achieve. Therefore, we propose a flexible unsupervised comparison of label distribution learning age estimation method, which can solve the above problems.
Similar to the wireless sensor network in the space to monitor and record the physical conditions of the environment and organize the collected data in a central location. In this article, we propose a label distribution learning method based on unsupervised comparison, dubbed UCLD, which typically models heterogeneous face aging data for robust face age estimation. Compared with the traditional fixed and inflexible label distribution methods, our method not only takes into account the high correlation between adjacent samples but also reduces the dependence of the model on the data. In this article, we believe that the learned distribution is determined by the relationship between the samples, as shown in Figure 1. Technically, we first construct the embedding space of each anchored sample based on the facial appearance information. Then, the age feature is extracted through the constraints of the two projection layers and the contrast loss. Our network structure uses the improved VGG-16 [15] for effective feature learning. Figure 2 illustrates the flow chart. In order to further evaluate the effectiveness of our proposed method, we conduct extensive experiments on two field datasets. Compared with the existing facial age estimation methods, it achieves significantly superior performance.
[figure omitted; refer to PDF]
Change the training dataset to a weakly supervised training dataset, and use only 25% of the labeled data to test the optimal ThinAge network architecture in DLDL-v2 and the ConAge network architecture proposed in this article. The experimental results are shown in Table 3.
Table 3
Weakly supervised face age estimation result table.
Method | Network | Dataset | MAE |
DLDL-v2 (baseline) | ThinAge | FGNET (25%) | 6.4146 |
UCLD | ConAge | 6.3342 | |
DLDL-v2 (baseline) | ThinAge | MORPH (25%) | 2.8834 |
UCLD | ConAge | 2.6545 |
It can be seen from the experimental results that our method has better performance than the DLDL-v2 framework regardless of whether it is fully supervised or weakly supervised. In addition, we have reached three conclusions: (1) traditional methods, such as DEX [25] and ODFL [25], process each age label independently without considering their previous correlation. Our unsupervised comparison method simulates the way humans observe things and can flexibly consider the relationship between age samples. (2) Some label distribution learning methods, such as LDL [11] and CPNN [11], only implement a fixed structural model on the age label distribution, which may lead to rigid adaptation to real-world facial aging data. Thanks to the comparative learning module, our method obtains more accurate semantic information, making subsequent test results more accurate. Particularly in a weakly supervised experimental setting, it can be seen that even if only a quarter of the data is used, the performance of our UCLD is better than most technical levels. This achievement is mainly because our model is less dependent on data.
4. Conclusion
In this article, in view of the high correlation between adjacent age samples and the strong dependence of existing methods on data, we combine contrast loss and label distribution learning to learn abstract representations in an unsupervised manner. An unsupervised contrast label distribution (UCLD) learning method is proposed, which is similar to the processing form of wireless sensor networks. Extensive experiments on two datasets have proved the effectiveness of the method, especially the MORPH dataset reflects the advanced nature of the method. In future work, we will focus on efficiently distinguishing similar images to solve the problem of age prediction accuracy.
Acknowledgments
This work was supported in part by the National Science Foundation of China under Grants 61806104 and 62076142, in part by the West Light Talent Program of the Chinese Academy of Sciences under Grant XAB2018AW05, and in part by the Youth Science and Technology Talents Enrolment Projects of Ningxia under Grant TJGC2018028.
[1] R. Angulu, J. R. Tapamo, A. O. Adewumi, "Age estimation via face images: a survey," EURASIP Journal on Image and Video Processing, vol. 2018 no. 1,DOI: 10.1186/s13640-018-0278-6, 2018.
[2] N. Ramanathan, R. Chellappa, S. Biswas, "Age progression in human faces: a survey," Journal of Visual Languages and Computing, vol. 15, pp. 3349-3361, 2009.
[3] W. Li, J. Lu, J. Feng, C. Xu, J. Zhou, Q. Tian, "Bridgenet: a continuity-aware probabilistic network for age estimation," 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1145-1154, DOI: 10.1109/cvpr.2019.00124, .
[4] W. Shen, Y. Guo, Y. Wang, K. Zhao, B. Wang, A. L. Yuille, "Deep regression forests for age estimation," 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2304-2313, DOI: 10.1109/cvpr.2018.00245, .
[5] X. Geng, Z.-H. Zhou, K. Smith-Miles, "Automatic age estimation based on facial aging patterns," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 29 no. 12, pp. 2234-2240, DOI: 10.1109/TPAMI.2007.70733, 2007.
[6] Yun Fu, Guodong Guo, T. S. Huang, "Age synthesis and estimation via faces: a survey," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 32 no. 11, pp. 1955-1976, DOI: 10.1109/TPAMI.2010.36, 2010.
[7] J. Lu, V. E. Liong, J. Zhou, "Costsensitive local binary feature learning for facial age estimation," EEE Transactions on Image Processing, vol. 24 no. 12, pp. 5356-5368, DOI: 10.1109/TIP.2015.2481327, 2015.
[8] Z. Yu, D.-Y. Yeung, "Multi-task warped Gaussian process for personalized age estimation," 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2622-2629, DOI: 10.1109/cvpr.2010.5539975, .
[9] K.-Y. Chang, C.-S. Chen, Y.-P. Hung, "Ordinal hyperplanes ranker with cost sensitivities for age estimation," CVPR, pp. 585-592, DOI: 10.1109/cvpr.2011.5995437, .
[10] B.-B. Gao, C. Xing, C.-W. Xie, J. Wu, X. Geng, "Deep label distribution learning with label ambiguity," IEEE Transactions on Image Processing, vol. 26 no. 6, pp. 2825-2838, 2017.
[11] Xin Geng, Chao Yin, Zhi-Hua Zhou, "Facial age estimation by learning from label distributions," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 35 no. 10, pp. 2401-2412, DOI: 10.1109/TPAMI.2013.51, 2013.
[12] Z. He, X. Li, Z. Zhang, F. Wu, X. Geng, Y. Zhang, M.-H. Yang, Y. Zhuang, "Data-dependent label distribution learning for age estimation," IEEE Transactions on Image Processing, vol. 26 no. 8, pp. 3846-3858, 2017.
[13] Z. Deng, M. Zhao, H. Liu, Z. Yu, F. Feng, "Learning neighborhood-reasoning label distribution (NRLD) for facial age estimation," 2020 IEEE International Conference on Multimedia and Expo (ICME),DOI: 10.1109/icme46284.2020.9102953, .
[14] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, I. Polosukhin, "Advances in Neural Information Processing Systems 30," Annual Conference on Neural Information Processing Systems 2017, .
[15] K. Simonyan, A. Zisserman, "Very deep convolutional networks for large-scale image recognition," 3rd International Conference on Learning Representations, ICLR 2015, .
[16] A. P. Dempster, N. M. Laird, D. B. Rubin, "Maximum likelihood from incomplete data via the EM algorithm," Journal of the Royal Statistical Society, Series B, vol. 39 no. 1,DOI: 10.1111/j.2517-6161.1977.tb01600.x, 1977.
[17] S. Ioffe, C. Szegedy, "Batch normalization: accelerating deep network training by reducing internal covariate shift," International conference on machine learning, pp. 448-456, .
[18] B. B. Gao, H. Y. Zhou, J. Wu, X. Geng, "Age estimation using expectation of label distribution learning," IJCAI, pp. 712-718, 2018. ijcai.org
[19] T. Chen, S. Kornblith, M. Norouzi, G. Hinton, "A simple framework for contrastive learning of visual representations," International conference on machine learning, pp. 1597-1607, .
[20] P. Khosla, P. Teterwak, C. Wang, A. Sarna, Y. Tian, P. Isola, A. Maschinot, C. Liu, D. Krishnan, "Supervised contrastive learning," Advances in Neural Information Processing Systems, vol. 33, 2020.
[21] Y. Tian, D. Krishnan, P. Isola, "Contrastive multiview coding," Computer Vision–ECCV 2020: 16th European Conference, pp. 776-794, .
[22] A. Lanitis, C. J. Taylor, T. F. Cootes, "Toward automatic simulation of aging effects on face images," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 24 no. 4, pp. 442-455, DOI: 10.1109/34.993553, 2002.
[23] K. Ricanek, T. Tesafaye, "MORPH: a longitudinal image database of normal adult ageprogression," 7th International Conference on Automatic Face and Gesture Recognition (FGR06), pp. 341-345, .
[24] R. Rothe, R. Timofte, L. Van Gool, "Deep expectation of real and apparent age from a single image without facial landmarks," International Journal of Computer Vision, vol. 126 no. 2-4, pp. 144-157, DOI: 10.1007/s11263-016-0940-3, 2018.
[25] H. Liu, J. Lu, J. Feng, J. Zhou, "Ordinal deep learning for facial age estimation," IEEE Transactions on Circuits and Systems for Video Technology, vol. 29 no. 2, pp. 486-501, DOI: 10.1109/TCSVT.2017.2782709, 2019.
You have requested "on-the-fly" machine translation of selected content from our databases. This functionality is provided solely for your convenience and is in no way intended to replace human translation. Show full disclaimer
Neither ProQuest nor its licensors make any representations or warranties with respect to the translations. The translations are automatically generated "AS IS" and "AS AVAILABLE" and are not retained in our systems. PROQUEST AND ITS LICENSORS SPECIFICALLY DISCLAIM ANY AND ALL EXPRESS OR IMPLIED WARRANTIES, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES FOR AVAILABILITY, ACCURACY, TIMELINESS, COMPLETENESS, NON-INFRINGMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Your use of the translations is subject to all use restrictions contained in your Electronic Products License Agreement and by using the translation functionality you agree to forgo any and all claims against ProQuest or its licensors for your use of the translation functionality and any output derived there from. Hide full disclaimer
Copyright © 2021 Qiyuan Li et al. This work is licensed under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
Abstract
Although label distribution learning has made significant progress in the field of face age estimation, unsupervised learning has not been widely adopted and is still an important and challenging task. In this work, we propose an unsupervised contrastive label distribution learning method (UCLD) for facial age estimation. This method is helpful to extract semantic and meaningful information of raw faces with preserving high-order correlation between adjacent ages. Similar to the processing method of wireless sensor network, we designed the ConAge network with the contrast learning method. As a result, our model maximizes the similarity of positive samples by data enhancement and simultaneously pushes the clusters of negative samples apart. Compared to state-of-the-art methods, we achieve compelling results on the widely used benchmark, i.e., MORPH.
You have requested "on-the-fly" machine translation of selected content from our databases. This functionality is provided solely for your convenience and is in no way intended to replace human translation. Show full disclaimer
Neither ProQuest nor its licensors make any representations or warranties with respect to the translations. The translations are automatically generated "AS IS" and "AS AVAILABLE" and are not retained in our systems. PROQUEST AND ITS LICENSORS SPECIFICALLY DISCLAIM ANY AND ALL EXPRESS OR IMPLIED WARRANTIES, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES FOR AVAILABILITY, ACCURACY, TIMELINESS, COMPLETENESS, NON-INFRINGMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Your use of the translations is subject to all use restrictions contained in your Electronic Products License Agreement and by using the translation functionality you agree to forgo any and all claims against ProQuest or its licensors for your use of the translation functionality and any output derived there from. Hide full disclaimer
Details



1 School of Information Engineering, Ningxia University, Yinchuan 750021, China; School of Cyberspace Science and Technology, Beijing Institute of Technology, Beijing 100081, China
2 School of Information Engineering, Ningxia University, Yinchuan 750021, China; College of Computer Science, Sichuan University, Chengdu 610065, China
3 School of Information Engineering, Ningxia University, Yinchuan 750021, China; Collaborative Innovation Center for Ningxia Big Data and Artificial Intelligence Co-founded by Ningxia Municipality and Ministry of Education, Yinchuan 750021, China