face aging with contextual generative adversarial nets · face aging with contextual generative...

9
Face Aging with Contextual Generative Adversarial Nets Si Liu SKLOIS, IIE, CAS [email protected] Yao Sun * SKLOIS, IIE, CAS [email protected] Defa Zhu SKLOIS, IIE, CAS School of Cyber Security, UCAS [email protected] Renda Bao SKLOIS, IIE, CAS School of Cyber Security, UCAS roger [email protected] Wei Wang University of Trento, Italy [email protected] Xiangbo Shu Nanjing University of Science and Technology [email protected] Shuicheng Yan Qihoo 360 AI Institute, Beijing, China National University of singapore [email protected] ABSTRACT Face aging, which renders aging faces for an input face, has aracted extensive aention in the multimedia research. Recently, several conditional Generative Adversarial Nets (GANs) based methods have achieved great success. ey can generate images ing the real face distributions conditioned on each individual age group. However, these methods fail to capture the transition paerns, e.g., the gradual shape and texture changes between adjacent age groups. In this paper, we propose a novel Contextual Generative Adversarial Nets (C-GANs) to specically take it into consideration. e C-GANs consists of a conditional transformation network and two discriminative networks. e conditional transformation net- work imitates the aging procedure with several specially designed residual blocks. e age discriminative network guides the syn- thesized face to t the real conditional distribution. e transition paern discriminative network is novel, aiming to distinguish the real transition paerns with the fake ones. It serves as an extra regularization term for the conditional transformation network, ensuring the generated image pairs to t the corresponding real transition paern distribution. Experimental results demonstrate the proposed framework produces appealing results by compar- ing with the state-of-the-art and ground truth. We also observe performance gain for cross-age face verication. KEYWORDS Face Aging, Generative Adversarial Nets, Contextual Modeling * corresponding author Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for prot or commercial advantage and that copies bear this notice and the full citation on the rst page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permied. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specic permission and/or a fee. Request permissions from [email protected]. MM ’17, Mountain View, CA, USA © 2017 ACM. 978-1-4503-4906-2/17/10. . . $15.00 DOI: 10.1145/3123266.3123431 0‐10 10 11 11‐18 18 19 19‐29 29 30 30‐39 39 40 40‐49 49 50 50‐59 59 60+ 60+ Age Age 1 2 3 4 5 6 7 groundtruth synthetic 1: Age discriminative network 2: Transition Pattern discriminative network Conditional Transformation Network 1 Input face Age group 1 Group 6 ‐> group 7 Figure 1: e proposed C-GANs algorithm for face aging. e input image can be transformed to any specic age group. e synthesized results of C-GANs are natural due to the two discriminative networks: the age discriminative network modeling the distribution of each individual age group, while the transition pattern discriminative network modeling the correlations between adjacent groups. ACM Reference format: Si Liu, Yao Sun, Defa Zhu, Renda Bao, Wei Wang, Xiangbo Shu, and Shuicheng Yan. 2017. Face Aging with Contextual Generative Adversarial Nets. In Proceedings of MM ’17, Mountain View, CA, USA, October 23–27, 2017, 9 pages. DOI: 10.1145/3123266.3123431 1 INTRODUCTION Face aging, also known as age progression [28, 35], is aracting more and more research interests. It has plenty of applications in various domains including cross-age face recognition [23], nding lost children, and entertainments [40]. In recent years, face aging has witnessed various breakthroughs and a number of face aging models have been proposed [9]. Face aging, however, is still a very challenging task in practice for various reasons. First, faces may have many dierent expressions and lighting conditions, which arXiv:1802.00237v1 [cs.CV] 1 Feb 2018

Upload: others

Post on 22-Apr-2020

21 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Face Aging with Contextual Generative Adversarial Nets · Face Aging with Contextual Generative Adversarial Nets MM ’17, October 23–27, 2017, Mountain View, CA, USA lïvï í

Face Aging with Contextual Generative Adversarial NetsSi Liu

SKLOIS, IIE, [email protected]

Yao Sun∗SKLOIS, IIE, [email protected]

Defa ZhuSKLOIS, IIE, CAS

School of Cyber Security, [email protected]

Renda BaoSKLOIS, IIE, CAS

School of Cyber Security, UCASroger [email protected]

Wei WangUniversity of Trento, [email protected]

Xiangbo ShuNanjing University of Science and

[email protected]

Shuicheng YanQihoo 360 AI Institute, Beijing, ChinaNational University of singapore

[email protected]

ABSTRACTFace aging, which renders aging faces for an input face, has a�ractedextensive a�ention in the multimedia research. Recently, severalconditional Generative Adversarial Nets (GANs) based methodshave achieved great success. �ey can generate images ��ing thereal face distributions conditioned on each individual age group.However, these methods fail to capture the transition pa�erns,e.g., the gradual shape and texture changes between adjacent agegroups. In this paper, we propose a novel Contextual GenerativeAdversarial Nets (C-GANs) to speci�cally take it into consideration.�e C-GANs consists of a conditional transformation network andtwo discriminative networks. �e conditional transformation net-work imitates the aging procedure with several specially designedresidual blocks. �e age discriminative network guides the syn-thesized face to �t the real conditional distribution. �e transitionpa�ern discriminative network is novel, aiming to distinguish thereal transition pa�erns with the fake ones. It serves as an extraregularization term for the conditional transformation network,ensuring the generated image pairs to �t the corresponding realtransition pa�ern distribution. Experimental results demonstratethe proposed framework produces appealing results by compar-ing with the state-of-the-art and ground truth. We also observeperformance gain for cross-age face veri�cation.

KEYWORDSFace Aging, Generative Adversarial Nets, Contextual Modeling

∗corresponding author

Permission to make digital or hard copies of all or part of this work for personal orclassroom use is granted without fee provided that copies are not made or distributedfor pro�t or commercial advantage and that copies bear this notice and the full citationon the �rst page. Copyrights for components of this work owned by others than ACMmust be honored. Abstracting with credit is permi�ed. To copy otherwise, or republish,to post on servers or to redistribute to lists, requires prior speci�c permission and/or afee. Request permissions from [email protected] ’17, Mountain View, CA, USA© 2017 ACM. 978-1-4503-4906-2/17/10. . .$15.00DOI: 10.1145/3123266.3123431

00‐‐10 10  1111‐‐1818 1919‐‐2929 3030‐‐3939 4040‐‐4949 5050‐‐5959 60+60+AgeAge

12

3 45

6 7

groundtruth

synthetic

1: Age discriminative network 2: Transition Pattern discriminative network

Conditional Transformation Network1

Input face

g

Age group 1 Group 6 ‐> group 7

… …

g g p p g p

Figure 1: �e proposed C-GANs algorithm for face aging.�e input image can be transformed to any speci�c agegroup. �e synthesized results of C-GANs are natural dueto the two discriminative networks: the age discriminativenetwork modeling the distribution of each individual agegroup, while the transition pattern discriminative networkmodeling the correlations between adjacent groups.

ACM Reference format:Si Liu, Yao Sun, Defa Zhu, Renda Bao,WeiWang, Xiangbo Shu, and ShuichengYan. 2017. Face Aging with Contextual Generative Adversarial Nets. InProceedings of MM ’17, Mountain View, CA, USA, October 23–27, 2017, 9 pages.DOI: 10.1145/3123266.3123431

1 INTRODUCTIONFace aging, also known as age progression [28, 35], is a�ractingmore and more research interests. It has plenty of applications invarious domains including cross-age face recognition [23], �ndinglost children, and entertainments [40]. In recent years, face aginghas witnessed various breakthroughs and a number of face agingmodels have been proposed [9]. Face aging, however, is still a verychallenging task in practice for various reasons. First, faces mayhave many di�erent expressions and lighting conditions, which

arX

iv:1

802.

0023

7v1

[cs

.CV

] 1

Feb

201

8

Page 2: Face Aging with Contextual Generative Adversarial Nets · Face Aging with Contextual Generative Adversarial Nets MM ’17, October 23–27, 2017, Mountain View, CA, USA lïvï í

MM ’17, October 23–27, 2017, Mountain View, CA, USASi Liu, Yao Sun, Defa Zhu, Renda Bao, Wei Wang, Xiangbo Shu, and Shuicheng Yan

Conditional Transformation NetworkAge Discriminative Network Transition Pattern Discriminative Network

Image

T

AgeReal/Fake face? Real/Fake face pair?

Transforme

Discrim

in

Discrim

iner

Real image

ator

Real Pair

ator

kFake ImageImageAge Paired Image Age

Fake Pair

Figure 2: �e structure of the proposed C-GANs.

pose great challenges to modeling the aging pa�erns. Besides, thetraining data are usually very limited and the face images for thesame person only cover a narrow range of ages.

Traditional face aging approaches can be roughly split into totwo classes, i.e., the prototyping ones [15, 38], and the modelingones [36, 37]. However, these approaches o�en require face agingsequences of the same person with wide range of ages which arevery costly to collect. Generative Adversarial Networks (GANs) [10]be�er deal with age progressions. Many GANs based methods [2,46] can generate the most plausible and realistic images which arehard to distinguish from real data conditioned on the age. However,all of these methods do not make full use of the sequential data.�erefore, these methods cannot explicitly consider the transitionpa�erns which are de�ned as the facial feature correlations betweendi�erent age groups for one person. �erefore, their results areusually unable to maintain face identity, or cannot satisfy the cross-aging transition rules well.

In this paper, we mainly consider the cross-age transition pat-tern. Speci�cally, transition pa�ern contains two aspects. One isthe identity consistency, and the other is the appearance changes.Identity preserving is critical in face aging based applications, e.g.,cross-age face veri�cation. Appearance changes include textureand shape alterations. Transition pa�ern is age-aware. For exam-ple, when one grows from baby to teenagers, the main appearancedi�erence is the face becomes larger. When one grows from theage of 50 to 60, the main facial changes lie on the texture alteration,such as the gradually developed eye bag, senile plaques and wrinkle.Di�erent from traditional GANs which only model the real datadistribution of each individual age, we focus on the higher-ordercross-age correlations, which will make the face aging results moreappealing. To model the above-mentioned transition pa�erns, wepropose a Contextual Generative Adversarial Nets (C-GANs). Fig-ure 1 illustrates C-GANs brie�y. For an input face, C-GANs cangenerate faces for any target age group. To ensure the generated im-ages real, C-GANs uses two discriminative networks to model thedistribution of each individual age group as well as the transitionpa�erns of two adjacent groups respectively.

More speci�cally, C-GANs consists of three networks, whichis shown in Figure 2. �e conditional transformation networktransforms the input face to the desired age; the age discrimina-tive network assists generating images indistinguishable with thereal ones; the transition pa�ern discriminative network regular-ize the generated images satisfying the cross-age aging rules. �eproposed C-GANs can be trained end-to-end and very easy to re-produce. In order to facilitate the presentation, we only mentionface aging/progression in this paper. Actually, C-GANs can alsoachieve face regression without any further modi�cation.

�e contributions of this paper are summarized as follows.

(1) We design an e�ective and e�cient contextual GANs basedface aging system whose aging results are signi�cantlybe�er than existing face aging methods. �e source codeof our method will be released to the academic area forfurther research.

(2) We introduce a novel transition pa�ern discriminative net-work in the C-GANs to regularize the synthesized facessatisfying the cross-age face aging rules.

(3) �e conditional face transformation network in C-GANsis di�erent with existing GANs generators in that it ismuch deeper, with several specially designed skip layersto preserve both the high-level semantics and low-levelinformation. It makes the generated images more naturaland real.

2 RELATEDWORKS2.1 Face AgingTraditional face aging models can be roughly divided into physi-cal model approaches and prototype approaches. Physical modelapproaches explicitly or implicitly models shape and texture pa-rameters for each age group. For example, Suo et al. [36] presenta hierarchical And-Or graph based dynamic model for face aging.Other model-based age progression approaches include active ap-pearance model [16], support vector regression [25] and implicitfunction [4]. �e prototype approache [15] aim at constructing a

Page 3: Face Aging with Contextual Generative Adversarial Nets · Face Aging with Contextual Generative Adversarial Nets MM ’17, October 23–27, 2017, Mountain View, CA, USA lïvï í

Face Aging with Contextual Generative Adversarial Nets MM ’17, October 23–27, 2017, Mountain View, CA, USA

k3n3s1 k3n64s1 k3n64s1 k1n64s1 k3n64s1 k3n64s1 k1n64s1 k3n64s2 k3n64s1

image

labe

l

conv

conv

concate

conv

conv

conv

concate

conv

conv

conv

concate

conv

Skip connection

k1n64s1k3n64s2k3n3s1k3n3s1 k3n64s1 k1n3s1k1n3s1 k1n3s1

onv

k1n64s1

conv

k3n64s2

onv

k3n3s1

ncate

onv

onv

onv

k3n3s1 k3n64s1 k1n3s1

ncate

onv

k1n3s1

anh

ncate

onv

k1n3s1

codecco

conco co coconcota

conco

b

conv

leakyRelu bnLegend:

Figure 3: Architecture of the conditional transformation network i-th corresponding kernel size (k), number of feature maps(n) and stride (s) indicated for each convolutional layer.

relighted average face as prototypes for di�erent age groups, andtransferring the texture di�erence between the prototypes to thetest image. However, the limitation of this model is they are basedon general rules, they totally discard the personalized information.Recently, Shu et al. propose a coupled dictionary learning (CDL)model [35]. It encodes the aging pa�erns by the dictionary bases.Every two neighboring dictionaries are learned jointly. However,this method still has ghost artifacts as the reconstruction residualdoes not evolve over time. Wang et al. [40] introduce a recurrentface aging (RFA) framework based on a recurrent neural network.�ey employ a two-layer gated recurrent unit as the basic recur-rent module whose bo�om layer encodes a young face to a latentrepresentation and the top layer decodes the representation to acorresponding older face. Generally, these techniques requiressu�cient age sequences as the training data, which limits thesemethods’s practicality.

2.2 Generative Adversarial NetworksRecently, GANs [10] has achieved great success in many imagesynthesis applications, including super resolution [17], image-to-image translation by pix2pix [13] and CycleGAN [48], in-paining[24], visual manipulation on the images [47].

Antipo et al. [2] propose the GAN-based method for automaticface aging. �ey particularly emphasize preserving the original per-son’s identity by introducing a “Identity-Preserving” optimizationof GAN’s latent vectors. Zhang [46] propose a conditional adversar-ial autoencoder (CAAE) that learns a face manifold, traversing onwhich smooth age progression and regression can be realized simul-taneously. Two adversarial networks are imposed on the encoderand generator, respectively, forcing to generate more photo-realisticfaces. Li et al. [18] presents a Deep convolutional network modelfor Identity-Aware Transfer (DIAT) of facial a�ributes. However,

these GANs based methods independently model the distribution ofeach age group, without capturing the cross-age transition pa�erns.

3 APPROACH�e architecture of the proposed C-GANs is shown in Figure 2.�e input image x is �rst aligned and parsed (Section 3.1). �enx is paired with an arbitrary age label y to feed into the condi-tional transformation network G (Section 3.2). �e synthesizedface G (x , y) is judged by the age discriminative network Da to bereal/fake (Section 3.3). Moreover, the age pair composed of a realimage and its fake counterpart, is fed into the transition pa�ern dis-criminative network Dt which predicts whether it is from the realimage pair distribution (Section 3.4). Finally, the objective functionand the training strategy is introduced (Section 3.5).

3.1 Image Preprocessing�e input image x is aligned via the face alignment techniqueswhich locates 68 points on the faces. �e landmark is used to alignthe faces. �en we use Deeplab v2 [6] to parse the human face intofacial and non-facial regions. �e non-facial region containing thebackground, the hair and clothes, are masked with gray color tofacilitate the GANs training.

3.2 Conditional Transformation NetworkGiven the input face x and the desired age y, the Conditional Trans-formation Network generates the synthesized face xy = G (x , y).�e architecture of the generator is shown in Figure 3. �e inputand output face images are 128 × 128 RGB images. �e output is inrange [−1, 1] through the hyperbolic tangent function. Normalizingthe input may make the training process converge faster. �e con-ditions of C-GANs are 7-dim one-hot age vectors, and reshaped as7-channel tensor with the same spatial dimensions with the inputface. �e input faces and the labels are concatenated and fed into

Page 4: Face Aging with Contextual Generative Adversarial Nets · Face Aging with Contextual Generative Adversarial Nets MM ’17, October 23–27, 2017, Mountain View, CA, USA lïvï í

MM ’17, October 23–27, 2017, Mountain View, CA, USASi Liu, Yao Sun, Defa Zhu, Renda Bao, Wei Wang, Xiangbo Shu, and Shuicheng Yanbe

l

nvk4n64s2

lab

con

k4n128s2 k4n256s2 k4n512s2 k4n1024s2 k4n2s1

image

conv

concate

sigm

oid

conv

conv

conv

conv

conv

Figure 4: �e structure of the age discriminative network.

the further processing. To make fair concatenation, the elementsof label is also con�ned to [−1, 1], where −1 corresponds to 0.

�e conditional transformation network mainly contains severalresidual blocks [11] with several skip layers. Following DCGAN[27], the convolution of stride 2 is employed instead of pooling.�e �rst three residual blocks donwsize the feature maps to halfthe resolution of the input image. �e “deconv” layer upsamplesthe feature map to the original resolution. Such layers have beenused previously [34, 44]. Note that to perform the re�nement, weadopt several skip layer. In this way we preserve both the high-level information passed from coarser feature maps and �ne localinformation provided in lower layer feature maps.

3.3 Age Discriminative Network�e structure of the age discriminative network Da (x ,y) is shownin Figure 4, similar to the conditional GAN [22]. θDa is the networkparameters. More formally, the training can be expressed as anoptimization of the function E(θG ,θDa ), where θG and θDa areparameters of G and Da , respectively:

minG

maxDa

E(θG ,θDa

)= Exy,y∼pdata (xy,y)

[logDa

(xy ,y

) ]+Ex∼px ,y∼py [log (1 − Da (G (x , y) , y))].

(1)

Note that the age label is resized to a tensor similar with theconditional transformation network. �e image and the labelsgo through one convolution layers individually and concatenatedto feed to Da to make it discriminative on both age and humanface. During training, the positive samples are the real faces andtheir corresponding age

{xy ,y

}, while nagetive samples are {xy , y},

where xy is the generated/fake face and y is the corrspondinglabel used during the generation process. Note that we speci�callyrandomly sample the label y for the fake images to enhance thegeneralization ability of the C-GANs model.

3.4 Transition Pattern Discriminative NetworkFor be�er face aging results, we speci�cally model the cross-agetransition pa�erns, which is de�ned as the facial feature correla-tions betweeen di�erent age groups. In this paper, we only consideradjacent age groups for simplicity. �e long-range transition pat-tern can be represented as the combination of a series of transitionspa�erns between adjacent age group. �e transion pa�ern is age-aware. As shown in Figure 4, when a person grows from the age of10 to 20, the facial shape alters as the skull grows. However, whenone grows from the age of 50 to 60, the most obvious change is

10‐>20 50‐>60

Figure 5: Illustration of the transition patterns of di�erntage ranges.

the gradually developped wrinkles. Despite the big appeareancechanges between adjecent age groups, the aging face should keepthe identity of the input face. In other word, the learned transitionpa�ern should also has the nice property of identity preserving.

To this end, C-GANs contains a transition pa�ern discriminativenetwork as shown in Figure 2. �e network Dt

(xy ,xy+1,y,y + 1

)indicates the transition pa�ern between xy aged at y to the imagexy+1 at the next age group y + 1. For notation simplity, the Dt isdenoted as Dt

(xy ,xy+1,y

). �e networks distinguishes the real

joint distribiton xy ,xy+1,y ∼ pdata (xy ,xy+1,y) from the fake one.�e optimization objetive is:

minG

maxDt

E(θG ,θDt

)=

= Exy,xy+1,y∼pdata (xy,xy+1,y)[logDt

(xy ,xy+1,y

) ]+ 12Exy,y∼pdata (xy,y)

[log

(1 − Dt

(xy ,G

(xy ,y + 1

),y) ) ]

+ 12Exy,y∼pdata (xy,y)

[log

(1 − Dt

(G(xy ,y − 1

),xy ,y − 1

) ) ].

(2)�e second term of Equation 2 guides the transformation network togenerate the fake pair

{xy ,G

(xy ,y + 1

)}to obey the real transition

pa�ern distribution. Similarily, the third term are imposed onthe tranformaion network to generate real convincing fake pair{G(xy ,y − 1

),xy

}.

3.5 Objective Function & Training StrategyComprehensively considering the losses of the conditional trans-formation network G, the age discriminative network Da as wellas the transition pa�ern discriminative network Dt , the overallobjective function is:

minG

maxDa

maxDt

E(θG ,θDa ,θDt

)=Ea + Et + λTV

= Exy,y∼pdata (xy,y)[logDa

(xy ,y

) ]+Ex∼px ,y∼py [log (1 − Da (G (x , y) , y))]+Exy,xy+1,y∼pdata (xy,xy+1,y)

[logDt

(xy ,xy+1,y

) ]+ 12Exy,y∼pdata (xy,y)

[log

(1 − Dt

(xy ,G

(xy ,y + 1

),y) ) ]

+ 12Exy,y∼pdata (xy,y)

[log

(1 − Dt

(G(xy ,y − 1

),xy ,y − 1

) ) ]+λ

(TV

(G(xy ,y − 1

) )+TV

(G(xy ,y + 1

) )+ TV (G (x , y))

).(3)

where TV(·) denotes the total variation which is e�ective in remov-ing the ghosting artifacts. �e coe�cient λ balances the smoothnessand high resolution. During training, Da , Dt and G are alterna-tively optimized. More speci�cally, for one iteration, the Da and G

Page 5: Face Aging with Contextual Generative Adversarial Nets · Face Aging with Contextual Generative Adversarial Nets MM ’17, October 23–27, 2017, Mountain View, CA, USA lïvï í

Face Aging with Contextual Generative Adversarial Nets MM ’17, October 23–27, 2017, Mountain View, CA, USA

Original image 0-10 11-18 19-29 30-39 40-49 50-59 60+

Figure 6: �e input face and the generated faces for 7 age groups.

are updated. For the next iteration, the parameters of Dt andG arere�ned.

Testing Phase�e test image xt and the desized label y is fedinto the conditional transformation network in Section 3.2, and theoutput face xty is obtained.

In the future, we would like to explore the face aging in videosby making use of the state-of-the-art video processing [20, 43],face parsing [41], object tracking [21, 45] and video deblur [29, 30]techniques.

4 EXPERIMENTS4.1 Dataset CollectionOur C-GANs requires both sequential and non-sequential data. Forsequential data, we select 575, 649, 1, 962, 695 and 166 images fromCACD [5], FGNET [1], LFW [12], Morph [31] and SUP [40] dataset,respectively. �e whole dataset contains 4, 047 images with equalfemale/male and age distribution. We generate 3, 992 positive pairsfrom the sequence data for training. Note that we only use 575images from CACD for model training, and reserves 4, 689 imagesfor the face veri�cation experiment in Section 4.6.

For non-sequential data, we use the public IMDB-Wiki dataset[32]. We manually delete the images with heavy occlusion or lowresolution. As the dataset contains very few senior people, we also

Page 6: Face Aging with Contextual Generative Adversarial Nets · Face Aging with Contextual Generative Adversarial Nets MM ’17, October 23–27, 2017, Mountain View, CA, USA lïvï í

MM ’17, October 23–27, 2017, Mountain View, CA, USASi Liu, Yao Sun, Defa Zhu, Renda Bao, Wei Wang, Xiangbo Shu, and Shuicheng Yan

10+ ‐> 20+ FT demo CDL RFA CAAE Ours20+ ‐> 60+ FT demo CDL RFA CAAE Ours

10+ ‐> 40+ FT demo CDL RFA CAAE Ours30+ ‐> 60+ FT demo CDL RFA CAAE Ours

4+ ‐> 10+ FT demo CDL RFA CAAE Ours20+ ‐> 40+ FT demo CDL RFA CAAE Ours

20+ ‐> 40+ FT demo CDL CDM CAAE Ours8+ ‐> 30+ FT demo CDL RFA CAAE Ours

d40+ ‐> 60+ FT demo CDL RFA CAAE Ours 20+ ‐> 30+ FT demo CDL CDM CAAE Ours

6+ ‐> 50+ FT demo CDL IAAP CAAE Ours 20+ ‐> 60+ FT demo CDL CDM CAAE Ours

Figure 7: Comparison with the State-of-the-arts.

selected some images from the webface dataset [19], adiencedb [8]and CACD [5]. Like [2], we divide the age into six age categories:0 − 10, 11 − 18, 19 − 29, 30 − 39, 40 − 49, 50 − 59 and 60+ years. �enon-sequential data consists of 15, 030 face images with a uniformdistribution on both gender and age.

4.2 Implementation Details�e C-GANs net is trained via torch based on the public availablecodes DCGAN [27]. Similarly, the learning rate is set 0.0002 andbeta1 is 0.5 with a mini-batch size 28. Both faces and ages arefed to the network. �e generator and discriminator networksare optimized alternatively by the Adam method. Generally, thenetwork needs 100 epochs to generate favorable images, whichmay takes 12 hour by using NVIDIA Titan X. It takes about 0.028sec. for testing one image.

4.3 Face Aging ResultsWe show the face aging results in Figure 6. We can see that C-GANscan generate quite appealing results. We can draw the followingobservations. First, the faces are quite real and natural. Second,the generated faces can change gradually when ge�ing older. Forexample, in the output faces of the last 5 age groups (in red boxes)

of the �rst and second rows, the beards appear and become white.�ird, C-GANs can synthesize images with large age gaps. Forexample, the input face of the third row is quite young, but thesynthesized faces in the 60+ group (in yellow box) is still quite real.For another example, the face in the fourth row is a senior lady.We can produce very child-looking young face (in yellow box) forthe 0 − 10 age group. Fourth, C-GANs can produce very detailedtexture changes. For example, in the ��h, sixth and seventh rows,the synthesized faces in the red boxes contain frighteningly realconvincing enough crow’s feet, canthus wrinkles and eye bags.Fi�h, the shapes of face and facial features also change during theaging/progression. For example, in the last two rows, when theseniors are transformed to babies, their face become smaller, andtheir eyes and ocular distance become larger.

4.4 Comparison with the State-of-the-artsBaseline methods: Some prior works on age progression haveposted their best face aging results with inputs of di�erent ages,including [36, 39]. We mainly compare with 9 baselines, includingFT demo: an online fun demo Face Transformer, IAAP: state-of-the-art illumination-aware age progression [15], RFA: recurrent faceaging [40], CDL: coupled dictionary learning [35], acGAN: face

Page 7: Face Aging with Contextual Generative Adversarial Nets · Face Aging with Contextual Generative Adversarial Nets MM ’17, October 23–27, 2017, Mountain View, CA, USA lïvï í

Face Aging with Contextual Generative Adversarial Nets MM ’17, October 23–27, 2017, Mountain View, CA, USA

40+ 40->60 60+ 10+ 10->9 9 30+ 30->20 20+ 40+ 40->30 30+

50+ 50->30 30+ 30+ 30->50 50+ 50+ 50->30 30+ 5 5->10 10+

7 7->10 10+ 40+ 40->50 50+ 60+ 60->50 50+ 50+ 50->40 40+

40+ 40->30 30+ 30+ 30->60 60+ 4 4->10 10+ 60+ 60->30 30+

Figure 8: Comparison between our results and ground truth.

aging with conditional generative adversarial networks [2], CAAE:conditional adversarial autoencoder [46], CDM: Compositional andDynamic Model [36] and [33, 39]. �ere are 246 aging results with72 inputs in total. Our face aging for each input is implementedto generate the aging results with the same ages (ranges) of theposted results.

�alitative Evaluation: Figure 7 plots the results of the com-parison. Compared with other methods, the aged face generated byour method has more realistic and noticeable transformations inappearance. For instance, the shape of facial features has changedobviously when children grow up(row 1 col 2, row 3 col 2). �eaged face in the row 2 col 1 gets more wrinkles, and her eyes be-come smaller during the process. We can also observe that eye bagsand more wrinkles appear in row 5 col 1. Meanwhile, our methodpreserve the identity very well, which can be observed in mostimages.

�antitative Evaluation: To quantitatively evaluate the per-formance of the proposed method, we designed a user study from43 volunteers. Each volunteer is shown with three generated im-ages each time. �e candidate images are generated from given agegroups and supposed to have the target ages. Among these images,there is the one generated by C-GANs and two other results gener-ated by FT demo, CAAE or other prior methods. Every volunteeris asked to choose one of the following three options. We added1 point if our result was chosen as the best one, 0.5 when “all arelikely” was chosen, and 0 if one result from prior work was chosenas the best. �e score was normalized by number of responses percell and shown in Figure 9. �e x-axis is the input age group andthe y-axis is the target age group.

From Figure 9 we can see that the proposed method outperformsprior work almost all the time. Particularly, our approach performs

0.5 0.8 1 0.7 0.7 1

0+

0.6 0.9 0.6 0.6 1 1

9 5

0-59

6

0.6 0.5 0.5 0.9 0.9 0.6

0.7 0.7 0.9 0.9 0.8 0.5

-39

40

-49

et A

ge

0.8 0.9 1 0.5 0.6 0.8

19-2

9 3

0-

Targ

e

1 0.9 0.7 0.6 0.8 0.6

0 9 0 6 0 8 0 7 0 8 0 9 1

1-18

0.9 0.6 0.8 0.7 0.8 0.9

1-10 11-18 19-29 30-39 40-49 50-59 60+

1-10

Input Age

Figure 9: Comprehensive comparison to prior works.

very good when the input and target age groups are contiguous. Webelieve this manly credits to the newly proposed Transition Pa�ernDiscriminative network. We also notice that our approach is a bitpoor when generating faces of 60+ from children. We think this isbecause of the signi�cant changes of appearances from children toolds. We will try to improve the performance in future works.

Page 8: Face Aging with Contextual Generative Adversarial Nets · Face Aging with Contextual Generative Adversarial Nets MM ’17, October 23–27, 2017, Mountain View, CA, USA lïvï í

MM ’17, October 23–27, 2017, Mountain View, CA, USASi Liu, Yao Sun, Defa Zhu, Renda Bao, Wei Wang, Xiangbo Shu, and Shuicheng Yan

10+ 20+ 20+ 50+ 60+

40+

5 8 20+ 50+ 60+5 8 20+ 50+ 60+

40+

Figure 10: Movie stars. In each group, a single image on thefar le� is age progressed to di�erent ages in the �rst row,and the real images of that age are shown in the second row.

4.5 Comparison with ground truthHere we qualitatively compare the generated faces with the groundtruth. �e qualitative comparisons of both female and male areshown in Figure 8, which shows appealing similarity. In each triplet,the �rst and third images are the ground truths with Age Group 1and Age Group 2, while the second image is our aging result.

In this experiment, we �rst crop the face from images of AgeGroup 1. �en we proceed these images by C-GANs and get theaging faces with the same ages as Age Group 2. At last, we rungraph cuts to �nd an optimal seam followed by poisson blendingto blend the aged face into the real head photo [7].

In Figure 8, we can observe that the generated aging faces almosthave no di�erence from real ones. �is indicates the C-GANs couldcorrectly synthesis the real age progression.

To be�er demonstrate the age progressing capability of C-GANs,we collect some images of two movie starts, including Brad Pi� andShirley Temple. For Temple, we �nd several real images throughher life. We cut out some frames from movies and take them as theolder appearance of Pi�. �e results are shown in Figure 10. �eage groups from 20 − 50 are omi�ed due to the limitation of data.Note that our method can successfully simulate the transformationcharacteristics of all ages of the stars, especially for the ages ofchildren and olds.

4.6 Cross Age Face Veri�cation�e proposed age progression method can also improve the per-formance of cross-age face veri�cation signi�cantly. We collected2, 000 intra-person pairs and 2, 000 inter-person pairs with crossages on the FGNET database, using 2, 044/2, 645 images ofmales/femalesrespectively. In both sets of pairs, the number of male pairs andfemale pairs are equal, and the age span of the pairs are all more

Figure 11: Face veri�cation results.

than 20 years. �e set of these 4, 000 pairs is called “Original Pairs”.In each original pair, we proceed the younger face to the agingface with the same age of the older face by C-GANs, and assumethe aging face has the same age as well. We replace the youngerface in original pairs by the newly generated aging face, and thenconstruct 4, 000 new pairs, called “Our Synthetic Pairs”. To evaluatethe performance of our C-GANs, we also generated the “CAAESynthetic Pairs” by the state-of-the-art age progression method[46]. �e state-of-the-art center Loss based face veri�cation [42] isused for testing on the above three sets of pairs.

�e FAR-FRR (false acceptance rate-false rejection rate) curvesare illustrated in Figure 11. �e EER (the equal error rates) onC-GANs Synthetic Pairs, CAAE Synthetic Pairs, and Original Pairsare 8.72%, 11.05%, and 17.41% respectively. We can see that the faceveri�cation on C-GANs Synthetic pairs achieves be�er ERR thanon other pairs. �is implies the aging faces generated by C-GANscan e�ectively alleviate the face veri�cation errors cost by age gaps.

5 CONCLUSION AND FUTUREWORKSIn this paper, we propose a contextual generative adversarial netsto tackle the face aging problem. Di�erent from existing generativeadversarial nets based methods, we explicitly model the transitionpa�erns between adjacent age groups during the training proce-dure. From baby to teenagers period, the transition pa�erns isshown in the way that the face becomes bigger, while from theages of 30 to the age of 50, the transition pa�erns include the grad-ually developed wrinkle. To this end, the C-GANs consists of twodiscriminative networks, i.e., an age discriminative network and atransition pa�ern discriminative network. �ey are collaborativelycontribute to the appealing results.

Currently, our model is based on DCGAN [27]. In future, weplan to employ other GANs to improve the performance, such asWasserstein GAN [3], LS GAN [26], EB GANs [14].

Page 9: Face Aging with Contextual Generative Adversarial Nets · Face Aging with Contextual Generative Adversarial Nets MM ’17, October 23–27, 2017, Mountain View, CA, USA lïvï í

Face Aging with Contextual Generative Adversarial Nets MM ’17, October 23–27, 2017, Mountain View, CA, USA

6 ACKNOWLEDGMENT�is work was supported by National Natural Science Foundationof China (No.61572493, Grant U1536203) and Natural Science Foun-dation of Jiangsu Province (Grant No. BK20170856)

REFERENCES[1] 2000. Face and gesture recognition. In Network: FG-NET Aging Database. (2000).[2] Grigory Antipov, Moez Baccouche, and Jean Luc Dugelay. 2017. Face Aging

With Conditional Generative Adversarial Networks. arXiv:1702.01983 (2017).[3] Martin Arjovsky, Soumith Chintala, and Lon Bo�ou. 2017. Wasserstein GAN.

(2017).[4] Alexandre Cruz Berg, Francisco Jos Perales Lopez, and Manuel Gonzlez. 2006. A

Facial Aging Simulation Method Using �accidity deformation criteria. IV (2006).[5] Bor Chun Chen, Chu Song Chen, and Winston H. Hsu. 2014. Cross-Age Reference

Coding for Age-Invariant Face Recognition and Retrieval.[6] Liang Chieh Chen, George Papandreou, Iasonas Kokkinos, Kevin Murphy, and

Alan L. Yuille. 2016. DeepLab: Semantic Image Segmentation with Deep Convolu-tional Nets, Atrous Convolution, and Fully Connected CRFs. In arXiv:1606.00915.

[7] Neeraj Kumar Samreen Dhillon Peter Belhumeur Shree K. Nayar Dmitri, Bitouk.2008. In ACM Trans. on Graphics (also Proc. of ACM SIGGRAPH).

[8] E Eidinger, R Enbar, and T Hassner. 2014. Age and Gender Estimation of Un�l-tered Faces. TIFS (2014).

[9] Y. Fu, G. Guo, and T. S. Huang. 2010. Age synthesis and estimation via faces: asurvey. TPAMI (2010).

[10] Ian J. Goodfellow, Jean Pougetabadie, Mehdi Mirza, Bing Xu, David Wardefarley,Sherjil Ozair, Aaron Courville, and Yoshua Bengio. 2014. Generative AdversarialNetworks. NIPS (2014).

[11] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2015. Deep ResidualLearning for Image Recognition. In CVPR.

[12] Gary B. Huang, Marwan Ma�ar, Tamara Berg, and Eric Learned-Miller. 2008.Labeled Faces in the Wild: A Database forStudying Face Recognition in Uncon-strained Environments. (2008).

[13] Phillip Isola, Jun Yan Zhu, Tinghui Zhou, and Alexei A. Efros. 2016. Image-to-Image Translation with Conditional Adversarial Networks. In arXiv:1611.07004.

[14] Yann LeCun Junbo Zhao, Michael Mathieu. 2017. Energy-based GenerativeAdversarial Network. arXiv:1609.03126 (2017).

[15] Ira Kemelmacher-Shlizerman, Supasorn Suwajanakorn, and Steven M Seitz. 2014.Illumination-Aware Age Progression. In CVPR.

[16] A Lanitis, C. J Taylor, and T. F Cootes. 2002. Toward automatic simulation ofaging e�ects on face images. TPAMI (2002).

[17] Christian Ledig, Lucas �eis, Ferenc Huszar, Jose Caballero, Andrew Cunning-ham, Alejandro Acosta, Andrew Aitken, Alykhan Tejani, Johannes Totz, andZehan Wang. 2016. Photo-Realistic Single Image Super-Resolution Using aGenerative Adversarial Network. In arXiv:1609.04802.

[18] Mu Li, Wangmeng Zuo, and David Zhang. 2016. Deep Identity-aware Transferof Facial A�ributes. In arXiv:1610.05586.

[19] L. Liu, C. Xiong, H. Zhang, and Z. Niu. 2016. Deep Aging Face Veri�cation WithLarge Gaps. TMM (2016).

[20] Si Liu, Changhu Wang, Ruihe Qian, Han Yu, Renda Bao, and Yao Sun. 2017.Surveillance Video Parsing with Single Frame Supervision. CVPR (2017).

[21] Si Liu, Tianzhu Zhang, Xiaochun Cao, and Changsheng Xu. 2016. Structuralcorrelation �lter for robust visual tracking. In CVPR.

[22] Mehdi Mirza and Simon Osindero. 2014. Conditional Generative AdversarialNets. Computer Science (2014).

[23] U Park, Y. Tong, and A. K. Jain. 2010. Age-invariant face recognition. TPAMI(2010).

[24] Deepak Pathak, Philipp Krahenbuhl, Je� Donahue, Trevor Darrell, and Alexei A.Efros. 2016. Context Encoders: Feature Learning by Inpainting. CVPR (2016).

[25] E Pa�erson, A Sethuram, MAlbert, and K Ricanek. 2007. Aspects of Age Variationin Facial Morphology A�ecting Biometrics. In ICB.

[26] Guo-Jun Qi. 2017. Loss-Sensitive Generative Adversarial Networks on LipschitzDensities. arXiv:1701.06264 (2017).

[27] Alec Radford, Luke Metz, and Soumith Chintala. 2015. Unsupervised Repre-sentation Learning with Deep Convolutional Generative Adversarial Networks.Computer Science (2015).

[28] Lejian Ren, Si Liu, Yao Sun, Jian Dong, Liu Luoqi, and Yan Shuicheng. 2017. TimeTraveler: a real-time face aging system. In ACM MM.

[29] Wenqi Ren, Xiaochun Cao, Jinshan Pan, Xiaojie Guo, Wangmeng Zuo, andMing-Hsuan Yang. 2016. Image deblurring via enhanced low-rank prior. IEEETransactions on Image Processing 25, 7 (2016), 3426–3437.

[30] Wenqi Ren, Si Liu, Hua Zhang, Jinshan Pan, Xiaochun Cao, and Ming-HsuanYang. 2016. Single image dehazing via multi-scale convolutional neural networks.In European Conference on Computer Vision. Springer, 154–169.

[31] Karl Ricanek and Tamirat Tesafaye. 2006. MORPH: a longitudinal image databaseof normal adult age-progression. FG (2006).

[32] Rasmus Rothe, Radu Timo�e, and Luc Van Gool. 2015. DEX: Deep EXpectationof Apparent Age from a Single Image. In CVPR workshop.

[33] Amrutha Sethuram, Karl Ricanek, and Eric Pa�erson. 2010. A hierarchicalapproach to facial aging. (2010).

[34] Evan Shelhamer, Jonathon Long, and Trevor Darrell. 2017. Fully ConvolutionalNetworks for Semantic Segmentation. TPAMI (2017).

[35] Xiangbo Shu, Jinhui Tang, Hanjiang Lai, Luoqi Liu, and Shuicheng Yan. 2015.Personalized Age Progression with Aging Dictionary. In ICCV.

[36] Jinli Suo, Song Chun Zhu, Shiguang Shan, and Xilin Chen. 2010. A Compositionaland Dynamic Model for Face Aging. TPAMI (2010).

[37] Yusuke Tazoe, Hiroaki Gohara, Akinobu Maejima, and Shigeo Morishima. 2012.Facial aging simulator considering geometry and patch-tiled texture. In ACMSIGGRAPH.

[38] Bernard Tiddeman, Michael Burt, and David Perre�. 2001. Prototyping andtransforming facial textures for perception research. CGA (2001).

[39] Junyan Wang, Yan Shang, Guangda Su, and Xinggang Lin. 2006. Age simulationfor face recognition. In ICPR.

[40] Wei Wang, Zhen Cui, Yan Yan, Jiashi Feng, Shuicheng Yan, Xiangbo Shu, andNicu Sebe. 2016. Recurrent Face Aging. In CVPR.

[41] Zhen Wei, Yao Sun, Jinqiao Wang, Hanjiang Lai, and Si Liu. 2017. LearningAdaptive Receptive Fields for Deep Image Parsing Network. In CVPR.

[42] Yandong Wen, Kaipeng Zhang, Zhifeng Li, and Yu Qiao. 2016. A DiscriminativeFeature Learning Approach for Deep Face Recognition. In ECCV.

[43] Han Yu, Guanghui Ren, Ruihe Qian, Yao Sun, Changhu Wang, hanqing Lu, andSi Liu. 2017. RSVP: A Real-Time Surveillance Video Parsing System with SingleFrame Supervision. In ACM MM.

[44] Ma�hew D. Zeiler and Rob Fergus. 2014. Visualizing and Understanding Convo-lutional Networks. In ECCV.

[45] Tianzhu Zhang, Si Liu, Changsheng Xu, Shuicheng Yan, Bernard Ghanem, Naren-dra Ahuja, and Ming-Hsuan Yang. 2015. Structural sparse tracking. In CVPR.

[46] Zhifei Zhang, Yang Song, and Hairong Qi. 2017. Age Progression/Regression byConditional Adversarial Autoencoder. In arXiv:1702.08423.

[47] Jun Yan Zhu, Philipp Krhenbhl, Eli Shechtman, and Alexei A. Efros. 2016. Gen-erative Visual Manipulation on the Natural Image Manifold. In ECCV.

[48] Jun-Yan Zhu, Taesung Park, Phillip Isola, and Alexei A. Efros. 2017. Un-paired Image-to-Image Translation using Cycle-Consistent Adversarial Networks.arXiv:1703.10593 (2017).