356 Part II • Predictive Analytics/Machine Learning
Face recognition, although seemingly similar to image recognition, is a much more complicated undertaking. The goal of face recognition is to identify the individ- ual as opposed to the class it belongs to (human), and this identification task needs to be performed on a nonstatic (i.e., moving person) 3D environment. Face recognition has been an active research field in AI for many decades with limited success until recently. Thanks to the new generation of algorithms (i.e., deep learning) coupled with large data sets and computa- tional power, face recognition technology is starting to make a significant impact on real-world applications. From security to marketing, face recognition and the variety of applications/use cases of this technology are increasing at an astounding pace.
Some of the premier examples of face recogni- tion (both in advancements in technology and in the creative use of the technology perspectives) come from China. Today in China, face recognition is a very hot topic both from business development and from application development perspectives. Face recognition has become a fruitful ecosystem with hundreds of start-ups in China. In personal and/or business settings, people in China are widely using and relying on devices whose security is based on automatic recognition of their faces.
As perhaps the largest scale practical applica- tion case of deep learning and face recognition in the world today, the Chinese government recently started a project known as “Sharp Eyes” that aims at establishing a nationwide surveillance system based on face recognition. The project plans to integrate security cameras already installed in public places with private cameras on buildings and to utilize AI and deep learning to analyze the videos from those cameras. With millions of cameras and billions of lines of code, China is building a high-tech authori- tarian future. With this system, cameras in some cit- ies can scan train and bus stations as well as airports to identify and catch China’s most wanted suspected criminals. Billboard-size displays can show the faces of jaywalkers and list the names and pictures of peo- ple who do not pay their debts. Facial recognition scanners guard the entrances to housing complexes.
An interesting example of this surveillance system is the “shame game” (Mozur, 2018). An
intersection south of Changhong Bridge in the city of Xiangyang previously was a nightmare. Cars drove fast, and jaywalkers darted into the street. Then, in the summer of 2017, the police put up cameras linked to facial recognition technology and a big out- door screen. Photos of lawbreakers were displayed alongside their names and government identifica- tion numbers. People were initially excited to see their faces on the screen until propaganda outlets told them that this was a form of punishment. Using this, citizens not only became a subject of this shame game but also were assigned negative citizenship points. Conversely, on the positive side, if people are caught on camera showing good behavior, like pick- ing up a piece of trash from the road and putting it into a trash can or helping an elderly person cross an intersection, they get positive citizenship points that can be used for a variety of small awards.
China already has an estimated 200 million sur- veillance cameras—four times as many as the United States. The system is mainly intended to be used for tracking suspects, spotting suspicious behavior, and predicting crimes. For instance, to find a criminal, the image of a suspect can be uploaded to the system, matching it against millions of faces recognized from videos of millions of active security cameras across the country. This can find individuals with a high degree of similarity. The system also is merged with a huge database of information on medical records, travel bookings, online purchases, and even social media activities of every citizen and can monitor practically everyone in the country (with 1.4 billion people), tracking where they are and what they are doing each moment (Denyer, 2018). Going beyond narrowly defined security purposes, the govern- ment expects Sharp Eyes to ultimately assign every individual in the country a “social credit score” that specifies to what extent she or he is trustworthy.
While such an unrestricted application of deep learning (i.e., spying on citizens) is against the privacy and ethical norms and regulations of many western countries, including the United States, it is becoming a common practice in countries with less restrictive privacy laws and concerns as in China. Even western countries have begun to plan on employing similar technologies in limited scales only for security and
Application Case 6.6 From Image Recognition to Face Recognition
Chapter 6 • Deep Learning and Cognitive Computing 357
Text Processing Using Convolutional Networks
In addition to image processing, which was in fact the main reason for the popularity and development of convolutional networks, they have been shown to be useful in some large-scale text mining tasks as well. Especially since 2013, when Google published its word2vec project (Mikolov et al., 2013; Mikolov, Sutskever, Chen, Corrado, and Dean, 2013), the applications of deep learning for text mining have increased remarkably.
Word2vec is a two-layer neural network that gets a large text corpus as the input and converts each word in the corpus to a numeric vector of any given size (typically ranging from 100 to 1,000) with very interesting features. Although word2vec itself is not a deep learning algorithm, its outputs (word vectors also known as word embeddings) already have been widely used in many deep learning research and commercial projects as inputs.
One of the most interesting properties of word vectors created by the word2vec algorithm is maintaining the words’ relative associations. For example, vector operations
vector (‘King’) – vector (‘Man’) + vector (‘Woman’)
and
vector (‘London’) – vector (‘England’) + vector (‘France’)
will result in a vector very close to vector (‘Queen’) and vector (‘Paris’), respectively. Figure 6.29 shows a simple vector representation of the first example in a two-dimensional vector space.
Moreover, the vectors are specified in such a way that those of a similar context are placed very close to each other in the n-dimensional vector space. For instance, in the word2vec model pretrained by Google using a corpus including about 100 billion words (taken from Google News), the closest vectors to the vector (‘Sweden’) in terms of cosine distance, as shown in Table 6.2, identify European country names near the Scandinavian region, the same region in which Sweden is located.
Additionally, since word2vec takes into account the contexts in which a word has been used and the frequency of using it in each context in guessing the meaning of the word, it enables us to represent each term with its semantic context instead of just the syntactic/symbolic term itself. As a result, word2vec addresses several word variation issues that used to be problematic in traditional text mining activities. In other words,
crime prevention purposes. The FBI’s Next Generation Identification System, for instance, is a lawful appli- cation of facial recognition and deep learning that compares images from crime scenes with a national database of mug shots to identify potential suspects.
Questions for Case 6.6
1. What are the technical challenges in face recognition?
2. Beyond security and surveillance purposes, where else do you think face recognition can be used?
3. What are the foreseeable social and cultural problems with developing and using face recog- nition technology?
Sources: Mozur, P. (2018, June 8). “Inside China’s Dystopian Dreams: A.I., Shame and Lots of Cameras.” The New York Times. https://www.nytimes.com/2018/07/08/business/china- surveillance-technology.html; Denyer, S. (2018, January). “Beijing Bets on Facial Recognition in a Big Drive for Total Surveillance.” The Washington Post. https://www.washing- tonpost.com/news/world/wp/2018/01/07/feature/in- china-facial-recognition-is-sharp-end-of-a-drive-for-total- surveillance/?noredirect=on&utm_term=.e73091681b31.