Source: Deep Learning on Medium
Privacy for Facial Image Recognition
Facial image recognition technology is an exciting break through for computer vision, at least for some people; it has opened the door to many new opportunities for automation and improvements in security, law enforcement, and customer service to name a few. For others, this technology is the latest in an onslaught of practices that violate personal privacy. The easy solution is to simply ban the use of the technology, but surely there are at least some use cases or approaches that will allow us to leverage some of the benefits of facial image recognition without sacrificing data privacy. StreamLogic aims to help companies be proactive in protecting customer privacy even beyond regulatory requirements. In this article, I outline a 3-layered approach to implement facial image recognition for customer behavior analysis with very strong data privacy properties.
The Customer Behavior Use Case
In the customer behavior analytics scenario, one typically has multiple physical locations, each with multiple cameras that are used to track customers during their visit. Using these cameras, we would like to answer questions such as how much time does a customer spend at the location, where do they spend the most time, what path do they take, how often do they visit, do they visit multiple locations, etc. Facial image recognition is crucial to answering these questions because it allows us to assign an identity to individuals in the video feed in order to associate appearances of the same person across multiple cameras and sites over time.
For this use case, it is not necessary to associate images with a person’s real-world identity (i.e. name, customer ID, etc.) but simply to identify which images collected from multiple cameras over multiple days represent the same individual. However, if the data is not collected and stored carefully, an employee or attacker could use the information to connect the data with a real-world identity. The following sections describe a three-part approach to enable an organization to perform customer behavior analytics while protecting the privacy of their customers.
1. The Signature
Identifying an individual from an image has its challenges: varying image quality, lighting conditions and camera angle, to name a few. The basic idea behind today’s image recognition technology is to extract some kind of fingerprint or signature from a facial image that is somehow independent of these variations. Instead of comparing images directly, we can compare the signatures generated from two images and assess their similarity. The “signature” generated from two images of the same person will not be identical but are similar enough that they can be compared much like you would compare hand-written signatures.
This facial image recognition signature is just a list of numbers. The graphs in figure 1 depict sample outputs from a face recognition system. As you can see in the top graph, there is significant similarity between the output from two different images of the same person. And as expected, the output generated for the images of two different people, as shown in the bottom graph, are quite dissimilar.
For customer behavior analysis, this could be implemented as follows: periodically capture images from each camera, detect the location of the faces in the image, generate a signature for each face detected, and finally record the event data (e.g. site, camera, face location, time, and signature). Once the signatures are generated there is no need to keep the image, so they can be discarded immediately. For the analysis, the signatures in the event data are compared and divided up into groups of events with similar signatures. Each group is labeled with a random ID representing that unique individual. If this data is collected in a central place over time then we can answer all the questions presented in the introduction, but have we adequately protected individual privacy?
The signatures are not easily reversible, sort of like most doctors’ signatures: you can compare them but you could rarely guess their name from their signature alone. However, if you were interested in obtaining information about a specific person you could easily get a photo of them, run it through the same facial recognition system to generate a signature and compare it to the event data signatures. You could also take a large database of photos of known people and do the same. Thus, discarding the original images is not enough to protect privacy. Using a proprietary facial recognition signature generator could provide some additional protection provided that you can protect access to that system, but more on that in the next section.
2. The Separation of Data
We learned in the previous section that we can replace actual face images with a representative signature to avoid retaining images of people, but that alone does not provide much privacy protection. To take it a step further, we now introduce the idea of data separation. In particular, we employ three types of separation. The following paragraphs describe the three types of separation, which is also summarized in figure 2.
Separating the Model from the Data
First, the model (i.e. proprietary facial recognition signature generation software) is only keep on the devices near the camera that collects the data, while the data is shipped to a server in a data center elsewhere. This protects against someone using a photo of a known person, using the model to generate their signature, and then matching it against the event data to acquire information about that person. Since the model and data are on different systems and different networks, they would need to infiltrate two systems to accomplish that.
Separating the Signature Data from the Event Data
Second, the signature data is sent to one server and the event data (site, camera, face location, time) is sent to a second server. The random ID associated with all the events for the same person, also called a token, is kept with both sets of data in order to make use of the data later. Now, in order to retrieve information about someone, given a known photo of them, requires infiltrating three systems 1) to access the model and generate a signature, 2) to match that with the signature data to obtain the associated token, and 3) to access the event data using the recovered token.
Separating the Data Over Time
There is another way besides using a known photo to recover information about an individual. If one has enough information about dates and times that the person has been at the location then they can scan the event data for tokens that have events for all those dates and times. With that token, they can then find out all the data about the person. However, if the process that groups event data by signatures and assigns random IDs is run once a day or more frequently then you will produce different tokens for the same individual over time, and one can no longer recover information using this method.
3. The Service Provider
Finally, one additional measure is to use a third-party identity management service to store and access the signature data. As shown in the diagram in figure 2, to answer questions about customer behavior, the analyst will collect only the necessary subset of event data to answer the question and send that along with the question to the service provider. The data can then will be combined and analyzed in memory to provide answers, i.e. the combined dataset is never stored and at risk of being stolen.
This provides protection against access by employees by limiting access to the third party service. Also, an attacker would now need to infiltrate three separate systems, in three locations, and managed by two distinct organizations.
I founded StreamLogic to help companies take advantage of the latest technologies in computer vision and image/video analytics. I’ve also spent many years working in the area of health care and understand the need to protect customer data as well as how that is done in practice. In this article, I describe several technical solutions to reducing the privacy risk of using facial image recognition. The point here is that we have a powerful new technology that will improve the products and services that companies offer and there are ways to do so in a responsible way. However, companies that want to take advantage of facial image recognition do need to be proactive and think through privacy concerns before deploying solutions.