A Information-based Philosophical View of Machine Learning and the Digital Person

Original article can be found here (source): Artificial Intelligence on Medium

A Information-based Philosophical View of Machine Learning and the Digital Person

Identity in digital reality.

Photo by Ashwin Vaswani on Unsplash

Suffice to say, the concept of a person is an extremely complex topic. Christian Smith’s recent book What is a Person?: Rethinking Humanity, Social life, and the Moral Good from the Person Up. In that book, Smith lists five levels of distinctly human capacities related to the notion of personhood. These include interest formation, conscious awareness, moral values, creativity, self-reflection, identity formation, truth seeking, language use, long-term memory, inter-subjective understanding, creativity and use of technology, narrative formation, volition, emotional experience and about ten more unique capacities of persons.

As the gap between reality and digital reality blur, it’s no longer clear where one starts and the other ends. The Oxford philosopher of technology, Luciano Floridi, has termed this phenomenon Onlife. We live in a liminal space between the real and what Postmodern sociologist Jean Baudrillard would call the hyperreal. Our personal data move around in symbolic spaces and the topology of these digital, symbolic spaces can impact the kinds of jobs we are offered, the movies we are recommended, and the dates we go on. I will argue that the account I present here has another nice feature in that it can help us make sense of worries about how machine learning personalization may erode our sense free will, as recent authors such as Shoshana Zuboff have claimed.

Nevertheless, in what follows I will try to lay out one possible interpretation of how persons, their personal data, and machine learning personalization are related. It is an unabashedly speculative interpretation, and draws on ideas from information theory that I think are helpful in understanding what makes us unique as persons.

You may have wondered what it means when Google or Facebook says, for instance, that “this advertisement was personalized based on your viewing preferences.”

Some of the ideas below are new and some of them were inspired by the work of Luciano Floridi and others, such as Mireille Hildebrandt. And in case you think these ideas are mere sophistry or “philosophizing for the sake of philosophizing,” questions surrounding the relation between persons and their digital representations are actively shaping the Digital Agenda of the EU. For example, in 2013 the European Commission published The Online Manifesto: Being Human in a Hyperconnected World. At some point, probably sooner than later, we must confront these tricky questions of how the digital and physical worlds relate.

You are a Fundamentally In-compressible Structure of Information

My view is that personal data, in the form of information, are representations of persons which reveal their true essence.

A person is a fundamentally in-compressible structure of information, a kind of unity of information. Let’s spend some time unpacking this claim in various ways. The Harvard philosopher Christine Korsgaard has spent her career advocating and extending Kantian views about personhood and agency. She claims that a kind of unified agency is necessary in order to act in accordance with practical reason and thus constitute oneself as a person, a unified entity with a particular set of values which motivate one to action. These sets of values differentiate us as unique persons, and can be thought of more generally as patterns. Patterns, as we know from Shannon and others, can be encoded and transmitted as messages to be decoded and interpreted by receptive observers or instruments.

Norbert Wiener, a founding father of information theory and cybernetics, expresses a similar view. Wiener, in his 1954 book Cybernetics and Society, writes that an “Organism is opposed to chaos, to disintegration, to death, as message is to noise… to describe an organism … is to answer certain questions about it which reveal its pattern.” For Wiener, persons are unified organisms which, through a kind of homeostasis, struggle to maintain their unity in the face of the universe’s blind march towards ever-increasing entropy. He writes “it is the pattern maintained by this homeostasis which is the touchstone of our personal identity” (pg. 96). We, as persons, are patterns, patterns which may be encoded and transmitted as messages. Wiener amazingly preempted Elon Musk’s Neuralink by about 70 years:

“It is amusing as well as instructive to consider what would happen if we were to transmit the whole pattern of the human body, of the human brain with its memories and cross connections, so that a hypothetical receiving instrument could re-embody these messages in the appropriate matter, capable of continuing the processes already in the body and the mind, and of maintaining the integrity needed for this continuation by a process of homeostasis.”

Your Invariant (digital) Representation

In the language of deep learning and computational neuroscience, we could choose to call this basic unity of information an “invariant representation” of a person, if talk of essences or unities sounds too hokey-pokey for you. In image recognition, an invariant representation of an object is some latent representation (a lossy encoding) of an object which permits identification of the object under arbitrarily varying conditions, such as lighting or angle. In surveillance applications using CCTV, for instance, the goal is to find a robust feature representation of a person which can permit identification across varying camera views. In the language of digital marketing and data brokers, the goal is to build an “Omnichannel view” of a customer that can identify a person from both their physical retail and online behavior. You hopefully get the picture.

Digital Life DisIntegrates The Person

Perhaps less abstractly, in our everyday language we recognize the unity of persons by referring to persons as “individuals”: once divided, you no longer have the same thing. Deleuze and others were the first to make note of this point. Cultural theorists have latched on to this characterization of persons through personal data. John Cheney-Lippold’s highly cited paper titled A New Algorithmic Identity: Soft Biopolitics and the Modulation of Control says, “These dividuals become the axiom of control, the recipients through which power flows as subjectivity takes a deconstructed dive into the digital era.” I’m not sure what an axiom of control is, but these cultural theorists are correct in viewing persons as indivisible unities. The question is what exactly, when taken away from you, makes you not you?

Everyday Essentialism

Lastly, and before we move on to how this information-based view of the person is connected to worries about the erosion of free will, I think it’s worth pointing out another interesting connection to essentialism. Regardless of how we feel about the metaphysical claim, there is a large body of evidence in everyday experience and in social psychology showing that people do act as if something like an “essence” really existed. People will pay vastly greater sums of money for an identical sweater if they believe it to have been worn by George Clooney. People will cherish an autographed baseball from their favorite player, as if by magic the skill of the player could be stored inside the object. Parents often believe their baby’s shoes to be irreplaceably valuable, and so on. As Paul Bloom and Susan Gelman have demonstrated in many experiments, part of what it means to be a person is to experience this feeling of psychological essentialism. We can’t seem to quite escape it.

How does machine learning relate to our free will? Photo by Kyle Glenn on Unsplash

An Entropy-based Interpretation of Free Will

If persons indeed possess free will, then by definition they could choose to perform any action or behavior, or do otherwise; what to do next is entirely “up to them.” If you are completely free to act, then knowing something about your environment or current state of mind does not reduce any uncertainty we might have about what you might do next. In the terminology of information theory, the mutual information between your behavior, your environment, or current mental state is effectively zero.

The maximally-free person could be envisioned as a kind of massively high dimensional joint uniform distribution over all possible discrete behaviors, thoughts, events, etc. As this distribution contains no redundancy (remember we said you could do anything with equal probability), the average number of bits needed to encode this distribution of events is equal to Shannon’s entropy. As you — a person purportedly possessing free will — are incompressible, there is no better, more efficient way to encode this information without sacrificing some aspect of your essence as a person. As a sidenote, I think there are definitely problems with viewing free will in this way, but I’ll have to save them for another post (for example, behaviors must always be interpreted under some description. But whose description: the doer or the interpreter’s?)

The maximally-free person. Imagine a joint density function like this, but defined in n-dimensions.

An Example of Information Gain

Now suppose we observe you act in a certain way. Maybe you bought some product on Amazon, for instance. Maybe it was vegan cat food. We have now acquired information about you. The degree of informativeness can be quantified by the degree to which the possession of this information (personal data) allow me to reduce my uncertainty about your fundamental nature as a person. It turns out that knowing that you buy vegan cat food reveals a lot about what type of person you are, what your home environment might be like, and so on. As the oft-quoted and underappreciated philosopher and founder of Bubba Gump Shrimp Co., Forrest Gump, once said, “My mama always said you can tell a lot about a person by their shoes, where they going, where they been.” True indeed, Forrest. True indeed.

It might turn out, however, that an observation of your behavior does nothing to reduce the uncertainty about your true essence. In that case, we might consider it to be mere “noise.” But part of your fundamental nature or essence includes your predisposition to engage in certain behaviors. Thus, MLP aims to collect as much personal data as possible so as to reduce the uncertainty about which behaviors you will exhibit, whether that means a click on an ad or the purchase of a product. You may now see why some worry about the dangers posed by MLP and AI for our free will.

Introducing Shannon’s Entropy

Let me parse these claims I just now made. Firstly, my notion rests on Shannon’s notion of entropy, which explicitly formalizes the relation between the predictability of events and the probability of events to occur. Here’s an example to clarify. A set of n events in which each event occurs with 1/n probability is therefore the most unpredictable (in terms of predicting the outcome of any one experiment). In Shannon’s terms, this means its entropy is maximized. We can think of the entropy of a distribution as the theoretical speed limit for compression of information. When entropy is at its peak, we are maximally uncertain of the outcome of any one event occurring in a given experiment. As the probability of events shifts from this 1/n uniformity, we become more certain of events occurring. Shannon’s genius was in formally expressing how we can take advantage of this redundancy to encode information in a more efficient way. Another major implication was realizing that entropy was in some sense the quantification of predictability.

The entropy (H) for a discrete random variable X, when using log base 2, is measured in bits. Since it’s an expectation, we can think of it as the smallest average code word length (over all possible codes) needed to encode a distribution of events.

Entropy Can Be Related to Bayesian Belief Updating

This notion of uncertainty can also be expressed within the context of Bayesian belief updating. Before we observe some event or behavior — in Heidegger’s terms, before a phenomenon presences — we have prior beliefs about the probability of some event occurring. These beliefs can be based on previous experience or cultural or scientific understanding of some phenomenon. But after we observe this event, we have collected information, information in the sense that now we can update the probability distribution represented by our prior beliefs. The difference between the prior probability distribution and the posterior (updated) distribution is what we might call the “information” or “information gain” of the observation. (In Bayesian machine learning, we oftentimes will use KL divergence to quantify the change in the uncertainty surrounding our parameter estimates when updating from our prior to posterior distribution).

In essence, we are now more certain about which events or behaviors this object will exhibit after we have observed the given event or behavior. Behaviors that reduce uncertainty more than others contain more information about your underlying nature. This idea is illustrated below and taken from the book Uncertainty and Information by George Klir.

How information and Bayesian belief updating relate to each other. A phenomenon “presences” to us in the form of observational data and reduces our posterior uncertainty as to its essence (Klir, 2006).

Big Tech Wants to Reduce the Uncertainty Regarding your Future Behavior

This finally brings us back to the question of why marketers and social media companies want your personal data and why personalization is the Association of National Advertiser’s Word of the Year. At the risk of stating the obvious here, I’m going to do it anyway.

Facebook, Google, Amazon, and Apple (to a lesser extent) want to accurately predict your future behavior. And what would permit them to predict your behavior with the least possible uncertainty? More personal data. Of course, different kinds of personal data will obviously be more useful in reducing the uncertainty of your true nature. For predicting future behavior, for example, knowledge of your past behavior would tend to reduce uncertainty. In everyday life, we would say that knowing you have a cigarette habit or addiction reduces the uncertainty about our predictions of whether you will smoke tomorrow. BF Skinner’s notion of “learning history” more formally captures this idea.

Nudges and Manipulation: Gaming the System

On top of all this, it’s also possible to manipulate a web app or device to “nudge” users towards certain predicted behaviors as another way of minimizing prediction errors, as Galit Shmueli has pointed out. A/B testing and push notifications might be examples of “tuning” or “herding” techniques that can artificially improve the prediction accuracy of MLP. The app that’s recording your usage data on your phone is, in some sense, just a pocket-sized digital Skinner box with a sleek design and well-chosen color palette. Consequently, we shouldn’t be surprised that the pragmatic behaviorism of Skinner gels so well with the kind of predictive modeling that dominates in industry. Remember when Zuckerberg told his employees to “Move fast and break things?” That’s pragmatism.

Scientific vs. Pragmatic Representations of Persons

We should also be clear about something else. The purpose of industry MLP is not to build the most scientifically accurate model of you, but instead to predict a very specific and narrowly-defined behavior within the already narrow behavioral confines of an app or website. A basic tenet of sampling theory is that your sample of measurements be properly representative of the system about which you are making inferences. Industry MLP doesn’t care about tenets of sampling theory. Your behavior, after all, is just one facet of who you are as a person. It’s not the full story. Christopher Reeve still led a meaningful life after an accident left him quadriplegic. Your behavior on an app is not representative of you as a person.

Our moral and legal codes recognize this difference when we punish those who kill with mens rea more harshly than those who accidentally kill someone. What goes on in your head matters in defining you as a unique person. I assume you also have beliefs, desires, short and long term plans, and various moral values. You also have a body and live in some kind of social community (unless you’re training to be the next Ted Kaczynski) and speak some kind of language. As thought experiments given by Locke and then much later by Daniel Dennett and others have shown, we could have mental representations with the opposite content yet exhibit indistinguishable behavior. This is the so-called “spectrum inversion” argument against behaviorism.

What’s more, the behavior apps and devices measure is actually just a small subset of all the behaviors you might exhibit. Even worse, most data collection fails to distinguish between conscious and nonconscious behavior. Most of what you do while using a computer falls under nonconscious goal-directed behavior. Think of moving a mouse to a button or clicking on a link. Did you have to plan that before you did it? There is so much of your behavior that is excluded from consideration that many commentators — afraid that MLP is eroding our free-will — have failed to realize. For instance, Instagram doesn’t know that you’re scrolling through your Instagram Explore page with a double cheesy Gordita crunch wrap in your left hand; these kinds of behaviors are not measured and so cannot be used to reduce the uncertainty of your true nature as someone who enjoys eating lukewarm, nacho-flavored dog food. You can now breathe a slow sigh of relief.

Don’t Fall into the Radical Behaviorism Trap

Although data scientists and researchers may claim that it works by predicting your “preferences” or “interests” or “needs,” I can give you a very simple argument to show you this can’t possibly be correct. Preferences cannot buy stuff. Needs don’t click thing. Interests don’t add things to carts. The fallacy typically exhibited by researchers in the recommender systems literature is that they have, perhaps, unconsciously equated preferences with behavior. They have fallen into the trap of radical behaviorism whether they realize it or not. They must not have heard of the Inverted Spectrum argument. I can tell you from experience, when you predict an outcome for a person in your dataset, there is no outcome column labeled “preferences” or “needs.” Instead it will simply say “Buy” or “Add” or “Churn.” These are all very narrowly defined behaviors that are the result of a near infinity of prior mental states. To see for yourself, here’s an excerpt from a highly cited paper by Basu et al. (1998):

This paper presents an inductive learning approach to recommendation that is able to use both ratings information and other forms of information about each artifact in predicting user preferences.

So, Industry MLP is slightly confused but thoroughly pragmatic. It has no need to posit theoretical constructs about what really drives you. It doesn’t care about your authentic self. It doesn’t need to care. The Big Four tech companies have stumbled onto the realization that by merely measuring and recording what you do, and when and where you do it (within the context of an app or device), that’s somehow enough to predict your actions on their platforms to their standards.

Just ‘Good Enough’ To Get What I Need From You

In other words, the predictions don’t need to be perfect, they just need to be good enough. What is “good enough”? Well, most likely anything that relates to key business metrics or KPIs such as return on investment, customer lifetime value, conversion rate, and perhaps most broadly at the CEO-level, maximizing shareholder value. Whether you actually have these beliefs or interests or whether these beliefs or interests actually are causally responsible for your behavior is besides the point. Unfortunately, exploring this point further would require another too-long and rambling blog post.

An astute reader at this point might realize that your social, moral, and gender identities could also serve to reduce uncertainty regarding your future behaviors, thoughts, beliefs, desires, and plans. Knowledge of your complete genome or Big 5 personality would also reduce much uncertainty as to whether you are likely to support the views of this political candidate, be open to new experiences, or metabolize caffeine faster than average (a fact that might be worth something to Redbull or Monster, for example).

Luckily the GDPR and other forthcoming regulations tend to put high barriers of explicit consent for allowing companies access to these kinds of “sensitive personal data.” Be forewarned, however, that researchers have already begun implementing recommender systems that use psychological and emotional states to personalize recommendations for you (link to Google patent). I see no reason why such research would slow down.

Photo by Darius Bashar on Unsplash

What Can You Do To Add Entropy?

In short, personal data are the means through which data controllers can reduce uncertainty regarding your future behaviors. This reduction of uncertainty carries a clear monetary value for advertisers and marketers. Luckily, though, you can preserve your unpredictability and thus your autonomy, in the Kantian sense of the ability to legislate for yourself about which behaviors to exhibit. There are various privacy enhancing techniques (PETs) that work by introducing precise amounts of statistical noise into behavioral data or by collapsing specific categories into more general ones, e.g., replacing New York City with New York state. In fact, there are metrics such as entropy-based information loss (EBIL) that are defined using the formula for conditional entropy. These kinds of measures tell us essentially how much entropy remains in some target variable given some transformation or aggregation in another related variable. We might focus on transformations that strike a balance between adding just enough distortion into the original dataset to protect data subjects’ privacy, but which are small enough to still permit useful analysis of the dataset.