On one level, this story sounds like yet another one of those Internet privacy stories -- Facebook is watching you online, Google is reading your email, or your computer may actually be a zombie in a bot net. For much of the technology-challenged world, the headline embodies all of the frightening possibility of a campfire ghost story. But, as with other big stories with complexity and depth, this story is far more nuanced than a simple black and white, right and wrong.
The reality is that we live in a data driven world. From the moment you turn on a device and it connects to a network, there is an electronic discussion that takes place. Some of the communications may be innocuous, like a device handshake with the network to give it identity or your computer asking a server what time it is. Or when you pull up a web page in your browser, your computer talks to a server that then sends back the data that your computer needs to build a web page. In the process of sending you data, that server may verify who you are, then call a bunch of it's friends, tell them your name, and ask them to send you data as well. And in a world full of servers and electronic logs, each of these transactions is logged in journals, and your history in each may affect the other.
This is the electronic ecosystem. Fundamentally, some of these things need to work like this in order for things to operate. Below the web sessions or the phone calls, the core back and forth of devices and interacting requires identity, memory and structure.
Anonymity on a network is not true anonymity, what it really is is a disconnect between identities. In the brick and mortar world, it is possible to have essentially anonymous transactions. You can have a conversation with another person in an isolated room. You can go to a store in a different area and purchase something in cash. But electronic transactions are different. Each electronic communication is like a phone call from one location to another. Electronic payments are essentially promises to transfer funds with a number used to identify the person writing the IOU. While we might want to imagine electronic activities conforming to the realities of our experiential world, they don't. This can have both costs and benefits.
Logs, Logs Everywhere - In Pursuit of Real Identity
Web marketers have long known that, while it's interesting to see what pages people visit, it can be even more interesting if you know where somebody came from and where they go after they visit your site. Is this someone interested in your product? Are they comparing your product to a competitive one? Have they been to your site multiple times? This is the type of data that can be extracted from a simple cross-site tracking cookie. Within that, typically, we try to weave together a tapestry of data points. Can we get the visitor to complete a form and give us some identity or contact info? Did they download files?
In it's simplest way, these are elements that can be tracked from a basic web log on one site. Or, using something like Eloqua, Marketo or Pardot, tracked across multiple sites and marketing deliverables. As marketers, we look for every bit of data that we can get, every touch point, in an effort to build an identity. We want to invest all of our selling resources into the process of converting that potential customer into revenue.
And yet, for all of our efforts, our tracking and our analysis, our best efforts are still just a sketch. Our simple tracking can easily be fooled by someone doing research at home, then going into the office or maybe using a different email address.
This problem of identity has always been an aspect of the web that's been both celebrated and loathed. While we're happy to be 'anonymous' when we're looking at things we might not want people to know about -- competitor's web sites, job listings, embarrassing medical conditions or even online porn -- anyone who has been in a chat room, a forum or the comments section of a blog knows the evil of anonymous trolls posting irrelevant or hateful comments. Real identity is often a thematic solution for these issues, sort of a, "you wouldn't post that if everyone knew who you were" approach.
But in that way, you can see where a government program that reaches across services and joins the various data streams is not particularly mind-blowing in terms of technical scope. For an organization like the NSA, being able to sort through different emails and identify that even though the email address for crazy voice in the alt.discussion.terrorist-bombing-plans isn't the same one as the guy who just ordered 10 pressure cookers on Amazon.com, and even though one uses Gmail and the other uses Yahoo, they both actually originate from the same IP address.
Crafting Persona and The Importance of Story
If you were to look at your typical web site log, what you have is a series of events. Data points. But they are nothing without a story. Consider a typical goal path through your web site ending in someone filling out a registration form and downloading an electronic asset. If you have 100 people visiting the page with the registration form but only 50 downloading the file, you need to build a story that explains the two pools. For those that didn't download, maybe the form was too long. Maybe they just wanted to see browse. Maybe they were competitors.
While it may seem like an arbitrary process and difficult to imagine, we actually do this all time in real life -- it's how we build an understanding of events. Think about when you're driving and you see another car use a turn signal. In simple terms, it's a directional indicator, a single data point that tells you that this car is planning to shift in that direction. But, in order to really understand what they intend to do, you need to put it into context of a larger story.
- Do they intend to change lanes?
- Are they planning to exit the freeway?
- Are they making a turn?
- Did they forget their blinker and are driving down the road with their turn signal on?
Understanding can be particularly challenging when you're looking at similar behaviors. Is this person weaving because they are drunk, dialing on their cell phone, or just being buffeted by crosswinds? In this context, one might be an ongoing threat, one a short term threat, and the other a broad-scale operational concern.
Building a story about online activity requires a much broader understanding of the landscape that the person is interacting in. Imagine the example of a single data point in your own system log, one where your system connected to an IP address in China. When your system connected to a server in China, did it go there because you loaded a web page with an ad network that pulled a file or a script from a server there? Did it connect there because your system has some advertising or tracking cookies on it from a previous visit? Did it go there because your system has malware running and it's compromised? Or did it just connect there because you're running Skype and there's a peer-to-peer link that connected there? Without having a broader tapestry of the transaction, this single data point is unintelligible.
Story and Data Correlation
In real life face-to-face interaction, understanding what's inside of someone's head can be difficult. It's potentially more problematic using electronic data. Even with a broad set of data points, algorithmically understanding intent and motivation often fall short. Consider Amazon. A visit to Amazon will get you follow-up emails with pricing deals on the things that you looked at. While this type of remarketing has higher clickthroughs than other programs, how often does it feel like you're being spammed? And, when those items that you searched appear on your 'My Amazon' page, how often do they actually help you get to the thing that you were interested in? On the marketer's side of the equation, that value is greater than zero so it counts as a win, but on you, the customer's side, it's far from a perfect match. If just having lots of personal tracking data was a slam dunk for understanding motivation, Facebook's advertising programs would be far more effective.
Ultimately, our story constructions are shaped by a variation of A/B testing and validation. First, there is the story, the hypothesis -- since this guy just activated his turn signal, I think he's going to exit the freeway. Next, we have the test -- does he get off at the exit? Once we've completed the test, we now have to evaluate the results and reinterpret our story.
Observations about data -- like the correlation between purchasing habits and pregnancy -- don't just bubble up from the data. They require a hypothesis and a framework for analysis. Consider this great article, In Head-Hunting, Big Data May Not Be Such a Big Deal, in the New York Times interviewing Laszlo Bock, senior vice president of people operations at Google. In the interview, Bock talks about some of the practices that Google used during the hiring process, and how well they correlated to their actual job performance. Essentially, he shoots down the value of famous Google practices like 'brain teasers' and asking all candidates for their GPA (I know what you're thinking, tell me something I didn't already know). Keep in mind, before they could evaluate this to see if it actually correlated to performance, somebody came up with the hypothesis that these things mattered. Google wanted to hire the smartest, best employees, so they defined a hypothetical profile of what those people should look like, then ran interview screening processes based on those. It's only after years of running this experiment that we see their hypothesis is being shot down.
Secrets, Lies, and the World of Cyberspying
Arguably, the most sensitive point in this whole Snowden story surrounds the secret, classified nature of the programs. Admittedly, it's difficult to measure a secret program. On the one hand, you have this big reveal that the government monitors electronic communications and activities, the great NSA version of Eloqua. Yawn. On the other hand, you have the government claiming that it is a national secret and officials saying that they didn't monitor communications.
The government's interest in having access to this kind of data is not new. While it seems like a rather simplistic idea now, remember the clipper chip? This was basically the government saying we can help American businesses with encryption, but we'll keep a key to the back door open so that we can monitor the bad guys using it. After the Bush-era telecom monitoring stories, are you really surprised that the government has an ear on the Internet and is hoovering up your electronic communications?
At the same time, remember the environment that you live in. There are malware exploits out there in the wild that allow non-government entities to monitor your computer, log your keystrokes, even turn on the camera and microphone in your computer -- some criminal or 15-year old pervert could be watching you through your laptop as you read your morning email. There are foreign governments that have exploited your electronic systems to gather intelligence on you, on your business, and on your technology. And the other day, as you drove home chatting with your significant other, you actually broadcast all of those secrets over the radio. Admittedly, it was a cellular radio designed with controls to make it more secure and more private, but did you really think that it was equivalent to the two of you speaking intimately in your bedroom?
This is the reality of the environment that we live in. The fundamental nature of these technologies means that electronic data is available and it can be monitored. But just as in the real world where you're unlikely to physically prevent an armed police officer from entering into your house and searching your premises if he wants to force his way in -- saying "you can't come in" doesn't actually prevent a search. Instead, historically, we have opted to disincentivize forced entry behavior by making any evidence collected without a warrant be inadmissible in legal proceedings. With electronic data right now, we've essentially handing the review of this over to a secret process. Rather than simply handing over the decisions surrounding the implications of this to secret, hidden elements of the government, we need a more open discussion of the potential and the impact of these types of programs. We need to define a framework for what we establish and rights and protections in this modern data environment. Otherwise, what happens when the government starts sending you Target-like coupon books because you might be guilty of a thought-crime?
Again, the real problem here isn't the data, it's the application of the data and the potential for abuse. It's not the terrorist plots that you stop, it's what happens if you know what porn site Mitt Romney looked at. It's not just the government that you fear; instead, it's what happens if the non-profit that you work for finds out that you 'anonymously' publish a blog about your sex life. Or what will your neighbors think if you order birth control from an online pharmacy because your local pharmacist has moral reservations and doesn't fill those orders?
In the vast and expanding world of our digital breadcrumbs, we all have moments that would rather not share with our friends, our colleagues, or our government. In the court that governs shame and embarrassment, there is no way to disincentivize moral outrage. There is no statute of limitations. Whether you're Paula Deen, Lance Armstrong, or (one of my favorite controversies) Sasha Grey, the perception of who you are and what you stand for lives in a sliding scale world that changes over time. Sometimes that time period can be short, sometimes decades.
Our electronic data is a lot like our DNA. With the evolving understanding of DNA, increasingly we know more and more about a person from their DNA. We can understand their genealogy, we can diagnose what's wrong with them, and we can even make predictions about their future. This type of information is so powerful that, as a culture, we attempt to be very careful with the availability, distribution, and use of it. We fear -- and probably rightly so -- discrimination, exclusion, or a wide range of potential limitations to life, liberty and the pursuit of happiness. Because we can imagine the dark potential inherent in all of this, we are cautious. In that same way, we need to approach our digital data in the same way.