Sunday, July 6, 2014

You the Outlier - Why Privacy/Anonymity is Important in a Big-Data World

In my previous piece, I argued that privacy was dead and multi-persona anonymity needs to take its place.  This is based on a critical premise though, that we need privacy (or anonymity).  I hear many poor arguments in support of privacy.  Let's look at those first and then consider a better reason.

Being Held Accountable for Your Actions
Lets address all the poor reasons we hear.  Obviously the argument against privacy is, "Why do you need privacy if you have nothing to hide?"  There are multiple luke-warm responses:
  1. "BECAUSE" - The concept that it is something you should 'just have'.
  2. What if the acceptability of my actions changes with the progression of time or 'those in charge' think my actions are a problem when I do not?
  3. No-one is perfect.  Should that be held against us?  In perpetuity?
  4. What about the insurance company who'll raise our rates when they find out what we've done?
These are all poor arguments against lack of privacy for one reason: They all assume someone shouldn't be held accountable for their actions.  While I think forgiveness is at the foundation of humanity, I don't think not being held accountable for actions can be held up as the reason for needing privacy.

Being Held Accountable for Others' Actions
In a big-data world, we are not necessarily judged by our actions, but by the profiles we match.  This is nothing new.  But while in the past an employer might require employees to sign a letter letting them inspect his driving habits and then fire those that receive any tickets or a DUI, with massive data available it can be taken to an unprecedented level.

Instead of inspecting a driving record, an employer may install monitoring devices in personal vehicles.   The monitor had a database of speed limits.  If you went more than 5 miles over, you received a warning to slow down.  If you didn't within 6 seconds, your violation was reported which could lead to your firing.

The first case is a crude model with very bold, red lines not to cross.  The second is a much more subtle model, with ambiguous grey lines. It is one fed with every speed you have ever driven.  It says that those who spend more than 5 miles over the speed limit regularly are a liability.  However, where did that model come from? How was it validated?  Was it validated?

The reason privacy (anonymity) is important is that every model has a large number of outliers, and there is a good chance you are that outlier in some model.

In a big-data world, we are judged against models.  "If a person exhibits, A, B, and C, then they must be D".  Being D may mean being unemployable.  It may mean being paid less or paying more.  It may mean being excluded, untrusted, or any other number of things.  However, in the model, there will be a number of outliers.  No-one cares for them as, by definition, they are not the norm.  Still, on the flip side, everyone is probably an outlier in some model.  And being judged by a model to which you are an outlier is inherently being held accountable for others' actions. 

In this case, you have done nothing wrong.  You will not do what the model accuses you of doing.  But you fit some model which you will not get to challenge and which may never have been critically assessed in the first place.

This critique isn't meant to detract from the usefulness of models.  Models can co-exist with privacy and anonymity.  Models trained on real data still offer significant value in many areas including trends and decision analysis.

But we want to make sure models don't become the pre-cogs in Minority Report.  Otherwise, the movie Gattaca could easily become our future.  Where privacy is not about being held accountable for the things you did.  It's about not being held accountable for the things you didn't do.

No comments:

Post a Comment