How Private Is It? Privacy Metrics and Preservation Techniques (1G1)

From IIW

Session Topic: How Private Is It? (1G1)

Convener: Dwight Irving

Notes-taker(s): Leon Thomas

Trying to get data, but people are bailing out, because people want to know "What are you going to do with my data?" Even friends/family wouldn't trust with things like Phone Records.

Decision was to let people control the data themselves - with hints on how they are doing. This led to creation of privacy metrics

Matrix: Everything that could be known about a person across top People across the left axis

Within the "What could be known" there is info that is publicly known; there is info that personal contacts would/could know. Created an linear algebraic equation on what can be known. This was really difficulty.

Problem: SSN vs Dogs name are not equally weighted; this is where "Feelings" come into play.

There is something that came out of social networking; Concept of "weighted networks" where your social graph can be summed to indicate trust/privacy.

Your real privacy (RP) equals Identity + (Persona (what people know about you) * Identity or I(1+Pp)

Once you start connecting things up, there is a "star" . The "one goat theorem" if there is a certainty, you know what it is. Once someone knows the most important thing about you, the power law says that there isn't much worse that things can get.

Identity doesn't follow a power law.

  • A) Address - factor of one
  • B) Where I start my daily runs - same as address

Name is not terrible identifying

How does this help the user understand how "private" they are. The RP (real privacy) is calculated similar to a credit score (0-1000 scale)

840 is a good score for the general population

One of the first records they got was Phone records:

  • Number
  • Time
  • Rate
  • Plan
  • Incoming/Outgoing
  • Calling City
  • Persona risk - who is contacting you/who can contact
  • Can extrapolate extra data from there

If you give out all of this, your real privacy goes down from 840 > 420

If you don't give last 4 digits of phone, your score would go up 420 > 600.

If you are in a group of 10000, then people feel better about their privacy.

Giving out only Area Code only boosts a bit. 600 > 650

If you date out (calling city, rate, plan), goes from 650 > 800

A user may have a shared secret, but it cannot be factored into the calculation for a specific person.

Reducing entropy:

Reducing resolution. Phone number, you delete the last 4 so that they do not have it.

Figure out context. Heartbeat - your heartbeat when running is one thing, but your heartbeat while in a hospital has far greater insight.

Add noise. How can users show data about themselves; what does someone need to find out about you, and are you providing them too much so that they can figure out what you didn't authorize. - Monkey Wrenching to insert disinformation about a user; can this be done for a user's benefit?

PhoneNumber example: Phone - last 4 = Score 1 Area Code only = Score 2

End user will get a cut of the monies generated through the sale of their data. Potentially a user could enter their data, then as your data is sold you could:

  • 1) get a check each month
  • 2) have the money donated to charity of your choice
  • 3) have your phonebill (example) be reduced as a result is trying to use games to get them involved that would provide rewards for participation.

DataBank could be inserted instead of an ad revenue model where users could be compensated for a consumer's behavior on your site.

They try to strip out identity and use persona instead; You can, however, use this info to identify who a user is.