Anonymized data is a myth, it seems. Esp location data.
11Sorry, your data can still be identified even if it’s anonymized
Urban planners and researchers at MIT found that it’s shockingly easy to “reidentify” the anonymous data that people generate all day, every day in cities.
Thanks to the near-complete saturation of the city with sensors and smartphones, we humans are now walking, talking data factories. Passing through a subway turnstile, sending a text, even just carrying a phone in your pocket: we generate location-tagged data on an hourly basis. All that data can be a boon for urban planners and designers who want to understand cities–and, of course, for tech companies and advertisers who want to understand the people in them. Questions about data privacy are frequently met with a chorus of, It’s anonymized! Any identifying features are scrubbed from the data!
The reality, a group of MIT scientists and urban planners show in a new study, is that it’s fairly simple to figure out who is who anyway. In other words, anonymized data can be deanonymized pretty quickly when you’re working with multiple datasets within a city.
Carlo Ratti, the MIT Senseable City Lab founder who co-authored the study in IEEE Transactions on Big Data, says that the research process made them feel “a bit like ‘white hat’ or ‘ethical’ hackers” in a news release. First, they combined two anonymized datasets of people in Singapore, one of mobile phone logs and the other of transit trips, each containing “location stamps” detailing just the time and place of each data point. Then they used an algorithm to match users whose data overlapped closely between each set–in other words, they had phone logs and transit logs with similar time and location stamps–and tracked how closely those stamps matched up over time, eliminating false positives as they went. In the end, it took a week to match up 17% of the users and 11 weeks to get to a 95% rate of accuracy. (With the added GPS data from smartphones, it took less than a week to hit that number.)
The rest of the article is here
https://www.fastcompany.com/90278465/sorry-your-data-can-still-be-identified-even-its-anonymized
/giphy panopticon
/image “ministry of love”
/image “ministry of truth”
- 9 comments, 29 replies
- Comment
WTF is “Big Data”? Is that like Big Oil or Big Pharma? I hate those guys.
@therealjrn
Yeah like that. Except they own everything more.
@therealjrn Nah. “Big Data” is not a lobbying group. It’s just a generic term for really, really big datasets.
In the real world, it’s a lot easier to generate way more data than you can really process and do something useful with. It’s a big area of research to more effectively search and process extremely large data sets.
An example of “Big Data” would be large-scale astronomical observations.
A more controversial/popular/fear-stroking/troubling/lucrative example is data generated by humans as they go about their lives.
@Limewater @therealjrn
@therealjrn knows all that.
He was jes’ funnin’ with us.
@f00l @therealjrn Yeah, I should have scrolled slightly further down.
@f00l @Limewater No problem sir, we now have your response on file.
@f00l @Limewater @therealjrn If you see something, say something!!!
/giphy all seeing eye
@f00l @Limewater @reg036 @therealjrn
/giphy under his eye
Thinking about this a little more…my grubby little handprints are all over the internet after 3, or is it 4? decades of traveling around. I’ve been a fan of forums, ect. since the old IRC days. Actually, I still am on IRC, albeit through web interfaces nowadays…
I’ve frequently used the same “screen name” in different places as well. For example, this one @therealjrn has been in use since I joined Woot years ago. Now it is linked with them, Mediocre Corp and Big AMZN. I buy all sorts of shit so if “they” want to profile me, fine. It’s way, way too late for me to close the barn door now.
My name is publicly recorded at the courthouse going back probably 40 years. If you want to get really freaky about it, my birth records are probably out there too.
Nowadays, I just concern myself with my financial data and keeping my credit reports locked like they have been for 10 years now.
I’m easy to find, I’ve been in the same town, (REDACTED) for years and years and years. Come see me some time, we’ll go get a cuppa.
@therealjrn
I know approx where you are, in that inferior state you reside within.
I can always drive there, and then put a wicked curse on the plumbing of the residential units you supervise.
Then I’ll just “follow the water”.
@therealjrn
“Follow the money”
@f00l Uhg. You sure you didn’t start early? I’ve got a sewer line they’re fixing…well…not today of course, ground is too wet. They managed to dig a trench way, way down, cutting through a main sidewalk to get the line replaced. Then the rains came yesterday so it looks like the Western Front out there. Hopefully they’ll be back Monday to pour the sidewalk.
It’s covered under a line replacement warranty, so at least it isn’t anything out of pocket.
@therealjrn
Hmmm. Was it my “gaslight them” curse? Or not?
/giphy gaslighted
Ok
I guess there will be devices made that intentionally provide fake data to surveillance telemetry sources. We already have some today - VPNs, fake GPS phone apps, fake caller ID apps, proxies, etc. Spew enough fake data, you can’t tell the real data from the fake. That’s called encryption and why it matters.
The problem us that there are always people who do these things to commit crimes or evade getting caught, and thus taint all who do so with legitimate privacy interests.
We want privacy in our homes with voice-activated devices except until we say “Alexa”, “Siri”, or “Hey, Google”. So how do these devices know that you’ve just said that unless they listen all the time? It’s not possible to have it both ways.
It’s not an easy problem to solve.
@mike808 It’s a very (technically) solveable problem. Companies may prefer to pretend like it is an unsolveable problem, because strongarming you is the point of the exercise.
Phones, for example, have little chips that only listen to recognize a keyword/phrase, before waking up the rest of the listening apparatus, to conserve battery life. You’d know if your phone was always listening, because your battery would be dead.
Apple markets how they spend effort and money trying to know less about you while delivering services based on your information.
This whole surveillance problem (I think it’d be useful to distinguish that from “privacy”) is a political one.
A note on that fake data thing: that’s how I recently got locked out of a Google account in spite of having the correct username and password – not keeping all the tracking / surveillance stuff around. Very suspicious. Bad sheep.
At one point I had to answer like 12 Google recaptchas in a row to get into a web site – I think it was … Paypal? that had recaptcha embedded in their phone app. Google doesn’t belong in my financial transactions. Anyways, they’ll punish you for not playing along. It is a political problem, with political solutions.
@InnocuousFarmer Somewhat. There’s also the erosion of 4th Amendment rights in that the government believes that just because data evidence of a crime might exist, it therefore does, and furthermore, the government has an unlimited right to access that data, irrespective of your 4th and 5th Amendment rights, which, btw, SCOTUS has settled law that clearly applies constitutional rights to all people, regardless of national origin / citizenship status. The supremacy of Consititutional law is what is at stake here.
You have every right to protect your “papers”, but no such rights to act in protecting anyone else’s. Which is why every attempt is made to make allegedly “your” information not yours. The fascist dystopian police state is nigh.
2 out of every 3 American Citizens live in a constitution-free zone.
First they came for my 4th Amendment.
Then they came for my 5th.
Then they came for my 2nd.
And then my 1st.
And when they repealed them all, and there was no one left to save anyone.
This is fucking hilarious. I was watching a video at https://www.cnet.com/how-to/tips-for-your-new-google-home-speaker/ to learn about my new mini and the man was saying commands.
My little mini behind me started talking and doing what he was asking.
So, yup, it is always listening.
@therealjrn
/wootstalker https://shirt.woot.com/offers/were-listening
We’re Listening
Price: $19.00
Condition: Probably New
/wootstalker https://shirt.woot.com/offers/always-awake
Always Awake
Price: $19.00
Condition: Probably New
@therealjrn
I’m shocked. Shocked, I tell ya!
Who the hell leaves their GPS on when they aren’t actively using Google maps?!?
@unksol If you have a device like a Tile, your location service has to be on at all times and the Tile service actually uses everyone’s data to find your device. There are many reasons your location data would need to be on all the time. I don’t sell drugs so I’m not really worried about it.
@unksol
Even with loc data off, it seems that so-called “anonymized data” can be attached to individuals.
@f00l @unksol @Fuzzalini What about that Pokemon? heh.
@Fuzzalini @therealjrn @unksol
Pokemon is too just difficult. Fear I could never master it.
@f00l Funny you say that. I used to play the game Ingress for a long time, it preceded Pokemon Go. But I couldn’t master catching the Pokemon. It pissed me off so much I uninstalled it within hours.
@Fuzzalini
My remark was a bit of a joke. I almost never play mobile or computer games … Or only the solo quickie ones.
I presume if I were to play I would be terrible.
@f00l @Fuzzalini Ingress Prime is the new game from Niantic (the folks behind Pokemon Go).
https://www.ingress.com/game/
@f00l @Fuzzalini @mike808
I believe this predates Pokemon by several years… Maybe just 1 or 2…
@f00l @jst1ofknd @mike808 Yes, I was playing Ingress 3 years ago, so it’s not new and it predates Pokemon by at least two years.
@f00l @jst1ofknd @mike808 I really enjoyed it. But they made it too hard to level up to the highest levels, so I bailed out after playing for about a year. I lost weight and got walking every day and would probably still be playing if I didn’t have to play all the time to level up.
@f00l @Fuzzalini @jst1ofknd
They’ve revamped it. That’s why it’s called Ingress Prime. Yes, the OG Ingress predates the Pokemon deal.
/wootstalker https://www.woot.com/offers/echo-plus-1st-gen-with-built-in-hub
Echo Plus (1st Gen) with built-in Hub
Price: $79.99
Condition: Refurbished/Not New
@therealjrn
Oh gosh. I made an obvious moral error in getting rid of my Alexa devices.
I want Amazon to monitor all my speech.
I need Amazon to monitor all my speech.
Help me, Woot! After I finish creating the extra accounts, I’ll order 999 of these.
I just want to be near Alexa all the time.
@f00l Excellent. It sounds like your re-education is progressing nicely.
Another big problem is big medical data. I was part of a panel on that several years ago. A reporter was on that panel and he managed to identify people using just the info in one big data set and easily accessible other information. This means all insurance companies of any flavor, medicare, auto insurance, life insurance, disability insurance, nursing home insurance, etc. will be able to see your medical history and use that in underwriting if they so desire (right now there are only a few questions they ask that could cause you to “fail” - I can just see auto insurance use data mining to link health conditions and accident rates, already our credit scores are used to set rates except in the few states that forbid this).
This matters for numerous reasons, not to mention the pre-existing condition issue. While you are still, at least temporarily protected while it is under appeal (could somebody please remove TX from the USA for what that judge just ruled that just torpedoed this while attempting to kill ACA/Obama Care so the rest of us aren’t affected), that is only for health insurance coverage and removal of the life time maximum for coverage.
The other unaddressed big problem is what is NOT protected by ACA and generally requires passing underwriting: long term and short term disability insurance (sometimes even when offered through an employer, especially long term), nursing home insurance, life insurance (except if the employer offers, say 1.5 times your income but then you may not be able to get more than that)…
AND… because medical underwriting is allowed connected with medicare this will affect you at 65 unless you have an alternative to medicare. You have 3 months before and 3 months after you turn 65 to choose a medigap policy and not be subjected to underwriting - do not screw this up and miss the deadline (by the way COBRA, by law, ends when you 65 even if you have months and months left of it). Miss that deadline, miss payments long enough and get your policy canceled then you are subjected to medical underwriting and may have to pay higher rates and can be denied coverage.
In all states but three you are then married to the medigap plan you signed up for and the company that you are using. You want to change plans or companies? Well good luck with that (unless you live in one of a couple of states). Medical underwriting may mean you can’t or if you can then it may mean paying way more than you would have.
Being able to identify people in Big Data medical databases may mean that all sorts of things that currently don’t use medical information, or only use it in a very limited way, will have a way to access our entire medical history and then use big data fishing expeditions to find correlations between our medical information and what ever it is they are interested in. Marketing companies would have a hay day. You have asthma? Well then lets spam you in a targeted way. You have migraines? Lets not hire you because you are likely to miss more work. You have angina? Lets raise your auto rates because you might be distracted due to pain be and fumbling for a nitro pill while driving thus be more likely to be in an accident.
I am already screwed (3 cancers, one with no cure, and a couple of other things, including one that is likely genetic) so I decided to sign up for that million people research data base (and I found out that my medical history is already part of a huge, multi center cancer database because of where I get treated - so read what you sign). I am the one in the family getting tested for genetic stuff instead of my siblings because I am already screwed and they aren’t yet. If I turn up positive for anything then they have a hard decision to make. Do they then get tested and find out, for their own peace of mind, for being able to then potentially do things to decrease the risk that may require the risk be known for health insurance to allow more frequent than recommended testing? And then be potentially be screwed in the same way I am (I already fail medical underwriting so I can’t get anything that requires it)? Or buy the insurance first, wait a year, then get tested and then hope you never miss a payment? Or never get tested and cross their fingers and hope for the best?
If your medical system uses any of the major computer programs out there for medical records, etc. a number of medical centers can already see your information across any system that uses that. For example, I know from first hand experience, that 2 university medical centers in two different states that I have used plus MD Anderson Cancer Center use Epic EMR (one of the market leaders in this market) and so anyone looking now has more data about me in one place. For researchers to use it I don’t even have to give permission if they say they will make it anonymous. They just have to get the OK from the Human Subjects Board. BUT since more and more journals require that the data set they used in their study (eg their anonymous data, not where they drew it from - oh wait this is what the reporter used to identify patients by name, including one women on that panel I was on) be publicly available, it will be come easier and easier for companies to link that medical information to what they already know about us (such as marketing companies, data aggregation companies…).
Been tested by 23andme? Guess what - the real purpose of their business is to build a genetic database to then rent out. They have managed to figure out how to get us to pay them to be in their database and pay them for the lab work needed to get us in there. We are not the customer, we are the raw materials.
In my opinion the cat is already out of the bag with big data and our laws aren’t even close to dealing with it.
@Kidsandliz
Yes.
Even if privacy laws and (and laws governing which existing data can be used in what ways, or what data can be accessed by whom) get passed, how does one know whether a company or organization is obeying that law or not?
Companies can/do lie or blow smoke about this stuff.
Lawyers, legalese, obscuration of data used, having only small units within an organization knowing that this data or that data gets used - only when data has obviously and proveably been used, or when a whistleblower brings proof of data access, would anyone know.
As our shopping experience and credit experiences are now customized by what they know, organizations (and prob data-savvy individual operators) will be - or can now - customize their uses of data do a degree that makes the data access and use unprovable within an individual experience.
And in the case of data that shouldn’t be out there in the commercial or organizational sphere, or in the wild, anyone know how to “get that fixed”?
: (
In short, we’re fucked.