Google sells my search terms? Quelle surprise!

2022-09-09

I was asked by Mayank Sharma writing for Lifewire if I was surprised that Google sold on my search terms.

Reader, I was not at all surprised.

Mass Data Collection : A history of collation

The very root of this issue comes from the Business Models of Google and Facebook.  Search data is a small part of the massive data empires that modern startups are encouraged to build.  The business model requires them to then later on sell on that generated data.

Users of these services search but also add additional data in the form of personal information and their contacts.  LinkedIn, Twitter, and Facebook encouraged users to find each other and connect their accounts.  More data gets collected and added to IDs like your google AD ID.  The entire ecosystem of applications and personal devices is designed to sell the potential of eyes on ADs.

Private information is reduced to data points to sell.  This enables the ecosystem to continue with little thought to the damage that information leakage can cause to individuals.  This is the legacy of Web 2.0.

Shosanna Zubkoff said in her book  The Age of Surveillance Capitalism: The Fight for a Human Future at the New Frontier of Power " in addition to keywords, each Google search query produces a wake of collateral data such as the number and pattern of search terms, how a query is phrased, spelling, punctuation, dwell times, click patterns, and location." [1]

As these systems grew they became more complex.

Fast forward to March 2022 when Senior Facebook engineers were giving evidence in an ongoing lawsuit in the wake of Cambridge Analytica.  [2]The engineers were unable to articulate what data was stored in which Facebook's systems. Those systems were created to facilitate targeted advertising.  To share more of that data and sell more impressions.

So this was not a surprise. Data leakage isn't new.  The risk has been there from the beginning as the companies started to collect and broker data.

In 2010 the Register broke the news that Google, Yahoo, and Bing were leaking lots of personal data, including search queries.  [3]

In 2017 another AD tech company claimed that data leakage happens because of the way these systems were built.  "This system was not built for data protection. Instead, it was built to enable hundreds of businesses to trade personal data about the people visiting websites, to determine what ads to show them, and what advertisers should pay to show those ads." [4]

Should you be concerned? Can we actually do anything about it?

The more individual information they have on you, the more they can offer the potential of your eyeballs on a website you visited.  The utility of that data goes beyond selling to Internet Users.   As we discovered with Cambridge Analytica and its relationship with Facebook.

If you've been to a website and not bought something, companies like google offer "re-targeting ads" as a product.[5] So the website can keep trying to sell you its products.  Targeted Advertising is stalking us through our browsing.

It's why enforcement of  Legislation like GDPR is vital.  Ads are now starting to be on our desktops, and in our browsers.    Our devices are hooked up to the Internet feeding more data to these networks with little oversight.

Our online interactions are increasing.  We need to ask questions about how the data is being collected and processed.  What is that data? What entities and individuals is the data shared with?  Finally, when is it destroyed?

We are building our own virtual Panopticon.  Instead of being observed in a cell by an unseen individual (or not) to enforce good behaviour.

The Panoptic cell is our profile of all various bits of metadata pooled together into shadow profiles. While our data points get shared between different services and data brokers.  In effect, our digital twin.  The more data points added the more accurate targeted advertising gets.  We need to look beyond the nuisance of tracking ads and consider what these massive correlated datasets could be used for in the Future.

The 2016 US election and the EU Referendum gave us a warning about how Cambridge Analytica and Palantir were used to profile and target voters.

So how can people limit the damage?  Is it pointless to try?  Much like any protective endeavor, it takes constant vigilance.

The online landscape is constantly evolving. The less information that an individual puts up on the services, the better.

Education is an important part of protecting your privacy. One day to keep track of is  Data Privacy Day on the 28th of January.  If you are thinking of using a service, search for information on them.  Take the time to look at those consent popups.  They are eye-opening.

There are various privacy extensions on firefox.  Installing adblockers like Privacy Possum and Ublock Origin is a good start.   Consider using the tor browser.

Consider giving up Google and Facebook services.  There are alternative email providers if you don't know how to host your own email.  There are also alternatives to Instagram, Youtube, and Twitter. These are a group of alternative apps that communicate with each other called the Fediverse. [6] Many accounts on the fediverse are community and privacy-focused so it's a good spot to find out more.

The EU is funding more privacy-focused Open Source Projects as part of its Next Generation Initiative.  If we choose to use publically funded Open Software you are choosing transparency in funding and with code. EU Funds like NGI ZERO[7] and its follow-up NGI Entrust [8] list all of their funded projects.

The EU has funded projects that are integrating with the Fediverse to provide alternatives to Google and Facebook.  There are VPN alternatives like Wireguard and several other privacy and transparency-focused projects.

[1] https://longreads.com/2019/09/05/how-google-discovered-the-value-of-surveillance/

[2] https://theintercept.com/2022/09/07/facebook-personal-data-no-accountability/

[3] https://www.theregister.com/2010/03/23/side_channel_attacks_web_apps/

[4]https://web.archive.org/web/20180331234847/https://pagefair.com/blog/2017/understanding-data-leakage/

[5]https://ads.google.com/intl/en_uk/home/resources/retargeting-ads/

[6]https://fediverse.observer/

[7] https://nlnet.nl/PET/

[8] https://nlnet.nl/assure/

https://web.archive.org/web/20230208035505/https://www.lifewire.com/your-favorite-websites-could-be-leaking-your-searches-to-the-highest-bidder-6561123