Google’s problematic Right to be forgotten algorithm
As I stated in my last blog post, after 17 years of working in the field of SEO, I remember 2 reproducible and severe errors by Google. And now I have found a third error from Google. Well, it may be an error from my point of view, Google may view that as “by design”.
There has been a big discussion about the judgement of the highest European Court pertaining to Google and the right to be forgotten. This has created more questions than answers. Since June 26, 2014 Google has started to delete results from their search results. A small piece of text at the bottom of the results page explains that due to the constraints of the European data privacy regulations, some results “may have been removed”. There are also already discussions about the implementation of all that.
Google has stated than several tens of thousands of requests have already been received. When you look at the statement at the bottom of a search results when you search for a name, you know that in the past something has happened that this person wants forgotten.
This has actually occurred to the following people (click on the link to see the full search results page):
And many many more. The screenshots are in German, as it cannot be reproduced the same in English. There seem to be different results depending on the language. It is also possible that this happens, because this new technology is just rolling out to all Google data centers.
Do you not know these people? I don’t know them as well. However, they are all members of the European Parliament.
About 20% of all members want to be forgotten.
UK names are more forgotten than German names
The strange thing is that with the German names of the European Parliament members, the problem cannot be reproduced in the same way. The Google algorithm seems to prefer English names.
Obviously there is a problem with the algorithm. Additionally, the following names have that statement at the bottom:
Fred Parker, Hans Parker, Geraldine Parker, Abbey Parker, Alicia Parker, Johannes Parker, Sven Parker, Timo Parker
Matt Parker, James Parker, Thomas Parker, Francis Parker, Mary Parker
It looks like that if a first name and a last name is part of a name that is on the list of the forgotten ones, then this statement appears. The question remains as to whether or not any result is actually removed.
As we can see, Google states that these results “may have been removed”, which is unclear as to the issue of removal. And additionally, why is there a difference if I search in English or German?
Matt Cutts, the noted head of the Google web spam team, told me: “Note that we show the notice for queries that look like names in Europe.” This sounds as if a name is similar to one in which there had been a request for removal, then this statement will appear at the bottom of the list. However, it still looks strange.
Let me ask you a question: What would you say if that statement appears in the results when you search for your own name? And what if you have a rare and unique name? Then most of the results will be about you. Everybody that searches for your name will also see it. Which is exactly what has occurred with Geraldine DeRuiter.
Geraldine DeRuiter is the wife of Rand Fishkin, a well known person in the SEO industry and the founder of Moz.com. He was surprised about that and told me that his wife had not submitted any such request for removal.
So I tried other names that start with “Geraldine”, and sometimes you get that statement of removal, sometimes not. It still looks like “Geraldine” is a first name of a person that submitted a request, and if you combine it with a last name from another person that did a request, this statement appears. But not always and not in every language.
The more people will submit such a request, the more such statements will appear in the results. In the future it’s possible that this will appear in most name searches. Does that make sense? To me it looks like an algorithmic error. Is it possible that Google wants it that way? Is it their version of protesting against the judgement?
If I were Geraldine DeRuiter and I had not submitted such a request, I would see that as a false-positive signal. I would not want this below the search results of my name. The question is whether or not Google will change their algorithm and if somebody will fight against that. In any regard, my personal opinion is that this is wrong.
I will now demonstrate how to check a bulk list of names for the removal statement. If you want to check a list, you can also use Forecheck for it or just drop me a line and I will have a look at it.
How to check bulk names
You can check all these names by hand, but here is how to execute it in an automated way with bulk lists of names in Forecheck:
First, save that list as a text file. Now we need that list with links to the Google results page. First you copy the URL for search strings into an Excel list (first row) and the names in the second row.
Secondly, you copy the two columns into a text file and delete all tabs. Now you have a list of all the names with the URLs of their search results page.
In Forecheck, you can open a list of URLs.
Next, open the generated text file. You will now see the list of all URLs in Forecheck. Click on the arrow at the top to start the analysis.
Now Forecheck will run through all the URLs.
After finishing the list, you can go to the Search tab and search for a string such as
“may have been removed”.
Pay attention not to use a string that could be part of the result without the removal statement. Searching for “removed” could lead to false-positive results when just the word “removed” is a description text of a result.
You can search the Content of all pages (which is the visible content) or the Source (source code). Searching the content will be faster, of course.
I hope that Google will refine it’s algorithm and change the way how they interpret the names. Perhaps Google wants this discussion to occur as they probably deem this judgement as a mistake. Of course, this judgement is not only a lot of work for Google, it also brings up many additional questions. And this problem will not only hit Google, it will hit other websites, at least the popular ones. How will they handle it? Only the future knows.