Image Hijacking in Google Image Search: How it works and how to protect
Image Hijacking in the Google Search results is an issue that has a long history. It is obviously an easy way to redirect traffic to other sites, mainly spammy websites. It has looked like a Google bug since 2007 when it first occured, so let’s dive into it to understand what is happening.
Reports about image hijacking go back to 2007 when Philipp Lenssen reported about it in his Google Blogoscoped Blog. Barry Schwartz also reported about that phenomen at Search Engine Land. So this is an old issue, but it still exists! Here is an example. It is in German, but there aren’t many cases like that out there and the Google algorithm seems to fix it after about one day automatically. However, Google hasn’t seemed to fix the algorithmic issues itself when indexing pictures.
When you search for “weihnachtskarten” (Christmas card) in the German Google, you will get something such as this:
As you see, we are not in the image search, but the same images still appear. Normally, in the organic results you will see some images such as these, all of which lead you to the website that contains the image. And the second image will lead you to the page trendstrom.de. In the picture below, you can see the same image at the bottom left.
Having said that, I do not know why people would buy a Christmas card with a slippery woman on the top right. However, it really is on that page… hey, it’s a free world.
The image hijacking effect
That is what we are used to. Now suddenly, from one day to the other, most of the images lead to another page:
You can see the same image on that page, as well as some additional ones. They are the first images that the search results page Google shows in the organic search results for “weihnachtskarten”. But what happened? First, let’s take a look at the URL link that is behind the image in the Google search results:
As you can see, the page has implemented the image from its original source! The link URL still has the same imgurl from the original source. Clicking on it will open the image in full size. The question is: Why does Google use the URL of that page instead of the page where it was originally used?
First, we can see that the page is in French. Therefore, it doesn’t make sense to link to that page from German search results. Next, the website http://golfdenantescarquefou.com/ has no real trust or high rankings in the search results. Therefore, there is no logical reason to link to that page.
Now one day later, the situation has changed. Google has seemed to correct that as the link goes to the original source again. Everything’s fine? No, that’s not the case! Two days later, it’s the same situation but with a different target:
Getting back to the first spam page, I get this:
Obviously that page has been removed. Not only the page but the whole domain. This could be the reason for the change.
But the pages tell me that somebody is making a business out of this hijacking. So I looked at the WHOIS information of both domains and – BAM! – they are the same. So let’s make a reverse domain check of that domain:
The image hijacking hack
When you take a closer look at the source code of the spam page, you can see the rel=”prettyPhoto” within the <a> tag.
<a href=’http://www.trendstrom.de/weihnachten/motive/weihnachtskarte-8.jpg’ rel=’prettyPhoto‘>
<img src=’http://www.trendstrom.de/weihnachten/motive/weihnachtskarte-8.jpg’ style=’height:100px;margin-right:5px;margin-bottom:5px;’ /></a>
This could be a reason that leads to such a problem. Of course, there is obviously a bug in the Google algorithm, but it’s possible that the rel attribute leads to a misinterpretation by the algorithm. The rel attribute is normally used for nofollow or follow information. But officially there are also other possible values:
alternate, author, bookmark, help, license, next, noreferrer, prefetch, prev, search and tag
The tag could be a keyword for the current document. Therefore, the rel=”prettyPhoto” is correct HTML. In the case regarding our client, I will add the attribute to the source. Maybe this will help. I will update this blog article when I have new information. Most likely it’s only important that there is a rel attribute and the value doesn’t matter, but I will try the prettyPhoto value.
Another solution could be to change the URL of the image on the original website and hope that Google will fetch it soon and exchange the image. But of course, spammy websites may have an automatism that regularly checks the image results and changes them regularly. When doing this, you might put an image at the original URL that contains some text like “Welcome to a crappy spammy website”. Or add the words “Sandra, I love you!” and show it to your beloved. Hey! Twenty years ago you sprayed the name on a wall and you were happy! Did you ever try to write something on a third website without hacking? Now is your chance!
And the last solution of course could be that Google is finally fixing this issue, 8 years after it appeared for the first time. Yes, it is not that easy for an algorithm to understand what the origin is. But obviously the algorithm is able to solve the problem; however, a gap of 2 days is needed for the issue to be solved.
If you have had the same problem and you have other suggestions or ideas, please share them!