The SEO onpage problems of metafilter.com

This post is an onpage SEO analysis regarding MetaFilter.com. It details the difficulties and problems that arise during an SEO Audit, and additionally it explains the onpage SEO problems pertaining to MetaFilter.com. You will receive insights on how an SEO analysis is created as well as that any SEO analysis can be generated by different methods and focus on various elements. However, a great deal of issues regarding site analysis should always include necessary important elements. As for this analysis, all of us will get a more detailed view on what has happened at MetaFilter.com.

Tasks: Onpage SEO Analysis, basic SEO considerations, Broken Links, External and Internawhatl (Near) Duplicate Content analysis

Patient: MetaFilter.com

Recently there have been discussions about MetaFilter.com and a possible Google penalty correlating to a drop in the rankings. The founder Matt Haughey posted at the end of May 2014 that due to the traffic decrease and revenue plummeting, three moderators would be laid off. He posted more details in his article at Medium.com. In short, he claims Google as one of the primary reasons. “Live by the Google sword, die by the Google sword” brings it to the point. In his posting at Medium.com, he talks about strange emails from other people regarding spammy links that point from MetaFilter to their sites. Haughey believes that MetaFilter is suffering from collateral damage. However, his statements don’t really give any clues as to what happened. In fact, Matt Haughey can only guess.

Update: After finishing this analysis, Matt Cutts, head of the web spam team at Google, announced at SMX West that an undisclosed algorithm update of Google has hit MetaFilter. He does not give any details about it except that they will work on the algorithm within the next few weeks or months to find a solution to such false-positive cases. But still, the situation for MetaFilter remains dramatic. That’s why I have left this article unchanged. The results may help MetaFilter to find a way out of the negative effects regarding that algorithm or at least increase the rankings to ease the situation. And possibly there is no easy way out for MetaFilter, as this article shows the main onpage SEO problems pertaining to the site.

He is right to say that Google is a big black box and Metafilter may have broken a (hidden) Google law. His situation is a difficult one, and we know of cases like that from the past. Danny Sullivan from Search Marketing Land picked up that case and dug into the details. In his post, he gave another point of view on what happened to MetaFilter. However, Sullivan also couldn’t provide a real explanation. David Auerbach did some digging as well and also couldn’t provide an adequate answer. Barry Schwartz thinks it is an error on Google’s side. Magic, Voodoo, or is there an explanation? Metafilter Charts

SEO Analysis – What we know

I will perform an analysis on MetaFilter based on my point of view. I have seen many strange things in my 17 years in the SEO business, and I will state that I have seen two cases of comprehensible errors on Google’s side. This post will not explain what really happened. But I will show you in general what you should look at in a situation like this and I’ll provide some advice. I will add my comments to what I think happened. And: It’s the first post ever on this blog on Forecheck.com :-) (If you want to know more about the software tool Forecheck, you can start here).

Matt Haughey posted two images, one from Google Analytics that shows a big drop of traffic in November 2012. This drop of about 40% is really hard. The peak around January 2014 comes from an article that went viral, says Matt. He also posted an image about the ad revenue regarding MetaFilter.com. First, I synchronized both images, and we can see that the 40% traffic drop led to a drop in ad revenue, but the decrease didn’t occur in the same way. Additionally, the ad revenue has continuously decreased since the end of 2011. A drop of traffic of about 40% normally would lead to nearly the same drop of ad revenue, at least when the revenue is mainly created by AdSense, such as in this case. But of course there may be other types of revenues, therefore we cannot completely detail this. However, we can see that the decrease of ad revenue is not only due to a hit by Google, it has been an ongoing process over a long period of time. So at first glance, it is not so obvious that Google is to blame. But when the main traffic comes from Google and the main ad revenue comes from Google, this is a dangerous strategy. In general: any business consultant will tell you to find different types of revenues.

Live by the Google sword, die by the Google sword

This sentence sounds a little bit like Google is the problem. But everybody running a business that depends on Google should know that it is also a business decision. Yes, of course Google dominates the market. But when you develop antivirus software and Microsoft decides to include an antivirus tool in Windows, this could hurt you. You know about that up front. It is easy to say that from an outside view, but I’ve run an SEO company for many years and I know about the dependency on Google. It is a risk, no question. This is one of the reasons why we decided to develop SEO software. But getting back to the specific topic in question: When we have a case like this, the first thing I would ask concerning such a drop in traffic is this: “What where the last changes made to the website in the last few days or weeks?” And I mean any change you can think of. Sometimes it can be simple things like

  • changing the tracking code or tracking settings
  • changing the robotx.txt file
  • changing a website template

It sounds simple, but in many cases there are simple answers to such decreases. Concerning Web Analytics, the first thing to check is if the tracking is ok. But how can you check that? Now we have to take a look at Forecheck.

Checking the Tracking

First I ran an analysis for Metafilter.com with Forecheck. I stopped the analysis at around 40,000 analyzed URLs, which took about an hour. This normally is enough to get a first impression. Afterwards, I checked the tracking code. For this, I used the full text search function in Forecheck. It enables searching within all data, including the source code of all pages.analytics-idFirst I grabbed the property ID of the Google Analytics Tracking from the source code of the homepage. In this case it is UA-251263-1.  Keep in mind that the last digit -1 could change within a website for several reasons, so it also could make sense to search for UA-251263. This string is unique for this website and will show us if there is tracking code in every page. For this I opened the Search tab and used the option “Show where search text is not found”. Be aware that in the current version of Forecheck you cannot combine the filter and the search function. Of course it makes sense to search only within internal pages. search-analytics-results The first thing we can see is that MetaFilter obviously uses different property ids for the subdomains. And there are several subdomains at MetaFilter, and we will talk about these later. To analyze the tracking, we should check every subdomain with its individual property ID. You can edit the settings of Forecheck before you start an analysis to skip other subdomains. subdomains-settings By deactivating “Check Subdomains”, Forecheck will only check the status of a URL on other subdomains to see if the URL works. Also, you can treat other subdomains as external URLs or you can even skip them (you can combine these options for your needs) since links to subdomains are outbound links when you activate the option “Treat other subdomains as external URLs”.

I ran another report and only analyzed the main domain www.mainfilter.com and checked the tracking again. 128 pages do not contain the tracking code, such as these:
http://www.metafilter.com/user/17751/musicpostsrss
http://www.metafilter.com/user/150108/metatalkpostsrss
Yes, they look strange – only some text exists. In fact, they are not (X)HTML pages, they are RSS feeds. But the server delivers the wrong content type, text/html. wrong-content-type As you can see, the column “Content Type” shows text/html. Well, Forecheck is not yet able automatically to understand that these URLs have the wrong content type. I never had such a case, but as a feed can easily be detected by the source code, it is a useful option. I added it to the feature list for Forecheck. So, in this case we have many Feeds that are delivered as pages. It is questionable if Google understands that they are feeds and not pages.

firefox-feed

By the way: When you open one of those links in Firefox, it shows this:

The same URL in Chrome looks like this:
URL in Chrome
Obviously Chrome cannot handle the Feed due to the wrong content type! Internet Explorer also understands the format. So it is clear that search engines might not understand this as well. Therefore, the first To Do we have is to fix this problem. But the other question is – does it makes sense to have so many feeds on one website? I did not take a deep look into this, but obviously all users that generate content have their own feeds. I am not sure if this makes sense. And I am not sure what a search engine does when it finds hundreds of different feeds on a website. I would first try to check if there are users that really read feeds from single users that generate content. But let’s continue the SEO analysis.

Google Search Results So I entered the URL into Google to see what Google understands. The URL itself is not in the index because the URL is blocked by the “noindex” meta-tag (more details below in the robots chapter). The second result gets my attention. But before we click on it, I try another one of those URLs. Side note: The search result page below is from the German Google search – see the soccer doodle? In the search result page to the left, the US version, there is no soccer doodle :-(

google-result-1 When I click on that URL, I am redirected to somewhere else. What is this? hodor-alf-nu It looks like a page from MetaFilter.com. And there are about 208 pages like that at this strange subdomain:

Pages at hodor.alf.nu

As I mentioned, these documents in the search engine results redirect to those pages. But there are also more pages from MetaFilter at hodor.alf.nu on other subdomains and in subfolders. I am not going to analyze that in detail, but it just shows that there may be a Duplicate Content problem. And there is much more. But hey! Didn’t we want to check the tracking code? Well, this is a nice example shwoing that an SEO analysis is a journey – you never know where you will end up. This is what makes SEO so fascinating, it is sometimes like an NSA job :-)!

So I checked another subdomain, Ask.metafilter.com, which has another property ID (UA-251263-7). Normally you have two digits behind the last hyphen for a specific property number (like UA-251263-07, see also the Google Analytics Help). But as far as I know, one digit is also allowed. The subdmains use specific property IDs which enables the individual analysis of subdomains. But of course you must pay attention when you analyze the complete traffic of a domain because you need to add all of the properties.

I did not find a noteworthy problem concerning the Analytics code. And we only did a pattern search with Forecheck. If you want to analyze whether or not a tracking code works on a single page, you can use the Chrome Analytics Debugger.But please understand that with this tool you can only analyze a single page. At Forecheck, we are working on a solution to analyze the tracking code completely within Forecheck. Also see the future feature page.

At this point I will skip further details regarding the analytics analysis, and I will add some thoughts on the subdomain issue.

To subdomain or not to subdomain

Whenever you can, you should try to use one single domain instead of several subdomains. If you add a blog to a website, try to implement it at www.domain.com/blog and not blog.domain.com. The reason is that a subdomain technically is another domain. Therefore, links to one (sub)domain do not count as links to another domain.

If MetaFilter would have started not with so many subdomains but instead with subfolders, this would be a much better solution for today. In the past, other strategies where favored and it has changed over time. Today, gathering backlinks (well, of course I mean white hat context related backlinks of value) is important. When you put all of the content within one domain, all pages will profit from a backlink, not just the pages on that subdomain. Yes, of course the subdomains are linked to each other. But putting everything on just one domain is the better solution from the SEO point of view.

The structure of MetaFilter exists of many subdomains, some more important (like ask.metafilter.com) and some less important. When changing such a structure, this is complex since you must redirect all pages to their new home (which can be done with some regular expression redirect rules if you use the same URL structure at the new place). I just want to stress that the subdomain structure that MetaFilter has today is a disadvantage concerning link popularity. Changing it today is of course a mess.

The external Duplicate Content problem

As we have seen before, there are external pages on other domains that have copies of Metafilter.com pages. And this is not the only domain. This is something that you have to take a deeper look at. In my SEO history, I had some cases where somebody copied complete websites. This could hurt a website due to external Duplicate Content. In one case, it was a complete working copy of a shop, but the orders where going somewhere else and the money was lost. In that case they even promoted that copied shop with AdWords. Therefore, we always have to think of any reason why content may have been duplicated on the internet. Now, I haven’t said that in this case somebody wants to hurt MetaFilter, but you must think in all directions.

Another thing that I always do (this is something we are actually working implementing in Forecheck): I copied a piece of text from some pages and searched for that text in Google (which is limited to a string with 32 words). This is just one example that is typical of a lot of the content on MetaFilter.com:

External Duplicate Content
The results shows this: Scientific American is first, which makes sense since this is the URL where the text was originally copied from (or an automated excerpt). This is the URL from where I copied that string:

http://www.metafilter.com/archived.mefi/1/01/2014/

And the link above that string directs to Scientific American. MetaFilter is listed at position 4. This is in general a big problem concerning “User-Generated Content”. Because copy & paste is quite easy on the internet, automated excerpts from other pages are just a click away. Not only does news exist as various copies, also comments (just think of the Ebay comments for transactions like “super ebayer”. Yes, this is an extreme example, but when you get user generated content on your website, you should be careful as to what type of content it is and if this content may be created in similar ways or if it is likely to be copied from somewhere – even from your own website).

When I search for that string in quotes, I get this:

External Duplicate Content 2

Not Scientific American in this case, rather the copy from Reddit makes it, although Reddit does link to the Scientific American page. Not good news for Scientific American! But the second result on that page shows another aggregator that has a copy from MetaFilter! This sample shows this:

Even when you are the source of specific content and others who copy your content link to you, you the source must not be #1 in Google when you search for that content!

This is not an SEO problem, rather it is an algorithmic problem. But it is not an error, it is just the fact that it is very difficult to find the source. And the canonical link is not used in such cases. With this sample, I will just emphasize that this is something that has to be analyzed more deeply. With a detailed analysis and a good SEO concept, things like that cannot always be solved, but the impact on the search engines can be handled better.

The internal Duplicate Content problem

dc-internal-fc

First, let’s now take a look at the Forecheck Benchmark Report. Even after analyzing some ten thousand pages, the (internal) Duplicate Content problem is quite small. So we will skip this and look at another issue.
I just copied some text from the Ask subdomain (one in which users state individual questions to real problems they have) where user-generated content really should be unique. I search for that text snippet in Google and found three results within the MetaFilter domain:

dc-internal This is quite normal on websites like that, also within blogs. In most cases, this problem is solved with the robots.txt file or the Meta-Tag robots. As excerpts and teasers are repeated within a website (just like a product that is listed within several categories), you must think about handling this problem. But using the robots.txt is not a good solution and I will show this later, when we look at the robots information. Be aware that Forecheck can only find exact copies of content, not Near Duplicate Content. This is something that we are working on, but it has not yet been implemented (and it is quite complex!).

Broken Links – a simple SEO problem!?

Yes, broken links are normally a simple problem and they can easily be solved. So let us take a look at MetaFilter.

broken-links

In my analysis of around 38,000 pages there are about 850 issues with links. The main problems are external links: The URL no longer exists or even the domain no longer exists (which is counted as an “Others” issue, as the domain cannot be resolved).

All those broken links should also be stated within the Google Webmaster Tools. 850 errors in 38,000 pages it not a big issue, but we always recommend making a website error-free. Not only for search engines but also users are frustrated if they click on a link that does not work.

Of course not every broken link is visible for every user. You can create different reports within Forecheck and also reports with broken links by page or by link in order to identify easily where the problem has originated. Some broken links are a problem that exist in a template and occur on many pages, some are single problems on single pages. In this case, the biggest problems are links on single pages that are outdated.

MetaFilter has existed for a very long time and a great deal of content was thrown into the website, including numerous links. It is a problem regarding how to deal with links that no longer work. In many cases they where posted by a user and they made sense at the time they where posted. Should you edit the content? I would say yes! You could delete the link and write the URL in red with a tooltip that states that this link is dead. No question that it’s work, but I think it’s worth it. And possibly there is a plugin for your CMS that does exactly this. Of course, only links that are dead forever should be treated like this. But which link is dead forever? After checking a link 3 times within one week and continuing to get an error, you can assume that this link will remain dead forever.

At this point we not only have to say that there are more problems than we thought (which is quite normal for most SEO analyses), but it also shows how complex and individual any SEO analysis can be. So now you understand why you should forget about an SEO analysis that only needs seconds or that can fit into one HTML page. The Forecheck Benchmark, for example, is just a starting point for more digging.

I will now show you the whole Benchmark Report from Forecheck in order to see where the other main problems of MetaFilter are (please click on the image to see it in full size):

Forecheck benchmark reportUp to this point, we only took a look at the Duplicate Content problem and the broken links. We now will take a look at the other main issues.

Timeouts and Load Time

I assume that you know about the importance of Load Time and Timeouts. They can be a temporary problem. Forecheck is a desktop software program, so it mainly relies on your internet connection. That is why the load time is a relative value here, and to make it comparable, Forecheck gathers reference values during the analysis to classify the results. This means that a bad internet connection will not lead to bad load time results. You should read the help in Forecheck for more details.

I ran 5 analyses from the end of May until the end of June 2014 and I always had an issue with load time and timeouts. The reasons can be very different. Some servers do not like too many requests from the same IP. And when I look into the robots.txt files, I can see several entries about “Crawl-Delay” which tells me that Load Time is an issue that the web masters from MetaFilter at least thought about.

To gather more expansive information on Load Time, you can use Google’s Page Speed tool and other tools to find the problem. But there are many problems that can lead to detrimental  load times. Be aware that when you just grab some single pages from a domain and they “seem to be fast”, this does not tell you that there is no load time problem. From what I can see in the analysis I did, there are potential problems concerning this issue. And when I enter Metafilter.com at Google Page Speed, the result will tell you the same. At least the domain is quite responsive and works quite well on mobile devices.

Title Issues

The problems with the titles can be easily solved. They are not a big issue, but solving them is easy in this case. There are only a few issues that are responsible for the problem. I will show you one: duplicate-title

All playlists have the same title: “MeFi Music”. Every playlist is from a user, so you could change that to “MeFi Music Playlist from xyz”. Solved.

Missing Description

As you can see in the Benchmark screenshot, there are more than 30,000 cases of no descriptions. It looks like all pages are missing a description.

missing-description

Forecheck analyzed 39,927 pages in this example, and it shows 11,260 errors and 21,088 (= 32,348) warnings concerning missing descriptions. Errors are for pages up to Level 2, warnings for Levels 3-4. The Level is the shortest path from the homepage (to be exact, from the start URL that you type into Forecheck). The shortest path is the least number of clicks to reach that page. The idea: pages with a higher level are less important. Sure, that is a simple generalization, but it makes sense.

Now I want to know why some of the pages have a description. Is there a rule? So I opened the analysis tab and moved the column “Meta Description” next to the URL column and sorted it. Here is what I got: Sorted Meta DescriptionWe can see that some pages have a unique description. The homepage has one (see index 1; below that line you can see another URL with the same description, obviously a duplicate content problem), and all pages at the subdomain bestof.metafilter.com have a description (but the same!).

Missing or duplicate descriptions is an SEO problem and it can be easily solved. You just have to grab an excerpt from the content and put it in the meta description. This piece of work can be done by some lines of code in your CMS. You should write an individual description for the most important pages (as it was done for the homepage here), but for the others it is important that the description is unique. Many CMS or shop systems have plugins that do exactly that and that allow the overwriting of the auto-generated description. This is something that can be solved within half a day. Then run Forecheck again and you will see this: problem solved. (Hopefully!)

Missing H1

Yes, all SEO’s know that every page should have an H1. But here it is missing on many pages, even on the homepage! Of course Forecheck only states missing H1 tags in internal pages. There is no rule at MetaFilter.com as to which page has an H1. Many pages that have an H1 have multiple H1s (see below). Solving this problem is more complex than solving missing descriptions. The description is hidden from the visitor but the H1 is not. So you need a plan, a systematic and logical solution for H1 elements and even for H2, H3 etc.

h2-h4-reportsBy the way: In the Benchmark screenshot you can see that Forecheck has more H2 to H4 reports on the right side. They are not part of the main issues, but you should always take a look at those reports too. To get more information about those reports, click on the blue question mark icon.

 

 

I will skip further investigation here, but this is also an important issue to be solved. A solution for the whole website is something that will need at least one day to create the best strategy, and possibly another half-day for implementation and testing.

Duplicate H1

duplicate-h1When we look at the Benchmark report above, we can see that duplicate H1 elements are high in number. With less than 40,000 analyzed pages, these high numbers may look wrong. The duplicate H1 reports count all cases of duplicate H1 text. As many pages have more than one H1 element (see the numbers at “Multiple H1″ one line below in the Benchmark Report), a single page can have more than one duplicate H1 issue. Duplicate H1I only picked one issue, the H1 text “Karma Police”. The example shows us that this H1 text is not only part of several pages, some pages contain this text twice. You can also use the Duplicate H2 to Duplicate H4 reports to identify duplicate problems. In general, Duplicate Hx happens on many web sites, and in blogs they are quite common. Also common is the fact that many of those problems are “solved” by the robots.txt or the Meta-Tag robots like in this case.

The screenshot shows us that most of the pages (the list is much longer than in the screenshot) are blocked by a noindex meta tag. But we will take a look at the robots information later. In general: The more unique the H1 elements are, the better, and the same for H2, H3, H4 and so on. The Duplicate Hx report shows any duplicate text in an H1 to H4 element. Warning! These reports are often very long!

Completely unique Hx elements within the whole website are difficult to generate. They are only possible with a stringent and consequent plan. But reducing the Duplicate Hx elements will help. By the way: You can change the settings in Forecheck so that Forecheck will not ignore the robots information. With this setting, Forecheck will check all URLs that are blocked by robots information (robots.txt or Meta-Tag robots) as to whether or not they exist. But Forecheck will not download pages and analyze them (the same way as the search engine crawlers do it).
robots-settings

In this case, those pages will not be part of the reports like the Duplicate H1 report. But there is also another option that you can use. Within the reports there is a drop-down menu with several filter options. Here you can hide all URLs that are blocked by robots.txt or meta tag robots. But attention! Always take a look at what is blocked and which user agent is used. We will take a look at that later.

duplicate-report-options

Multiple H1

Multiple H1 means that a page has more than one H1. This is also a problem because normally a page has one headline. Think of a newspaper that has 3 main headlines on the first page. It does not make sense. The H1 should summarize the content of a page, that’s why the H1 often is the same as the title (with little differences – in most cases the title also includes the brand). Of course, the title is visible in the search engines and should be optimized accordingly while the H1 is visible for the visitor and should just summarize the content and contain important keywords.

Every H1 within a page is separated by a double vertical line. You can see this in the image samples of pages with numerous H1 elements.

Multiple H1

Robots information

This is the last thing I want to look at in this analysis. It is not possible to look at all the issues, but this one is very important for MetaFilter. First you must know that Forecheck allows you to set the user-agent as you like.

user-agentsThere are 3 sections you can choose from within the settings: Browser, Crawler or Mobile. The default is “Forecheck”, which is not a known Crawler (not yet :-) nor a Browser. So in most cases, if there are entries in the robots.txt file, they are mostly for other user-agents and not Forecheck.

Below the three sections is an edit field where the full user-agent string can be seen or even edited. There you can enter an individual user-agent string if you want.

 

 

When I ran my analysis I kept the user-agent “Forecheck”. The first thing that caught my eyes is the high number of blocked URLs.

blocked-urls“Blocked by robots.txt” means that there is an entry in the robots.txt file for the current user-agent that blocks URLs (not only pages). Pages with noindex mean that those pages have a noindex which blocks them from being crawled.

Nearly 25,000 pages are blocked, and I analyzed nearly 40,000 pages. This of course has to be inspected. Note: In this case there are no errors. Errors are blocked CSS and JS files, as Matt Cutts repeatedly announced not to block CSS and JS files.

First you must know that the MetaFilter domain has several subdomains and every subdomain has it’s own robots.txt file. At the date I am writing this, Forecheck does not support the analysis of more than one robots.txt file. This is needed when you set Forecheck to analyze all subdomains of a domain. This is already in the pipeline. So for getting exact information about the robots information we must analyze every subdomain by itself. But for now, I will only analyze the robots.txt file metafilter.com/robots.txt.

robots.txt

 

The first thing you can see is that they entered a history of that file with comments at the top. This is of course a very good idea. But it seems that after 2011 there where no more changes. But I can see that the robots.txt file was last changed at the 28th May of 2014! So we must assume that the history has not been maintained correctly.

The second block is for the Googlebot (line “User-Agent: Googlebot”). It disallows 4 folders. I use the internal search of Forecheck to search for the folder strings in all URLs and I found out that there are some thousand blocked pages within /favorited/. The links are beneath every user name of any teaser. The question is, is this useful information? What does it tell you when you know how many favored topics a user has? Does anybody click on that?

Blocking pages has one big issue: A link to a blocked page is an outgoing link, even when you mark it with the rel=”nofollow” attribute. The Page Rank algorithm will reduce the link popularity of a page for every outgoing link. If they are blocked, link popularity or link juice is destroyed.  Just look at this page as an example.

That page has nearly 200 links to the /favorited/ folder, all blocked. Deleting those links on all pages like that would increase their link popularity. And I think that the favorited links are not really used. You can of course track that and see if these links are clicked on. But I think it is mainly important how many users favorited an article and not, as well as which users did that. Deleting the links just stating the number would not delete the information.

Does it make sense to block those pages? Well, those pages just state a list of users. They have no real information, no value, which is why I would also just delete the links.

But now I would like you to look at the bottom of the robots.txt file with the line “User-Agent: *”. This line is for all other User-Agents (also for Forecheck). As there is a block for Googlebot, the Googlebot will only analyze that block and skip the rest.

Please compare those two blocks, the one for the Googlebot, the one for the rest. You can see that the last block also blocks some “user” folders. That is why I have so many blocked URLs in my analysis. But I must ask these questions:

- Why are there so many different parts for different user-agents?
- Why can some user-agents (Crawlers) crawl the user folders, others not?

Now I will show you another strange issue:

blocked-user-urls

There are thousands of pages in the user folders. They aren’t just blocked by robots.txt (nor for the Googlebot!), but they are also blocked by noindex (meta-tag robots). So those pages all have a noindex and additionally they are blocked for some crawlers but not for all. There seems to be a mismatch of robots information. The robots.txt tells the Googlebot to index those pages, but they are blocked by noindex. I am not sure if this was the original intention.

Please understand that the “follow” instruction will not work, when a page is blocked by the robots.txt!

The “follow” instruction shall avoid any link juice destruction. But in this case, at least the bingbot crawler from microsoft will not crawl all user pages as there is not individual block in the robots.txt file for the Bing crawler!

There are no user pages in the Google index due to the noindex. But the user pages are linked a lot. As I said before, blocking pages for the robots is in general a bad idea. It destroys link popularity (in this case it is avoided by the “follow” instruction for the Googlebot, but not for the Bingbot). Forecheck has a column “Link Juice” that calculates the internal link popularity. You can run an experiment and make an analysis that ignores all robots information and another one that does not ignore the robots information. Here the link juice will be higher when Forecheck completely ignores the robots information.

I summarize that I am sure that the robots information is not in its best shape. But I also must admit that I only analyzed the main domain, not the subdomains. But we can see that there is another issue here. It is possible that the drop of traffic from one day to the other could be the result of a change in the robots.txt file or by adding some noindex information to user pages. Right, the user pages have no real value, but they are important for the link popularity. Also, the subdomains of MetaFilter have different property IDs, so where does the stated graph come from? Does it contain all subdomains?

In general: Keep the robots.txt file simple. Avoid blocks for special Bots from search engines to prevent misinterpretations and oversights.

Summary

Let me just stress that these results are based on an analysis of about some ten thousand pages of MetaFilter.com. To get fully and detailed reports you should run reports that analyze more URLs. Forecheck can analyze up to about 1,5 million URLs which will take many hours. But it is not possible to run such reports and analyses for a free blog post. And as you can see, tools like Forecheck are always just the beginning of deep and detailed investigations.

First I must admit that I have no answer to the question as to why MetaFiler lost so much traffic in one day. But I showed you just some (not all!) things you have to look at for analyzing a website for SEO issues. You can see that is it complex. And many new questions arose from my analysis. But it is normal that we ask many questions when we do an SEO analysis. And in this analysis, I only caught some of the issues that looked important to me regarding MetaFilter.com.

Fact is, there are many SEO issues to be solved. These should increase the rankings of MetaFilter once all potentials have been implemented. And maybe the answer to the harsh decrease is simple, or possibly you will never find an answer. This is daily work when dealing with SEO. And I am sure that no NSA technology would help here. Of course, with access to all information (Google Analytics, Webmaster Tools, etc.), an answer is more probable!

The last thing I want to show you is a visibility graph over time. I additionally correlated the ad revenue graph that Matt Haughey posted and I think that these two graphs correlate very well. When you compare it with the traffic graph that Matt Haughey also posted, they do not correlate.visibility-index-2

When I look at that graph I cannot see an error caused by Google, although I would never declare that as impossible. I more likely guess that MetaFilter has lost traffic over the years and there are several SEO issues that should be solved. Yes, many people say that MetaFilter turned gray in terms of how information is presented on the internet. But solving the SEO issues should help a lot. Next, MetaFilter also has to think about usability and design.

I would not blame Google first when the ranking is dropping over time. Yes, Google has changed a lot of the algorithms over the years, but I would say that most of these changes make sense. The question I would ask is: What did MetaFilter do in order to deal with these algorithmic changes? I have worked with many companies over the years, and in some cases for nearly 10 or more years with the same company. SEO is a continuing process, and the strategies change over the years (and I am not talking about black hat methods that have to be fixed later!). Maybe MetaFilter has not really accepted SEO as a serious business task that has to be done continuously.

I hope this analysis gives you an insight on how SEO experts work as well as how difficult but also fascinating this job can be and what Forecheck can do to help in analyzing all that stuff. This case is very difficult, but possibly it can be solved. But it is really great that Matt Haughey came up with this information because there are not many cases like this where people open their business data and talk turkey. So thanks to Matt Haughey and I would like to encourage him to give more details so we all can learn from this. Maybe more webmasters and CEOs will open their jewel cases of problems and discuss them publicly.

So, if you have a problem and are willing to share information, write to me. Maybe we will pick it up here and create an analysis that might help you!

About Thomas Kaiser

Thomas Kaiser, founder and CEO of Forecheck LLC and cyberpromote GmbH, launched his first company at 23. He developed the first MPEG-2 video coder for Windows at the Technical University of Munich. In 1997 he invented “RankIt!!”, the first SEO software program in Germany. He has also written several books and is a sought-after speaker at SEO conferences and events. He loves playing guitar, enjoys his 5 kids and has drunk SEO milk since birth. You can write him at thomas /at/ forecheck.com.
Facebook
Twitter