Est reading time: 5 minutes, 19 seconds

I read this article on Econsultancy last month with some amusement.

Econsultancy is a widely respected and professional online marketing site who publish content from some of the biggest names in online marketing, SEO, PPC and Web Development and are certainly considered a trusted source in the field. Most of us would jump at the chance to get an article published by Econsultancy, never mind a link back to our site. The idea that people would want to get links removed from Econsultancy just sounds silly.

My initial assumption was that the SEOs involved in sending the link removal requests were using some method of automatically categorising backlinks to mark for removal. I figured Econsultancy had been erroneously placed in a bucket labelled “Sites with SEO in the page title” meant for spammy directories and nobody had bothered to sense-check their outreach list before sending out a templated email.

But then the blog post linked above also notes that their guest bloggers had been receiving suspicious link emails from Google which included signature links on the Econsultancy blog amongst the examples. This in itself seems odd as the signature links on Econsultancy generally aren’t what you’d consider a ‘spammy’ link – I haven’t noticed any with optimised anchor text for example, usually just brand names and links to social media profiles.

So I filed the affair as a slip up or maybe some isolated cases where the guest bloggers got greedy with their signature links and used excessive, deliberately manipulative anchor text and I went about my business. But then I was doing a backlink review for a client this week and noticed something interesting. They also have a link from the Econsultancy blog. Actually, they have around 20 links from the Econsultancy blog. All from the same article, but not from the same URL…

Econsultancy automatically detect the location of a visitor to their site and then redirect them to an appropriate subfolder for their respective location. So, for example, if I go to http://www.econsultancy.com/blog I get redirected to http://www.econsultancy.com/uk/blog. Depending on where you are in the world, you’ll get redirected to a different subfolder. This redirect is handled with a 302 redirect to indicate that it’s a temporary redirect.

I’m not entirely sure WHY they’re doing this though, as the content always seems to be the same, it doesn’t seem to switch languages for different regions. What’s more, it’s causing them problems. All of these URLs can be legitimately crawled and indexed by search engines. There’s no canonical markup to indicate which page is the original version and there’s no Rel=”alternate” hreflang=”x” markup to hint to search engines which URL is most appropriate to rank in different regions.

The end result? Well, lets take a look at the search results when you do a query for site:econsultancy.com inurl:three-reasons-why-publishers-hate-living-in-a-post-penguin-post-panda-world

econsultancy-duplicate-urls

 

That’s 19 URLs indexed by Google for one blog post, duplicate content on all of them. Most of the URLs are the regional folders as well as URLs where the article has been linked to with additional tracking parameters. This means that whenever someone gets a link on Econsultancy – whether it’s in a blog post, author profile, blog comment, press release or infographic – it stands to be duplicated multiple times across different URLs. At minimum, every regional URL version is likely to get indexed and potentially other URLs with tracking code at the end of them as well.

They’ve implemented 301 redirects to move www. URLs to the non-www. versions so they’re obviously concerned about canonicalisation of URLs. Implementing Canonical tags and blocking parameters in Webmaster tools are both easy things to do and would essentially resolve the canonical issues. Rel=”alternate” hreflang=”x” markup is a bit more tricky to implement properly but shouldn’t be a problem Econsultancy and would allow them to get preferred URLs to consistently rank in different regions.

The redirects to the different subfolders are 302s which, if they’re redirecting search bots the same way should technically mean that these wouldn’t be indexed IF that’s the only way search engines found them. However, there’s two problems with that.

Firstly, you can link directly to these subfolder URLs, so if Google crawled a link directly to one of the URLs then it wouldn’t pass through a 302 redirect. As users are always redirected to a regional subfolder, this is the link they’re going to share and therefore there’s likely to be plenty of opportunities for Google to find and crawl the alternative versions of the link.

Secondly, Google typically starts to ignore 302 redirects if they stick around long enough anyway. 302 is meant to indicate content which has temporarily moved, but it’s regularly misused to redirect to content which has permanently moved, so if a 302 redirect stays in place long enough Google will often start treating it as a permanent redirect. In any case, the evidence is clear and Google is regularly indexing alternative URLs.

And lets not even get into whether Google might see these redirects at an attempt at cloaking.

So this leads me to wonder, is the reason links from Econsultancy are being flagged as spammy and they’re receiving link removal requests from SEOs simply due to this duplication causing links from their domain to look spammy? As an isolated, single link, it’s not that big a deal, but with every link on the site counting for 20+ links from different URLs it wouldn’t take many guest posts or press releases for the number of identical links coming from the domain to spiral out of control.

If this is the case, it also leads me to wonder what other links outside of our own control could be triggering similar red flags? As an opensource project, the DMOZ directory is heavily duplicated across hundreds thousands millions(!?) of domains. Same with Wikipedia. If trusted sites like these happen to link to us or a client, are we going to have to hunt down and disavow every domain that scrapes or syndicates their content as well? How good is Google at recognising links that are outside of our control or innocent technical errors that cause duplicate links to be spun out across multiple URLs?