Google replay Microsoft mistake

A few years ago Microsoft devised, tested and eventually stopped their "Smart Tags" initiative. MS pulled the plug after massive protests from website authors who were (rightfully) outraged by MS' plans to alter their content by adding additional links. Four years later, Google wanders down the same path and introduce "AutoLinks"...

Screenshot of the Google Toolbar's autolink menu

AutoLinks are a new feature with version 3 of their popular IE toolbar (currently in beta). The feature recognizes several kinds of identifiers (ISBN's, VIN's, US addresses and package tracking numbers) and provides links to related Google and "approved by Google" 3rd party services for these. It does however require some user interaction, it isn't triggered before the user access the "Show Auto Links" menu of the toolbar (screen shot on the right). At the same time the menu is displayed, the page itself is also altered by adding links to the items the AutoLink feature recognizes. And here lies the lesson that Google should have learned from MS' attempt at this four years ago; you may not alter the content of other peoples websites.

I approve this link

Linkage on pages I publish can in many ways be considered as my way of endorsing a site or service, in terms of Google's own PR system a link it's even considered as giving "my vote" to a site (although that's not a concern here). That's why the "no-follow initiative" was positively received by the web community, and that's why this is a bad idea. There's no way I'm giving any third party a carte blanche to add links to my content, effectively deciding which sites I'm endorsing (or at least give my visitors that impression).

The menu in itself might have been acceptable. But when they decide to alter content they haven't produced, with the intention of monetizing it, they have stepped well over the line. Although Google claim they don't receive compensation from the services they link to, there's no question as to why they're doing this; As a multi billion dollar hyper company, their motivation is none other than making money for their shareholders. In reality it doesn't matter if they make money of it or not. (As the first commandment of the WGBA handbook clearly states that "Though shall not modify content that does not belong to thy"... ;-)

When Google tops it off by adding non-validating markup to the pages I strive to validate; I've had it! If you'd like to take a closer look at the markup before, after the intervention and after the cleanup, I've created a simple script that catches the different stages of this process. Obviously you'll need IE with the toolbar installed, if that isn't a feasible solution there's even a "cached copy" available.

One thing MS nearly got right when exploring this, was that they gave web authors an option to prevent "Smart Tags" from appearing on a page (by adding a <meta name="MSSmartTagsPreventParsing" content="true" /> tag to page headers). Optionally this should have been the other way around (opt-in, instead of opt-out), but at least it gave web authors a method of preventing the tagging. Google provides no such method, so I see no other way than to clean up the mess after the content has been altered...

The concept

The only method I've found to detect manipulation by the toolbar, is by checking for changes to the number of links on a page (the addition of Autolinks). For this a global variable is declared to store the initial number of links on a page, the actual value is retrieved as part of the Window.OnLoad() function. (It's important that this is called after other scripts that may change the that count.) Both steps also include a simple check for IE (window.execScript is only supported by IE), effectively preventing unnecessary resource usage for users with other browsers.

The first step of the function is an If statement that checks if the current number of links is different to that of the reference count of the variable. If the total number of links have changed, we assume the toolbar is involved and the actual removal of links added by AutoLink is performed. Luckily Google have made this easy, all new links have an identifier containing the value "_googt" followed by a (seemingly) random value. In order to remove the new links we simply execute a (regular expression) replace to remove all links with this identifier (but leaves the linked word), finally we swap the current document contents with our newly cleaned content. If no changes are detected, a timeout is created that (on expiration) will run the function once more.

The code

// Global variables
   //Used by IE only functions...

   if (window.execScript){
      var intMalarkeyLinks;
      }
 
// Onload events

window.onload = function() {
   //IE only functions...

   if (window.execScript){
      intMalarkeyLinks = document.links.length;
      eRemoveMalarkeyGoo();
      }
   }
 
// eRemoveMalarkeyGoo - Removes "AutoLinks" from the Google Toolbar
// Copyright©2005 by John Magnus Juliussen (http://evirtus.net/)

function eRemoveMalarkeyGoo() {
   if (!(intDocumentLinks==document.links.length)) {
      var strRegexFind = /<a id=_googt.*>([^<]*)<\/a>/gi;
      var strRegexReplace = "$1"
      var objDocBody = document.body.innerHTML;
      var objNewDocBody = objDocBody.replace(strRegexFind,strRegexReplace);
      document.body.innerHTML = objNewDocBody;
      }
   else {
      setTimeout("eRemoveMalarkeyGoo()",1200);
      }
   }

I've prepared a document that demonstrates the above. To make stealing it easy, I've included all the necessary code inline. This work is licensed under a Creative Commons License. As always, comments and corrections are very much welcomed!

Couldn't generate page, however a cached copy was found and served on the 1st of January 1970 @ 00:00 in 0.016 seconds...

Errors occurred while generating this page... (More info availible)