I was pretty surprised to see that my site had been indexed within its first week of existence – considering I didn’t submit it via the tool, nor did I add it to Google Webmaster Tools (not using the new name yet) or even have Google Analytics installed.
I’m pretty certain it hasn’t gained any links over its short lifetime.
This begs the question – how is Google finding new domains? I have to admit, it’s not a question I’d considered for a little while because recent projects haven’t involved starting from scratch.
I put it to Twitter and quickly gained a response from a colleague – see below for part of the thread:
Essentially, there were a few theories:
Google Crawls Links from Gmail
This is quite an old theory that has generated a lot of chatter over the past 5/6 years.
Some people have gone as far as to say it even impacts rankings – which I think is very unlikely.
However, someone (a spammer, which we’ll discuss in the next section!) did email me about my new domain via Gmail, making Google indexing it from here at least a possibility in this case.
A pretty comprehensive test was conducted by Eric Enge back in March, whereby 4 pages were uploaded to his site with no links apart from Gmail correspondence etc. None were indexed, which seems to scupper the long held belief of some.
This differs from my case though, which is a new domain. It’s unlikely this should make any difference (why would Google only index new domains using this method, but not new URLs?) but leads quite nicely to the next theory.
Google Monitors Domain Registrations
Perhaps Google uses a seemingly common spam tactic (judging by the number of emails/comments/referral spam visits from this site’s first week of existence) of monitoring new domain registrations.
Like the Gmail theory though, it seems to be taking their mission of ‘indexing the web’ a bit too far – as the domain isn’t even part of the web yet. Moreover, requesting the vast quantities of registered domains that go dormant every day seems overkill and a waste of resources.
The Hosting/Domain Registration Company Automatically Submitted
This is a random possibility that popped into my head, but it seems very unlikely as I’ve not come across this apart from as a service to opt into from the Control Panel.
Google Uses Chrome
Theories are abounding about the use of Chrome to gather clickstream data to feed into ranking algorithms – why not also for the discovery of new URLs?
Google Uses Search Queries
I actually searched for my domain a few times to see if it had been indexed. It doesn’t seem too far-fetched that Google would use search volume data to feed into its crawler – it basically goes some of the way to qualifying the leads it could get from monitoring domain registrations (people are unlikely to search for a domain if they aren’t going to do anything with it).
There is a Link Somewhere
It’s possible there is a link that no tool is showing yet – perhaps one of the referral spammers actually added a link to their site before deleting. It seems unlikely but plausible.
I think that Chrome or search query data are probably the most likely from the theories above. The next step will be to test – if I find some time! It would be fascinating if either were proven to be true, as there would be potential implications beyond just indexing – I would view a strong positive result for either as a big step towards showing the likelihood of them being used within ranking algorithms.
UPDATE 14/07/2015: The day after I published this post, I submitted a URL to Google’s ‘Add URL’ tool that could not possibly been part of the web but had been live a few days. I then did a domain: filetype: search (it was a KML file) only 5 minutes later to find it had already been indexed. Wish I had searched for it before – but surely 5 minutes is too quick for the ‘Add URL’ tool? I see this as lending weight to the Chrome theory as it had been visited (waiting for 100% confirmation it was via Chrome) but not searched for.
Update 15/07/2015: The URL had definitely been visited with Chrome but not searched for.