Bits of Books - Books by Title

The Google Story

by David Vise

More books on Computers

Google is the only multi-billion-dollar company in the world that is also a spelling mistake. Back in the palaeolithic era (that's the palaeolithic era in the internet sense, i.e. autumn 1997) its co-founders, Larry Page and Sergey Brin, were graduate computer science students at Stanford. They were working on an insanely cool new search engine, wanted to incorporate it as a company, and needed to find a name. David Vise, in his breezy book The Google Story, tells how they came up with one. A fellow graduate student suggested to Page and Brin that they use the name given to what is sometimes, erroneously or metaphorically, called the largest number, 10100: google. They looked up the name on the internet, found that it wasn't taken, and registered their brand-new brand, google.com. The next morning they found that the reason the name hadn't been taken was because it should be spelled googol - and that googol.com had, of course, already been bagged. (It belonged, and still belongs, to a Silicon Valley software engineer and home-brewed beer enthusiast called Tim Beauchamp: 'The links on this page are a mishmash of eclectic destinations that may be of interest to you. Actually, they may only be of interest to Tim but what the heck. It is his site!') Lesser men might have considered that a bad omen, but Larry and Sergey are not bad-omen kind of guys. Just over eight years later, Google is the fastest-growing company in the history of the world – with, at the time of writing, a market capitalisation of $138 billion. Larry and Sergey, the Wallace and Gromit of the information age, are worth more than $10 billion each.

Companies are a bit like people in that they tend to bear the imprint of the milieu in which they were formed. Google, spelling mistake and all, is a product of the intensely academic environment in which both Page and Brin were raised. Page was born in Michigan, Brin in Russia, but apart from that their backgrounds were eerily alike: ethnically but not religiously Jewish, educated in Montessori schools, their fathers both university professors of science (computer science at Michigan and maths at Maryland, respectively), their mothers both also super-numerate (database consultancy and Nasa - it must be fun to say 'my mum works at Nasa'). Brin was 16 when he began taking classes at the University of Maryland, and 19 when he graduated. He went to Stanford to begin work on his PhD. Page, who had done his first degree at the University of Michigan, came there a year later to have a look at the computer science PhD programme. On a Stanford orientation day in 1995, looking round San Francisco, Page began arguing with the tour guide, a second-year comp. sci. PhD student whose opinionated obnoxiousness so closely resembled his own. You have seen enough buddy movies to know what happened next.

The key idea which underlies Google came out of this academic milieu; it was an insight that could occur only to someone thoroughly marinated in academic ways of thinking. John Battelle, an internet-world insider and search-engine specialist, gives a fascinating account of it in his indispensable book The Search. Page was fooling around at Stanford, trying to come up with an idea for his PhD thesis. He had always been interested in Nikola Tesla, a scientist whose list of brilliant inventions - 'wireless communication and X-rays to solar cells and the modern power grid' - was not matched by the success he had in marketing them, or himself. Page liked the idea of making things that caught on; he had no interest in hiding his light under a bushel. He began to think about his own web page, and who was reading it, and whether or not anyone was not just reading it but linking to it - which would definitely be an indication of a more than casual interest. But while it was easy to find the outward links from a web page, it was not at all straightforward to find out the reverse, who was linking to that site. So Page wrote a program which solved the problem of finding out who was linking to any given web page. He called the program BackRub.

Once BackRub had been written, Page began to wonder if there was a way of using it to determine the utility of any particular site – and this is when he, or he and Brin, had a big idea. It was based on one of the most widely mocked areas in academia, that of bibliometrics: assessing the importance of any given article or piece of information by measuring how often other people in the field mention it. In bibliometrics, no attempt is made to see how sensible or useful or well-argued a piece of work is: all you do is count how often it is mentioned. This never-mind-the-quality-feel-the-width approach sounds like a ridiculous way of assessing the importance of intellectual work but it is, I am told, a surprisingly powerful tool. In any case, it is what gave Page and Brin the idea for a program which measured the importance of a web page by counting how often other web pages linked to it. Page gave the mathematical algorithm which worked out this problem the name PageRank.

Then the boys set out to build a search engine which used PageRank. (The patent for PageRank, incidentally, is owned by Stanford University. Google have exclusive use of it until 2011.) The idea was that a search engine which knew how important a page was would have a powerful advantage in assessing the quality of the information on that page. The search engine would not only be able to look for specific words, it would have a way of assessing the quality of data on the page where those words occur. That would give it a huge advantage in delivering useful information.

As for how it works in practice, the first thing to realise is that Google does not search the internet. If it did, the internet would grind to a halt under the strain of all the searching taking place, because Google alone (let alone the competition) makes upwards of 100 million searches every day. Instead the program searches a copy of the internet stored on its own computers. It sends out a 'crawler' which downloads copies of internet pages. A full circuit of all the web pages in the world takes roughly a month, which is why the information on Google is often a few days old; the most recent snapshot of the page copied back to the Googleplex is available as the 'Cached' link on any given Google result. (This delay is one of several reasons why, if you can't find anything on Google, it is worth trying an alternative search engine, such as Yahoo or Clusty.) Having copied the internet, it then indexes it. Google makes an index of every word on a web page, where it stands in relation to other words, whether or not a word is listed in a title, whether it is listed in a special typeface, how frequently it is listed on the page and so on. It also gives a lot of importance to the PageRank of the page in question. There are more than a hundred of these criteria, and Google gives a numeric weight to every one of them, for every searchable term on every one of eight billion web pages. When a query arrives - which it does at the rate of many times every second - Google searches the index for the relevant terms, measures the relevance of the results using all its various metrics including PageRank, crunches out a single number for each page, and lists them, with the highest score at the top, usually within half a second or so.

Even if you didn't know a thing about computers, you could tell this involved a truly scary amount of computational power. This is another area in which Google's origins show up as a strength. When the program was first conceived, Page thought he would be able to download an entire copy of the internet to his own PC. That turned out not to be the case: Page and Brin ended up having to scrounge, cadge, rustle up and 'borrow' every scrap of computational power they could find at Stanford to gather the necessary data. What they learned in the process became one of their great strengths. Google does not run on huge, expensive mainframe computers but on a very large number of bog-standard, over-the-counter PCs, the same sort used by ordinary mortals. The PCs are tweaked and cabled together in particular ways to provide Google's 'special sauce' - this is one of the revelations in David Vise's book - and run a customised, stripped-down version of Linux. When a PC breaks, they chuck it away and replace it. Nobody knows just how many of these PCs Google has. John Hennessy, the president of Stanford and a Google board member, says that it's 'the largest computer system in the world' - Vise puts the figure at more than 100,000 PCs. Without their experience in graduate student bodging, the founders of Google would never have learned how to put together a computer cluster that combined such replaceable simplicity with such computational muscle. Its main problem these days is the heat generated by all those silicon chips.

The boys took the company public in 2004, leaving it as late as they could, this being one of the many ways in which Google diverged from the Silicon Valley norm during the long-lost boom. The general pattern during the internet gold rush was to launch a company as early as possible, and hope that investors bought the shares before the company ran out of cash. That was because most dot.coms had no money; their business model involved truly spectacular revenue projections, set some distance in the future. A standard pitch started by pointing out the size of some market - to take the example used in the cautionary documentary Dot.Com, that for paying parking tickets. Say $1 billion worth of parking tickets are paid every year. Say the company servicing the payments earns 30 per cent of the fee. Say you could set up an online service to pay these tickets, and then - and this was the enticingly pseudo-sensible part of the pitch – take into account that only, say, 20 per cent of the public will be willing to pay in this convenient new way. Lo, you have just created a business with annual revenue of $60 million, and extraordinary potential to expand when other local or national government payment services migrate online. Your company is now worth a couple of billion dollars. Or it will be soon. 'Grow big fast!' (That was one of the battle-cries of the internet age.) 'If you build it, they will come!' (That was another.) Set up an Initial Public Offering, quick! There's gold in them thar bills!

Fresh new thinking along these lines caused one of the greatest destructions of capital ever seen. Google's route was superficially similar. They concentrated on making their search technology the best. Traffic to the site grew at great speed, all without a cent spent on marketing. The company had as yet no business model; as one of its directors said, 'we'll figure out how to monetise that.' This was exactly the thinking that cost so many people so much money. The difference was that Google managed to do it, and they did so by building a huge business in the most nickel-and-dime way imaginable, through small ads. Next time you do a search on Google, have a look at the 'Sponsored Links' on the right of the results. These are paid advertisements. The ads have been bid for by people who bid for specific words, or combinations of words: 75c for 'digital camera', to take an example from The Google Story, but $1.08 for 'digital cameras' (because people who click on the plural are more likely actually to buy them), or $30 for 'mesothelioma' (because the people who place the ads are personal injury lawyers looking for clients who want to sue whoever it was they think gave them this particular cancer). Many of the words cost only a few cents to bid for: 30c for 'pet food', for instance. If you click on one of the links, the advertiser pays Google the agreed amount.

Google's ads are so effective at generating income because they tap directly into the intentions of people looking for things. An ad in any normal medium is, to one degree or another, a form of broadcasting: it will appear in front of many people who have no interest in it, en route to finding the minority on whom it will exert some grip. Google's ads appear only in front of people who are already looking for the thing they are advertising; they are as narrowcast as advertising can possibly be. The general realisation of this was accompanied by the dawning knowledge that Google in effect has a direct line, if not quite to the unconscious dreaming mind of the world, at least to the part of it which voices its wishes. This was something no one foresaw about the internet, that its 'killer app' - the thing which made it indispensable to ordinary people - was the ability to find services and information. The received wisdom in the business was that search was a 'commodity', something it was simple to buy from the cheapest provider. In disproving that, Google showed that it was wired straight into the global id.

The underlying idea of search-plus-ads was not original: a company called Overture was already doing the same (and Google later settled a suit from Overture out of court). But nobody did it anywhere near as well as Google, and the success of Ad Words (as it is called) is the reason Google, instead of rushing to the stock market as quickly as possible, which is what everyone else did, took as long as they could to go public. They knew that as soon as their revenue figures were disclosed, everyone would go nuts, and their competitors would begin knocking themselves out to get into this amazing new business of search-plus-ads. They had a secret, and it was the opposite secret from every other internet start-up: their secret was that they were already making a ton of money. They have continued to do so. Google in the six months to 30 June 2005 earned $2.6 billion, almost entirely from its ads. It was sitting on more than $3 billion and had no borrowings, and it has since raised another $4 billion in cash. This sheer financial muscle is the reason Google is now such a power in the world.

The financial success of Google since its IPO means that Page and Brin can now do more or less what they like. The limits on their company are set not by what they can afford but by what they can conceive and bring off. The stated mission of Google is 'to organise the world's information and make it universally accessible and useful', an immodest project, to put it mildly, but one on which Google is at least in a position to make a decent start. But the remorseless focus implied by that 'mission statement' is a little misleading, since the company's philosophy is to give bright people a free rein to attack the problems that interest them, and 20 per cent of employees' time is devoted to pet projects of their own devising. This makes Google a great centre of 'if you build it, they will come,' and means that the company is constantly coming up with new schemes and wheezes, not all of which make a coherent whole, but which tend at the least to be interesting ideas. It also means that barely a day goes by without a news story touching on Google in some respect or other.

Since I began writing this piece Google has been in the headlines several times: for governments' complaints about the spy-friendly potential of the all too detailed satellite maps in Google Earth; for a new feature called Music Search, which does what it says on the tin; for announcing a plan to take a 5 per cent stake in AOL; for being vulnerable to 'black hat' tactics from Search Engine Optimisers, who specialise in boosting Google results; and for hugely expanding its nascent Google Video service. The media are obsessed with Google, not least because they are so worried by it. (The general consensus is that Google, having once been seen as a technology company, should instead be regarded as a media company. You may not think it matters, but money people like to see things through the prism of a 'business model'.) Other recent stories have concerned Google's offering the whole of San Francisco free wireless access to the internet, setting up a free Google Space at Heathrow airport to allow people to use its products, launching Google Talk as a potentially disruptive way of making free phone calls over the internet, pressing on with its ambitions for Google Book Search (formerly Google Library) to 'make the full text of all the world's books searchable by anyone', and launching Google Base to take over the world's classified advertising market. In the meantime, the company has launched a Toolbar, including a Desktop Search tool which searches for information on users' own PCs - something Microsoft, the world's biggest software company, has been trying and failing to do for a number of years.

What scares people about this is the feeling that Google has a masterplan, and that they are advancing towards world information and financial dominance. It isn't clear that that's right, though. My sense of it (and it's only a sense) is that Google advances more by letting its engineers invent things and solve problems, or perceived problems, one at a time, and that as long as the problem being solved broadly fits with the overall mission statement, they'll go ahead with it. Some of these stabs seem well thought out, others less so. At the same time the core focus on search stays. People who work in the field say that search is only 5 per cent 'solved', and that the huge amount of information located on the internet but (for a variety of reasons) unavailable to searches remains an enormously difficult problem to solve. It seems likely that this focus will give the company plenty to chew on for many years, even after the overheated share price cools off.

So: is Google a good thing? The geek in me wants to say yes. It certainly has made finding information incomparably easier. Some of the information is even true ... Actually, that's not fair, but a lot of what is on the net is false, and the Google-derived mistake is something you do now notice in the mainstream media. One example occurred on the death of Hunter S. Thompson. When he died, several newspapers shared with us, often in the opening sentence, President Nixon's opinion that 'Hunter S. Thompson represented the dark, venal and incurably violent side of the American character.' Except (as any Hunter S. Thompson fan will tell you) Nixon didn't say that about Thompson, Thompson said it about Nixon. But a site giving the line the wrong way around was the first thing to come up on Google on the day of Thompson's death.

Despite such glitches, Google is from the research point of view invaluable. I've used it on a more or less daily basis for the last five years, but it was only when I began working on this piece that I fully realised just how many features it has added, as part of an ambition to do 'something intelligent' with every query. Google Scholar, which searches academic papers, is very useful, and will become more so. The powerful calculator feature, which will do advanced maths as well as highly practical things like converting square feet into metres, is useful. The character ~ lets you search for synonyms, and is useful. Google News, which was invented by an engineer, Krishna Bharat, using his 20 per cent time to come up with a broadly global news service in the wake of 9/11, is useful, and terrifies conventional news organisations. The translation service isn't useful yet, but I bet it will be one day. The command 'define' is a useful quick way of finding what a word means. The blog search is fairly handy and will get better. Google Earth isn't particularly useful, but it is brutally cool: you begin with a satellite view and gradually descend to earth, homing in with a level of detail which can give you a view of your own house (also, it turns out, of secret military installations). Gmail, with its super-swift searching and 2GB of free space, is amazing, if you don't mind the fact that your email is scanned and used to target ads (and stored indefinitely). Google Maps is useful, and, because Google lets people write APIs (application programming interfaces) to adapt its programs in ways they find personally helpful, will grow more and more useful over time. One dark example: an API giving a map of sex offenders in the USA, which lets people see whether there are any registered sex offenders near them, and where the sex offender lives. Nice.

On a lighter note, Froogle, the shopping search service, is sort of useful, and has a feature which chills the blood of conventional retailers: when you're out in the high street and see something you want to buy, you can text its name to 64664 and Froogle will text back the best price it can find online. Also cool is Google Zeitgeist, which tells you which search terms have most increased in frequency in the past year. For 2005 the top five items are Myspace, Ares, Baidu, Wikipedia and Orkut - all of which, I notice in my trendspotting hat, involve some sort of sharing, searching, meeting or collaborating online. It must be said that the coolness of Zeitgeist is reduced by the fact that it no longer lists the most declining search terms. In 2002, the last year they gave this info, the five most increased searches were for Spider-Man, Shakira, Winter Olympics, World Cup and Avril Lavigne; the five most decreased searches were for Nostradamus, Napster, Anthrax, World Trade Center and Osama bin Laden. Thus did we recover from the trauma of 9/11.

Technologically, Google is an amazing thing. As for whether it is a good thing, that depends on what happens next. The company is keen to stress that, because of the voting structure of its shareholdings, it remains in the control of its founders. It is keen to send little signals of its own geekiness: its official IPO filing, for instance, announced that it would sell $2,718,281,828 worth of shares – a number based on e, the so-called natural logarithm, a number intimately familiar to maths nerds. On 18 August last year the company announced that it would sell 14,159,265 shares, with the intention of raising about $4 billion in cash, to do they would not say what – the point here (apart from the huge amount of money) being that the number of shares was based on the value of pi, 3.14159265. And then there's the fact that Google makes itself available in dozens of languages, including pig Latin and Klingon. These unfunny semi-jokes are designed to show that Google is rooted in the same comp. sci. culture in which it was born, and retains the same focus on the pure excellence of its products.

That does not mean that Google is always aware of the consequences of its actions in the wider world. A strength of the firm - its rootedness in grad student nerd culture - is also a weakness, in the form of a certain arrogance and unwillingness to pay attention to views emanating from lesser forms of life. The example of this currently preoccupying the publishing business is Google Book Search, the plan to scan all the world's books and have them available for search. This sounds ambitious, to put it mildly, but Google have the resources and the determination to do it, and they have been working at it for some time, beginning with the libraries of Michigan, Stanford and Oxford. They are digitising millions of books in these collections, and have already begun providing access to the out of copyright volumes. Google began to digitise currently copyrighted books in America until they were stopped by a lawsuit from the American Association of Publishers.

A fundamental clash of cultures is at work here. To Google, with its mission to 'organise the world's information and make it universally accessible and useful', it is obvious that books, which contain so much information - accurate information too, far more so than on the web - must be searchable online. The plan is not simply to give the books away: although the whole book will be scanned and stored, only specific fragments of text will be displayed. It will be the best shop window ever for obscure texts. Besides, isn't their company policy 'Don't be evil'’? But to publishers, there is something outrageously hypocritical about the contrast between Google’s ferocious protection of its own intellectual property rights and its contempt for everyone else's. What's to stop Google giving free online access to the books once they are scanned? It's probably against the law, sure, but a sufficiently ruthless company which perceived a sufficiently strong demand could find ways around that. Once the texts were scanned and stored, the only thing preventing every writer's work from being given away free would be a few pieces of computer code on Google's servers. At the moment Google say they have no intention of providing access to this content; but why should anybody believe them?

More generally, the biggest single area of worry about Google involves privacy. This has been a long-running subject of concern on the net, but thanks to an op-ed piece in the New York Times in November it has begun to attract some wider attention. The paper pointed out that the prosecution in a recent North Carolina strangulation case drew into evidence the fact that the defendant had made Google searches on the words 'neck' and 'snap'. This brought to wider notice the fact that Google logs all the searches made on it, and stores this information indefinitely; and Google installs a cookie on the computer of everyone who uses it, which helps log that user's searches, and which isn't due to expire until 2038. Because every computer has a unique IP address, every visit to every website can be traced back to the computer making it - a fact well known in geek circles but remarkably under-publicised outside them. (Last April a Chinese journalist called Shi Tao was given ten years in jail for 'leaking state secrets' after Yahoo! in Hong Kong handed over information linking his IP address and his email to the Chinese authorities.) Users of Google's Gmail service have already given the company their identity, a full record of all their searches, and copies of all their emails, stored indefinitely. According to the tech guru Robert Cringely, the future of Google lies in combining the company's knowledge of who you are with its Google Video service to produce microscopically targeted TV ads. 'Google imagines a world where only single people see match.com ads, and people who can't drive see ads from taxi companies where others see Toyota campaigns. Where fraternities see ads for strip clubs, beer, Cancun weekends and LSAT prep courses, and only seniors (and their adult children) see ads for Alzheimer's drugs.' In case that doesn't seem sufficiently dystopian, one should bear in mind that the information stored at Google is vulnerable to legal subpoena. It's not hard to imagine this information being sought by governments, litigants or divorcing spouses, and the list does not stop there. Google badly needs to develop tools which ensure privacy.

The alarming potency of Google as a way of finding out information about people is a different subject; though the fact that its potency can be alarming is not in dispute. A journalist at Cnet, a tech-news portal, did half an hour's Google research on Eric Schmidt, the chief executive of the company, and published the results, by way of showing just how effective Google was at this kind of thing. Schmidt, outraged, threw a major strop and Google announced it would not speak to anyone from Cnet for a year (so there!). But personal information is easily found, especially in America, where phone directories are reverse-searchable and social security numbers are simply obtained. So far, everyone who has invested in Google has made out like the proverbial bandit; but one day the share price will drop, and people who've bought shares will find that they've lost money. It is then that Google's leaders will come under pressure to find some uses for that unprecedented goldmine of personal data. As for privacy in relation to governments, the company's existing privacy policy says that 'we may share information' if 'we conclude that we are required by law or have a good faith belief that access, preservation or disclosure of such information is reasonably necessary to protect the rights, property or safety of Google, its users or the public.' You don't have to be Diogenes the Cynic to think that this gives Google the latitude to do pretty much whatever it wants. Let's not forget that in February 2004 Google, having brought its news service to China, immediately gave in to the Chinese government and omitted links to sites which the Chinese government did not want its citizens to see. This was the first big test of Google's loudly proclaimed 'Don't be evil' policy in a context where the company would have been preferring principle to money, and it was one they failed.

Putting all this together, we reach the conclusion that, on the one hand, Google is cool. On the other hand, Google has the potential to destroy the publishing industry, the newspaper business, high street retailing and our privacy. Not that it will necessarily do any of these things, but for the first time, considered soberly, these things are technologically possible. The company is rich and determined and is not going away any time soon. They know what they are doing technologically; socially, though, they can't possibly know, and I don't think anyone else can either. These are the earliest days in a process of what may turn out to be radical change. The best historical analogy for where Google is today probably comes from the time when the railroads were being built. Everyone knew that trains and railways would change the world, but no one predicted the invention of suburbs. Google, and the increased flow of information on which it rides and from which it benefits, is the railway. I don't think we've yet seen the first suburbs.

Books by Title

Books by Author

Books by Topic

Bits of Books - Books by Title

The Google Story

Bits of Books To Impress