A first look at Google Scholar Beta
Today's launch of the new Google Scholar beta site led me to spend an hour testing the site for quality and quantity of information.
Google Scholar uses a special algorithm (unknown to us, and probably a state secret) to calculate the "scholarliness" or seriousness of a particular hit in the Google database. The idea is to create a subset of the enormous Google database in order to satisfy less consumer-focused searches.
First some background. It's important to remember that over the last year or so, Google has been working feverishly to add a variety of invisible web content into its databases, including library catalogue records, invisible web databases like PubMed, and full-text books as part of its strange-bedfellow partnership with Amazon.com through A9.com. This previously unindexed content is likely the source of many of the 3 billion pages that Google added to its index last week. Several of us in the search assessment world have been griping lately about the fact that most of this content would never be found in a typical Google search. (See my article Just because it's indexed doesn't mean you'll find it for a little background.) This beta rollout of Google Scholar directly addresses that criticism.
In my first glance at Google Scholar, I see that although much of this newly indexed content is delivered through this site, much of it leads only to citations or licensed databases of full text content that can't be accessed for free.
For example, a search of "human resources" benchmarking turned up approximately 4700 results today (versus 383,000 in the same search in straight-up Google). However, many of those results are delivered as links to full text articles or books from popular commercial publishers and aggregators like Wiley, Ingenta, and Blackwell, with relatively few of those available for free.
I also tested a medical search, for the keywords sumatriptan migraine in Google Scholar. As I had hoped for and expected, most of the initial results came from the PubMed database content that Google has been indexing for the last year or so, and the PubMed result comes up on click. However, this should not be taken as a recommendation to search PubMed through Google Scholar rather than directly through PubMed, which offers an updated database and better interface options than Google or Google Scholar.
At this stage of Google Scholar, there are MANY results that users simply won't be able to access without paying themselves for the article, or finding an enterprise partner (like their school or college library) that owns the journal. That will likely create a level of frustration among users. One easy way for Google Scholar to solve the problem would be to model Findarticles.com, which enables users to check a box and omit all but free articles. (This would certainly lead to lower revenue and some pouty publishers, so don't bet it will happen.)
The one truly fabulous feature of Google Scholar (which really WILL help serious searchers and academics, rather than just confuse them) is the "cited by X" feature of the database. If Google has a link to another document in its database which has cited the article that comes up in the hit list, it will produce a link labelled "Cited by [#]" where # is the number of cached documents which cite the document. You can click on this link to bring up a page of the cited references. This isn't just cool, it may actually be useful to researchers interested in expanding their range of citation searching options, and also could be used in determining relevancy.
There's no advertising on Google Scholar -- at least for now. It's kind of a moot point anyhow. Although Danny Sullivan reported in today's Search Engine Watch that Google claims that is not earning any revenue from new subscriptions between searchers and publishers, my guess is that Google may have enticed publishers into making their content available on Google with the promise of some sort of revenue sharing or micropayments of some sort, and as a result Google will likely see enough revenue to make this return on investment a viable business option.
The presence of Google Scholar presents a really interesting problem to libraries around the globe. Libraries have been working furiously to make their licensed content accessible through federated searching INSIDE their library web sites, and it's been pretty easy to make users understand that the only way to access licensed content would be by entering the library's portal site.
Google Scholar has the potential to turn that model on its head. Students will begin to access links to journal and book content directly through Google Scholar, and depending which IP they are coming from, may or may not be able to access the full text content directly on click. Talk about confusion. In a typical scenario, when accessing Google Scholar from inside a campus IP which is connected to, say, Ingenta.com, students might have access to a full text journal link. But from OUTSIDE the IP, the same link will produce a "please buy me" result. Yikes.