I’ve been doing more searching than usual in Google Scholar in the course of editing a manuscript for the September/October 2013 issue of Online Searcher. I know that both librarians and students have mixed emotions about GS, and there have been many articles, blog posts, conference presentations, and tweets about it. I was delighted, last November, to hear about developments from its creator, Anurag Acharya, Google Distinguished Engineer, at the Charleston Conference.
Here’s just a few of the things that delighted/depressed me in my recent search experience. I searched for Marydee Ojala. OK, that’s me. And Google figured out that the M Ojala in the author field was that same Marydee. Not Markus, who’s doing a PhD at Helsinki University. Not Matti, who’s in the Department of Agricultural Sciences at the same institution. Not Mace, the librarian who organizes the unconference Cycling for Libraries. I have a mental image of a Google algorithm that, in extremely unscientific terms, sees Marydee Ojala, notes that I write about online searching, and eliminates social science research, agricultural research, and cycling to present me with completely relevant results. Well done, Google Scholar!
Then I got to the sort options. That’s when depression set in. On my name, I had 696 hits. Un-ticking the box for Citations, the hit count was further reduced, to 308. When I sort by date, that goes down to 1. OK, it’s a book review I wrote in 2013 and the sort does say that it only does articles posted (not published, mind you) in the last year. So I hit the back arrow and suddenly my results, now sorted by relevance, went down to 545. Again un-ticking the Citation box, the hit count is now 196.
Several other author searches performed similarly, so it’s not just me.
Going back to that date sorting: It’s important to remember that it’s not by publication date. One search, on “digital libraries”, retrieved 3,540 hits when limited to Since 2013. The sort by date winnowed it down to 606. Shouldn’t the number of articles in the last year be the same as since 2013? Looking at the first 10 hits in the sorted results list, they are, indeed, sorted by date posted. But the publication dates vary. Again, this is very unscientific, as I looked only at the first 10, but although most were published during the May-June 2013 time frame, they were not sorted by the publication date and one was published in 2006.
I can think of a number of reasons why Google Scholar has trouble with its sorting. Publisher embargoes and the methodology of web scraping to populate Scholar undoubtedly play a role. Still, it’s an interesting exercise to unravel how and why Google Scholar does what it does.