Use Google Scholar, Scopus and Web of Science for Comprehensive Citation Tracking
Design – Citation analysis, observational study.
Setting – Three citation tracking databases: Google Scholar, Scopus and Web of Science.
Subjects – Citations from eleven journals each from the disciplines of oncology and condensed matter physics for the years 1993 and 2003.
Methods – The researchers selected eleven journals each from the list of journals from Journal Citation Reports 2004 for the categories “Oncology” and “Condensed Matter Physics” using a systematic sampling technique to ensure journals with varying impact factors were included. All references from these 22 journals were retrieved for the years 1993 and 2003 by searching three databases: Web of Science, INSPEC, and PubMed. Only research articles were included for the purpose of the study. From these, a stratified random sample was created to proportionally represent the content of each journal (oncology 1993: 234 references, 2003: 259 references; condensed matter physics 1993: 358 references, 2003: 364 references). In November of 2005, citations counts were obtained for all articles from Web of Science, Scopus and Google Scholar. Due to the small sample size and skewed distribution of data, non-parametric tests were conducted to determine whether significant differences existed between sets.
Main results – For 1993, mean citation counts were highest in Web of Science for both oncology (mean = 45.3, SD = 77.4) and condensed matter physics (mean = 22.5, SD = 32.5). For 2003, mean citation counts were higher in Scopus for oncology (mean = 8.9, SD = 12.0), and in Web of Science for condensed matter physics (mean = 3.0, SD = 4.0). There was not enough data for the set of citations from Scopus for condensed matter physics for 1993 and it was therefore excluded from analysis. A Friedman test to measure for differences between all remaining groups suggested a significant difference existed, and so pairwise post-hoc comparisons were performed. The Wilcoxon Signed Ranked tests demonstrated significant differences “in citation counts between all pairs (p < 0.001) except between Google Scholar and Scopus for CM physics 2003 (p = 0.119).”
The study also looked at the number of unique references from each database, as well as the proportion of overlap for the 2003 citations. In the area of oncology, there was found to be 31% overlap between databases, with Google Scholar including the most unique references (13%), followed by Scopus (12%) and Web of Science (7%). For condensed matter physics, the overlap was lower at 21% and the largest number of unique references was found in Web of Science (21%), with Google Scholar next largest (17%) and Scopus the least (9%). Citing references from Google Scholar were found to originate from not only journals, but online archives, academic repositories, government and non-government white papers and reports, commercial organizations, as well as other sources.
Conclusion – The study does not confirm the authors’ hypothesis that differing scholarly coverage would result in different citation counts from the three databases. While there were significant differences in mean citation rates between all pairs of databases except for Google Scholar and Scopus in condensed matter physics for 2003, no one database performed better overall. Different databases performed better for different subjects, as well as for different years, especially Scopus, which only includes references starting in 1996. The results of this study suggest that the best citation database will depend on the years being searched as well as the subject area. For a complete picture of citation behaviour, the authors suggest all three be used.
Evidence Based Library and Information Practice (EBLIP) | EBLIP on Twitter