Measuring the Extent of the Synonym Problem in Full-Text Searching

Authors

Jeffrey Beall University of Colorado Denver
Karen Kafadar

DOI:

Keywords:

Full-text searching, synonyms, search precision, information retrieval

Abstract

Objective – This article measures the extent of the synonym problem in full-text searching. The synonym problem occurs when a search misses documents because the search was based on a synonym and not on a more familiar term.

Methods – We considered a sample of 90 single word synonym pairs and searched for each word in the pair, both singly and jointly, in the Yahoo! database. We determined the number of web sites that were missed when only one but not the other term was included in the search field.

Results – Depending upon how common the usage is of the synonym, the percentage of missed web sites can vary from almost 0% to almost 100%. When the search uses a very uncommon synonym ("diaconate"), a very high percentage of web pages can be missed (95%), versus the search using the more common term (only 9% are missed when searching web pages for the term "deacons"). If both terms in a word pair were nearly equal in usage ("cooks" and "chefs"), then a search on one term but not the other missed almost half the relevant web pages.

Conclusion – Our results indicate great value for search engines to incorporate automatic synonym searching not only for user-specified terms but also for high usage synonyms. Moreover, the results demonstrate the value of information retrieval systems that use controlled vocabularies and cross references to generate search results.

Downloads

Download data is not yet available.

Author Biographies

Jeffrey Beall, University of Colorado Denver

Metadata Librarian Assistant Professor

Karen Kafadar

Rudy Professor of Statistics in the College of Arts and Sciences, Indiana University, Bloomington, Indiana, USA

Downloads

Published

2008-12-13

How to Cite

Beall, J., & Kafadar, K. (2008). Measuring the Extent of the Synonym Problem in Full-Text Searching. Evidence Based Library and Information Practice, 3(4), 18–33. https://doi.org/10.18438/B8MC85

Download Citation

Issue

Vol. 3 No. 4 (2008)

Section

Research Articles

License

The Creative Commons-Attribution-Noncommercial-Share Alike License 4.0 International applies to all works published by Evidence Based Library and Information Practice. Authors will retain copyright of the work.

Measuring the Extent of the Synonym Problem in Full-Text Searching

Authors

DOI:

Keywords:

Abstract

Downloads

Author Biographies

Jeffrey Beall, University of Colorado Denver

Karen Kafadar

Downloads

Published

How to Cite

Issue

Section

License

Similar Articles

Current Issue

about

submit