5. Phrases and Proximity

Modern Information Retrieval
Chapter 10: User Interfaces and Visualization

Next: 6. Natural Language and Up: 5. Query Specification Previous: 4. Graphical Approaches to

5. Phrases and Proximity

Boolean query specification!phrases Boolean query specification!proximity phrases, query specification with proximity, query specification with

In general, proximity information can be quite effective at improving precision of searches. On the Web, the difference between a single-word query and a two-word exact phrase match can mean the difference between an unmanageable mess of retrieved documents and a short list with mainly relevant documents.

A large number of methods for specifying phrases have been developed. The syntax in LEXIS-NEXIS requires the proximity range to be specified with an infix operator. For example, `white w/3 house' means `white within 3 words of house, independent of order.' Exact proximity of phrases is specified by simply listing one word beside the other, separated by a space. A popular method used by Web search engines is the enclosure of the terms between quotation marks. Shneiderman et al. [#!shneiderman98!#] suggest providing a list of entry labels, as suggested above for specifying facets. The difference is, instead of a disjunction, the terms on each line are treated as a phrase. This is suggested as a way to guide users to more precise query specification.

The disadvantage of these methods is that they require exact match of phrases, when it is often the case (in English) that one or a few words comes between the terms of interest. For example, in most cases the user probably wants `president' and `lincoln' to be adjacent, but still wants to catch cases of the sort `President Abraham Lincoln.' Another consideration is whether or not stemming is performed on the terms included in the phrase. The best solution may be to allow users to specify exact phrases but treat them as small proximity ranges, with perhaps an exponential fall-off in weight in terms of distance of the terms. This has been shown to be a successful strategy in non-interactive ranking algorithms [#!clarke96!#]. It has also been shown that a combination of quorum ranking of faceted queries with the restriction that the facets occur within a small proximity range can dramatically improve precision of results [#!hearst96a!#,#!mitra98!#].

Next: 6. Natural Language and Up: 5. Query Specification Previous: 4. Graphical Approaches to