6. Natural Language and Free Text Queries

Modern Information Retrieval
Chapter 10: User Interfaces and Visualization

Contents

Next: 6. Context Up: 5. Query Specification Previous: 5. Phrases and Proximity

6. Natural Language and Free Text Queries

query specification!natural language natural language free text queries

Statistical ranking algorithms have the advantage of allowing users to specify queries naturally, without having to think about Boolean or other operators. But they have the drawback of giving the user less feedback about and control over the results. Usually the result of a statistical ranking is the listing of documents and the association of a score, probability, or percentage beside the title. Users are given little feedback about why the document received the ranking it did and what the roles of the query terms are. This can be especially problematic if the user is particularly interested in one of the query terms being present.

mandatory operator

One search strategy that can help with this particular problem with statistical ranking algorithms is the specification of `mandatory' terms within the natural language query. This in effect helps the user control which terms are considered important, rather than relying on the ranking algorithm to correctly weight the query terms. But knowing to include a mandatory specification requires the user to know about a particular command and how it works.

The preceding discussion assumes that a natural language query entered by the user is treated as a bag of words, with stopwords removed, for the purposes of document match. However, some systems attempt to parse natural language queries in order to extract concepts to match against concepts in the text collection [#!jacobs93b!#,#!mccune85!#,#!strzalkowski99!#]. question answering Murax

Alternatively, the natural language syntax of a question can be used to attempt to answer the question. (Question answering in information access is different than that of database management systems, since the information desired is encoded within the text of documents rather than specified by the database schema.) The Murax system [#!kupiec93!#] determines from the syntax of a question if the user is asking for a person, place, or date. It then attempts to find sentences within encyclopedia articles that contain noun phrases that appear in the question, since these sentences are likely to contain the answer to the question. For example, given the question `Who was the Pulitzer Prize-winning novelist that ran for mayor of New York City?,' the system extracts the noun phrases `Pulitzer Prize,' `winning novelist,' `mayor,' and `New York City.' It then looks for proper nouns representing people's names (since this is a `who' question) and finds, among others, the following sentences:

The Armies of the Night (1968), a personal narrative of the 1967 peace march on the Pentagon, won Mailer the Pulitzer Prize and the National Book Award.

In 1969 Mailer ran unsuccessfully as an independent candidate for mayor of New York City.

Thus the two sentences link together the relevant noun phrases and the system hypothesizes (correctly) from the title of the article in which the sentences appear that Norman Mailer is the answer.

FAQ finder

Another approach to automated question answering is the FAQ finder system which matches question-style queries against question-answer pairs on various topics [#!burke97!#]. The system uses a standard IR search to find the most likely FAQ (frequently asked questions) files for the question and then matches the terms in the question against the question portion of the question-answer pairs.

Ask Jeeves A less automated approach to question answering can be found in the Ask Jeeves system [#!askjeeves!#]. This system makes use of hand-picked Web sites and matches these to a predefined set of question types. A user's query is first matched against the question types. The user selects the most accurate rephrase of their question and this in turn is linked to suggested Web sites. For example, the question `Who is the leader of Sudan?' is mapped into the question type `Who is the head of state of X (Sudan)?' where the variable is replacedby a listbox of choices, with Sudan the selected choice in this case. This is linked to a Web page that lists current heads of state. The system also automatically substitutes in the name `Sudan' in a query against that Web page, thus bringing the answer directly to the user's attention. The question is also sent to standard Web search engines. However, a system is only as good as its question templates. For example a question `Where can I find reviews of spas in Calistoga?' matches the question `Where can I find X (reviews) of activities for children aged Y (1)?' and `Where can I find a concise encyclopedia article on X (hot springs)?' query specification|)

Next: 6. Context Up: 5. Query Specification Previous: 5. Phrases and Proximity