1.1.11. Searching

Searching and retrieving is the main way of selecting part of the database. As one seldom manipulates at the same time the whole database, search is the preferred approach to data.

Presently library OPACs and BMS seem to consider a window-structured and driven search interface as the most appropriate tool, (better: the single Google-like window is gaining audience). No symbols, no explicit logic is required from the user. The user is offered several superimposed windows to enter data and combo boxes to select boolean operators connecting the windows: that's it. This seems to be simple and efficient, but it is also deceptive. It makes searching easier and finding faltering.
We still think and speak with clauses and pauses. (A OR B) and (C OR D) is not such a complicated query: I look for« (children OR adolescents) AND (death OR suicide)». With the mainstream dumb window-structured search interface, such a basic query statement becomes impossible to formulate, because parentheses are not foreseen. The algorithm which governs the syntax and the priority among boolean operators is hidden and the expression is commonly transformed into: «(children OR (adolescents AND death) OR suicide) or in (((children OR adolescents) AND death) OR suicide)». In the first case priority is given to the type of operator, in the latter priority is top down. None of them gives the appropriate response to the abovementioned query.
The alternative is not necessarily full SQL (Structured Query Language), it is enough that one can make use of parentheses, as in the first statement that we made above.

Any field (full text indexing), truncation and phrase queries are essential.

Accented letters (e é è ê) should not make any difference, like case (upper=lower): as a matter of fact only the latter is a de facto standard, whereas the former is very variable and deceiving.

Searching in the result ('refine') and saving query expressions are a bit less essential (though to be deeply appreciated).

The use of browsing term lists directly pointing to the records is really useful.

When indexes are rudimentary based on a 1:1 correspondence with the fields that originate them, you end by having the authors field that generates one index, translators another, directors still another and so on with editors etc., and the same for titles (article, host document, journals, series, translated...): the outcome is deceiving from the point of view of a field based searching.
Because you know that "Umberto Eco" is an author, somebody "who writes", and when you search for it in a database you mainly want to retrieve the records where he has got an intellectual responsibility, no matter if he acted as translator or editor or author, this is something you will investigate later. You certainly accept and expect for the difference between Eco as an author and Eco as the subject of an essay. It is at least irritating being forced to make three or four different searches or a dumb full-text (any field) searching to retrieve all the records where Umberto Eco was involved as a writer.
Despite the traditional indexing approach in library catalogs, several BMS offer this kind of poor 1:1 field based indexing. The alternative is field clustering, i.e. one index for all the 'names' fields (authors, contributors, editors, translators, directors etc.), better if flexible, whereby you can decide which fields to link to a given index. This is already reality in several packages: judgement is involved more than engineering.

Other aspects like soundex, fuzzy, relevance ranking operators are a bit finical in this context.

Z39.50 searching remote database is important to retrieve and import data.

If the BMS implements the OpenUrl protocol you will be able to send data of your database records to the relevant -often 'local'- OpenUrl link resolver in order to ask for specific services, such as full-text article, document delivery, searching the local OPAC for a physical copy ...


Table of contents  | Index