Choosing a search option

Having chosen the search field (or using the default setting) you must choose a search option for the field in question. Different field types allow different search options:

Please note that these expressions should be read as continuations of what is written before them. The search criteria should be read as one whole statement. This means that you may be searching for

The expression "Text contains a character or a word which begins with "人"" is not ideal - a character obviously cannot begin with "人". It should be read as "Text contains a character - or (if a character is not searched for) a word which begins with "人"," but this is obviously too long. Since strings of characters can be searched for, it should really be "Text contains a string of characters - or (if a string of characters is not searched for) a word which begins with "人";" however, since the string may possibly consist of a single character, one would even have to expand on that! The difficulty stems from the different behaviours of words in non-ideographic scripts and characters in ideographic scripts which do not form words that a computer can recognise.

Text searches relate to searches for words in Roman letters or to Chinese characters.

"Words" are strings of Roman letters that are separated by spaces or punctuation marks or that appear in the beginning or end of fields. In the "Definition" field, a search for a word which begins with "man" will retrieve records with "manifest," "manner", etc., as well as "man", whereas a search for a word which ends with "man" will find records with "gentleman," "woman," etc. as well as "man". A search for a word which equals "man" will (not surprisingly) result in records with the exact word "man" in the search field (but also, e.g. "middle-man" which is regarded as two words on account of the dash). The records which contain a word which equals "man," may of course contain other words. A search for a word which contains "man" will result in most records; in addition to all of those mentioned above, a word such as "womanish" will occur here.

Each field in the database is indexed in a certain way, and this affects the search options one can use.

Indexing affects searches for words written with Roman letters. In fields where Roman text occurs one will usually have searches that are not case sensitive, so searches for "MAN" and "man" (and so on) will result in the same records being found.

"Chinese characters" are a more difficult to explain, but searches for Chinese characters are also carried out differently according to how the fields they occur in are indexed, though case is of course not relevant.

Chinese characters can (1) occur in fields that are indexed in a way that makes a string of characters equivalent to a single word, or they can (2) occur in fields in which each character is counted as one word. When the latter way of indexing is used, it makes no sense to talk of searches for words which begin (or end) with a certain character (or a certain string of characters); all searches are, in fact, searches where one finds records which contain the character (or a string of characters) one searches for. The menus, however, still give the option of choosing between the different search options, but these are in fact not relevant. If, on the other hand, a field is indexed in a way that makes a string of Chinese character equivalent to a word, one can sensibly search for character strings which begin or end in a certain way.

If one, e.g., has a dictionary with Chinese headwords, one would normally index the field in a way that allows searches for character strings beginning and ending with a certain character (or character string); otherwise one would find the character (or character string) in question, no matter where in the headword it occurred, even if one searched for character string beginning with the character in question. On the other hand, if one has a field containing passages from a text, one would normally index this in a way that treated each character as a single word; otherwise one would not find "習" if one searched for it, though one of the fields read "學而時習之"; the characters "學而時" intervene and counts as the beginning of the word/character string. Fields containing Roman text may also contain Chinese characters; in these fields a single Chinese character is generally treated as one word, i.e. searches for the character in question bring up records in which the character occurs in all positions, not just the beginning, though searches for Roman words can be searched with beginning/ends with. If the field in question read "The phrase 學而時習之" and one searched for "習", one would find it (if "學而時習之" were treated as one word, one would not have found it).

This relates to searches for records beginning with a certain character/character string; such searches are the default search option in the database. One can always overrule this behaviour by searching for records which contain the character/character string in question, i.e. even though the field in question indexes whole character strings as words, if one searches for records with words which contain certain characters, one would find the same record that one would find, had the field in question been indexed in a way that treats each character as a single word. One can therefore always search with a consistent use of search options, but one should bear in mind that searches for records which contain a certain character/character string are considerably slower than searches for records which begin with a certain character/character string.

In the explanation of the contents of each field, it is indicated whether searches for Chinese characters belong to the first or the second type: "searches for Chinese characters treat each character as a single word" (meaning that search options are irrelevant) or "searches for Chinese characters treat each character string as a single word" (meaning that search options are relevant).

Jens Østergaard Petersen