Lexeme Representations - General Description

Lexeme Representations, consist of a pronunciation of a word together with a set of graphs or characters conventionally used to write that word. TLS being an historical encyclopaedia, pronunciations have to bespecified as the series of reconstructions of ancient pronunciations followed by the modern Pinyin reading. See Chinese Phonology.

At present, TLS contains 15 503 such Lexeme Representations.

The characters used to write a given word can be many, in some cases, just as there are many spellings of words in English (hundreds of ways of spelling "through" in pre-Shakespearean manuscripts, for example). TLS is slowly building up a limited set of variant ways of writing the same word. Meanwhile, current character variants may be conveniently found in such dictionaries as Jianming gu Hanyu cidian 簡明古漢語詞典 (Chengdu 1989, often reprinted) and Zhang Yongyan's Gu Hanyu cidian 古漢語字典 (Chengdu: Bashu Shushe, 1998) both of which specify for which meanings of a word alternative characters may be used.

Nota bene:
What complicates the matter is the fact that TLS has been limited to the use of BIG5 font for the first decades of the project. This has forced the database to opt for "normalised" BIG5 versions of texts where modern technology would have allowed us to present an unnormalised text. A look at the presently available splendid digital edition of the Sibucongkan 四部叢刊 collection will show the huge extent of character variation in early editions. As a result of its technological limitation, TLS is not at present a reliable source for early Chinese orthography. The CHANT database from the Chinese University of Hong Kong provides much better critical texts, although even this digital source has problems of its own, since it involves extensively emended texts.

Christoph Harbsmeier

Close Window