cjklib — Han character library

Cjklib provides language routines related to Han characters (characters based on Chinese characters named Hanzi, Kanji, Hanja and chu Han respectively) used in writing of the Chinese, the Japanese, infrequently the Korean and formerly the Vietnamese language(s). Functionality is included for character pronunciations, radicals, glyph components, stroke decomposition and variant information.

This document is about version 0.3.2, see http://cjklib.org/ for the newest and http://cjklib.org/current for the current development version. The project is hosted on http://code.google.com/p/cjklib. See http://characterdb.cjklib.org/ for a collaborative effort on gathering language data for cjklib.

Contents:

Examples

Get characters by pronunciation (here: “국” in Korean):
>>> from cjklib import characterlookup
>>> cjk = characterlookup.CharacterLookup('T')
>>> cjk.getCharactersForReading(u'국', 'Hangul')
[u'匊', u'國', u'局', u'掬', u'菊', u'跼', u'鞠', u'鞫', u'麯', u'麴']
Get stroke order of characters:
>>> cjk.getStrokeOrder(u'说')
[u'㇔', u'㇊', u'㇔', u'㇒', u'㇑', u'㇕', u'㇐', u'㇓', u'㇟']
Convert pronunciation data (here from Pinyin to IPA):
>>> from cjklib.reading import ReadingFactory
>>> f = ReadingFactory()
>>> f.convert(u'lǎoshī', 'Pinyin', 'MandarinIPA')
u'lau˨˩.ʂʅ˥˥'
Access a dictionary (here using Jim Breen’s EDICT):
>>> from cjklib.dictionary import EDICT
>>> d = EDICT()
>>> d.getForTranslation('Tokyo')
[EntryTuple(Headword=u'東京', Reading=u'とうきょう', Translation=u'/(n) Tokyo (current capital of Japan)/(P)/')]

Contact

For help or discussions on cjklib, join cjklib-devel@googlegroups.com.

Please report bugs to the project’s bug tracker.

Indices and tables