WadeGilesOperator – Wade-Giles¶
cjklib.reading.operator.WadeGilesOperator
is an implementation
of the Mandarin Chinese romanisation Wade-Giles. It was in common use before
being replaced by Pinyin.
Features:
- tones marked by either superscript or plain digits,
- flexibility with derived writing, e.g. szu instead of ssu,
- alternative representation of characters ŭ and ê,
- handling of omissions of umlaut ü with resulting ambiguity,
- alternative marking of neutral tone (qingsheng) with either no mark or digits zero or five,
- configurable apostrophe for marking aspiration,
- placement of hyphens between syllables and
- guessing of input form (reading dialect).
Specifics¶
Alterations¶
While the Wade-Giles romanisation system itself is a modification by H. A. Giles, some further alterations exist, requiring an adaptable solution to parse transliterated text.
Diacritics¶
While non-retroflex zero final syllables tzŭ, tz’ŭ and ssŭ carry a
breve on top of the u in the standard realization of Wade-Giles, it is
often left out while creating no ambiguity. In the same fashion finals
-ê, -ên and -êng, also syllable êrh, carry a circumflex over the
e which often is not written, and no ambiguity arises as no equivalent
forms with a plain e exist. These forms can be handled by setting options
'zeroFinal'
to 'u'
and 'diacriticE'
to 'e'
.
Different to that, leaving out the umlaut on the u for finals -ü,
-üan, -üeh and -ün does create forms where back-conversion for some
cases is not possible as an equivalent vowel u form exists. Unambiguous
forms consist of initial hs- and y- (exception yu) and/or finals
-üeh and -üo, the latter being dialect forms not in use today. So
while for example hsu can be unambiguously converted back to its correct
form hsü, it is not clear if ch’uan is the wanted form or if it stems
from ch’üan, its diacritics being mangled. This reporting is done by
checkPlainEntity()
.
The omission of the umlaut can be controlled by setting
'umlautU'
to 'u'
.
Others¶
For the non-retroflex zero final forms tzŭ, tz’ŭ and ssŭ the latter
is sometimes changed to szŭ. The operator can be configured by setting
the Boolean option 'useInitialSz'
.
The neutral tone by default is not marked. As sometimes the digits zero or
five are used, they can be set by option 'neutralToneMark'
.
The apostrophe marking aspiration can be set by 'wadeGilesApostrophe'
.
Tones are by default marked with superscript characters. This can be
controlled by option 'toneMarkType'
.
Recovering omitted apostrophes for aspiration is not possible as for all cases there exists ambiguity. No means are provided to warn for possible missing apostrophes. In case of uncertainty check for initials p-, t-, k-, ch-, ts and tz.
Examples¶
The WadeGilesDialectConverter
allows
conversion between said forms.
- Restore diacritics:
>>> from cjklib.reading import ReadingFactory >>> f = ReadingFactory() >>> f.convert(u"K’ung³-tzu³", 'WadeGiles', 'WadeGiles', ... sourceOptions={'zeroFinal': 'u'}) u'K\u2019ung\xb3-tz\u016d\xb3' >>> f.convert(u"k’ai¹-men²-chien⁴-shan¹", 'WadeGiles', 'WadeGiles', ... sourceOptions={'diacriticE': 'e'}) u'k\u2019ai\xb9-m\xean\xb2-chien\u2074-shan\xb9' >>> f.convert(u"hsueh²", 'WadeGiles', 'WadeGiles', ... sourceOptions={'umlautU': 'u'}) u'hs\xfceh\xb2'
- But:
>>> f.convert(u"hsu⁴-ch’u³", 'WadeGiles', 'WadeGiles', ... sourceOptions={'umlautU': 'u'}) Traceback (most recent call last): ... cjklib.exception.AmbiguousConversionError: conversion for entity 'ch’u³' is ambiguous: ch’u³, ch’ü³
- Guess non-standard form:
>>> from cjklib.reading import operator >>> operator.WadeGilesOperator.guessReadingDialect( ... u"k'ai1-men2-chien4-shan1") {'zeroFinal': u'\u016d', 'diacriticE': u'e', 'umlautU': u'\xfc', 'toneMarkType': 'numbers', 'useInitialSz': False, 'neutralToneMark': 'none', 'wadeGilesApostrophe': "'"}