PinyinDialectConverter — Hanyu Pinyin dialects¶
Specifics¶
Examples¶
The following examples show how to convert between different representations of Pinyin.
Create the Converter and convert from standard Pinyin to Pinyin with tones represented by numbers:
>>> from cjklib.reading import * >>> targetOp = operator.PinyinOperator(toneMarkType='numbers') >>> pinyinConv = converter.PinyinDialectConverter( ... targetOperators=[targetOp]) >>> pinyinConv.convert(u'hànzì', 'Pinyin', 'Pinyin') u'han4zi4'
Convert Pinyin written with numbers, the ü (u with umlaut) replaced by character v and omitted fifth tone to standard Pinyin:
>>> sourceOp = operator.PinyinOperator(toneMarkType='numbers', ... yVowel='v', missingToneMark='fifth') >>> pinyinConv = converter.PinyinDialectConverter( ... sourceOperators=[sourceOp]) >>> pinyinConv.convert('nv3hai2zi', 'Pinyin', 'Pinyin') u'nǚháizi'
Or more elegantly:
>>> f = ReadingFactory() >>> f.convert('nv3hai2zi', 'Pinyin', 'Pinyin', ... sourceOptions={'toneMarkType': 'numbers', 'yVowel': 'v', ... 'missingToneMark': 'fifth'}) u'nǚháizi'
Decompose the reading of a dictionary entry from CEDICT into syllables and convert the ü-vowel and forms of Erhua sound:
>>> pinyinFrom = operator.PinyinOperator(toneMarkType='numbers', ... yVowel='u:', Erhua='oneSyllable') >>> syllables = pinyinFrom.decompose('sun1nu:r3') >>> print syllables ['sun1', 'nu:r3'] >>> pinyinTo = operator.PinyinOperator(toneMarkType='numbers', ... Erhua='twoSyllables') >>> pinyinConv = converter.PinyinDialectConverter( ... sourceOperators=[pinyinFrom], targetOperators=[pinyinTo]) >>> pinyinConv.convertEntities(syllables, 'Pinyin', 'Pinyin') [u'sun1', u'nü3', u'r5']
Or more elegantly with entities already decomposed:
>>> f.convertEntities(['sun1', 'nu:r3'], 'Pinyin', 'Pinyin', ... sourceOptions={'toneMarkType': 'numbers', 'yVowel': 'u:', ... 'Erhua': 'oneSyllable'}, ... targetOptions={'toneMarkType': 'numbers', ... 'Erhua': 'twoSyllables'}) [u'sun1', u'nü3', u'r5']
Fix cosmetic errors in Pinyin input (note tone mark and apostrophe):
>>> f.convert(u"Wǒ peí nǐ qù Xīān.", 'Pinyin', 'Pinyin') u"Wǒ péi nǐ qù Xī'ān."
Fix more errors in Pinyin input (note diacritics):
>>> string = u"Wŏ peí nĭ qù Xīān." >>> dialect = operator.PinyinOperator.guessReadingDialect(string) >>> f.convert(string, 'Pinyin', 'Pinyin', sourceOptions=dialect) u"Wǒ péi nǐ qù Xī'ān."