The following examples show how to convert between different representations of Pinyin.
Create the Converter and convert from standard Pinyin to Pinyin with tones represented by numbers:
>>> from cjklib.reading import *
>>> targetOp = operator.PinyinOperator(toneMarkType='numbers')
>>> pinyinConv = converter.PinyinDialectConverter(
... targetOperators=[targetOp])
>>> pinyinConv.convert(u'hànzì', 'Pinyin', 'Pinyin')
u'han4zi4'
Convert Pinyin written with numbers, the ü (u with umlaut) replaced by character v and omitted fifth tone to standard Pinyin:
>>> sourceOp = operator.PinyinOperator(toneMarkType='numbers',
... yVowel='v', missingToneMark='fifth')
>>> pinyinConv = converter.PinyinDialectConverter(
... sourceOperators=[sourceOp])
>>> pinyinConv.convert('nv3hai2zi', 'Pinyin', 'Pinyin')
u'nǚháizi'
Or more elegantly:
>>> f = ReadingFactory()
>>> f.convert('nv3hai2zi', 'Pinyin', 'Pinyin',
... sourceOptions={'toneMarkType': 'numbers', 'yVowel': 'v',
... 'missingToneMark': 'fifth'})
u'nǚháizi'
Decompose the reading of a dictionary entry from CEDICT into syllables and convert the ü-vowel and forms of Erhua sound:
>>> pinyinFrom = operator.PinyinOperator(toneMarkType='numbers',
... yVowel='u:', Erhua='oneSyllable')
>>> syllables = pinyinFrom.decompose('sun1nu:r3')
>>> print syllables
['sun1', 'nu:r3']
>>> pinyinTo = operator.PinyinOperator(toneMarkType='numbers',
... Erhua='twoSyllables')
>>> pinyinConv = converter.PinyinDialectConverter(
... sourceOperators=[pinyinFrom], targetOperators=[pinyinTo])
>>> pinyinConv.convertEntities(syllables, 'Pinyin', 'Pinyin')
[u'sun1', u'nü3', u'r5']
Or more elegantly with entities already decomposed:
>>> f.convertEntities(['sun1', 'nu:r3'], 'Pinyin', 'Pinyin',
... sourceOptions={'toneMarkType': 'numbers', 'yVowel': 'u:',
... 'Erhua': 'oneSyllable'},
... targetOptions={'toneMarkType': 'numbers',
... 'Erhua': 'twoSyllables'})
[u'sun1', u'nü3', u'r5']
Fix cosmetic errors in Pinyin input (note tone mark and apostrophe):
>>> f.convert(u"Wǒ peí nǐ qù Xīān.", 'Pinyin', 'Pinyin')
u"Wǒ péi nǐ qù Xī'ān."
Fix more errors in Pinyin input (note diacritics):
>>> string = u"Wŏ peí nĭ qù Xīān."
>>> dialect = operator.PinyinOperator.guessReadingDialect(string)
>>> f.convert(string, 'Pinyin', 'Pinyin', sourceOptions=dialect)
u"Wǒ péi nǐ qù Xī'ān."