PinyinDialectConverter — Hanyu Pinyin dialects

Specifics

Examples

The following examples show how to convert between different representations of Pinyin.

  • Create the Converter and convert from standard Pinyin to Pinyin with tones represented by numbers:

    >>> from cjklib.reading import *
    >>> targetOp = operator.PinyinOperator(toneMarkType='numbers')
    >>> pinyinConv = converter.PinyinDialectConverter(
    ...     targetOperators=[targetOp])
    >>> pinyinConv.convert(u'hànzì', 'Pinyin', 'Pinyin')
    u'han4zi4'
    
  • Convert Pinyin written with numbers, the ü (u with umlaut) replaced by character v and omitted fifth tone to standard Pinyin:

    >>> sourceOp = operator.PinyinOperator(toneMarkType='numbers',
    ...    yVowel='v', missingToneMark='fifth')
    >>> pinyinConv = converter.PinyinDialectConverter(
    ...     sourceOperators=[sourceOp])
    >>> pinyinConv.convert('nv3hai2zi', 'Pinyin', 'Pinyin')
    u'nǚháizi'
    
  • Or more elegantly:

    >>> f = ReadingFactory()
    >>> f.convert('nv3hai2zi', 'Pinyin', 'Pinyin',
    ...     sourceOptions={'toneMarkType': 'numbers', 'yVowel': 'v',
    ...     'missingToneMark': 'fifth'})
    u'nǚháizi'
    
  • Decompose the reading of a dictionary entry from CEDICT into syllables and convert the ü-vowel and forms of Erhua sound:

    >>> pinyinFrom = operator.PinyinOperator(toneMarkType='numbers',
    ...     yVowel='u:', Erhua='oneSyllable')
    >>> syllables = pinyinFrom.decompose('sun1nu:r3')
    >>> print syllables
    ['sun1', 'nu:r3']
    >>> pinyinTo = operator.PinyinOperator(toneMarkType='numbers',
    ...     Erhua='twoSyllables')
    >>> pinyinConv = converter.PinyinDialectConverter(
    ...     sourceOperators=[pinyinFrom], targetOperators=[pinyinTo])
    >>> pinyinConv.convertEntities(syllables, 'Pinyin', 'Pinyin')
    [u'sun1', u'nü3', u'r5']
    
  • Or more elegantly with entities already decomposed:

    >>> f.convertEntities(['sun1', 'nu:r3'], 'Pinyin', 'Pinyin',
    ...     sourceOptions={'toneMarkType': 'numbers', 'yVowel': 'u:',
    ...        'Erhua': 'oneSyllable'},
    ...     targetOptions={'toneMarkType': 'numbers',
    ...        'Erhua': 'twoSyllables'})
    [u'sun1', u'nü3', u'r5']
    
  • Fix cosmetic errors in Pinyin input (note tone mark and apostrophe):

    >>> f.convert(u"Wǒ peí nǐ qù Xīān.", 'Pinyin', 'Pinyin')
    u"Wǒ péi nǐ qù Xī'ān."
    
  • Fix more errors in Pinyin input (note diacritics):

    >>> string = u"Wŏ peí nĭ qù Xīān."
    >>> dialect = operator.PinyinOperator.guessReadingDialect(string)
    >>> f.convert(string, 'Pinyin', 'Pinyin', sourceOptions=dialect)
    u"Wǒ péi nǐ qù Xī'ān."
    

Class