cjklib.reading.converter
— Conversion between character readings¶
Architecture¶
The basic method is convert()
which converts one input string from one reading to another.
The method getDefaultOptions()
will return the conversion default settings.
What gets converted¶
The conversion process uses the
ReadingOperator
for the source reading to
decompose the given string into the single entities. The decomposition
contains reading entities and entities that don’t represent any
pronunciation. While the goal is to convert included reading entities to the
target reading, some convertes might decide to also convert non-reading
entities. This can be for example delimiters like apostrophes that differ
between romanisations or punctuation marks that have a defined
representation in the target system, e.g. Braille.
Errors¶
By default conversion won’t stop on entities that closely resemble other
reading entities but itself are not valid. Those will turn up unchanged in
the result and can cause a CompositionError
when the target operator decideds that it is impossible to link a converted
entity with a non-converted one as it would make it impossible to later
determine the entity boundaries.
Most of those errors will probably result from bad input
that fails on conversion. This can be solved by telling the source operator
to be strict on decomposition (where supported) so that the error will
be reported beforehand. The followig example tries to convert xiǎo tōu
(“thief”), misspelled as *xiǎo tō:
>>> from cjklib.reading import ReadingFactory
>>> f = ReadingFactory()
>>> print f.convert(u'xiao3to1', 'Pinyin', 'GR',
... sourceOptions={'toneMarkType': 'numbers'})
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
...
cjklib.exception.CompositionError: Unable to delimit non-reading entity 'to1'
>>> print f.convert(u'xiao3to1', 'Pinyin', 'GR',
... sourceOptions={'toneMarkType': 'numbers',
... 'strictSegmentation': True})
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
...
cjklib.exception.DecompositionError: Segmentation of 'to1' not possible or invalid syllable
Not being strict results in a lazy conversion, which might fail in some
cases as shown above. u'xiao3 to1'
(with a space in between) though will
work for the lazy way ('to1'
not being converted), while the strict
version will still report the wrong *to1.
Other errors that can arise:
AmbiguousDecompositionError
, if the source string can not be decomposed unambigiuously,ConversionError
, e.g. if the target system doesn’t support a feature given in the source string, andAmbiguousConversionError
, if a given entity can be mapped to more than one entity in the target reading.
Bridge¶
Conversions between two Readings can be made using a third reading
if no direct conversion is defined. This reading is called a
bridge reading and is implemented in
BridgeConverter
. Using the routines
from the ReadingFactory
will automatically employ
bridges if needed.
Examples¶
Convert a string from Jyutping to Cantonese Yale:
>>> from cjklib.reading import ReadingFactory
>>> f = ReadingFactory()
>>> f.convert('gwong2jau1waa2', 'Jyutping', 'CantoneseYale')
u'gwóngyāuwá'
This is also possible creating a converter instance explicitly using the factory:
>>> jyc = f.createReadingConverter('GR', 'Pinyin')
>>> jyc.convert('Woo.men tingshuo yeou "Yinnduhshyue", "Aijyishyue"')
u'Wǒmen tīngshuō yǒu "Yìndùxué", "Āijíxué"'
Convert between different dialects of the same reading Wade-Giles:
>>> f.convert(u'kuo3-yü2', 'WadeGiles', 'WadeGiles',
... sourceOptions={'toneMarkType': 'numbers'},
... targetOptions={'toneMarkType': 'superscriptNumbers'})
u'kuo³-yü²'
See PinyinDialectConverter
for more examples.
Reading conversions¶
- Mandarin Chinese
- cjklib.reading.converter.PinyinDialectConverter — Hanyu Pinyin dialects
- cjklib.reading.converter.WadeGilesDialectConverter — Wade-Giles dialects
- cjklib.reading.converter.PinyinWadeGilesConverter — Hanyu Pinyin to Wade-Giles
- cjklib.reading.converter.GRDialectConverter — Gwoyeu Romatzyh dialects
- cjklib.reading.converter.GRPinyinConverter — Gwoyeu Romatzyh to Pinyin
- cjklib.reading.converter.PinyinIPAConverter — Hanyu Pinyin to IPA
- cjklib.reading.converter.PinyinBrailleConverter — Pinyin to Braille
- Cantonese
- Shanghainese