Challenges in Concocting an Orthographic Romanization of Chinese

(Note: I assume Traditional Chinese in this post. Many points still apply when dealing with Simplified Chinese, though not all.)

I spent some of my time over the last month or so trying to invent an orthographic romanization of Chinese. By this, I mean that rather than a romanization system that uses strings of Latin letters to represent Chinese words that approximates the [insert dialect of Chinese] pronunciation of words, such a system uses string of letters that are based on the structure of the written Chinese characters (Hanzi). Why have such a system?

  1. It is easier to remember how it works, and has a lesser strain of memory than remembering Chinese characters.
  2. In a sense, this dedicates more to the frequent intention of Chinese characters to simultaneously categorize a term as to its meaning and hint towards its pronunciation; many Chinese characters contain semantic and phonetic parts, the first of which categorizes a word (for instance, as having to do with water), and the second of which hints as to the word’s pronunciation. In such a romanization system, the relevant parts of a term are both encapsulated in the romanization of a word; one would not need to look at a word to see its semantic part, since it is just spoken.
  3. As much as Chinese’s character system makes Mandarin, for instance, nice and succinct to speak in, with one syllable per character, one thing there just isn’t as much of in Chinese names of things as there are in, say, Latin, Sanskrit, and Thai names is really majestic and cool-sounding names like Aurelius and Chulalongkorn. One could only add in that much phonetic spice in one syllable. With the proper phonetic touches to the expansion of the saying of Chinese words, one could bring this sort of construction to Chinese.

Here are properties that would be desirable in such a system of orthographic romanization.

  1. There is one and only one way to say each character, and characters that are different are said different ways.
  2. Sounds associated with parts of words are easy to memorize, while also reflecting relevant semantic components in characters.
  3. The resultant words are not too long.

The rest of this post is about why creating a system that meets these three criteria is a significant challenge.

Let’s start with a reasonable first idea: assigning a letter or set of letters to each type of stroke, and putting them together in stroke order. We now have five problems.

  1. There are Chinese characters with a really large number of strokes. There are quite a few words of upwards of 25 strokes, certain rare ones with upwards of 40 strokes (). Although criterion 3 above is not well-defined, it’s pretty clear that if we had something for each stroke and wrote them all out this would definitely not meet criterion 3.
  2. Some different Chinese characters have the same parts, but in different orders or orientations (召 and 叧).
  3. Some different Chinese characters have the same types of strokes in the same order, but one of different length compared to the surroundings ( and ).
  4. Some different Chinese characters are in fact exactly the same except for dimensions ( and , the Chinese-character parallel to the flags of Monaco and Indonesia).
  5. A fully semantics-respecting system needs to differentiate on the left and on the right in Chinese characters, since they actually fundamentally reference different ideas.

It may be that all of these challenges together actually make satisfying all three criteria above impossible. In particular, points 3 and 4 require the storing of more than even stroke order information, but also geometrical information, in order to ensure that different characters have different romanizations. Point 5 puts a desire to encode semantic information and a desire to map stroke structure in romanization into conflict, as it puts forth a situation where semantics and orthography in Chinese come into direct conflict. And in order to make such a system, one needs to do all of this in addition to stroke information while considering how long resultant terms end up being, while causing the results to not be too phonotactically ludicrous. This suddenly becomes quite a daunting task.

If one presents short strings to represent commonly-occurring parts of characters (like so-called “radicals”), then perhaps one could create a prefix-infix-suffix system that encodes information in extra strokes. Due to the fact that sometimes the same extra stroke does something different depending on where it’s placed, though, such a system will need to encode its geometric situation properly. But given a clever encoding of geometry from template sets, one may get closer to orthographically romanizing Chinese.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s