S¯ã: A Syllabically Fixed-Length Romanization of Mandarin Chinese

The remarkable stringency of Mandarin Chinese phonotactics make for a set of possible syllables restricted and systematic enough that one could reasonably endeavor to devise a not-too-complicated romanization system where each possible syllable has the same number of characters. Here, I provide such a system where this number of characters is 3.

Here’s the plan: we take advantage of how Mandarin phonotactics prohibit any consonant clustering whatsoever (until one decides to interpret certain affricates as being actually clusters, but why would one want to do that?) and how only two consonants (both being nasals) are feasible for a syllabic coda. Since the Mandarin consonant inventory isn’t particularly large either (in fact only comparable in size to the English consonant inventory after counting all allophones as separate sounds), we just dedicate one character to a possible onset consonant, and absorb the possible ending nasal into characters dedicated to the syllable’s vowel(s), like how Polish uses ogoneks (ą) and Portuguese uses tildes (ã).

The hardest part of making a mapping is representing the wide palette of vowels Mandarin has to offer. Hence, the remaining two characters of a syllable that are not the onset consonant are dedicated to the vowel, including the absorption of a possible end nasal.

Here’s the mappings for the first character of the syllable, the onset consonant. In each cell representing a phone, the unparenthesized portion is the sound in IPA, and the parenthesized portion is the character used in this transliteration.


(For comparison, this is the chart of mappings for Pinyin romanization.



No character needs to be assigned to the velar nasal (ŋ), as it never occurs in the start of syllables. Most character assignments are intuitive from the perspective of most Latin-script languages. The c/s/z system is modeled after the usage of these letters in Polish, Czech, and Slovak, which have similar affricate situations to Mandarin. The diacritic assigned to retroflex (the caron) is chosen in correspondence to the diacritic used for the slightly-more-anterior postalveolar sounds in Czech and Slovak. The circumflex is chosen to be the diacritic to represent alveolopalatal sounds because it is a caron upside-down, which reflects how Mandarin’s alveolopalatal series is in complementary distribution to its retroflex series.

For the rare syllables with zero onset, the character ‘ is used to fill the consonant slot. In Pinyin romanization, this character is used to disambiguate syllable separation, a role not required in this romanization system due to syllables being fixed-length.

Here’s the mappings for the last two characters of the syllable, representing the syllable’s vowels.


(Once, again, here’s the mappings in Pinyin romanization for comparison.



The lack of a vowel (and thus, a syllabic consonant) can be indicated via apostrophe in both positions (”).

This chart, however, needs to be expanded to represent final nasal absorption. To do so, we represent final nasals with a tilde on the latter vowel to indicate a final /n/ (n), and an ogonek on the latter vowel to indicate a final /ŋ/ (ng). The exception to this is in the case the third letter is ‘, in which case a tilde itself (~) or an ogonek itself (˛) replaces the final ‘.

We can use the system developed so far as is, but we can still extend it to include tone information. Typical romanization systems use diacritics to denote tones other than the 5th (short) tone, but this romanization system will use diacritics for tones other than the 4th (falling), under the reasoning that a tone represented by no alternation gives it a sense of being “normal”, and the 4th tone is the tone that sounds most normal (at least in isolation) from the perspective of most languages that use the Latin script. Thus, we will represent the 4th tone as no alteration and the 1st, 2nd, 3rd, or 5th tones via adding a macron, acute, caron, or dot above respectively to the first vowel position if it is not ‘, and otherwise just the macron, acute, caron and dot above by itself (replacing the ‘).

Here’s some examples. Remember, because this is a fixed-length romanization scheme, we don’t even need spaces for parsing because we could just remember to group three characters at a time together; spaces could just be sprinkled here and there to be easier on the eye.

Name of the Republic of China (中華民國): ž¯ǫhúamí~gúö

Name of the People’s Republic of China (中华人民共和国): ž¯ǫhúar´ẽmí~g`ǫh´ëgúö

Beijing (北京): bˇeẑī˛

Shanghai (上海): š’ąhˇi

Chongqing (重庆): č´ǫĉì˛

Taipei (臺北): t´ibˇe

Confucius (孔子): kˇǫz˙’

Confucius’s personal name (孔丘): kˇǫĉīo

Sun Yat-Sen (孫逸仙): sūẽ’i’ŝīã

Mao Zedong (毛泽东): m´uz´ëd¯ǫ

Chiang Kai-Shek (蔣介石): ẑǐąẑieš´’

Zhou Enlai (周恩来): ž¯o’¯ẽl´i

Zhou Youguang (周有光): ž¯o’ǐogūą


