Wikipedia Languages

It’s often curious which languages a certain Wikipedia article is available in.




As I was in the shower today, I suddenly thought of a certain set of pictures that spread through the internet a few times: roads with “SHCOOL” painted on them, pointed out as a demonstration of the failure of our school systems (or of the individual that painted it).

But here’s a thought: what if the person painting “SHCOOL” just wanted to spread the idea that school can be cool (as opposed to a chore, which certainly most kids view it as, unless kids these days are that different)?

They’d of course be mistaken. School isn’t cool. Learning is cool. School deprives you of learning.

But we could start using the term “shcool” to refer to a learning institution or organization that helps its members and others to understand the actually important things in life, the term symbolically reflecting the fact that slightly misspelling a word like “school” to “shcool” is really not a major detriment to communication (maybe you’d have issues running a computer search (say, a `find` or a `grep`) on a file for instances of the term “school”, but if you were on a computer, why didn’t you run a spell checker?), nor does it actually mean the misspeller is incapable of understanding Things That Actually Matter. Maybe of shcool and school, we can make school be the mistake.

Respect to Sandwich

For a couple of years, I’ve had the idea of starting a comic strip called Vi, Max, and Nona, of which one of the characters, Nona, is a cartoonist who writes comics involving talking programming languages. It became gradually clear that there’s no way I have time in my life for making this idea fully become reality, but here’s a flushing out of one of the ideas for Nona’s comic.


Word Ladder

In each line, add, change, or delete a letter from the string on the previous line.

______ (Kerberos username)
______ (Kerberos username)
______ (Kerberos username of a zephyr user)
______ (Kerberos username of a zephyr user)
______ (Kerberos username of a zephyr user)
______ (Kerberos username of a zephyr user)
______ (Kerberos username)
______ (Kerberos username of a zephyr user)


If you have enough context, you can deduce the questions.

  1. black
  2. raven
  3. Hangul
  4. Python
  5. 21G.611
  6. Building 6C
  7. “Jabberwocky”
  8. Edgar Allan Poe
  9. Henry David Thoreau
  10. Voltaire
  11. Richard Stallman
  12. George Washington
  13. Sweden
  14. New Hampshire
  15. CGP Grey
  16. Saturday Morning Breakfast Cereal
  17. Romanticism
  18. V
  19. Soren
  20. The Martian
  21. “Forbidden Friendship”
  22. Nightwish
  23. Under the Grey Banner
  24. “Sacrament of Wilderness”

Challenges in Concocting an Orthographic Romanization of Chinese

(Note: I assume Traditional Chinese in this post. Many points still apply when dealing with Simplified Chinese, though not all.)

I spent some of my time over the last month or so trying to invent an orthographic romanization of Chinese. By this, I mean that rather than a romanization system that uses strings of Latin letters to represent Chinese words that approximates the [insert dialect of Chinese] pronunciation of words, such a system uses string of letters that are based on the structure of the written Chinese characters (Hanzi). Why have such a system?

  1. It is easier to remember how it works, and has a lesser strain of memory than remembering Chinese characters.
  2. In a sense, this dedicates more to the frequent intention of Chinese characters to simultaneously categorize a term as to its meaning and hint towards its pronunciation; many Chinese characters contain semantic and phonetic parts, the first of which categorizes a word (for instance, as having to do with water), and the second of which hints as to the word’s pronunciation. In such a romanization system, the relevant parts of a term are both encapsulated in the romanization of a word; one would not need to look at a word to see its semantic part, since it is just spoken.
  3. As much as Chinese’s character system makes Mandarin, for instance, nice and succinct to speak in, with one syllable per character, one thing there just isn’t as much of in Chinese names of things as there are in, say, Latin, Sanskrit, and Thai names is really majestic and cool-sounding names like Aurelius and Chulalongkorn. One could only add in that much phonetic spice in one syllable. With the proper phonetic touches to the expansion of the saying of Chinese words, one could bring this sort of construction to Chinese.

Here are properties that would be desirable in such a system of orthographic romanization.

  1. There is one and only one way to say each character, and characters that are different are said different ways.
  2. Sounds associated with parts of words are easy to memorize, while also reflecting relevant semantic components in characters.
  3. The resultant words are not too long.

The rest of this post is about why creating a system that meets these three criteria is a significant challenge.

Let’s start with a reasonable first idea: assigning a letter or set of letters to each type of stroke, and putting them together in stroke order. We now have five problems.

  1. There are Chinese characters with a really large number of strokes. There are quite a few words of upwards of 25 strokes, certain rare ones with upwards of 40 strokes (). Although criterion 3 above is not well-defined, it’s pretty clear that if we had something for each stroke and wrote them all out this would definitely not meet criterion 3.
  2. Some different Chinese characters have the same parts, but in different orders or orientations (召 and 叧).
  3. Some different Chinese characters have the same types of strokes in the same order, but one of different length compared to the surroundings ( and ).
  4. Some different Chinese characters are in fact exactly the same except for dimensions ( and , the Chinese-character parallel to the flags of Monaco and Indonesia).
  5. A fully semantics-respecting system needs to differentiate on the left and on the right in Chinese characters, since they actually fundamentally reference different ideas.

It may be that all of these challenges together actually make satisfying all three criteria above impossible. In particular, points 3 and 4 require the storing of more than even stroke order information, but also geometrical information, in order to ensure that different characters have different romanizations. Point 5 puts a desire to encode semantic information and a desire to map stroke structure in romanization into conflict, as it puts forth a situation where semantics and orthography in Chinese come into direct conflict. And in order to make such a system, one needs to do all of this in addition to stroke information while considering how long resultant terms end up being, while causing the results to not be too phonotactically ludicrous. This suddenly becomes quite a daunting task.

If one presents short strings to represent commonly-occurring parts of characters (like so-called “radicals”), then perhaps one could create a prefix-infix-suffix system that encodes information in extra strokes. Due to the fact that sometimes the same extra stroke does something different depending on where it’s placed, though, such a system will need to encode its geometric situation properly. But given a clever encoding of geometry from template sets, one may get closer to orthographically romanizing Chinese.

Mantul: A Hangul for English

Hangul (), the script of Korean, is arguably the best-thought-out script ever invented by humankind, for it takes one step beyond where other scripts have trodden: the letters’ shapes reflect their sounds.

All languages allow for similar clauses, sentences, and words for similar thoughts (of the people, by the people, for the people: all relations to the people) and usually similar word parts for similar concepts (six is 6, sixteen is 16, sixty is 60: all numbers related to the sixth natural number* in the perspective of our base system). This cuts massively from the amount of time needed to learn a language as it reflects the structure of the relations among ideas expressed: if we say similar things in similar ways, we more easily guess and remember how to effectively communicate each of them.

*To people who consider 0 a natural number: I zero-indexed this statement.
*To people who consider 0 not a natural number: I one-indexed this statement.

The makers of Hangul saw beyond the levels of structure that other script-makers and script-contributors saw: they saw the structures relating sounds—the features that allow sounds to fall into certain categories, and the various ways different pairs of sounds are different in the same way—and decided that their script will utilize relations down to the level of the phoneme. Letters differing only by aspiration will be the same except for the horizontal in the middle. Sounds made in the back of the mouth have a circle, representing the throat. Sounds made by both lips (bilabial sounds) have a rectangle, representing the two lips. Find all the letters here.

Why not, then, have the most assimilating language of the world, English, adopt the ideas of Hangul, and obtain phoneme-level relation-reflection in script? In the rest of this post, I define Mantul (/’mæntʊl/), a Hangul-inspired script for the English phonological inventory.

Continue reading “Mantul: A Hangul for English”