Here’s a game. The challenge: try to communicate a reference to a long string of text (maybe all the elements of a certain set, or even an entire book) with only one string, such that:
- The string consists of only letters in the text: no spaces or punctuation, or its equivalents. (for instance, the 26 letters of the English alphabet, for books or speeches in English)
- No letter is used more than once.
- The string is a subsequence of the text; that is, the letters appear in the text, in that order, possibly with more letters in between.
So, for a given input, someone seeks a subsequence consisting of unique letters that hopefully communicates an idea to others.
As an example,
is probably a fairly good string to reference the lanthanides, with ‘cerim’, ‘pas’, ‘nody’, and ‘lut’ referring respectively to the first, second, third, and last lanthanides.
Here’s some other strings you may be able to recognize.
And here’s a really far stretch:
When one is giving creative names to a set of related entities, one tends to expect a theme. A theme in the names gives a sense of cohesion to the various parts of a unit.
Street names make a particularly good example of theme-naming. Check out, for instance, the clear naming themes going on in these two places.
Sometimes, however, a situation really looks like there should be a theme, and one can’t find one. In these cases, it’s easy to believe there’s just no theme intended or that there’s some theme one’s missing, but occasionally it just seems unsettling that a set of entities does not have a theme.
Continue reading “Jarring Lack of Theme”
Suppose you heard the term “inaccessible island rail”. What do you think this term refers to? When I heard it, my mind conjured an image of a train line that connected really inaccessible islands.
And that sounds weird. Did someone undertake a project just to create such a rail line? It sounds incredibly costly. And it also sounds like it’d be something cool enough that I would’ve heard about it by now. Nevertheless, what else could this term refer to?
It turns out that a rail is a type of bird. Go figure. So it’s a type of bird that only lives in really remote islands. That makes much more sense than the train situation. Okay.
Except that that isn’t even specifically what this species of bird is. It is a species of rail that only lives on one island, literally named Inaccessible Island. It’s slightly southwest of Tristan da Cunha.
So actually, I slightly lied in the text in the first paragraph, by lowercasing “inaccessible” and “island”. But here’s the thing: you don’t hear capitalization in verbal speech. The uppercase letter hints would not be available to you if someone was orally communicating the term for this bird species to you (and even if you were reading this in text, maybe you would’ve thought the capitalization was probably for other emphasis than to hint that it referred to an island literally named that way). Also note that whereas realizing ‘rail’ probably did not refer to the context of trains could have happened via considering the context of the sentence in which it is used, context would very likely not have helped hint at the ‘Inaccessible Island’ issue.
I claim that ‘Inaccessible Island’ is a poor choice of name for this island. Names should be useful, distinguishing handles, and this name is not that. It was an attempt to reference the island’s inaccessibility, but it decided to do so via a term that would naturally be used anyway to describe islands, thus vastly increasing possibilities of confounding in all terms that refer to it. Calling the sort of bird a ‘rail’ is also unhelpful, but this part is not as problematic, for the reasons stated above.
This sort of naming failure in attempting to make a reference or hint at a metaphor is pervasive in computer science. When looking back at my learning process for many ideas in computer science, I find that this was a massive reason I often got stuck or was confused. People that name tools or ideas relating to computers often try to give them names that refer to parallel entities or processes outside the world of computers, and in doing so make usage of terms often extremely ambiguous.
Continue reading “Failures in Referential Nomenclature”
(and thus very vulnerable to puns)
Justin (two words)
The remarkable stringency of Mandarin Chinese phonotactics make for a set of possible syllables restricted and systematic enough that one could reasonably endeavor to devise a not-too-complicated romanization system where each possible syllable has the same number of characters. Here, I provide such a system where this number of characters is 3.
Here’s the plan: we take advantage of how Mandarin phonotactics prohibit any consonant clustering whatsoever (until one decides to interpret certain affricates as being actually clusters, but why would one want to do that?) and how only two consonants (both being nasals) are feasible for a syllabic coda. Since the Mandarin consonant inventory isn’t particularly large either (in fact only comparable in size to the English consonant inventory after counting all allophones as separate sounds), we just dedicate one character to a possible onset consonant, and absorb the possible ending nasal into characters dedicated to the syllable’s vowel(s), like how Polish uses ogoneks (ą) and Portuguese uses tildes (ã).
The hardest part of making a mapping is representing the wide palette of vowels Mandarin has to offer. Hence, the remaining two characters of a syllable that are not the onset consonant are dedicated to the vowel, including the absorption of a possible end nasal.
Here’s the mappings for the first character of the syllable, the onset consonant. In each cell representing a phone, the unparenthesized portion is the sound in IPA, and the parenthesized portion is the character used in this transliteration.
(For comparison, this is the chart of mappings for Pinyin romanization.
No character needs to be assigned to the velar nasal (ŋ), as it never occurs in the start of syllables. Most character assignments are intuitive from the perspective of most Latin-script languages. The c/s/z system is modeled after the usage of these letters in Polish, Czech, and Slovak, which have similar affricate situations to Mandarin. The diacritic assigned to retroflex (the caron) is chosen in correspondence to the diacritic used for the slightly-more-anterior postalveolar sounds in Czech and Slovak. The circumflex is chosen to be the diacritic to represent alveolopalatal sounds because it is a caron upside-down, which reflects how Mandarin’s alveolopalatal series is in complementary distribution to its retroflex series.
Continue reading “S¯ã: A Syllabically Fixed-Length Romanization of Mandarin Chinese”
It’s often curious which languages a certain Wikipedia article is available in.
As I was in the shower today, I suddenly thought of a certain set of pictures that spread through the internet a few times: roads with “SHCOOL” painted on them, pointed out as a demonstration of the failure of our school systems (or of the individual that painted it).
But here’s a thought: what if the person painting “SHCOOL” just wanted to spread the idea that school can be cool (as opposed to a chore, which certainly most kids view it as, unless kids these days are that different)?
They’d of course be mistaken. School isn’t cool. Learning is cool. School deprives you of learning.
But we could start using the term “shcool” to refer to a learning institution or organization that helps its members and others to understand the actually important things in life, the term symbolically reflecting the fact that slightly misspelling a word like “school” to “shcool” is really not a major detriment to communication (maybe you’d have issues running a computer search (say, a `find` or a `grep`) on a file for instances of the term “school”, but if you were on a computer, why didn’t you run a spell checker?), nor does it actually mean the misspeller is incapable of understanding Things That Actually Matter. Maybe of shcool and school, we can make school be the mistake.