Archive Page 2

自行车摆满着在图书馆前面

这也是大学生的生活的一方面,不过这个图片显示得并没有期末考试时那么挤。那个时候到了,我就把另一个图片摆在这里。


是中文啊!

春节快乐,鼠年大吉 


我 在 這 個 博客 上 批評 漢字 當作 文字 批評 得 不 少,可是 今天 我 想 承認 對 學生 來 說,漢字系統 也 有 它 的 好處。主要 的 是 這樣:中文 的 學生 平常 學 生詞,同時 學 有關 的 漢字。所以,如果 他們 看到 一 篇 文章,文章 有 很 多 不 認識 的 字,學生 能 馬上 注意到 那 篇 文章 看不懂。但是,如果 只 有 發音,學生 猜錯 意思 的 趨勢 很 大。

此外,除了 練字 給 學生 一 種 特別 學會 的 滿意 以外,因為 學 字 這麼 難,那麼 學 發音 和 意思 對 學生 來 說 記下去 比較 容易。


名字

15Feb08

我 剛才 在 路上 碰到了 我 以前 的 中文 老師,那 位 老師 就是 給 我 起 名字 的 人。其實 我 那 個 時候 騎著 自行車,沒 看到 老師,經過 她。可是,她 大聲 叫了 我 的 名字 的 時候,我 就 馬上 停下來了,轉了 一 轉 看到 她 了。她 為 嚇 我 道歉,告訴 我 這 只是 她 的 個 實驗,以 查明 是否 我 對 我 的 中文 名字 反應。好像 這 項 實驗 成功了。也 是 挺 有意思 的 吧,因為 對 我 來 說,我 的 中文 名字 現在 就是 算 我 的 個 真正 的 名字。好 厲害 哦!


With Mac OS X version 10.5 “Leopard”, Apple introduced other non-English dictionaries for its Dictionary application and also developed a framework that would allow developers to make their own dictionaries compatible with the Mac OS X format. And, I just stumbled upon the CEDICT for Apple Dictionary 1.1 which is a port of the CEDICT to Apple’s Dictionary format. Although the ABC Chinese-English dictionary is my dictionary of choice for most purposes, the fact that I can do Spotlight searches for Chinese words does make this solution very attractive. They also have all the syllables in Mandarin spoken aloud if you should need them.

In addition, if you remember, Mac OS X does have built-in support for pop-up dictionary definition support in all Cocoa applications (such as most Apple applications save iTunes, and, nowadays, the majority of third party software save current Microsoft and Adobe products). What this means is that now you can look up definitions of Chinese words while browsing Safari just as well as you can lookup English words. 例如:

CEDICT Dictionary Pop-up Panel in Mac OS X

 

Plain awesome! (Now, if only Chinesepera-kun was also a system-wide utility like this.)


chinesetools.eu

25Dec07

chinesetools.eu is a website chock full of useful Chinese tools, but the one I found most interesting was their Chinese Seal Generator, which converts your name in characters to the familiar (yet usually not-easy-to-decipher) Seal Script Rounded Rectangle.


This comes as a little addition to my previous entry, The Phoneticization of Chinese, which talked about why it sucks that Chinese is not a deterministically phonetic script (read about ‘determinism’ with regard to scripts at the end of this piece), why it should be a deterministic script, and, finally, an example of a simple system that could implement a phonetic script for Chinese. One of the big questions that come out of it is that if Chinese writing becomes completely phonetic, will readers lose the ability to easily disambiguate? The answer is “Yes and No”; I shall tackle both sides of the answer below.

Yes, they will lose the ability to disambiguate

If you’re walking down the street today and you see the character 医, you know it’s not the ‘one’ or ‘clothes’, but instead something to do with doctors or medicine – perhaps, a clinic or a hospital. If you see ‘Yī’ instead, you do not have that same power of disambiguation. And here, unless there is some obvious context, if you’re going down the street, you could be trying to disambiguate between a clothes shop or a clinic. With toneless pinyin, which is very common on street signs in China and Taiwan, you are left with even more possibilities as to what ‘Yi’ could mean. Similarly, even with the 心言水木 system, you do not have that power of disambiguation anymore, however, you do know the tone because that’s embedded in the character.

The basic limitation of a phoneticization system for Chinese is that it relies on context and the fact that speakers can easily disambiguate between terms when they are placed (a) in context and (b) in polysyllabic words. So, for example, if somebody just threw the word ‘zhǎo’ at you, you might not be able to tell the person whether it meant ‘to look for’, ‘to ask for’ or ‘to give change’, but if the same person gave you ‘xúnzhǎo’ instead, you could say, without hesitance, that it meant ‘to seek/look for’. However, since Classical Chinese is something that was never meant to be spoken out and have its meanings resolved through context and polysyllabic words to the same degree as Spoken Chinese is, the use of a phonetic system for written down Classical Chinese would be infeasible as it would become even more indecipherable than it is today. Another area where such a phonetic system would be infeasible would be in the transcription of Chinese names, which usually have words of different meanings strung together and those words tend not to have meanings distinguishable from context.

No, they will not lose the ability to disambiguate

For this part, I draw my comparisons from the two languages I am most familiar with – English and Hindi. When students start learning Chinese and start using computers for typing, they soon realize how many different characters have the same exact pronunciation. And, usually, it’s quite disheartening to find out that there is so much more yet to be learnt. In reality, most of the characters that come at the ends of those lists are so rare that you may never encounter them in your entire life. However, let’s say that there are ten to twenty odd relatively frequently used meanings for each syllable in Chinese; is this not the case in other languages such as English and Hindi?

First of all, let me take a somewhat amusing example from Hindi. We have a word, kal or कल (pronounced ‘cull’ with an unaspirated ‘k’) that means both ‘tomorrow’ and ‘yesterday’; similarly, another word parson or परसों (pronounced ‘per-so-ng’) means both ‘the day after tomorrow’ and ‘the day before yesterday’. Yet, never in my life have I been confused about which one it meant; in fact, I never even actively thought about the fact that they were the same word until a friend of mine pointed it out to me sometime after I arrived in the United States at the age of eighteen. Of course, if you’d handed me the word kal randomly and out of context, I suppose I shouldn’t be able to tell you what it meant, but if you put it in a sentence (and Hindi has tense), it is always unambiguous what it means. Clearly, this particular double meaning would never work in Chinese because there’s no tense and these very time words are used for disambiguation between past, present and future.

Now, some examples from English. Let’s consider the word “fine”. According to dictionary.com, this word has 18 meanings as an adjective, 4 meanings as an adverb, 2 meanings as an intransitive verb, 3 meanings as a transitive verb and 1 meaning as a noun. Similarly, the word ‘bank’ has 18 meanings, the word ‘draft’ has 38 meanings, and so on. Yet, any capable speaker of the English language can determine which meaning applies when with trivial ease, using context. For example, “You are fine” and “You are fined” have two distinct meanings which are incredibly hard to mix up. In fact, my claim is that you can make English completely and deterministically phonetic by having only one standard way of writing every single sound (converting ‘rhyme’ to ‘rime’, etc.) and still not lose an ounce of meaning in an English sentence. What is my claim based on? It’s based on the fact that it is already done every single day when people speak English. There is no distinction between ‘two’, to’ and ‘too’ when I speak, yet my listener knows which one I’m referring to. Similarly, when Chinese people speak Chinese, they do not speak in Chinese characters, but the closest they do come to speaking is some sort of phonetic system like Pinyin or 心言水木. The fact that dictation (tīngxiě) exists in Chinese should be proof enough of the fact that a completely phonetic system works!

Deterministic and Non-deterministic Phoneticism

Chinese and English, in my opinion, both use a “phonetic” system but that the degrees of phoneticistic value differ for the two. If phonetic value is given a number between 1 and 100, then Chinese is somewhere in the 10s or 20s, definitely below the minimum passing grade and English is somewhere in the high 80s. Basically, given a random Chinese character, you have a 10-20% chance of getting the sound of the character right (my estimations have been generous, actual chances may be even lower) and with English you have 80%-something chance of getting the sound precisely right. Neither is 100% phonetic, neither is 100% detached from phonetics. For example, if every single Chinese character refused to give a clue about its pronunciation, like 天 or 木, then the Chinese characters would be at the 0% mark. If English was transcribed in a way that you could always be able to figure out accurate pronunciations of words you’d never seen before, then English would be at the 100% mark.

With Pinyin and 心言水木, the plan is to take Chinese to that same 100% mark. If you hear a word and you have the Chinese ear, you have 100% probability of detecting the correct Pinyin for it, and if you know the correct Pinyin for a word, you have 100% probability of being able to say it out aloud accurately. This attainment of 100% probability is the same as saying it is a deterministic script.


Chinese and Japanese are pretty much the only languages I know in which students first learn new words by writing them down in some sort of romanization, and then later “graduate” to the proper script. In a Japanese class, you can, at some point, stop using romanization altogether because new, unknown words, for which you do not know the Chinese characters can be written out in Hiragana. Not so with Chinese. And as a learner of the Chinese language, this greatly bothers me, because the only thing I can switch to is Pinyin. Now, you might ask – whatever is so wrong with Pinyin? And I would tell you “Nothing, but…”, and this is about the shallow, aesthetic – perhaps even superficial – aspect of human nature rather than the pragmatic sort. Now, as far as the pragmatic aspect is concerned, yánjiūshēng and 研究生 (graduate student) are one and the same. And I agree wholeheartedly – at the only level at which it really matters, which is getting the meaning across to the reader – that both of them perform the same feat. However, what draws me to choose 研究生 over yánjiūshēng is only the fact that the former looks more aesthetically pleasing and more “Chinese” to me than the latter.

Now as a slight digression, I have to say that this isn’t just something I discovered after learning Chinese – no, the characters were actually one of the reasons I wished to learn East Asian languages – it was because I hadn’t the slightest idea how on earth they worked that I so wanted to learn about them. This is precisely why Vietnamese never even entered my sphere of interest when I was picking which language to study – it used a romanization, I could see how it worked and you know, that just took away all the curiosity. Now, Japanese, Chinese and Korean were the languages I was choosing from. And, in the end, I chose Chinese because, well, first of all, you can find Chinese people by the bucket load in the United States if you’re looking for people to talk to, and, secondly, the language was tonal, which was again something that I hadn’t the slightest idea about.

Now back we come to the point that most polysyllabic Chinese words written out in Pinyin are unambiguous representations of the corresponding characters and the bottom line is that they’re both the same word in Chinese whether you like it or not. No, you get zilch more if you write out the characters for “graduate student” in terms of meaning. What you do get is something you can quickly glance at and say, “Well, that’s Chinese or I’m a duck-billed platypus”. And that is, what I feel, what the Chinese don’t want to lose by switching to romanization. If I may exaggerate a bit, Chinese culture revolves around characters – they simply love ‘em! There are various arts derived from it, they come up in movies (Hero, anyone?), they even have dictionary lookup competitions in school – as impractical of a script as it is, let’s just admit the fact that they like the way it looks. And you know what? I do too.

OK, so romanization would look too alien and non-Chinese as a primary, or even secondary, script for Chinese people to adopt, so what can they do? Well, there’s Zhùyīn Fúhào (otherwise known as Bopomofo – ㄅㄆㄇㄈ). Taiwan is the main user of this script and it is taught to school children there as an aid to learning Chinese characters (what is known as a ruby). There are quite a few fine aspects of Bopomofo. Firstly, just like Hiragana in Japanese, it is derived from Chinese characters with similar sounds which have been simplified down; this allows a person who has preexisting knowledge of Chinese characters to learn this script in a relatively short period of time. The second salient feature of the script is that it divides all Chinese sounds into their Initial Consonants, Combined Vowels and Final Consonants, such that the simpler sounds in Chinese like ‘ba’ for example are written with two Bopomofo characters, one for the ‘b’ and the other for the ‘a’, and the more complicated sounds such as ‘zhuang’ would be three characters – ‘zh’ ‘ua’ and ‘ng’. So, yes, definitely a lot more atomic than Chinese characters, more atomic than Hiragana or Katakana in Japanese which do Initial + Vowel, but not as atomic as, say Pinyin, which, instead of combining the vowel sound into a single character, splits ‘ua’, say, into ‘u’ and ‘a’. Still, the level of atomicity in Bopomofo seems reasonable. What my big gripe about it is this – whither the tones?

Sometimes, I just fail to understand it. Chinese is a tonal language, we’ve all got that. Then why does nearly every phoneticization system ever made for it completely overlook the tones as an integral component? Here’s a sample word in Bopomofo – “ㄓㄨˋ ㄧㄣ ㄈㄨˊ ㄏㄠˋ”. Wouldn’t it look so much nicer if it were “ㄓㄨㄧㄣ ㄈㄨㄏㄠ”? But that would be ignoring tones in the same way Arabic ignores all its short vowels, and, unfortunately, that can’t be done in Chinese because wěn means to kiss and wèn means to ask a question and you do not want to get those mixed up with your Chinese teacher. Trust me. Even in Pinyin, tones seem to have suffered much oversight. Here’s a good rule – if the writing system breaks down during instant messaging, it’s probably not a good one. And Pinyin tends to lose tones fast. Which is why, when you read “Ta shi zuotian wen wo de”, you don’t know if ‘wen’ means ‘ask’, ’smell’ or ‘kiss’ – a relatively vexing problem. The only system that bothered to have tones as an integral part of the romanization was Gwoyeu Romatzyh (Guóyǔ Luómǎzì – 国语罗马字) and, guess what, we’re not using that one anymore for some reason.

The bottom line, then, is that no one really wants to use a romanization as a primary or secondary standard because it doesn’t really look Chinese enough, looks out-of-place when put alongside Chinese characters and adds tones as an afterthought. Meanwhile, Bopomofo, the native solution, which might be acceptable to a lot of Chinese people with some degree of coaxing, is plagued by the fact that it, even worse than Pinyin, has the ugliest way of “incorporating” tones ever visualized. What’s to be done then? Well, first of all, there is a “realization” to be made that’s very important and that concerns how Chinese characters work in the first place. And it is that the Chinese script, in the end, is a phonetic one. If it weren’t, then any kind of transliteration, for example, would be impossible. Yet we have 麦当劳 (Màidāngláo; McDonald’s) and 西雅图 (Xīyǎtú; Seattle). It is a script that is, as I put it, trying to be completely phonetic. Type ‘ke’ into your Chinese Input Method Editor (IME) of choice, for example, and see what you get. Amongst the ones I get, three out of the first ten have the component 果 in them and another three have 亥 in them, which means that when you find yourself guessing the sound of a Chinese character, you would guess a character that had 果 in it to be pronounced ‘ke’ with some tone. Now, there are three issues involved here: first, not all the characters in which you see 果 will be pronounced ‘ke’; second, not all the characters pronounced ‘ke’ will have 果 in them; and third, there is no indication of tone. However, the important point is that there is the intent in the Chinese script of being a phonetic one. Like, if you gave it another couple of thousand years to evolve, it might just become one.

Unfortunately, a couple of thousand years is a period that most people are not willing to wait and pretty much everyone who is willing to wait is incapable of actually doing so. So, my suggestion is to slightly expedite the process – make Chinese phonetic, completely phonetic, and make it thus here and now! Here are the requirements placed for the script: first, that it must look and feel “Chinese” so that there is a chance of it being actually adopted; in this vein, it would be good if it were based off preexisting Chinese characters in the first place because people already know a bunch of those and it would be easier to switch; second, that it convey the sound of every syllable (tone and all) completely and precisely. And do I personally have a solution in mind? As it happens, I do. I call it Xīn Yán Shuǐ Mù or 心言水木.

心言水木 is quite simple. For each complete sound in Chinese (such as ‘zhuāng’ or ‘mǐn’), you set aside a character component that corresponds to the toneless version of the sound, say, 壯 for ‘zhuang’ and 民 for ‘min’. Now, you pick a “radical” that corresponds to each of the four primary tones. For this, I have picked 心, 言, 水 and 木 for the first, second, third and fourth tones respectively as the characters themselves have the same tone, which makes it easy to remember. The neutral tone is depicted by not adding any tone component to the character and just writing out the right-hand side component alone. So, this makes 敏感 in present-day Chinese equivalent to 泯汗 in this new system, where 干 is the component for ‘gan’. Similarly, 变化 becomes 楄杹. Do note that the emphasis in this script is not on having the least number of strokes in the character. In my opinion, the simplification of characters was a futile effort because, although it reduced the number of strokes for a lot of characters, for one, it did not change the basic problem of the Chinese character system being an inconsistent mess, and, for another, any Chinese scholar would now have to learn two character sets (mercifully overlapping) in order to have the assurance of being able to read any Chinese text anywhere in the world. So, although the corresponding characters might have more strokes, what you are assured of is being able to read everything you see and being able to write anything you hear. There could be made a few exceptions to, say, numbers from 1-10 which would retain their characters, as well as the characters for time words such as year, month and day. In all, a person would have to learn about 435 characters to be able to read and write even the rarest of Chinese sounds and words.

Click here to see a more detailed description of 心言水木 as well as the list of all the sounds and their corresponding 心言水木 phonetic components.

UPDATE [2007-12-25]: If you read this article and thought, “Well, what about ambiguity with words that sound the same and mean different things?”, then you need to read this follow-up article, The Myth  of Ambiguity.


zhdict.com

15Dec07

You might not have heard about zhdict.com yet, but it is the most kick-ass online Chinese dictionary I have seen to date. First of all, it’s the only one with a decent-looking interface. Secondly, if you’ve ever wondered why searching on zhongwen.com or even dict.cn was so damn frustrating sometimes, you need to look no further. zhdict.com has the most obvious search tool – you type in the English, the characters or the Pinyin, and it finds it for you. There’s no need to separate out the Pinyin with spaces like you have to do in dict.cn (before I read their help page and found out that I had to do this, I had assumed you couldn’t search on dict.cn by Pinyin) or go to a specialized Search page to be able to search with Pinyin like you have to do on zhongwen.com.

Also, one of the absolutely wonderful features of zhdict.com is that it can annotate your text allowing you to mouse over characters (and words!) to see what they mean. If you’ve used Chinesepera-kun, Wenlin or something similar, you’ll know what I’m talking about and it really is amazing once you start using it. You can use it to read the sample material that comes up when you search for words, but you can also click Annotate on the links bar at the top, paste in your own text (say, from a news article) and read it with annotations. Just. Fucking. Brilliant.


空白

06Dec07

首先,我 想 关于 中文 写 的 时候 缺 空白 的 事情。 您 会 一定 知道 英文 和 绝大部分 的 别的 语言 用 空 表示 词 的 头 和 尾。可是,读 中文、日文 和 韩语 时,读者 必须 自己 搞懂。我 觉得 这 真 是 个 麻烦;事实上,读 拼音 时,读者 能 容易地 看到 词 边界,这 很 有趣。我 觉得 拼音 创造对了 因为 它 有 合理 的 制度 以 抄写 中文字。我 希望 将来 带着 这 种 改革 来 给 中文,使 中文 文件 好读 起来。

English Translation
First of all, let me tackle the topic concerning the lack of white space when writing out Chinese. You of course know that English and most other languages use whitespace to show where words begin and end. However, when reading Chinese, Japanese or Korean, the reader must himself figure these out. I feel that this is really a big bother. In fact, when reading Pinyin, the reader can easily see the word boundaries, which is very interesting. I feel that Pinyin was made right, because it is a very logical system for transliterating Chinese characters. I hope that the future will bring with it these same kinds of reform to Chinese, and make Chinese characters easy to read.