Boston Linux & Unix (BLU) Home | Calendar | Mail Lists | List Archives | Desktop SIG | Hardware Hacking SIG
Wiki | Flickr | PicasaWeb | Video | Maps & Directions | Installfests | Keysignings
Linux Cafe | Meeting Notes | Blog | Linux Links | Bling | About BLU

BLU Discuss list archive


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Japanese characters on OO.o presentation--> i18n



Nicholas Bodley wrote:
> Imho, read and heed! I didn't know that. I'm extremely unlikely to 
> send e-mail in Japanese, but it's one of those essentials (like 
> knowledge of BCC) one really has to keep in mind when sending e-mail.
>
> As I understand it, (and I might well be wrong! Corrections welcome!) 
> there are at least two basically-different ways to encode Japanese 
> text; iirc, one (Shift-JIS? Apologies if I'm wrong) is something like 
> the old {ltrs}/{figs} shift in 5-bit teleprinters -- one can be in the 
> wrong mode. The consequence is that if a "mode-change" character is 
> omitted, or wrongly sent when it should not be, (or munged...), all 
> subsequent text (at least up to a redefining of "mode") is scrambled 
> badly. If you think seeing English text in {figs} shift is bad, when 
> you have a practical set of something like 2,300 or so 
> basically-Chinese characters, and are receiving nonsense, as I 
> understand it, that's mojibake.
There are actually several different encodings.  Shift_JIS (Microsoft 
SJIS) is primarily used for web pages and other documents on Windows 
systems.  ISO-2022-JP is used for e-mail.  There's also EUC-JP (Extended 
Unix Code) which used on Unix systems.  Universal encodings like UTF-8 
and UTF-16 are also used.  I am a big fan of UTF-8 because it supports 
multiple languages (East Asian, Arabic, Hebrew, Thai, English, etc...) 
and efficiently handles ASCII (as single bytes.)

A great resource on this subject is the book, "CJKV Information 
Processing" by Ken Lunde.
> [Katakana]
>
> One can read more Japanese than one might, at first, expect. Japan has 
> imported English words "wholesale", sometimes adapting them to their 
> own language (I'm typing on a Compaq "pasokon" -- pasonaru 
> konpyuutaa). Perhaps 35,000 words have been imported. These words are 
> rendered/written with a simple syllabary called katakana, which 
> (except for arbitrary-seeming, never-complicated character shapes) is 
> about as easy to learn* as an alphabet, and can be a *lot* of fun.
Both Katakana (foreign words) and Hiragana (native Japanese words) are 
phonetic so they are easy to learn.  Kanji is also interesting but to be 
literate you need to learn a few thousand characters which is quite a task.





BLU is a member of BostonUserGroups
BLU is a member of BostonUserGroups
We also thank MIT for the use of their facilities.

Valid HTML 4.01! Valid CSS!



Boston Linux & Unix / webmaster@blu.org