Boston Linux & Unix (BLU) Home | Calendar | Mail Lists | List Archives | Desktop SIG | Hardware Hacking SIG
Wiki | Flickr | PicasaWeb | Video | Maps & Directions | Installfests | Keysignings
Linux Cafe | Meeting Notes | Blog | Linux Links | Bling | About BLU

BLU Discuss list archive


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Tool for identifying languages



On Tue, Jan 17, 2006 at 10:23:17AM -0500, Christopher Schmidt wrote:
> On Tue, Jan 17, 2006 at 09:34:06AM -0500, Jeff Kinz wrote:
> > Does anyone know of a tool that can determine which language a
> > chunk of text is written in? (Assume a few hundred words)
> 
> http://languid.cantbedone.org/
> http://languid.cantbedone.org/Language-Guess.tgz

Wow.  Unbelievable.  Thank you Chris.
> 
> -- 
> Christopher Schmidt
> Web Developer


Why I'm "wowed":

This tool appears to use some form of statistical analysis based on
how often certain three "character" strings appear.  Also, whitespace is
one of the characters.   Very nice, and thanks again to Chris.

Here's a few random lines of the English "strings" file:
t t                     45
 be                     46
ld                      47
e a                     48
rs                      49
 wa                     50
ut                      51
ve                      52
ll                      53




-- 
Jeff Kinz, Emergent Research, Hudson, MA.
speech recognition software may have been used to create this e-mail

"The greatest dangers to liberty lurk in insidious encroachment by men
of zeal, well-meaning but without understanding." - Brandeis

To think contrary to one's era is heroism. But to speak against it is
madness. -- Eugene Ionesco




BLU is a member of BostonUserGroups
BLU is a member of BostonUserGroups
We also thank MIT for the use of their facilities.

Valid HTML 4.01! Valid CSS!



Boston Linux & Unix / webmaster@blu.org