Boston Linux & Unix (BLU) Home | Calendar | Mail Lists | List Archives | Desktop SIG | Hardware Hacking SIG
Wiki | Flickr | PicasaWeb | Video | Maps & Directions | Installfests | Keysignings
Linux Cafe | Meeting Notes | Blog | Linux Links | Bling | About BLU

BLU Discuss list archive


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Good Word doc -> plain text conversion



On 09/19/2010 03:38 PM, jc-8FIgwK2HfyJMuWfdjsoA/w at public.gmane.org wrote:
> Anyone here have advice on programs (scriptable and  usable
> on linux) that convert Word docs to plain text?
> 
> I've been googling, of course, but most of the  things  I'm
> finding start with "1.  Load the file into Word". This is a
> good clue that the scheme  probably  can't  be  used  in  a
> script that's running on a linux system.  ;-)

If you want an automated solution. how about writing it in Java?

http://poi.apache.org/
The Apache POI Project's mission is to create and maintain Java APIs for
manipulating various file formats based upon the Office Open XML
standards (OOXML) and Microsoft's OLE 2 Compound Document format (OLE2).
In short, you can read and write MS Excel files using Java. In addition,
you can read and write MS Word and MS PowerPoint files using Java.
Apache POI is your Java Excel solution (for Excel 97-2008). We have a
complete API for porting other OOXML and OLE2 formats and welcome others
to participate.

OLE2 files include most Microsoft Office files such as XLS, DOC, and PPT
as well as MFC serialization API based file formats. The project
provides APIs for the OLE2 Filesystem (POIFS) and OLE2 Document
Properties (HPSF).

Here are some other solutions:
http://www.linux.com/archive/feed/52385






BLU is a member of BostonUserGroups
BLU is a member of BostonUserGroups
We also thank MIT for the use of their facilities.

Valid HTML 4.01! Valid CSS!



Boston Linux & Unix / webmaster@blu.org