Boston Linux & UNIX was originally founded in 1994 as part of The Boston Computer Society. We meet on the third Wednesday of each month at the Massachusetts Institute of Technology, in Building E51.

BLU Discuss list archive


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Discuss] Converting "rich" (MIME) email to plain text



This is not for extracting just the "Content-Type: text/plain" section of
an email message, but rather for converting an HTML file to plain text.

I needed to do this on a very limited scale, so I just wrote a few lines of
PHP that suited my situation:

function textify ($file) {
  $contents = file_get_contents($file);
  $contents = strip_tags($contents);
  $contents = htmlspecialchars_decode($contents, ENT_QUOTES); // including
single and double quotes
  $contents = str_replace(' ', ' ', $contents); // replace entity with
space
  $contents = preg_replace('#\{literal\}.*?\{/literal\}#mUs', '',
$contents); // remove {literal} Smarty blocks
  $contents = preg_replace("/[\t ]+/", " ", $contents); // replace
successive blanks with a single blank
  $contents = preg_replace("/^[\t ]+/m", "", $contents); // remove leading
blanks
  $contents = preg_replace("/^ *$\n/mU", "", $contents); // remove empty
lines

  return $contents;
}


There is a class[1] to do this in PHP that has been used by several full
programs such as PHPMailer. Curiously, PHPMailer *removed* the class
because the former is GPL while the latter is LGPL[2]

[1] https://github.com/mtibben/html2text
[2]
https://github.com/PHPMailer/PHPMailer/commit/127d26ef3c43118d82c244c15016cf37d67504c6


Greg Rundlett
https://eQuality-Tech.com
https://freephile.org



BLU is a member of BostonUserGroups
BLU is a member of BostonUserGroups
We also thank MIT for the use of their facilities.

Valid HTML 4.01! Valid CSS!



Boston Linux & Unix / webmaster@blu.org