Boston Linux & Unix (BLU) Home | Calendar | Mail Lists | List Archives | Desktop SIG | Hardware Hacking SIG
Wiki | Flickr | PicasaWeb | Video | Maps & Directions | Installfests | Keysignings
Linux Cafe | Meeting Notes | Blog | Linux Links | Bling | About BLU

BLU Discuss list archive


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

endless compression



(Let me preface this with some background, I've written a few format
specific compressors, and have implemented a few standard compressors.)

Lets talk about *what* compression is. Compression is representing a
theoretically larger range of data with a smaller range. Compression is
removing "knowledge" about the data from the data, leaving only the
"interesting" part that is needed to reconstitute the data based on
knowledge.

Think of it this way: If you have a text document, and you KNOW you'll
never need more than 6 bits to represent any character, then, you can
encode this using 6 bits instead of 8 bits. Saving 25% of the space.

Now, jpeg is a lossy compression, it takes knowledge about bitmap images
and photohraphy and makes assumptions about what the important aspect of
the image is. Specifically, squashes the dynamic range of each pixel, but
samples areas of the image to encode the range for multiple pixels. It
uses the reduced pixel information and the sampled range for a group of
pixes to reconstitute a pixel. Compression is based on the reduced aamount
of data for each pixel.

The idea, then, is to take knowledge of a numerical series or knowledge
about data. The theoretical number range may be X but in reality there is
only Y variation, and if Y < X, the data should be compressable.

When you look at a ZIP compressed file, it does not compress well using
standard stream compressors. This is because there are few if any
repeating paterns. The fact that it has few repeating patterns, makes this
stream interresting. It means that it is less likely in a universe of
streams. If it is less likely, then it can be represented with fewer bits.

Standard compressors work on the notion that repeated patterns can be
saved once and re-used when they show up again. LZW actually uses more
bits per byte for each byte, but many fewer bits for strings of repeated
bytes.

A "re-compressor" is different, rather than "repeated" bytes, it would use
information about special sequences of data that have little or no
repitition and few recognizable patterns. This may be hugely expensive to
compute and offer only a compressability of a few percent, however, if it
repeatable, then it can iterate and iterate untill the overhead for the
compression is larger than the data.

I'm not saying it is practical, workable, or really possible, I'm just
saying that, in theory, it may be possible.







On Aug 12, 2004, at 8:57 AM, markw at mohawksoft.com wrote:

> Jules has mentioned that he has a compressor that will compress
> compressed
> files. I looked at this a while back and it is a really cool idea.
>
> It is theoretically possible.
>
> Use conventional compression at first. Numerically speaking, a
> traditionally compressed file is interresting because it has virtually
> no
> repeating numbers. It is also unlikely to have any serious non-random
> seeming number sequences.
>
> As such, this file may be able to be described with fewer bites than it
> contains. That, my friends, is the nature of compression.
>

Sorry, your writing sounds a bit too much like Jules's. What is the
nature of compression? Your writing jumps cleanly from introduction to
conclusion without saying anything. How can a file with no repeating
numbers be described with fewer 'bites' (sic)?




BLU is a member of BostonUserGroups
BLU is a member of BostonUserGroups
We also thank MIT for the use of their facilities.

Valid HTML 4.01! Valid CSS!



Boston Linux & Unix / webmaster@blu.org