Boston Linux & Unix (BLU) Home | Calendar | Mail Lists | List Archives | Desktop SIG | Hardware Hacking SIG
Wiki | Flickr | PicasaWeb | Video | Maps & Directions | Installfests | Keysignings
Linux Cafe | Meeting Notes | Blog | Linux Links | Bling | About BLU

BLU Discuss list archive


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

endless compression



Hi Mark,

On Sun, 2004-08-15 at 15:23, markw at mohawksoft.com wrote:
> Having read the link you submitted, I think there is a common
> misunderstanding between the people who think the repeatable compression
> is possible and those who don't.


This comment reminds me of the old joke, 

  "There are 3 kinds of people in this world, those who understand 
   math and those who don't."


> Something like this:
> A segment is a run within the stream that can be represented by a compressor.
> The stream is a data stream.
> 
> while(stream)
> {
>     best_compressor = NULL;
>     best_seed = NULL;
>     best_count = 0;
>     foreach(compressor)
>     {
>         foreach(seed)
>         {
>             bytes_represented = compare(stream, compressor, seed);
>             if(bytes_represented > best_count)
>             {
>                 best_compressor = compressor;
>                 best_seed = seed;
>                 best_count = bytes_represented;
>             }
>         }
>     }
>     if(best_count > segment_overhead)
>     {
>          put(best_compressor, best_seed, best_count);
>          stream += best_count;
>     }
>     else
>     {
>         put(NULL, stream, 1);
>         stream++;
>     }
> }
> 
> As you can see, with any interesting number of compressors and interesting
> range of seeds, this can be HUGELY expensive. It is an interesting theory.
> Numerically, it sounds sort of fishy, but if you can get lucky enough, it
> works.


Since its not relevant to this discussion, lets ignore the origin
(whether its a PRNG [1] or some other source) of your n-bit streams and
concentrate on the important aspect which is the encoding.  Lets imagine
that you have enough of these bit streams (in your case, PRNGs) to
represent every possible combination of n bits (which is 2^n streams). 
You then need a way of *remembering* which one of the 2^n streams should
be used when you decompress.  And to uniquely represent each of the 2^n
streams, you will need, at an absolute minimum, an n-bit index.  So, in
the general case of trying to compress every possible input stream, what
have you gained?

Absolutely nothing.

The above argument can be extended to variable-length streams and I
leave that as an exercise for you to work out.

Its interesting to note that essentially all of the great (fundamental)
discoveries in math and science tend to be "limiting" in the sense that
they place some constraint on what can or can't be achieved.  For
instance:

 - speed of light (the propagation of information)
 - 2nd law of thermodynamics (the "arrow of time")
 - Cramer-Rao Lower Bound (statistics)
 - Planck constant (Heisenberg and the ability to resolve space-time)
 - Information Theory (Shannon and maximum transmission rates)

These are not the nonsense limits ("couldn't travel faster than 60 MPH")
that you mention.  In particular, the last (Shannon) is one that you
really ought to look into since it should be required reading for anyone
playing with compression techniques.

Ed

[1] PRNG = pseudo-random number generator


-- 
Edward H. Hill III, PhD
office:  MIT Dept. of EAPS;  Rm 54-1424;  77 Massachusetts Ave.
             Cambridge, MA 02139-4307
emails:  eh3 at mit.edu                ed at eh3.com
URLs:    http://web.mit.edu/eh3/    http://eh3.com/
phone:   617-253-0098
fax:     617-253-4464





BLU is a member of BostonUserGroups
BLU is a member of BostonUserGroups
We also thank MIT for the use of their facilities.

Valid HTML 4.01! Valid CSS!



Boston Linux & Unix / webmaster@blu.org