Boston Linux & Unix (BLU) Home | Calendar | Mail Lists | List Archives | Desktop SIG | Hardware Hacking SIG
Wiki | Flickr | PicasaWeb | Video | Maps & Directions | Installfests | Keysignings
Linux Cafe | Meeting Notes | Blog | Linux Links | Bling | About BLU

BLU Discuss list archive


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

endless compression



Having read the link you submitted, I think there is a common
misunderstanding between the people who think the repeatable compression
is possible and those who don't.

Now, again, I'm not saying I believe it is possible, I'm saying that some
of the arguments sound plausible and it may be possible.

The argument of the people who say it is impossible is this:

You can't compress data beyond its universe, i.e. if you have 2 billion
tokens, without recognizable streams (repetitive tokens, known sequences
of tokens, etc.) You can not compress below 2 billion tokens.

This is actually not a dissimilar argument put forth from the repeatable
compression people, only what is "recognizable" is in question.

Now, the theory is this:

standard compressors take repeating patterns and basically reduce the
universe of a stream to a subset. A compressed stream of data we know is
smaller then its possible universe would inhabit because it is not normal
for a stream of random data NOT to contain any repeating patterns.

The crux of the argument is the quantification of how much smaller the
universe is for the compressed stream.

It is inarguable that if, by pure random chance, the bytes in a compressed
stream are identical to the output bytes of a specific random number
generator with a specific seed, an astonishing amount of compression could
be had.

The premise now is to search any and all known (or new) pseudo random
sequence generators, compressors and keys that may produce all or some
limited runs within the stream.

Something like this:
A segment is a run within the stream that can be represented by a compressor.
The stream is a data stream.

while(stream)
{
    best_compressor = NULL;
    best_seed = NULL;
    best_count = 0;
    foreach(compressor)
    {
        foreach(seed)
        {
            bytes_represented = compare(stream, compressor, seed);
            if(bytes_represented > best_count)
            {
                best_compressor = compressor;
                best_seed = seed;
                best_count = bytes_represented;
            }
        }
    }
    if(best_count > segment_overhead)
    {
         put(best_compressor, best_seed, best_count);
         stream += best_count;
    }
    else
    {
        put(NULL, stream, 1);
        stream++;
    }
}

As you can see, with any interesting number of compressors and interesting
range of seeds, this can be HUGELY expensive. It is an interesting theory.
Numerically, it sounds sort of fishy, but if you can get lucky enough, it
works.

I think the people who say it can't work are kind of closed minded, like
the scientists who said we couldn't travel faster than 60 MPH, break the
sound barrier, or fly. Plus, there is a lot of computational power
available today as well as a lot of storage. If it takes a 100G of data
dictionaries and code and vast amounts of CPU on each system to encode and
decode data, that's not automatically impractical, especially for long
range communications, say like mars.

To their detriment, the recompression people are also a bit unreasonable.
They ignore a lot of practical problems and generally do not seem to have
a good grasp on the actual issues.


For me, you guys have called me a crank. That's fine, but I also remember
when 100MBIT over twisted pair for any practical length was considered
impossible. I also remember scientists saying barriers will never be
broken for some reason or another. Chip manufacturers do a lot of things
that seemed impossible 25 years ago.

To dismiss as impossible is silly, because that is based on the notion
that everything you know right now is 100% correct, and we all know it
isn't. To believe it is possible is also silly because no one has done it
and the numbers are shaky, based on what we currently know.

For me, it is interesting. Sort of like alchemy, turning lead into gold
seems much less impossible today than at any other time in history.

Doesn't anyone else find this interesting? or has it become impossible to
discuss unproven theories without personal attacks?


On Fri, 2004-08-13 at 10:07, Seth Gordon wrote:
> markw at mohawksoft.com wrote:
> >
> > Not the point, if you have a repeatable compressor, then compression
ratio
> > is largely unimportant. If you can compress something 3% over and over
> > again, then you can make it really small.
>
> Yes, and if you have a perpetual motion machine, then Middle Eastern oil
> reserves are largely unimportant.
>
> Assume that you have a "repeatable compressor" that will losslessly
> shrink any string of bits by 3% every time it is run.  What happens when
> it receives a 100-bit input?  There are 2^100 possible inputs, but only
> 2^97 possible outputs.  Therefore, there must be *some* inputs for which
> this compressor cannot produce a 97-bit output--indeed, there must be
> some inputs for which this compressor cannot produce a 99-bit output.
> Therefore, this repeatable compressor cannot exist.  QED.


Seth is on-target.  The above is stated with much greater rigor at:

  http://www.faqs.org/faqs/compression-faq/part1/  [Section 9.1]

Apparently, cranks have been flirting this topic for some time and just
can't seem to let it go.  I wish this group would let it go.

Please?

Ed

--
Edward H. Hill III, PhD
office:  MIT Dept. of EAPS;  Rm 54-1424;  77 Massachusetts Ave.
             Cambridge, MA 02139-4307
emails:  eh3 at mit.edu                ed at eh3.com
URLs:    http://web.mit.edu/eh3/    http://eh3.com/
phone:   617-253-0098
fax:     617-253-4464




BLU is a member of BostonUserGroups
BLU is a member of BostonUserGroups
We also thank MIT for the use of their facilities.

Valid HTML 4.01! Valid CSS!



Boston Linux & Unix / webmaster@blu.org