panthema / 2007 / 0328-ZLibString

C++ Code Snippet - Compressing STL Strings with zlib

Posted on 2007-03-28 18:23 by Timo Bingmann at Permlink with 11 Comments. Tags: c++ code-snippet frontpage

The zlib library can be found on virtually every computer. It is THE general-purpose lossless patent-free compression library.

This small C++ code snippet features a pair of functions which use this ubiquitous library to compress ordinary STL strings. There are many uses for this code snippet, like compressing string data stored in a database or binary data transfered over a network. Keep in mind that the compressed string data is binary, so the string's c_str() representation must be avoided.

To compile the following small program use "gcc testzlib.cc -o testzlib -lz" where testzlib.cc is the code.

// Copyright 2007 Timo Bingmann <tb@panthema.net>
// Distributed under the Boost Software License, Version 1.0.
// (See http://www.boost.org/LICENSE_1_0.txt)

#include <string>
#include <stdexcept>
#include <iostream>
#include <iomanip>
#include <sstream>

#include <zlib.h>

/** Compress a STL string using zlib with given compression level and return
  * the binary data. */
std::string compress_string(const std::string& str,
                            int compressionlevel = Z_BEST_COMPRESSION)
{
    z_stream zs;                        // z_stream is zlib's control structure
    memset(&zs, 0, sizeof(zs));

    if (deflateInit(&zs, compressionlevel) != Z_OK)
        throw(std::runtime_error("deflateInit failed while compressing."));

    zs.next_in = (Bytef*)str.data();
    zs.avail_in = str.size();           // set the z_stream's input

    int ret;
    char outbuffer[32768];
    std::string outstring;

    // retrieve the compressed bytes blockwise
    do {
        zs.next_out = reinterpret_cast<Bytef*>(outbuffer);
        zs.avail_out = sizeof(outbuffer);

        ret = deflate(&zs, Z_FINISH);

        if (outstring.size() < zs.total_out) {
            // append the block to the output string
            outstring.append(outbuffer,
                             zs.total_out - outstring.size());
        }
    } while (ret == Z_OK);

    deflateEnd(&zs);

    if (ret != Z_STREAM_END) {          // an error occurred that was not EOF
        std::ostringstream oss;
        oss << "Exception during zlib compression: (" << ret << ") " << zs.msg;
        throw(std::runtime_error(oss.str()));
    }

    return outstring;
}

/** Decompress an STL string using zlib and return the original data. */
std::string decompress_string(const std::string& str)
{
    z_stream zs;                        // z_stream is zlib's control structure
    memset(&zs, 0, sizeof(zs));

    if (inflateInit(&zs) != Z_OK)
        throw(std::runtime_error("inflateInit failed while decompressing."));

    zs.next_in = (Bytef*)str.data();
    zs.avail_in = str.size();

    int ret;
    char outbuffer[32768];
    std::string outstring;

    // get the decompressed bytes blockwise using repeated calls to inflate
    do {
        zs.next_out = reinterpret_cast<Bytef*>(outbuffer);
        zs.avail_out = sizeof(outbuffer);

        ret = inflate(&zs, 0);

        if (outstring.size() < zs.total_out) {
            outstring.append(outbuffer,
                             zs.total_out - outstring.size());
        }

    } while (ret == Z_OK);

    inflateEnd(&zs);

    if (ret != Z_STREAM_END) {          // an error occurred that was not EOF
        std::ostringstream oss;
        oss << "Exception during zlib decompression: (" << ret << ") "
            << zs.msg;
        throw(std::runtime_error(oss.str()));
    }

    return outstring;
}

/** Small dumb tool (de)compressing cin to cout. It holds all input in memory,
  * so don't use it for huge files. */
int main(int argc, char* argv[])
{
    std::string allinput;

    while (std::cin.good())     // read all input from cin
    {
        char inbuffer[32768];
        std::cin.read(inbuffer, sizeof(inbuffer));
        allinput.append(inbuffer, std::cin.gcount());
    }

    if (argc >= 2 && strcmp(argv[1], "-d") == 0)
    {
        std::string cstr = decompress_string( allinput );

        std::cerr << "Inflated data: "
                  << allinput.size() << " -> " << cstr.size()
                  << " (" << std::setprecision(1) << std::fixed
                  << ( ((float)cstr.size() / (float)allinput.size() - 1.0) * 100.0 )
                  << "% increase).\n";

        std::cout << cstr;
    }
    else
    {
        std::string cstr = compress_string( allinput );

        std::cerr << "Deflated data: "
                  << allinput.size() << " -> " << cstr.size()
                  << " (" << std::setprecision(1) << std::fixed
                  << ( (1.0 - (float)cstr.size() / (float)allinput.size()) * 100.0)
                  << "% saved).\n";

        std::cout << cstr;
    }
}

Comment by Sam Dutton at 2011-06-08 14:00 UTC
Thanks for this.

I realise this post is four years old, but I'll comment anyway...

Excuse my ignorance of compression and encoding algorithms, but I'm looking for a way to compress a std::string, in a way that the 'compressed string' can be stored as a std::string, and later retrieved and decompressed to a std::string. Is this possible with your code?

I'm probably missing the point here, but the comment on compress_string() seems to conflict with the std::string return type:

/* Compress a STL string using zlib with given compression level and return
the binary data. */
Comment by Timo at 2011-06-14 08:09 UTC
Well, maybe you'd best check whether a std::string can contain "binary data".
Greetings, Timo
Comment by Patrick at 2011-06-17 06:46 UTC
Hi

I just tested this code in Mac OS X... I saved it as gzip.cpp and ran g++ -o gzip gzip.cpp, but I got the following error:

g++ -o gzip gzip.cpp
Undefined symbols:
"_inflateEnd", referenced from:
decompress_string(std::basic_string, std::allocator > const&)in ccT57o1k.o
"_deflateInit_", referenced from:
compress_string(std::basic_string, std::allocator > const&, int)in ccT57o1k.o
"_inflate", referenced from:
decompress_string(std::basic_string, std::allocator > const&)in ccT57o1k.o
"_deflateEnd", referenced from:
compress_string(std::basic_string, std::allocator > const&, int)in ccT57o1k.o
"_inflateInit_", referenced from:
decompress_string(std::basic_string, std::allocator > const&)in ccT57o1k.o
"_deflate", referenced from:
compress_string(std::basic_string, std::allocator > const&, int)in ccT57o1k.o
ld: symbol(s) not found
collect2: ld returned 1 exit status

How can I fix this, do you have any idea?
Comment by Timo at 2011-06-17 08:27 UTC
Yes, just add -lz to the gcc line. Sorry about that. I've appended this tip to the blog post.
Timo
Comment by Stefan at 2012-06-05 16:14 UTC
Thanks for this, it looks nice.
I found I had to #include <cstring> for memset and strcmp.
Comment by Jason at 2013-12-18 19:26 UTC
Hi. May I include the compress_string() and decompress_string() functions in a project I am working on? How should I properly credit you?
Comment by Timo at 2013-12-21 09:58 UTC
I added a Boost license header. I hope that license is fine for you.
Timo

Comment by Steve at 2014-02-18 19:20 UTC
Thanks for this Timo. This saved me a bunch of time. I noticed a few issues.

The #include <cstring> was required (memset).

Changing inflateInit(&zs) to inflateInit2(&zs, 16+MAX_WBITS) allows us to decompress gzip files as well.
Comment by Ray Burgemeestre at 2014-06-01 17:46 UTC
Thank you for sharing these Timo!

I used them for implementing simple Content-Encoding: {deflate|gzip} compression support for a CGI program a while ago.

Decided to share how I created two additional compress_gzip and decompress_gzip functions.

The changes are simple:


// For the decompress
inflateInit2(&zs, MOD_GZIP_ZLIB_WINDOWSIZE + 16)

// For the compress
deflateInit2(&zs, compressionlevel, Z_DEFLATED,
MOD_GZIP_ZLIB_WINDOWSIZE + 16,
MOD_GZIP_ZLIB_CFACTOR,
Z_DEFAULT_STRATEGY) != Z_OK
)


The complete functions with an example available here:

http://blog.cppse.nl/deflate-and-gzip-compress-and-decompress-functions

I see in Steve's comment there is also "MAX_WBITS" somewhere.. my constants are copied from mod_gzip:


#define MOD_GZIP_ZLIB_WINDOWSIZE 15
#define MOD_GZIP_ZLIB_CFACTOR 9
#define MOD_GZIP_ZLIB_BSIZE 8096


- Ray
You can use the zlibcomplete library in C++ (linked above) to easily compress or decompress std::string without worrying about dynamic allocation yourself.
Comment by Benjamin at 2016-03-30 01:43 UTC
Thanks for this sample, it was super useful.

I used it for doing gzip compression too. My only change was the way I initialized zlib with deflateInit2, but beside that I could use your code as is.


// deflateInit2 configure the file format: request gzip instead of deflate
const int windowBits = 15;
const int GZIP_ENCODING = 16;

deflateInit2(&zs, Z_DEFAULT_COMPRESSION, Z_DEFLATED,
windowBits | GZIP_ENCODING, 8,
Z_DEFAULT_STRATEGY);

Post Comment
Name:
E-Mail or Homepage:
 

URLs (http://...) are displayed, e-mails are hidden and used for Gravatar.

Many common HTML elements are allowed in the text, but no CSS style.
RSS 2.0 Weblog Feed Atom 1.0 Weblog Feed Valid XHTML 1.1 Valid CSS (2.1)
Copyright 2005-2017 Timo Bingmann - Impressum