panthema / 2007 / 0328-ZLibString

C++ Code Snippet - Compressing STL Strings with zlib

Posted on 2007-03-28 18:23 by Timo Bingmann at Permlink with 11 Comments. Tags: c++ code-snippet frontpage

The zlib library can be found on virtually every computer. It is THE general-purpose lossless patent-free compression library.

This small C++ code snippet features a pair of functions which use this ubiquitous library to compress ordinary STL strings. There are many uses for this code snippet, like compressing string data stored in a database or binary data transfered over a network. Keep in mind that the compressed string data is binary, so the string's c_str() representation must be avoided.

To compile the following small program use "gcc testzlib.cc -o testzlib -lz" where testzlib.cc is the code.

// Copyright 2007 Timo Bingmann <tb@panthema.net>
// Distributed under the Boost Software License, Version 1.0.
// (See http://www.boost.org/LICENSE_1_0.txt)

#include <string>
#include <stdexcept>
#include <iostream>
#include <iomanip>
#include <sstream>

#include <zlib.h>

/** Compress a STL string using zlib with given compression level and return
  * the binary data. */
std::string compress_string(const std::string& str,
                            int compressionlevel = Z_BEST_COMPRESSION)
{
    z_stream zs;                        // z_stream is zlib's control structure
    memset(&zs, 0, sizeof(zs));

    if (deflateInit(&zs, compressionlevel) != Z_OK)
        throw(std::runtime_error("deflateInit failed while compressing."));

    zs.next_in = (Bytef*)str.data();
    zs.avail_in = str.size();           // set the z_stream's input

    int ret;
    char outbuffer[32768];
    std::string outstring;

    // retrieve the compressed bytes blockwise
    do {
        zs.next_out = reinterpret_cast<Bytef*>(outbuffer);
        zs.avail_out = sizeof(outbuffer);

        ret = deflate(&zs, Z_FINISH);

        if (outstring.size() < zs.total_out) {
            // append the block to the output string
            outstring.append(outbuffer,
                             zs.total_out - outstring.size());
        }
    } while (ret == Z_OK);

    deflateEnd(&zs);

    if (ret != Z_STREAM_END) {          // an error occurred that was not EOF
        std::ostringstream oss;
        oss << "Exception during zlib compression: (" << ret << ") " << zs.msg;
        throw(std::runtime_error(oss.str()));
    }

    return outstring;
}

/** Decompress an STL string using zlib and return the original data. */
std::string decompress_string(const std::string& str)
{
    z_stream zs;                        // z_stream is zlib's control structure
    memset(&zs, 0, sizeof(zs));

    if (inflateInit(&zs) != Z_OK)
        throw(std::runtime_error("inflateInit failed while decompressing."));

    zs.next_in = (Bytef*)str.data();
    zs.avail_in = str.size();

    int ret;
    char outbuffer[32768];
    std::string outstring;

    // get the decompressed bytes blockwise using repeated calls to inflate
    do {
        zs.next_out = reinterpret_cast<Bytef*>(outbuffer);
        zs.avail_out = sizeof(outbuffer);

        ret = inflate(&zs, 0);

        if (outstring.size() < zs.total_out) {
            outstring.append(outbuffer,
                             zs.total_out - outstring.size());
        }

    } while (ret == Z_OK);

    inflateEnd(&zs);

    if (ret != Z_STREAM_END) {          // an error occurred that was not EOF
        std::ostringstream oss;
        oss << "Exception during zlib decompression: (" << ret << ") "
            << zs.msg;
        throw(std::runtime_error(oss.str()));
    }

    return outstring;
}

/** Small dumb tool (de)compressing cin to cout. It holds all input in memory,
  * so don't use it for huge files. */
int main(int argc, char* argv[])
{
    std::string allinput;

    while (std::cin.good())     // read all input from cin
    {
        char inbuffer[32768];
        std::cin.read(inbuffer, sizeof(inbuffer));
        allinput.append(inbuffer, std::cin.gcount());
    }

    if (argc >= 2 && strcmp(argv[1], "-d") == 0)
    {
        std::string cstr = decompress_string( allinput );

        std::cerr << "Inflated data: "
                  << allinput.size() << " -> " << cstr.size()
                  << " (" << std::setprecision(1) << std::fixed
                  << ( ((float)cstr.size() / (float)allinput.size() - 1.0) * 100.0 )
                  << "% increase).\n";

        std::cout << cstr;
    }
    else
    {
        std::string cstr = compress_string( allinput );

        std::cerr << "Deflated data: "
                  << allinput.size() << " -> " << cstr.size()
                  << " (" << std::setprecision(1) << std::fixed
                  << ( (1.0 - (float)cstr.size() / (float)allinput.size()) * 100.0)
                  << "% saved).\n";

        std::cout << cstr;
    }
}

Comment by Sam Dutton at 2011-06-08 14:00 UTC

Thanks for this.

I realise this post is four years old, but I'll comment anyway...

Excuse my ignorance of compression and encoding algorithms, but I'm looking for a way to compress a std::string, in a way that the 'compressed string' can be stored as a std::string, and later retrieved and decompressed to a std::string. Is this possible with your code?

I'm probably missing the point here, but the comment on compress_string() seems to conflict with the std::string return type:

/* Compress a STL string using zlib with given compression level and return
the binary data. */

Comment by Timo at 2011-06-14 08:09 UTC

Well, maybe you'd best check whether a std::string can contain "binary data".
Greetings, Timo

Comment by Patrick at 2011-06-17 06:46 UTC

Hi

I just tested this code in Mac OS X... I saved it as gzip.cpp and ran g++ -o gzip gzip.cpp, but I got the following error:

g++ -o gzip gzip.cpp
Undefined symbols:
"_inflateEnd", referenced from:
decompress_string(std::basic_string, std::allocator > const&)in ccT57o1k.o
"_deflateInit_", referenced from:
compress_string(std::basic_string, std::allocator > const&, int)in ccT57o1k.o
"_inflate", referenced from:
decompress_string(std::basic_string, std::allocator > const&)in ccT57o1k.o
"_deflateEnd", referenced from:
compress_string(std::basic_string, std::allocator > const&, int)in ccT57o1k.o
"_inflateInit_", referenced from:
decompress_string(std::basic_string, std::allocator > const&)in ccT57o1k.o
"_deflate", referenced from:
compress_string(std::basic_string, std::allocator > const&, int)in ccT57o1k.o
ld: symbol(s) not found
collect2: ld returned 1 exit status

How can I fix this, do you have any idea?

Comment by Timo at 2011-06-17 08:27 UTC

Yes, just add -lz to the gcc line. Sorry about that. I've appended this tip to the blog post.
Timo

Comment by Stefan at 2012-06-05 16:14 UTC

Thanks for this, it looks nice.
I found I had to #include <cstring> for memset and strcmp.

Comment by Jason at 2013-12-18 19:26 UTC

Hi. May I include the compress_string() and decompress_string() functions in a project I am working on? How should I properly credit you?

Comment by Timo at 2013-12-21 09:58 UTC

I added a Boost license header. I hope that license is fine for you.
Timo

Comment by Steve at 2014-02-18 19:20 UTC

Thanks for this Timo. This saved me a bunch of time. I noticed a few issues.

The #include <cstring> was required (memset).

Changing inflateInit(&zs) to inflateInit2(&zs, 16+MAX_WBITS) allows us to decompress gzip files as well.

Comment by Ray Burgemeestre at 2014-06-01 17:46 UTC

Thank you for sharing these Timo!

I used them for implementing simple Content-Encoding: {deflate|gzip} compression support for a CGI program a while ago.

Decided to share how I created two additional compress_gzip and decompress_gzip functions.

The changes are simple:


// For the decompress
inflateInit2(&zs, MOD_GZIP_ZLIB_WINDOWSIZE + 16)

// For the compress
deflateInit2(&zs, compressionlevel, Z_DEFLATED,
MOD_GZIP_ZLIB_WINDOWSIZE + 16,
MOD_GZIP_ZLIB_CFACTOR,
Z_DEFAULT_STRATEGY) != Z_OK
)


The complete functions with an example available here:

http://blog.cppse.nl/deflate-and-gzip-compress-and-decompress-functions

I see in Steve's comment there is also "MAX_WBITS" somewhere.. my constants are copied from mod_gzip:


#define MOD_GZIP_ZLIB_WINDOWSIZE 15
#define MOD_GZIP_ZLIB_CFACTOR 9
#define MOD_GZIP_ZLIB_BSIZE 8096


- Ray

You can use the zlibcomplete library in C++ (linked above) to easily compress or decompress std::string without worrying about dynamic allocation yourself.

Comment by Benjamin at 2016-03-30 01:43 UTC

Thanks for this sample, it was super useful.

I used it for doing gzip compression too. My only change was the way I initialized zlib with deflateInit2, but beside that I could use your code as is.


// deflateInit2 configure the file format: request gzip instead of deflate
const int windowBits = 15;
const int GZIP_ENCODING = 16;

deflateInit2(&zs, Z_DEFAULT_COMPRESSION, Z_DEFLATED,
windowBits | GZIP_ENCODING, 8,
Z_DEFAULT_STRATEGY);

Post Comment
Name:
E-Mail or Homepage:
 

URLs (http://...) are displayed, e-mails are hidden and used for Gravatar.

Many common HTML elements are allowed in the text, but no CSS style.