panthema / tags / c++

Weblog Articles Tagged with '#c++'

Weblog Articles

Photo of a Samsung NVMe SSD

NVMe "Disk" Bandwidth and Latency for Batched Block Requests

Posted on 2019-03-22 16:00 by Timo Bingmann at Permlink with 0 Comments. Tags: #c++ #stxxl #thrill

Last week I had the pleasure of being invited to the Dagstuhl seminar 19111 on Theoretical Models of Storage Systems. I gave a talk on the history of STXXL and Thrill, but also wanted to include some current developments. Most interesting I found is the gap closing between RAM and disk bandwidth due to the (relatively) new Non-Volatile Memory Express (NVMe) storage devices.

Since I am involved in many projects using external memory, I decided to perform a simple set of fundamental experiments to compare rotational disks and newer solid-state devices (SSDs). The results were interesting enough to write this blog article about.

Among the tools of STXXL/FOXXLL there are two benchmarks which perform two distinct access patterns: Scan (benchmark_disks) and Random (benchmark_disks_random).

  • In Scan a batch of k sequential blocks of size B are read or written in order.
    Batched Scanning Pattern
  • In Random a batch of k randomly selected blocks of size B from a span of size N are read or written.
    Batched Random Access Pattern

The Scan experiment is probably the fastest access method as it reads or writes the disk (actually: storage device) sequentially. The Random experiment is good to determine the access latency of the disk as it first has to seek to the block and then transfer the data. Notice that the Random experiment does batched block accesses like one would perform in a query/answering system where the next set of random blocks depends on calculations performed with the preceding blocks (like in a B-Tree). This is a different experiment than done by most "throughput" measurement tools which issue a continuous stream of random block accesses.

This article continues on the next page ...

First slide of the talk

Tutorial on Boost.Spirit at C++ User Group Karlsruhe

Posted on 2018-09-12 19:30 by Timo Bingmann at Permlink with 0 Comments. Tags: #talk #c++ #parsing

On September 12th, 2018, I gave another 90min talk with live-coding examples in German at the C++ User Group Karlsruhe in rooms of the Karlsruhe Institute of Technology (KIT).

This time I was asked to present a more advanced topic around C++ and libraries and I chose to present a tutorial on Boost.Spirit.

Boost.Spirit is a parser and generator template meta-programming framework and maybe one of the most crazy and advanced uses of C++. It enables one to write context-free grammars inline as C++ code, which are translated into recursive descent parsers and fully optimized by the compiler.

This powerful framework is however not easy to get started with. I hope my tutorial helps more people to skip the steep learning curve and use Boost.Spirit for securely parsing user input and other structure data.

The tutorial consisted of a set of introduction slides: slides-2018-09-12-Cpp-Meetup.pdf slides-2018-09-12-Cpp-Meetup.pdf. Followed by a live-coding session in German which was recorded by the KIT (see below for the youtube video).

Download slides-2018-09-12-Cpp-Meetup.pdf

The extensive code examples presented in the live coding session are available on this webpage
or on github: https://github.com/bingmann/2018-cpp-spirit-parsing.

The examples can be seen as instructive templates and copy & paste sources for new development. The examples are:

  1. Learn to walk and parse simple integers and lists.
    Parse 5, [5, 42, 69, 256].
  2. Create a parser for a simple arithmetic grammar (and part two).
    Parse 5 + 6 * 9 + 42 and evaluate correctly.
  3. Parse CSV data directly into a C++ struct.
    Parse AAPL;Apple;252.50; into a struct.
  4. Create an abstract syntax tree (AST) from arithmetic (and part two).
    Parse y = 6 * 9 + 42 * x and evaluate with variables.
  5. Ogle some more crazy examples, e.g. how to parse.
    <h1>Example for <b>C++ HTML Parser<b></h1>
    This HTML <b>snippet</b> parser can also interpret
    *Markdown* style and enables additional tags
    to <% invoke("C++", 42) %> functions.

Furthermore, a recording of the live-coding in German is available on Youtube:
https://www.youtube.com/watch?v=gYAheppw73U


TLX Logo

Note about the new tlx library of Advanced C++ Data Structures and Algorithms

Posted on 2018-05-28 18:20 by Timo Bingmann at Permlink with 0 Comments. Tags: #c++

Last year on February 19th, I started a new github repository called tlx with the goal of de-duplicating code from three projects: Thrill, STXXL, and a private project. The idea came up after a STXXL code workshop in Frankfurt (fashionably called hackathons nowadays).

Link to library: http://github.com/tlx/tlx and Doxygen Documentation

The first main common pieces of code were:

  1. the fast loser tree implementations from MCSTL by Johannes Singler necessary for efficient multiway merging,
  2. my die() macros for testing and run-time assertions,
  3. a common intrusive reference counter called counting_ptr, and
  4. simple but vital std::string manipulation functions missing from the STL.

The initial reason for tlx to come about was to consolidate all the bug fixes to the loser tree implementations that I had scattered across the three projects. Efficient multiway merging is such a fundamental task and there was no universally available C++ library that implements the tournament tree well.

A long search for an appropriate vacant user account with three letters on github lead to "tlx". This is definitely a good C++ namespace name, but to this day, it is unclear what the letters stand for. Template Libraries for CXX? The missing Library for CXX? Template Library and more eXtensions. Have your pick, someday someone will find a good official expansion.

Since its inception, tlx has grown a lot. Its goal is to consolidate algorithms and data structures from multiple projects. In a sense tlx maybe aims to be the Boost for advanced algorithms. The goals and constraints of tlx are:

  • To have a library of well implemented and tested advanced algorithms and things missing from the C++ STL.
  • Target high modularity with as little dependencies between modules as possible.
  • Zero external dependencies: no additional libraries are required.
  • Only have compile time configuration (no platform dependent checks).
  • Compile on all platforms with C++ -- smartphones, supercomputers, windows, maybe even embedded microcontrollers.
  • Attempt to never break existing interfaces.
  • Warning and bug-freeness on all compilers.
  • Keep overhead down -- small overall size such that is can be included without bloating applications.
  • Collect code only under the Boost license, which one of the most liberal licenses and can be used any project.

Currently, tlx contains

  • The fast tournament (loser) trees from MCSTL by Johannes Singler, with many fixes.
  • A fast intrusive reference counter called CountingPtr, which has considerably less overhead than std::shared_ptr.
  • Efficient and fast multiway merging algorithms from Johannes Singler, which were previously included with gcc. The tlx version has many fixes and is available for clang and MSVC++.
  • Many string manipulation algorithms for std::string.
  • An improved version of my stx-btree implementation, which is basically always a better alternative to std::map (but not std::unordered_map).
  • A copy of siphash for string hashing.
  • Efficient sequential string sorting implementation such as radix sort and multikey quicksort (described in length in my PhD thesis).

And much more, which one can find on the front page of the Doxygen Documentation


First slide of the talk

Presentation "C++ Goodies" at C++ User Group Karlsruhe

Posted on 2017-03-08 00:00 by Timo Bingmann at Permlink with 0 Comments. Tags: #talk #c++

On March 8th, 2017, I gave a 90min talk in German at the C++ User Group Karlsruhe which consisted mostly of live-coding examples of how to use new C++11/14/17 features in a practical setting. The contents were:

  • Rvalue References and Move Semantics (with Excursions into Lambdas and std::function)
  • Virtual Final Override
  • Variadic Template Parameter (Un-)Packing
  • Random Bits of Thrill

The source code I wrote for the presentation and the slides
are available in a github repository https://github.com/bingmann/2017-cpp-goodies.

Direct link to the slides PDF.

Furthermore, a recording of talk in German is available on Youtube: https://www.youtube.com/watch?v=EvSZHXmXR1M


A figure from the technical report

Thrill: High-Performance Algorithmic Distributed Batch Data Processing with C++

Posted on 2016-08-20 09:54 by Timo Bingmann at Permlink with 0 Comments. Tags: #research #c++ #thrill

Our technical report on "Thrill: High-Performance Algorithmic Distributed Batch Data Processing with C++" is now available on arXiv as 1608.05634 or locally: 1608.05634v1.pdf 1608.05634v1.pdf with source 1608.05634v1.tar.gz 1608.05634v1.tar.gz (780 KiB).

This report is the first technical documentation about our new distributed computing prototype called Thrill. Thrill is written in modern C++14, and open source under the BSD-2 license. More information on Thrill is available from the project homepage.

Thrill's source is available from Github.

Download 1608.05634v1.pdf

Abstract

We present the design and a first performance evaluation of Thrill -- a prototype of a general purpose big data processing framework with a convenient data-flow style programming interface. Thrill is somewhat similar to Apache Spark and Apache Flink with at least two main differences. First, Thrill is based on C++ which enables performance advantages due to direct native code compilation, a more cache-friendly memory layout, and explicit memory management. In particular, Thrill uses template meta-programming to compile chains of subsequent local operations into a single binary routine without intermediate buffering and with minimal indirections. Second, Thrill uses arrays rather than multisets as its primary data structure which enables additional operations like sorting, prefix sums, window scans, or combining corresponding fields of several arrays (zipping).

We compare Thrill with Apache Spark and Apache Flink using five kernels from the HiBench suite. Thrill is consistently faster and often several times faster than the other frameworks. At the same time, the source codes have a similar level of simplicity and abstraction.


First slide of the talk

Presentation "Massive Suffix Array Construction with Thrill" at DFG SPP 1736 Annual Colloquium

Posted on 2015-10-01 19:40 by Timo Bingmann at Permlink with 0 Comments. Tags: #c++ #talk #thrill

Today, we gave an overview presentation of the vision behind Project Thrill, its current state, and how it will be used to implement suffix and LCP array construction, and many other distributed algorithms: 2015-10-01 Massive Suffix Array Construction with Thrill.pdf 2015-10-01 Massive Suffix Array Construction with Thrill.pdf.

Download 2015-10-01 Massive Suffix Array Construction with Thrill.pdf

First slide of the talk showing priority queues at the airport

Presentation of Parallel Priority Queue at the Conference SEA'2015

Posted on 2015-06-30 17:11 by Timo Bingmann at Permlink with 0 Comments. Tags: #c++ #talk #stxxl

We are very glad to have been given the opportunity to present our work on bulk-parallel priority queues for external memory at the 14th International Symposium on Experimental Algorithms (SEA 2015) in Paris. Our paper is in the proceedings and also available here: paper-SEA15-Bulk-Parallel-Priority-Queue.pdf paper-SEA15-Bulk-Parallel-Priority-Queue.pdf

The talk was given by Thomas Keh, and the slides of the presentation are available online: 2015-06-29 A Bulk-Parallel Priority Queue in External Memory with STXXL.pdf 2015-06-29 A Bulk-Parallel Priority Queue in External Memory with STXXL.pdf. The implementation is available in the current master branch of STXXL at github.

Download 2015-06-29 A Bulk-Parallel Priority Queue in External Memory with STXXL.pdf

Abstract

We propose the design and an implementation of a bulk-parallel external memory priority queue to take advantage of both shared-memory parallelism and high external memory transfer speeds to parallel disks. To achieve higher performance by decoupling item insertions and extractions, we offer two parallelization interfaces: one using "bulk" sequences, the other by defining "limit" items. In the design, we discuss how to parallelize insertions using multiple heaps, and how to calculate a dynamic prediction sequence to prefetch blocks and apply parallel multiway merge for extraction. Our experimental results show that in the selected benchmarks the priority queue reaches 64% of the full parallel I/O bandwidth of SSDs and 49% of rotational disks, or the speed of sorting in external memory when bounded by computation.


First slide of the talk showing sparks forming a C++

Presentation of DALKIT (work in progress) in Berlin

Posted on 2015-03-27 21:00 by Timo Bingmann at Permlink with 0 Comments. Tags: #c++ #talk #thrill

Today, I presented our work in progress on a distributed computation platform for Big Data algorithms at the LSDMA All-Hands-Meeting in Berlin. One of the currently proposed names is DALKIT. The talk covers the current state our student project is in, which consists mainly of the design of the framework's interface, architecture and future components.

The slides of the presentation 2015-03-27 Project DALKIT.pdf 2015-03-27 Project DALKIT.pdf are available online. However, as usual, my slides are very difficult to understand without the audio track. For future "final" version presentations there will probably be more videos.

Download 2015-03-27 Project DALKIT.pdf

STXXL simple logo

Released STXXL 1.4.1

Posted on 2014-10-29 11:17 by Timo Bingmann at Permlink with 0 Comments. Tags: #c++ #university #stxxl

STXXL is an implementation of the C++ standard template library STL for external memory (out-of-core) computations, i.e., STXXL implements containers and algorithms that can process huge volumes of data that only fit on disks. While the compatibility to the STL supports ease of use and compatibility with existing applications, another design priority is high performance.

More history about STXXL can be found in the blog post to 1.4.0. Today, the second release of the new 1.4 branch was published:

What's new in 1.4.1 ?

  • Integrated support for kernel based asynchronous I/O on Linux (new file type "linuxaio"), which exploits Native Command Queuing (NCQ) if available.
  • Merged stxxl::unordered_map branch, which provides a hash map backed by external memory.
  • Replaced struct default_completion_handler with a NULL pointer, thus avoiding superfluous new/delete work for each I/O request.
  • Added stxxl::external_shared_ptr which is a proxy class to allow use of shared_ptr classes inside stxxl containers.
  • Fixing bugs and warnings on 32-bit systems (yes, they still exist).
  • Use atomic_counted_object in class file for request reference counting.
  • Adding support for MinGW-w64 (64-bit) systems with working SJLJ thread implementations.

See the CHANGELOG for further minor changes.


First slide of the talk showing the inside of a hard disk

Recording of a Talk "STXXL 1.4.0 and Beyond"

Posted on 2014-06-22 20:00 by Timo Bingmann at Permlink with 0 Comments. Tags: #c++ #stxxl #talk #frontpage

This is a recording of a talk that I gave last week at the 3rd LSDMA Topical Meeting in Berlin. The talk covers a basic introduction into the STXXL library's features and contains many short code examples that serve as a tutorial.

The slides of the presentation 2014-06-22 STXXL 1.4.0 and Beyond.pdf 2014-06-22 STXXL 1.4.0 and Beyond.pdf are available online and the recorded talk can be seen on Youtube video below.

Download 2014-06-22 STXXL 1.4.0 and Beyond.pdf

And here is the video of the recording: https://www.youtube.com/watch?v=UswxcAOJKBE

This article continues on the next page ...

Ternary search tree used in parallel super scalar string sample sort and LCP-aware tournament tree

Released parallel-string-sorting 0.6
including Parallel Super Scalar String Sample Sort and Parallel Multiway LCP-Mergesort

Posted on 2014-03-09 10:20 by Timo Bingmann at Permlink with 0 Comments. Tags: #c++ #parallel-string-sorting #sorting

This short post announces the second public version of our parallel string sorting project. It is a test framework and algorithm collection containing most sequential and parallel string sorting implementations.

The collection includes parallel super scalar string sample sort (pS5), which we developed and showed to have the highest parallel speedups on modern single-socket multi-core shared memory systems. Additionally, the collection now contains parallel multiway LCP-mergesort, which can be used to speed up string sorting on NUMA multi-socket machines.

See the parallel-string-sorting project page for our technical report and more information about version 0.6.


STXXL simple logo

Released STXXL 1.4.0

Posted on 2013-12-12 16:50 by Timo Bingmann at Permlink with 0 Comments. Tags: #c++ #university #stxxl

STXXL is an implementation of the C++ standard template library STL for external memory (out-of-core) computations, i.e., STXXL implements containers and algorithms that can process huge volumes of data that only fit on disks. While the compatibility to the STL supports ease of use and compatibility with existing applications, another design priority is high performance.

The project was originally created by Roman Dementiev and Peter Sanders at MPI Informatik in Saarbrücken. It moved to Karlsruhe with them in 2004. After Roman's PhD defense, there was a cooperation with the Algorithm Engineering group at the University of Frankfurt to create better parallel asynchronous sorting. Afterwards, stewardship moved to Frankfurt, where work on flash/SSD drives and various external memory graph algorithms was done.

After a longer stretch without further work, I have decided to take part in future development as a maintainer. This is partly due to my previous experience with it while implement in eSAIS, the external memory suffix and LCP array construction algorithm. And thus, today, the first release of the new 1.4 branch was published:

What's new in 1.4.0 ?

  • reorganized source hierarchy into include/ lib/ tests/ examples/ doc/ tools/
  • CMake build system for cross-platform compilation
  • greatly improved documentation with tutorials and examples
  • efficient external matrix operations
  • new containers stxxl::sequence and stxxl::sorter
  • improved .stxxl disk configuration files and additional options
  • combined stxxl_tool of disk benchmarks
  • simple examples and skew3 as real-world stream application
  • support for Visual Studio 2012 and 2013 without Boost
  • important bug fixes in stxxl::queue and stxxl::priority_queue

Example of intuition behind the inducing process

Presented Short Paper about eSAIS at MASSIVE'13 Workshop

Posted on 2013-09-05 18:00 by Timo Bingmann at Permlink with 0 Comments. Tags: #research #stringology #c++ #talk

Today, we presented a shorter version of our work on "Inducing Suffix and LCP Arrays in External Memory" at the MASSIVE Workshop 2013, held adjacently with ESA at ALGO 2013 in Sophia Antipolis, France.

The slides of our presentation 2013-09-05 eSAIS @ MASSIVE'13.pdf 2013-09-05 eSAIS @ MASSIVE'13.pdf and the corresponding short paper massive13esais.pdf massive13esais.pdf are available online via this webpage.

Download 2013-09-05 eSAIS @ MASSIVE'13.pdf
Download massive13esais.pdf

Please refer to the first eSAIS posting for details and source code.

Our thanks goes to all the organizers for making such an inspiring workshop possible.


The Sound of Sorting demo program

Published "The Sound of Sorting" 0.6

Posted on 2013-05-22 23:50 by Timo Bingmann at Permlink with 2 Comments. Tags: #c++ #university #fun #sorting #sound of sorting

This post announces the publication of my demo program for integer string sorting algorithms, called "The Sound of Sorting". It both visualizes the internals of sorting algorithms, and generates sound effects from the values being compared!

The demo is implemented using the cross-platform toolkits wxWidgets and SDL, can be executed on Windows, Linux and Mac, and runs in real time.

There are also many videos of the sorting algorithm on my new YouTube channel.

See the Sound of Sorting project page for the demo program and source code, and more information about version 0.6.


Ternary search tree used in parallel super scalar string sample sort

Released parallel-string-sorting 0.5 including Parallel Super Scalar String Sample Sort

Posted on 2013-05-08 11:47 by Timo Bingmann at Permlink with 0 Comments. Tags: #c++ #parallel-string-sorting #sorting

This short post announces the first public version of our parallel string sorting project. It is a test framework and algorithm collection containing many sequential and parallel string sorting implementations.

The collection includes parallel super scalar string sample sort (pS5), which we developed and showed to have the highest parallel speedups on modern multi-core shared memory systems.

See the parallel-string-sorting project page for our technical report and more information about version 0.5.


Thumbnail of a pie chart filling to 100%

Released disk-filltest 0.7 - Simple Tool to Detect Bad Disks by Filling with Random Data

Posted on 2013-03-27 21:32 by Timo Bingmann at Permlink with 0 Comments. Tags: #c++ #utilities

This post announces the first version of disk-filltest, a very simple tool to test for bad blocks on a disk by filling it with random data. The function of disk-filltest is simple:

  • Write files random-######## to the current directory until the disk is full.
  • Read the files again and verify the pseudo-random sequence written.
  • Any write or read error will be reported, either by the operating system or by checking the pseudo-random sequence.
  • Optionally, delete the random files after a successful run.

See the disk-filltest project page for more information about version 0.7.


Memory profile plot as generated by example in malloc_count tarball

Released malloc_count 0.7 - Tools for Runtime Memory Usage Analysis and Profiling

Posted on 2013-03-16 22:17 by Timo Bingmann at Permlink with 1 Comments. Tags: #c++ #coding tricks

This post announces the first version of malloc_count, a very useful tool that I have been fine-tuning in the past months. The code library provides facilities to

  • measure the current and peak heap memory allocation, and
  • write a memory profile for plotting.
  • Furthermore, separate stack_count function can measure stack usage.

The code tool works by intercepting the standard malloc(), free(), etc functions. Thus no changes are necessary to the inspected source code.

See the malloc_count project page for more information about version 0.7.


Instacode coloring of assembler code

Coding Tricks 101: How to Save the Assembler Code Generated by GCC

Posted on 2013-01-24 18:07 by Timo Bingmann at Permlink with 2 Comments. Tags: #c++ #coding tricks #frontpage

This is the first issue of a series of blog posts about some Linux coding tricks I have collected in the last few years.

Folklore says that compilers are among the most complex computer programs written today. They incorporate many optimization algorithms, inline functions and fold constant expressions; all without changing output, correctness or side effects of the code. If you think about it, the work gcc, llvm and other compilers do is really amazing and mostly works just great.

Sometimes, however, you want to know exactly what a compiler does with your C/C++ code. Most straight-forward questions can be answered using a debugger. However, if you want to verify whether the compiler really applies those optimizations to your program, that your intuition expects it to do, then a debugger is usually not useful, because optimized programs can look very different from the original. Some example questions are:

  • Is a local integer variable stored in a register and how long does it exist?
  • Does the compiler use special instructions for a simple copy loop?
  • Are special conditional instructions used for an if or switch statement?
  • Is a specific function inlined or called each time?

These questions can be answered definitely by investigating the compiler's output. On the Net, there are multiple "online compilers," which can visualize the assembler output of popular compilers for small pieces of code: see the "GCC Explorer" or "C/C++ to Assembly v2". However, for inspecting parts of a larger project, these tools are unusable, because the interesting pieces are embedded in much larger source files.

Luckily, gcc does not output binary machine code directly. Instead, it internally writes assembler code, which then is translated by as into binary machine code (actually, gcc creates more intermediate structures). This internal assembler code can be outputted to a file, with some annotation to make it easier to read.

This article continues on the next page ...

Example of the Inducing Process

eSAIS - Inducing Suffix and LCP Arrays in External Memory

Posted on 2012-11-19 15:49 by Timo Bingmann at Permlink with 2 Comments. Tags: #research #stringology #stxxl #c++

This web page accompanies our conference paper "Inducing Suffix and LCP Arrays in External Memory", which we presented at the Workshop on Algorithm Engineering and Experiments (ALENEX 2013). A PDF of the publication is available from this site as alenex13esais.pdf alenex13esais.pdf or from the online proceedings. The paper was joint work with my colleagues Johannes Fischer and Vitaly Osipov.

Download alenex13esais.pdf

The slides to my presentation of the paper on January 7th, 2013 in New Orleans, LA, USA is available: alenex13esais-slides.pdf alenex13esais-slides.pdf. They contain little text and an example of the eSAIS algorithm with a simplified PQ.

Download alenex13esais-slides.pdf

We have also submitted a full version of the eSAIS paper to a journal. Due to long publication cycles, we make a pre-print of the journal article is available here: esais-preprint.pdf esais-preprint.pdf. The full paper contains more details on the inducing algorithm for the LCP array and additional experimental details.

Download esais-preprint.pdf

Our implementations of eSAIS, the eSAIS-LCP variants, DC3 and DC3-LCP algorithms as described in the paper are available below under the GNU General Public License v3 (GPL).

eSAIS and DC3 with LCP version 0.5.4 (current) updated 2013-12-13
Source code archive:
(includes STXXL 1.4.0)
eSAIS-DC3-LCP-0.5.4.tar.bz2 eSAIS-DC3-LCP-0.5.4.tar.bz2 (1.37 MiB) Browse online
Git repositories Suffix and LCP construction algorithms
git clone https://github.com/bingmann/eSAIS
cd eSAIS; git submodule init; git submodule update
STXXL 1.4.0
git clone https://github.com/stxxl/stxxl

For more information about compiling and testing the implementation, please refer to the README included in the source.

This article continues on the next page ...

The QuadClip Algorithm

Finding Roots of Polynomials by Clipping - Report and Implementation from my Lab Course in Numerical Mathematics

Posted on 2012-03-20 22:29 by Timo Bingmann at Permlink with 0 Comments. Tags: #maths #university #c++

This semester I had the pleasure to take part in a lab exercise course supervised by Prof. Thomas Linß at the FernUniversity of Hagen. The objective was to comprehend, implement and evaluate a particular recent advancement in the field of numerical mathematics. My topic was finding the roots of a polynomial by clipping in Bézier representation using two new methods, one devised by Michael Bartoň and Bert Jüttler [1], the other extended from the first by Ligang Liu, Lei Zhang, Binbin Lin and Guojin Wang [2].

My implementation of this topic was done for the lab course in C++ and contains many in themselves interesting sub-algorithms, which are combined into the clipping algorithms for finding roots. These sub-algorithms may prove useful for other purposes, which is the main reason for publishing this website. Among these are:

  • Polynomial classes for monomial and Bézier representations: PolynomialStandard and PolynomialBezier.
  • Algorithms to convert from monomial to Bézier representation and vice versa: PolynomialStandard::toBezier() and PolynomialBezier::toStandard().
  • Evaluation algorithms for both representations: Horner's Schema and the Algorithm of de Casteljau.
  • Another version of de Casteljau's Algorithm to split a polynomial in Bézier representation into two parts.
  • Jarvis' March aka gift wrapping (run time O(hn)) to calculate the convex hull of the Bézier polygon: PolynomialBezier::getConvexHull().
  • Cardano's formulas to find all real roots of any cubic polynomial: PolynomialStandard::findRoots().

For the lab course I wrote two documents, both in German: one is an abstract Kurzfassung.pdf Kurzfassung.pdf (1 page), which is translated into English below, and the other a short report Ausarbeitung.pdf Ausarbeitung.pdf (6 pages). The report contains a short description of the algorithms together with execution and convergence speed measurements, which verify the original authors experiments. For presenting the lab work I created these Slides.pdf Slides.pdf, which however are not self-explanatory due to my minimum-text presentation style.

This article continues on the next page ...

Small drawing of a B+ tree

Update Release of STX B+ Tree 0.8.6

Posted on 2011-05-18 12:44 by Timo Bingmann at Permlink with 1 Comments. Tags: #c++ #stx-btree

After four years I have decided to release an updated version 0.8.6 of the STX B+ Tree C++ Template Classes package. The updated release contains all patches that have accumulated in my inbox over the years. So yes, please send me patches for this project, it is not in vain! Below the highlights on the changes in this release:

  • Implemented a missing function: erase(iterator iter) by recursively searching for the referenced leaf node inside the subtree containing equal keys.
  • Applied a patch which adds support for STL allocators as template parameters.
  • Corrected limits of a for loop when shifting pairs from left to right leaf nodes during deletion.

I also reran the speed test done back in 2007 on my new hardware and compared the results with the old data. Due to the larger L2 cache sizes in my new Intel Core i7, the B-tree speed-up first starts to show at about 100,000 integer items, rather than 16,000 items with my older Pentium 4. This might also have something to do with the new CPU using 64-bit pointers and thus requiring larger nodes for child references. Read the complete speed test here.

result plot from new speedtest

The updated source code package is available for download from this webpage.


Yet Another Release of digup 0.6.40 - A Digest Updating Tool

Posted on 2011-01-31 20:25 by Timo Bingmann at Permlink with 0 Comments. Tags: #c++ #utilities

This is yet another release entry of digup. This time, however, it is a major release with lots of new improvements and some old fixes:

  • Added new flags --modify-window=number to allow larger deltas in the modification time of files. This is a fixup for checking backups copied onto FAT filesystems, because these store the time stamp with a precision of only 2 seconds. Since this is a common operation, the shortcut -w/--windows will set the allowed modification window to 1.
  • Also added a new flag --exclude-marker=file, which makes the recursive file system walk skip all directories (and their sub-directories) containing a special marker file usually called .nobackup in other programs. This option is persistent and stored in the created digest file, so it must only be specified once and then in all subsequent runs the same directory marker is used to selectively skip over special directories.
  • Added a --restrict=pattern temporary exclusion flag. If specified, all paths containing the pattern as a substring are temporarily skipped. This allows one to quickly check a specific folder or file structure for modifications. Especially useful in combination with -c to perform a full digest check on only a subset of the files.
  • One more smaller fixup: using 64-bit file offsets on Window, which allows files larger than 2GB to work correctly.

For more information and the new version see the digup web page.


Bugfix Release: digup 0.6.30 - A Digest Updating Tool

Posted on 2010-10-03 16:12 by Timo Bingmann at Permlink with 0 Comments. Tags: #c++ #utilities

Fixed another severe bug in the digup tool: on the amd64 architecture the tool crashed when writing the digest file, thanks goes to Daniel D. for reporting and fixing this bug.

The bug was caused by the variable arguments lists va_list used twice in the fprintfcrc() function. Apparently, on the amd64 platform va_start() and va_end() must be called twice even when passed the list to vsprintf().

For more information and the new version see the digup web page.


Bugfix Release: digup 0.6.27 - A Digest Updating Tool

Posted on 2010-08-20 23:05 by Timo Bingmann at Permlink with 0 Comments. Tags: #c++ #utilities

Fixed a two bugs in the digup tool: added large file support when compiling the program and fixed a string allocation bug.

This new version enables large file support by using long long variables for size. Furthermore, a string allocation bug was fixed which occured when using -t and -f command line parameters.

For more information and the new version see the digup web page.


Bugfix Release: stx-execpipe 0.7.1 - STX Execution Pipe C++ Library

Posted on 2010-07-30 17:13 by Timo Bingmann at Permlink with 0 Comments. Tags: #c++

Fixed a small bug in the stx-execpipe library: add large file support when compiling the library.

This bug switches on the large file support functions. Without this fix a pipeline reading or writing files >2GB will not function properly. The fix is to #define _FILE_OFFSET_BITS 64 when compiling the library's code.

For more information and the source code see the stx-execpipe web page.


Execution pipe with exec() bubbles

Published stx-execpipe 0.7.0 - STX Execution Pipe C++ Library

Posted on 2010-07-18 23:31 by Timo Bingmann at Permlink with 0 Comments. Tags: #c++

The STX C++ library series has been extended today with a new installation: the STX Execution Pipe library, in short STX ExecPipe. It is the solution to an issue that I encountered in writing a small backup application. This backup tool collects some file to backup and then calls tar and xz followed by ncftpput.

However, I could not find any useful C++ library that allows convenient chaining of external programs like used in everyday shell piping. This pipe line functionality is based on the system calls fork(), exec() and pipe(), which are not easy to use correctly. After writing some ad-hoc functions to call one or two external programs, I decided to tackle this basic problem once and for all. The result is the stx-execpipe library.

Using the library a C++ program can build a sequence of external programs with command line parameters. These programs are connected by the library just like a shell pipe: stdout of the preceding stage goes into stdin of the next one. The input and output of the whole pipeline can be redirected to a plain fd, a file or saved in a std::string.

One very interesting feature of the library is to insert intermediate processing functions into a pipe line. The data can be intercepted and passed back to the parent process for manipulation or just inspection. This was necessary to calculate the SHA1 digest of a backup tarball simultaneously to uploading it.

For more information and the source code see the stx-execpipe web page.

The following small code snippet exemplifies the flexibility of the stx-execpipe solution:

stx::ExecPipe ep;

// first stage calls tar
std::vector<std::string> tarargs;
tarargs.push_back("tar");
tarargs.push_back("--create");
tarargs.push_back("--verbose");
tarargs.push_back("--no-recursion");
tarargs.push_back("/path/to/some/files");
tarargs.push_back("/path/to/more/files");
ep.add_execp(&tarargs);

// second stage compresses the tarball
ep.add_execp("xz", "-9");

// third stage intercepts data for a SHA1 digest
Sha1Function sha1tar;
ep.add_function(&sha1tar);

// fourth stage sends the tarball via FTP
std::vector<std::string> ftpargs;
ftpargs.push_back("ncftpput");
ftpargs.push_back("-c");
ftpargs.push_back("ftp.upload-to-host.net");
ftpargs.push_back("/path/to/ftpfile.tar.gz");
ep.add_execp(&ftpargs);

if (ep.run().all_return_codes_zero()) {
    std::cout << "Backup upload complete." << std::endl
}
else {
    // error processing...
}

Drawing of cbtreedb tree index structure

Published stx-cbtreedb 0.7.0 - STX Constant B-Tree Database Template Classes

Posted on 2010-04-14 13:34 by Timo Bingmann at Permlink with 0 Comments. Tags: #c++

Published yet another C++ template library using a B-tree. This time the solution is a disk-based read-only key-value mapping using a "packed, sequential" B-tree as index structure.

All applications mapping a large number of constant, integral keys to string or data blobs can benefit from this library. The database structure is highly compact and contains self-verification checksums over both key and value areas.

stx-cbtreedb is a direct contender with cdb and tinycdb, which however are based on hash tables and do not retain key proximity. Compared to other full-fledged B-tree implementations like BerkeleyDB or TokyoCabinet, the stx-cbtreedb is very small, faster and the database files have much less overhead due to read-only optimizations.

For more information and the source code see the stx-cbtreedb web page.


Digup shovel and digest matching

Published digup 0.6.23 - A Digest Updating Tool

Posted on 2009-11-10 22:30 by Timo Bingmann at Permlink with 0 Comments. Tags: #c++ #utilities

Published a small, but very useful console tool to update md5sum or shasum digest files. It will read existing md5sum.txt files and add new files to it without rereading the whole directory tree.

This makes digup very useful to update and verify incremental archives like chronological data storages or music collections, which are nowadays commonly stored and backuped on hard disks. Using a full file digest scan even slowly creeping bad blocks on old hard disks can be detected. By using a crontab entry, this check can be performed unattended and routinely.

For more information, the source code and binaries for various platforms see the digup web page.


Funny Drawing with 'C++' 'FLEX' and a Bison

Published Flex Bison C++ Example 0.1.4

Posted on 2009-09-05 10:40 by Timo Bingmann at Permlink with 0 Comments. Tags: #flex-bison-cpp-example #parsing #c++ #code-example

Released a minor updated source code version for Flex Bison C++ Example. The example source code is released into the public domain or, at your option, under the Do What The Fuck You Want To Public License (WTFPL).

This minor bugfix release fixes up two simple compilation issues with the newest bison version 2.4.1.

For more information and the download package see the Flex Bison C++ Example web page.


[150x150] CryptoTE Icon

Published CryptoTE 0.5.390

Posted on 2009-08-08 11:25 by Timo Bingmann at Permlink with 1 Comments. Tags: #cryptote #c++ #cryptography

After almost one year of personally testing the program, I decided to publicly released the first version of CryptoTE v0.5.390. One year in the making, CryptoTE is a very useful little text-editor with integrated cryptography. The name stands for CRYPTOgraphy Text Editor and it transparently encrypts text files storing them into secure containers. The program incorporates the popular editing component Scintilla and makes heavy use of wxWidgets.

Screenshot of CryptoTE on Linux

For more information, the source code and binaries for various platforms see the CryptoTE web page.


Funny Drawing with 'C++' 'FLEX' and a Bison

Published Flex Bison C++ Example 0.1.3

Posted on 2008-10-23 11:25 by Timo Bingmann at Permlink with 0 Comments. Tags: #flex-bison-cpp-example #c++ #code-example #parsing

Released yet another updated source code package for Flex Bison C++ Example. The example source code is released into the public domain or, at your option, under the Do What The Fuck You Want To Public License (WTFPL).

This bugfix release solves a subtle, severe bug, which rendered the template code useless. Even the example exprtext program segfaulted with any expression.

Corrected a very subtle bug with the newly introduced virtual yywrap() function in the FlexLexer class. Depending on how the header was included, the class contained the virtual yywrap() function or not. These differing class declarations lead to very strange NULL pointer exceptions, because the different compiled objects assume different class memory layouts. Ultimately the exprtest program always segfaulted.

For more information and the download package see the Flex Bison C++ Example web page.


Small drawing of a B+ tree

Update Release of STX B+ Tree 0.8.3

Posted on 2008-09-07 18:31 by Timo Bingmann at Permlink with 0 Comments. Tags: #c++ #stx-btree

Released another updated version 0.8.3 of the STX B+ Tree C++ Template Classes package. The updated release fixes up issues with the root node == NULL when the tree is initially empty.

Fixed crash when running verify() on an empty btree object. Now the root node is freed when the last item is removed. Also fixed crash when attempting to copy an empty btree or when trying to remove a non-existing item from an empty btree.

Also enhancing the speedtest to test the hash table container implementation from __gnu_cxx. Extending tests by another set of runs measuring only the find/lookup functions. See the speed results web page for more information.

The updated source code package is available for download from this webpage.

Some compiled binaries of wxBTreeDemo for Win32 and Linux are available on the demo download page.

As before, the updated main B+ tree implementation can be found in doxygen stx/btree.h or with plain text comments stx/btree.h.


Instacode coloring of stacktrace

C++ Code Snippet - Print Stack Backtrace Programmatically with Demangled Function Names

Posted on 2008-09-01 22:30 by Timo Bingmann at Permlink with 35 Comments. Tags: #c++ #code-snippet #coding tricks #frontpage

Yesterday I was tasked to analyzed an inner function of a reasonably complex software package. The inner function was called thousands of times from many different parts of the program, a simple counter print-out showed that. However I was interested in which execution paths reach this inner function and how often the different parts access the function.

My straight-forward idea was to dump a stack backtrace each time the inner function is called, similar to the one printed by a debugger. However I needed some code snippet to dump the stack backtrace programmatically, without using gdb to halt the program each time.

Stack backtraces can be saved with backtrace(3), resolved into symbolic names using backtrace_symbols(3) and printed using backtrace_symbols_fd(3). These functions are well documented and fairly easy to use.

However I was debugging a C++ program, which made heavy use of templates and classes. C++ symbols names (including namespace, class and parameters) are mangled by the compiler into plain text symbols: e.g. the function N::A<int>::B::func(int) becomes the symbol _ZN1N1AIiE1B4funcEi. This makes the standard backtrace output very unreadable for C++ programs.

To demangle these strings the GNU libstdc++ library (integrated into the GNU Compiler Collection) provides a function called __cxa_demangle(). Combined with backtrace(3) a pretty stack backtrace can be outputted. The demangling function only works for programs compiled with g++.

The following header file contains a function print_stacktrace(), which uses backtrace(3), backtrace_symbols(3) and __cxa_demangle() to print a readable C++ stack backtrace.

This article continues on the next page ...

Small drawing of a B+ tree

Update Release of STX B+ Tree 0.8.2

Posted on 2008-08-13 16:48 by Timo Bingmann at Permlink with 0 Comments. Tags: #c++ #stx-btree

Released an updated version 0.8.2 of the STX B+ Tree C++ Template Classes package. The updated release fixes up all issues with iterators and one harmless bad-memory access.

The reverse_iterator classes of the B+ tree were completely reworked. Now they are real implementations and do not use STL magic. Both reverse_iterator and const_reverse_iterator should work as expected now. Added two large test cases for iterators. Enabled public default-constructors on iterators.

Also fixed a memory access bug which happens in erase_one_descend(): leaf->slotkey[leaf->slotuse - 1] if leaf-slotuse == 0. This doesn't have any other bad effect, because the case only occurs when leaf == root and then the resulting btree_update_lastkey message is never really processed. However it still is a bad-memory access.

The updated source code package including the wxBTreeDemo source is available for download from this webpage.

Some compiled binaries of wxBTreeDemo for Win32 and Linux are available on the demo download page.

As before, the updated main B+ tree implementation can be found in doxygen stx/btree.h or with plain text comments stx/btree.h.


Funny Drawing with 'C++' 'FLEX' and a Bison

Published Flex Bison C++ Example 0.1.2

Posted on 2008-08-03 14:26 by Timo Bingmann at Permlink with 0 Comments. Tags: #flex-bison-cpp-example #c++ #code-example #parsing

Released an updated source code package for Flex Bison C++ Example. The example source code is released into the public domain or, at your option, under the Do What The Fuck You Want To Public License (WTFPL).

This bugfix release solves two problems there were reported to me via e-mail:

The first problem were compilation errors that occured when no %union directive is used in the grammar: in this case the include headers order is changed around by bison and thereby breaks compilation. This was fixed by never including parser.h directly, but always using scanner.h.

And the second issue was raised because new versions of flex were released after years of stagnation. The new flex version 2.5.35 adds a virtual function yywrap() to the yyFlexLexer class. This function is automatically defined in any lexer source file generated by flex. However because I copied FlexLexer.h from an older flex distribution, the function definition throughs a "no yywrap() member function" compiler error. Updating the FlexLexer.h with a conditional declaration of yywrap() hopefully did the trick and now works on all versions. Usually this file should be taken from /usr/include and not from the package. However that will break compilation if flex is not installed, and a self-sufficient compilation package was a primary goal of the example.

For more information and the download package see the Flex Bison C++ Example web page.


Small drawing of a B+ tree

Bugfix Release of STX B+ Tree 0.8.1

Posted on 2008-01-25 15:48 by Timo Bingmann at Permlink with 1 Comments. Tags: #c++ #stx-btree

Released a bugfix version 0.8.1 of the STX B+ Tree C++ Template Classes package. The bug fixed is a possibly illegal memory access during find() function.

I received a new test case via email in which valgrind detected an uninitialized memory access. By tracing it, I soon found that it happens during any find(key) call with a key that is larger than any item contained in the tree. During the find() function find_lower() is called on a leaf node and returns the slot number with the smallest or equal key. However if the queried key is larger than all keys in a leaf node or in the whole tree, find_lower() returns a slot number past the last valid key slot. Comparison of this invalid slot with the queried key then yields an uninitialized memory error in valgrind.

The updated source code package including the wxBTreeDemo source is available for download from this webpage.

Some compiled binaries of wxBTreeDemo for Win32 and Linux are available on the demo download page.

As before, the updated main B+ tree implementation can be found in doxygen stx/btree.h or with plain text comments stx/btree.h.


Funny Drawing with 'C++' 'FLEX' and a Bison

Published Flex Bison C++ Example 0.1

Posted on 2007-08-20 11:53 by Timo Bingmann at Permlink with 2 Comments. Tags: #flex-bison-cpp-example #c++ #code-example #parsing

Released example source code package Flex Bison C++ Example. The example source code is released into the public domain or, at your option, under the Do What The Fuck You Want To Public License (WTFPL).

This example shows how to use both Flex and Bison in C++ mode. This way both lexer and parser code and data is encapsulated into classes. Thus the lexer and parser are fully re-entrant, because all state variables are contained in the class objects. Furthermore multiple different lexer-parser pairs can easily be linked into one binary, because they have different class names and/or are located in a different namespace.

Why Use These Old Tools? Well, they are here to stay and they work well. But most important, the code generated by Flex and Bison requires no compile-time dependencies, because they generate fully autonomous source code. So far I have not found any modern parser generator which outputs independent code. It is even possible to compile the generated source on Windows with Visual C++ 2005.

For more information and the download package see the Flex Bison C++ Example web page.


Small drawing of a parse tree

Published STX Expression Parser Framework Version 0.7

Posted on 2007-07-17 17:10 by Timo Bingmann at Permlink with 0 Comments. Tags: #c++ #parsing

Released the first version 0.7 of the STX Expression Parser C++ Framework package. The library is licensed under the GNU Lesser General Public License (LGPL) (2.1 or later).

The STX Expression Parser provides a C++ framework, which can process user-specified expression strings containing program-specific variables. It can be integrated into applications to allow user-customized data selection and filtering. The expresssion strings are intuitive SQL-like WHERE-clauses and can contain arbitrarily complex arithmetic. At the same time the expression processing time is guaranteed to be fast enough to safely iterate over larger data sets.

The expression parser can process arbitrarily complex arithmetic expressions like those seen below. To access application-defined data, functions and variables may be included in the expression. An expression can be used as a boolean filter by using comparison and logic operators.

For more information see the STX Expression Parser web page.

Most impressive are the interactive online CGI parser demo and the online CSV file filter.


C++ Code Snippet - In-Place and String-Copy Uppercase/Lowercase Conversion of STL Strings

Posted on 2007-06-02 13:22 by Timo Bingmann at Permlink with 2 Comments. Tags: #c++ #code-snippet

This post completes the small C++ function collection of simple STL string manipulations. The following code snippet shows simple locale-unware uppercase and lowercase conversion functions using tolower and toupper. Nothing revolutionary; I'm just misusing this weblog as a code-paste dump for reuseable code.

Sometimes it is better to have a case-insensitive string class. More about ci_string can be found at Guru of the Week (GotW) #29: Case-Insensitive Strings.

#include <string>
#include <cctype>

// functionals for std::transform with correct signature
static inline char string_toupper_functional(char c)
{
    return std::toupper(c);
}

static inline char string_tolower_functional(char c)
{
    return std::tolower(c);
}

static inline void string_upper_inplace(std::string &str)
{
    std::transform(str.begin(), str.end(), str.begin(), string_toupper_functional);
}

static inline void string_lower_inplace(std::string &str)
{
    std::transform(str.begin(), str.end(), str.begin(), string_tolower_functional);
}

static inline std::string string_upper(const std::string &str)
{
    std::string strcopy(str.size(), 0);
    std::transform(str.begin(), str.end(), strcopy.begin(), string_toupper_functional);
    return strcopy;
}

static inline std::string string_lower(const std::string &str)
{
    std::string strcopy(str.size(), 0);
    std::transform(str.begin(), str.end(), strcopy.begin(), string_tolower_functional);
    return strcopy;
}
This article continues on the next page ...

C++ Code Snippet - In-Place and String-Copy Space Trimming of STL Strings

Posted on 2007-05-30 17:28 by Timo Bingmann at Permlink with 1 Comments. Tags: #c++ #code-snippet

Yesterday I once again stumbled upon whitespace trimming of STL strings: a check was required if the given user input is empty. Where "empty" also means some user-given string containing only spaces. After one hour of unproductive searching for something as simple as a space trimming function, I decided to put the resulting code here for future reference.

The following code snippet contains two versions of the function: in-place trimming and string-copy trimming. I prefer the copy-trimming function because they allow a more functional programming style. The functions only trim spaces, but can be modified by replacing each ' ' with something like " \n\r\t".

#include <string>

static inline void string_trim_left_inplace(std::string &str)
{
    str.erase(0, str.find_first_not_of(' '));
}

static inline void string_trim_right_inplace(std::string &str)
{
    str.erase(str.find_last_not_of(' ') + 1, std::string::npos);
}

static inline std::string string_trim_left(const std::string &str)
{
    std::string::size_type pos = str.find_first_not_of(' ');
    if (pos == std::string::npos) return std::string();

    return str.substr(pos, std::string::npos);
}

static inline std::string string_trim_right(const std::string &str)
{
    std::string::size_type pos = str.find_last_not_of(' ');
    if (pos == std::string::npos) return std::string();

    return str.substr(0, pos + 1);
}

static inline std::string string_trim(const std::string& str)
{
    std::string::size_type pos1 = str.find_first_not_of(' ');
    if (pos1 == std::string::npos) return std::string();

    std::string::size_type pos2 = str.find_last_not_of(' ');
    if (pos2 == std::string::npos) return std::string();

    return str.substr(pos1 == std::string::npos ? 0 : pos1,
                      pos2 == std::string::npos ? (str.length() - 1) : (pos2 - pos1 + 1));
}

static inline void string_trim_inplace(std::string& str)
{
    std::string::size_type pos = str.find_last_not_of(' ');
    if(pos != std::string::npos) {
        str.erase(pos + 1);
        pos = str.find_first_not_of(' ');
        if(pos != std::string::npos) str.erase(0, pos);
    }
    else
        str.erase(str.begin(), str.end());
}
This article continues on the next page ...

Screenshot of the wxBTreeDemo v0.8

Updated STX B+ Tree to 0.8 which now includes wxBTreeDemo

Posted on 2007-05-13 19:48 by Timo Bingmann at Permlink with 0 Comments. Tags: #c++ #stx-btree

Released an updated version 0.8 of the STX B+ Tree C++ Template Classes package. The update fixes a few segmentation faults with empty trees without root node.

This new release includes the demonstration program wxBTreeDemo. This program draws illustrations of the B+ trees constructed by the STX B+ Tree template classes. It allows the user to selected different types of B+ tree instantiations: integer or string keys and different slot numbers. The user may insert and erase key/data pairs from the tree and run different search operations. The demo program uses the cross-platform wxWidgets toolkit and can be compiled on Linux, Windows and MacOSX.

The source code package including the wxBTreeDemo source is available for download from this webpage.

Some compiled binaries of wxBTreeDemo for Win32 and Linux are available on the demo download page.

As before, the only slightly changed main B+ tree implementation can be found in doxygen stx/btree.h or with plain text comments stx/btree.h.


lcov: A Good HTML Generator for gcov Results

Posted on 2007-05-08 11:08 by Timo Bingmann at Permlink with 0 Comments. Tags: #c++ #linux

Writing test cases is a good way to prevent and detect problems or bugs in source code. They improve understanding of the difficult parts by requiring deeper thought into how to test of those areas. By rerunning the same test sequences one can assure that the code still produces the same results even after making significant changes. cppunit provides a C++ test framework which is sort of over-bloated. However reduced to a set of reusable template files the framework gets quite handy.

To measure how much of the code is tested, gcov provides a way to determine which lines are executed during a test suite run. Note that this simple line-is-touched metric is only one aspect of how well a piece of code is tested. However gcov's results are printed in text mode and it cannot merge the results from multiple coverage files, so multi-file test suites cannot be measured as a whole.

Yesterday I finally found a good open-source Linux tool to get correct coverage results: lcov. It was designed to measure coverage in the Linux kernel, but works very well on user-space programs as well. lcov builds on gcov's data files and generates HTML report files. It even highlights the untested source code lines.

I uploaded the test coverage results of the STX B+ Tree test suite. It shows 89.2% coverage of the main, most difficult header file implementing the insert and erase algorithms.


Small drawing of a B+ tree

Published STX B+ Tree C++ Template Classes Version 0.7

Posted on 2007-04-27 15:02 by Timo Bingmann at Permlink with 0 Comments. Tags: #stx-btree #c++

Released the first version 0.7 of the STX B+ Tree C++ Template Classes package. The template classes are licensed under the LGPL.

The STX B+ Tree package is a set of C++ template classes implementing a B+ tree key/data container in main memory. The classes are designed as drop-in replacements of the STL containers set, map, multiset and multimap and follow their interfaces very closely. By packing multiple value pairs into each node of the tree the B+ tree reduces heap fragmentation and utilizes cache-line effects better than the standard red-black binary tree. The tree algorithms are based on the implementation in Cormen, Leiserson and Rivest's Introduction into Algorithms, Jan Jannink's paper and other algorithm resources. The classes contain extensive assertion and verification mechanisms to ensure the implementation's correctness by testing the tree invariants.

The main B+ tree implementation can be found in doxygen stx/btree.h or with plain text comments stx/btree.h.

The source code package is available for download from this webpage.

The classes are documented extensively using doxygen. The generated HTML documentation can be browsed online or downloaded.

Special interest was put into performing a speed comparison test between the standard red-black tree and the new B+ tree implementation. The speed test results are interesting and show the B+ tree to be significantly faster for trees containing more than 16,000 items.


C++ Code Snippet - Compressing STL Strings with zlib

Posted on 2007-03-28 18:23 by Timo Bingmann at Permlink with 11 Comments. Tags: #c++ #code-snippet #frontpage

The zlib library can be found on virtually every computer. It is THE general-purpose lossless patent-free compression library.

This small C++ code snippet features a pair of functions which use this ubiquitous library to compress ordinary STL strings. There are many uses for this code snippet, like compressing string data stored in a database or binary data transfered over a network. Keep in mind that the compressed string data is binary, so the string's c_str() representation must be avoided.

To compile the following small program use "gcc testzlib.cc -o testzlib -lz" where testzlib.cc is the code.

// Copyright 2007 Timo Bingmann <tb@panthema.net>
// Distributed under the Boost Software License, Version 1.0.
// (See http://www.boost.org/LICENSE_1_0.txt)

#include <string>
#include <stdexcept>
#include <iostream>
#include <iomanip>
#include <sstream>

#include <zlib.h>

/** Compress a STL string using zlib with given compression level and return
  * the binary data. */
std::string compress_string(const std::string& str,
                            int compressionlevel = Z_BEST_COMPRESSION)
{
    z_stream zs;                        // z_stream is zlib's control structure
    memset(&zs, 0, sizeof(zs));

    if (deflateInit(&zs, compressionlevel) != Z_OK)
        throw(std::runtime_error("deflateInit failed while compressing."));

    zs.next_in = (Bytef*)str.data();
    zs.avail_in = str.size();           // set the z_stream's input

    int ret;
    char outbuffer[32768];
    std::string outstring;

    // retrieve the compressed bytes blockwise
    do {
        zs.next_out = reinterpret_cast<Bytef*>(outbuffer);
        zs.avail_out = sizeof(outbuffer);

        ret = deflate(&zs, Z_FINISH);

        if (outstring.size() < zs.total_out) {
            // append the block to the output string
            outstring.append(outbuffer,
                             zs.total_out - outstring.size());
        }
    } while (ret == Z_OK);

    deflateEnd(&zs);

    if (ret != Z_STREAM_END) {          // an error occurred that was not EOF
        std::ostringstream oss;
        oss << "Exception during zlib compression: (" << ret << ") " << zs.msg;
        throw(std::runtime_error(oss.str()));
    }

    return outstring;
}

/** Decompress an STL string using zlib and return the original data. */
std::string decompress_string(const std::string& str)
{
    z_stream zs;                        // z_stream is zlib's control structure
    memset(&zs, 0, sizeof(zs));

    if (inflateInit(&zs) != Z_OK)
        throw(std::runtime_error("inflateInit failed while decompressing."));

    zs.next_in = (Bytef*)str.data();
    zs.avail_in = str.size();

    int ret;
    char outbuffer[32768];
    std::string outstring;

    // get the decompressed bytes blockwise using repeated calls to inflate
    do {
        zs.next_out = reinterpret_cast<Bytef*>(outbuffer);
        zs.avail_out = sizeof(outbuffer);

        ret = inflate(&zs, 0);

        if (outstring.size() < zs.total_out) {
            outstring.append(outbuffer,
                             zs.total_out - outstring.size());
        }

    } while (ret == Z_OK);

    inflateEnd(&zs);

    if (ret != Z_STREAM_END) {          // an error occurred that was not EOF
        std::ostringstream oss;
        oss << "Exception during zlib decompression: (" << ret << ") "
            << zs.msg;
        throw(std::runtime_error(oss.str()));
    }

    return outstring;
}

/** Small dumb tool (de)compressing cin to cout. It holds all input in memory,
  * so don't use it for huge files. */
int main(int argc, char* argv[])
{
    std::string allinput;

    while (std::cin.good())     // read all input from cin
    {
        char inbuffer[32768];
        std::cin.read(inbuffer, sizeof(inbuffer));
        allinput.append(inbuffer, std::cin.gcount());
    }

    if (argc >= 2 && strcmp(argv[1], "-d") == 0)
    {
        std::string cstr = decompress_string( allinput );

        std::cerr << "Inflated data: "
                  << allinput.size() << " -> " << cstr.size()
                  << " (" << std::setprecision(1) << std::fixed
                  << ( ((float)cstr.size() / (float)allinput.size() - 1.0) * 100.0 )
                  << "% increase).\n";

        std::cout << cstr;
    }
    else
    {
        std::string cstr = compress_string( allinput );

        std::cerr << "Deflated data: "
                  << allinput.size() << " -> " << cstr.size()
                  << " (" << std::setprecision(1) << std::fixed
                  << ( (1.0 - (float)cstr.size() / (float)allinput.size()) * 100.0)
                  << "% saved).\n";

        std::cout << cstr;
    }
}

C++ Code Snippet - Using the Boost.Regex Library

Posted on 2007-03-14 14:43 by Timo Bingmann at Permlink with 0 Comments. Tags: #c++ #code-snippet

The Boost library is a collection of very useful C++ (template) libraries. However it's documentation is very complex and using the library straight-forward usually results in g++ scrolling endless pages of template instantiation errors.

This code snippet shows by example how to use the Boost.Regex library. It compiles and executes regular expressions on strings. Some test I ran showed that it is not as fast as pcre, however Boost.Regex it is easier and more elegant to use in C++ programs. The program must be linked with -lboost_regex.

#include <iostream>
#include <stdlib.h>
#include <boost/regex.hpp>
#include <boost/lexical_cast.hpp>

int main()
{
    // This regex is compiled at start-up and matches YYYY-MM-DD dates. If it
    // contains a syntax error, the program aborts at start-up with an
    // exception.
    static const boost::regex
        date_regex("(199[0-9]|200[0-9])-([1-9]|0[1-9]|1[012])-([1-9]|[0-2][1-9]|3[01])");

    // First example: char* c-style input strings use boost::cmatch results.
    {
        const char *input_cstr = "2007-03-14";
        boost::cmatch char_matches;

        if (boost::regex_match(input_cstr, char_matches, date_regex))
        {
            // Convert the parsed number using boost's lexical_cast library
            int year = boost::lexical_cast<int>( char_matches[1] );
            // Or use the old way: get the std::string object, then it's char*
            int month = atoi( char_matches[2].str().c_str() );

            std::cout << "First example:"
                      << " year " << year
                      << " month " << month
                      << " day " << char_matches[3] << "\n";
        } 
        else
        {
            std::cout << "First example should have matched the regex.\n";
        }
    }

    // Second example: STL strings use boost::smatch results.
    {
        std::string input_stlstr = "2007-03-34";
        boost::smatch str_matches;

        if (boost::regex_match(input_stlstr, str_matches, date_regex))
        {
            std::cout << "Second example shouldn't have matched the regex.\n";
        }
        else
        {
            std::cout << "Second example didn't match the regex. This was intended.\n";
        }
    }

    // Third example: Temporary regex object and no capture results needed.
    {
        if (boost::regex_match("2007", boost::regex("(199[0-9]|200[0-9])")))
        {
            std::cout << "Third example matched the temporary regex object.\n";
        }
        else
        {
            std::cout << "Third example should have matched the regex.\n";
        }
    }

    // Fourth example: regex_match matches the whole string while regex_search
    // matches substrings just like perl.
    {
        std::string input = "Today is 2007-03-14, how are you?";

        if (boost::regex_match(input, date_regex))
        {
            std::cout << "Fourth example (regex_match) shouldn't match.\n";
        }
        else
        {
            std::cout << "As expected, the fourth example (regex_match) didn't match.\n";
        }

        if (boost::regex_search(input, date_regex))
        {
            std::cout << "While the fourth example using regex_search did matched.\n";
        }
        else
        {
            std::cout << "Fourth example using regex_search should have matched the regex.\n";
        }
    }
}

C++ Code Snippet - Making a Custom Class ostream Outputable

Posted on 2007-03-01 14:47 by Timo Bingmann at Permlink with 1 Comments. Tags: #c++ #code-snippet

How to get a custom class to work with std::cout << obj; ? I for my part always forget the exact prototype of the required operator<<. Here is an minimal working example to copy code from:

#include <iostream>

struct myclass
{
    int a, b;

    myclass(int _a, int _b)
        : a(_a), b(_b)
    { }
};

// make myclass ostream outputtable
std::ostream& operator<< (std::ostream &stream, const myclass &obj)
{
    return stream << "(" << obj.a << "," << obj.b << ")";
}

int main()
{
    myclass obj(42, 46);

    std::cout << obj << std::endl;
}

QtSqlView Screenshot 1

QtSqlView 0.8.0 Released

Posted on 2006-10-10 12:56 by Timo Bingmann at Permlink with 0 Comments. Tags: #qtsqlview #c++

Released the first version 0.8.0 of QtSqlView under the GPL.

QtSqlView is a simple and easy to use SQL database browser written in Qt 4.x using the excellent QtSql components. Using QtSql drivers it can natively connect to MySQL, PostgreSQL and SQLite databases. Furthermore other database systems may be accessed using their ODBC drivers. QtSqlView is released under the GNU General Public License: source code and win32 binary may be downloaded here.

This short program was initially written for a set of windows users, who need to access and edit a PostgreSQL database. All this is possible with M$ Access and ODBC, but the configuration of PostgreSQL's ODBC driver and the ODBC DSN is far too complicated for the average database editor. Thus problem-free access of open-source databases was top priority for QtSqlView.

QtSqlView boasts the following features:

  • Problem-free connecting to MySQL, PostgreSQL and SQLite databases on Windows.
  • Add, delete and modify a list of database connections.
  • Browse, edit, save and revert SQL tables, system tables and views of registered connections
  • Copy selected cells as tab-separated text to the clipboard.
  • View table schema including primary key.
  • Execute custom SQL queries on the database connect and view results.
  • SQL syntax highlighting in query editor.
  • Uses Qt4's greatly improved SQL components.

You may download the source code for Linux/OSX or a setup package for Windows.

More screenshots are available as well.


SDIOS06 Shell Screenshot

SDIOS06 - Source Code and Ready-To-Run Image

Posted on 2006-09-14 09:26 by Timo Bingmann at Permlink with 2 Comments. Tags: #sdios06 #university #c++

As promised the source code to SDIOS06 was released under the GPL.

SDIOS06 is a toy operating system developed during the practical course SDI (System Design and Implementation) at the Systems Architecture Group of the University of Karlsruhe. It was designed and written by Timo Bingmann, Matthias Braun, Torsten Geiger and Andreas Mähler. Two games were ported to a custom SDL implementation using the VMware "hard"-ware: SDLjump and SuperTux. For more information and screenshots see my blog entry 20060727-SDI-Demo.

The source code archive was published on the L4Ka.org page: http://www.l4ka.org/86.php

A local copy of the source archive (7.4 MB) is available as well. The README file contains a great deal of information about SDIOS06's design and modules. The complete source code can be browsed on the web.

To make demonstration as easy as possible a ready-to-run binary vmware image (3.8 MB) can be downloaded. The image contains SDIOS06 installed on a virtual vmware disk image. The VMware image can be run using the free VMware Player.


Mandelbrot Example

sdlfractal 0.1

Posted on 2006-08-09 21:33 by Timo Bingmann at Permlink with 0 Comments. Tags: #sdlfractal #c++

Sdlfractal is a port of a simple little fractal generator which I wrote some five years ago. It is not supposed to replace wonderful tools like xfractint so it is very simple and not very fast. The main goal of this project was to learn SDL and it turned out that fltk was also required. The main construction is an SDL surface extended by a canvas class which simulates a high-resolution coordinate system. On this coordinate system the fractals can be drawn.

To control fractal parameters the generator displays a second window using the fltk engine. From this dialog the currently displayed fractal and it's parameters can be changed. This requires a dual engine event loop in the program.

By dragging the mouse on the drawing canvas you can zoom into all fractals. If fractal drawing takes too long (and nothing is shown), then just click the canvas and the generator will stop.

The fractal generator can also save high-resolution PNG files. The following images are some examples created by the fractal generator.

The source code to sdlfractal 0.1 can be downloaded in a tar.bz2 archive (121 KB). It is also browsable on the web.

Sdlfractal was designed to the portable to Win32 using SDL and fltk compiled with MinGW. A compiled version which should run out of the box on most windows. Download the zip archive(267 KB) containing the executable.

MD5sums of the source and binary archives:
538da60a5ef2d427fbb901d6080e631e sdlfractal-0.1.tar.bz2
5bab1ccb93e170f5c42d995a6761ca7a sdlfractal-0.1-win32.zip

Mandelbrot snow storm

This section of the mandelbrot fractal is so beautiful that it is my current wallpaper. It can be downloaded at 800x600 (242 KB), 1024x768 (372 KB) or 1280x960 (552 KB).

This article continues on the next page ...

Studienarbeit "Visualisierung sehr großer Graphen" fertiggestellt

Posted on 2006-06-20 17:31 by Timo Bingmann at Permlink with 0 Comments. Tags: #compsci study thesis #graphviz #university #c++

Heute habe ich meine Studienarbeit "Visualisierung sehr großer Graphen" am Institut für Theoretische Informatik fertig gestellt und abgegeben. Die Studienarbeit ist eine 3-4 monatige wissenschaftliche Arbeit und dient als Vorbereitung auf die Diplomarbeit.

Die Studienarbeit kann zum Durchlesen als PDF (2,4 MB) herunter geladen werden. Auf diesen Abschnitt folgenden die deutsche und englische Zusammenfassung der Arbeit. Weiter gibt es eine Version zum Ausdrucken (PDF 15,3 MB) mit einer hochauflösenden Vektorgraphik im Anhang.

Zusammenfassung

In dieser Studienarbeit wird untersucht, mit welchen Methoden sehr große Graphen wie ein Straßennetzwerk von Europa effizient und komfortabel visualisiert werden können. Als Ausarbeitung entsteht ein C++ Rahmenwerk für die Datenhaltung eines Graphen mit Attributen und ein Java Applet, das mit dem Datenhaltungs-Server mittels CORBA kommuniziert. Das Rahmenwerk kann leicht in bestehende Graphanwendungen integriert werden, um deren Algorithmen zu animieren.

Als Basis-Graphstruktur wird ein Adjazenz-Array verwendet und um Strukturen erweitert, die zu jedem Knoten und jeder Kante beliebig viele Attributwerte speichern. Zwei der Knotenattribute werden als Zeichenkoordinaten verwendet. Der Grundgraph und die Datenhaltung der Attributwerte wird auf möglichst kompakte Art und Weise gelöst. Graphanwendungen können eine Liste von temporären Änderungen erzeugen, die mit dem großen globalen Graphen zusammengeführt werden können. Um das Vorgehen der Graph-Algorithmen zu visualisieren, werden deren Funktionsaufrufe in einer Änderungsfolge kodiert, welche als Animation zum Java Client übertragen wird.

Um die Geschwindigkeit einer Ausschnittsanfrage zu erhöhen, wird die mehrdimensionale Indexstruktur R-Tree verwendet. Diese ermöglicht Anfragezeiten, die linear zur Anzahl der zurückgelieferten Kanten und unabhängig vom gewählten Ausschnitt sind. Es können komplexe Filterausdrücke aus Vergleichsbedingungen mit boolschen und arithmetische Operatoren verwendet werden, um die angezeigten Kanten in einem Visualisierungsauschnitt einzuschränken und so komfortabel bestimmte Aspekte der Anwendungs-Algorithmen zu untersuchen oder hervorzuheben.

Als Referenzanwendung wird das Rahmenwerk von der am Institut für Theoretische Informatik in Karlsruhe entwickelten Routenplanungsanwendung zur Visualisiserung mittels Web Applet verwendet.

Abstract

This study thesis investigates and implements methods used to efficiently visualize very large graphs like a street network of Europe. A C++ server framework is designed, which implements a data management library for graphs with attributes. A Java applet communicates with the data server via CORBA and draws sections of the graph. The graph data management library can easily be integrated into existing graph application to visual and animate calculations.

The data management library uses the adjacency array structure for representing the base graph. It is extended by similar data structures to hold an arbitrary number of attributes for each vertex and edge. The data structures are optimized towards highest storage efficiency. To modify the static global graph an application may construct a list of changes, which are efficiently merged into permanent data storage. To visualize the workings of an algorithm its function call sequence can be recorded into a list of graph changes. This change time line can be transfered to the Java client and displayed as an animation.

To accelerate access to arbitrary sections of the graph the spatial index structure R-Tree is used. It enables query times to linearly increase with the number of returned edges and be independent of the section size or position.

Furthermore complex filter expressions including comparisons and arithmetic operators can be applied to limit the displayed edges. This enables the client to explore the graph's details comfortably and highlight interesting aspects of an algorithm.

The graph data management library is used in a route planning application, which is being developed in a research group of the University of Karlsruhe. It will be used to visualize the route using a web applet.


Vortrag "Objekt-orientiertes Programmieren in C"

Posted on 2005-06-14 16:19 by Timo Bingmann at Permlink with 0 Comments. Tags: #university #talk #c++

Im Sommersemester 2005 habe ich am Praktikum "Real-Life Programming" am IPD Lehrstuhl der UniKa teilgenommen. Hier platt die Beschreibung von der Homepage zitiert:

Wie programmiert man richtig?

Viele performancekritische Software wird immer noch in C geschrieben. C erlaubt dem Compiler einen sehr großen Optimierungsspielraum, in diesem Praktikum wird geübt, wie dieser ausgenutzt werden kann und wie die dabei auftretenden Klippen zu umschiffen sind.

In diesem Kontext könnt ihr lernen:

  • Den Umgang mit den UNIX Entwicklungswerkzeugen
  • Schreiben von portablem Code
  • Verständnis des Übersetzungsprozesses von C
  • Analysieren des durch Übersetzer erzeugten Assemblertextes
  • Programmieren, so dass der Übersetzer guten Code erzeugen kann
  • Wie man die Performance von Programmen steigern kann
  • Umgang mit großen Projekten
  • Das Finden von Fehlern in großen und alten Softwaresystemen
  • Beherrschen von Debugging und Profiling Werkzeugen
  • Kniffe des C-Präprozessors
  • Die "Geheimnisse" von C

In diesem Zusammenhang haben zwei Komilitonen und ich einen Vortrag über "Objekt-orientiertes Programmieren in C" (ohne ++) ausgearbeitet und gehalten. Weiteres Schwerpunktthema war die Darstellung von C++ in Maschine, also wie der C++ Übersetzer dann die Klassen abbildet.

Vortragsfolien: OOC-Folien.pdf 179 kB
Handout: OOC-Handout.pdf 122 kB
Beispielcode: OOC-Beispiele.tar.gz 4 kB

Hier noch ein Auszug des Inhaltsverzeichnisses:

  1. Objekt-Orientierte Konzepte
  2. OO in C
    • Kapselung und Geheimnisprinzip
    • Vererbung und Polymorphie
    • Generiztiät
  3. Darstellung von C++ in Maschine
    • vtable
    • Name-Mangling
    • Run-Time-Type Information (RTTI)

Diese Folien geben einen kompetenten Überblick über den Themenbereich.