panthema / tags / thrill
First slide of the talk

Presentation "Thrill: High-Performance Algorithmic Distributed Batch Data Processing with C++" at IEEE Big Data 2016

Posted on 2016-12-06 16:00 by Timo Bingmann at Permlink with 0 Comments. Tags: talk thrill

Today, I gave a presentation of our paper "Thrill: High-Performance Algorithmic Distributed Batch Data Processing with C++" at the IEEE International Conference on Big Data 2016 in Washington D.C., USA. An extended technical report of our paper is also available on this website or on arXiv.

The slides of the presentation at the IEEE conference are available here:
slides-Thrill-High-Performance-Algorithmic-Distributed-Batch-Data-Processing-with-CPP-TalkAsGiven.pdf slides-Thrill-High-Performance-Algorithmic-Distributed-Batch-Data-Processing-with-CPP-TalkAsGiven.pdf.

Below a longer version of the slides is available for download:
slides-Thrill-High-Performance-Algorithmic-Distributed-Batch-Data-Processing-with-CPP.pdf slides-Thrill-High-Performance-Algorithmic-Distributed-Batch-Data-Processing-with-CPP.pdf.
These slides contain additional figures which are useful to understand the DIA operations in Thrill, along with many extra design slides omitted from shorter talks.

Download slides-Thrill-High-Performance-Algorithmic-Distributed-Batch-Data-Processing-with-CPP.pdf

First slide of the talk

Presentation "STXXL and Thrill (Parallel Batch Processing)" at STXXL Workshop in DFG SPP 1736

Posted on 2016-09-21 20:00 by Timo Bingmann at Permlink with 0 Comments. Tags: talk thrill stxxl

Today, I gave a technical presentation comparing STXXL and Project Thrill at the STXXL Workshop organized within the DFG SPP 1736. The main topic of the workshop was to determine the future development course of STXXL, and the biggest question in this regard was how to bring more multi-core parallelization into STXXL. Thrill or an adaptation of its ideas may be the solution to this challenge: 2016-09-21 STXXL and Thrill Slides.pdf 2016-09-21 STXXL and Thrill Slides.pdf.

Download 2016-09-21 STXXL and Thrill Slides.pdf

A figure from the technical report

Thrill: High-Performance Algorithmic Distributed Batch Data Processing with C++

Posted on 2016-08-20 09:54 by Timo Bingmann at Permlink with 0 Comments. Tags: research c++ thrill

Our technical report on "Thrill: High-Performance Algorithmic Distributed Batch Data Processing with C++" is now available on arXiv as 1608.05634 or locally: 1608.05634v1.pdf 1608.05634v1.pdf with source 1608.05634v1.tar.gz 1608.05634v1.tar.gz (780 KiB).

This report is the first technical documentation about our new distributed computing prototype called Thrill. Thrill is written in modern C++14, and open source under the BSD-2 license. More information on Thrill is available from the project homepage.

Thrill's source is available from Github.

Download 1608.05634v1.pdf

Abstract

We present the design and a first performance evaluation of Thrill -- a prototype of a general purpose big data processing framework with a convenient data-flow style programming interface. Thrill is somewhat similar to Apache Spark and Apache Flink with at least two main differences. First, Thrill is based on C++ which enables performance advantages due to direct native code compilation, a more cache-friendly memory layout, and explicit memory management. In particular, Thrill uses template meta-programming to compile chains of subsequent local operations into a single binary routine without intermediate buffering and with minimal indirections. Second, Thrill uses arrays rather than multisets as its primary data structure which enables additional operations like sorting, prefix sums, window scans, or combining corresponding fields of several arrays (zipping).

We compare Thrill with Apache Spark and Apache Flink using five kernels from the HiBench suite. Thrill is consistently faster and often several times faster than the other frameworks. At the same time, the source codes have a similar level of simplicity and abstraction.


First slide of the talk

Presentation "Massive Suffix Array Construction with Thrill" at DFG SPP 1736 Annual Colloquium

Posted on 2015-10-01 19:40 by Timo Bingmann at Permlink with 0 Comments. Tags: c++ talk thrill

Today, we gave an overview presentation of the vision behind Project Thrill, its current state, and how it will be used to implement suffix and LCP array construction, and many other distributed algorithms: 2015-10-01 Massive Suffix Array Construction with Thrill.pdf 2015-10-01 Massive Suffix Array Construction with Thrill.pdf.

Download 2015-10-01 Massive Suffix Array Construction with Thrill.pdf

First slide of the talk showing sparks forming a C++

Presentation of DALKIT (work in progress) in Berlin

Posted on 2015-03-27 21:00 by Timo Bingmann at Permlink with 0 Comments. Tags: c++ talk thrill

Today, I presented our work in progress on a distributed computation platform for Big Data algorithms at the LSDMA All-Hands-Meeting in Berlin. One of the currently proposed names is DALKIT. The talk covers the current state our student project is in, which consists mainly of the design of the framework's interface, architecture and future components.

The slides of the presentation 2015-03-27 Project DALKIT.pdf 2015-03-27 Project DALKIT.pdf are available online. However, as usual, my slides are very difficult to understand without the audio track. For future "final" version presentations there will probably be more videos.

Download 2015-03-27 Project DALKIT.pdf

RSS 2.0 Weblog Feed Atom 1.0 Weblog Feed Valid XHTML 1.1 Valid CSS (2.1)
Copyright 2005-2016 Timo Bingmann - Impressum