panthema / 2020
eBay Logo

Started working for eBay Inc in San José, CA, USA.

Posted on 2020-11-02 10:00 by Timo Bingmann at Permlink with 0 Comments. Tags: #ebay

This is a very much delayed blog entry stating that I join eBay's Search Engine team in San José, California.

eBay needs no introduction. It is one of the biggest e-commerce marketplaces in the world, connecting buyers and sellers from around the world. Depending on the metric and time frame, it is the fifth largest marketplace in the world by gross merchandise volume (GMV), behind Alibaba (Taobao and Tmall), Amazon, and JD.com. According to other statistics, it the second most visited global e-commerce and shopping website, again after Amazon.com (see Statista statistic). At the same time, its market cap is much smaller than that of those competitors, because it does not have inventory and only facilitates the transactions. It is an exciting company with lots of potential.

The search engine which powers eBay handles 80 billion queries on 3 billion documents every day. The index is distributed on tens of thousands geographically distributed servers. My job will be to improve the custom C++ search engine enabling customers to locate items to buy. There is an academic publication titled "The Architecture of eBay Search" presented at SIGIR eCom 2017 which contains more publicly available information about the search engine.

I am excited to have joined eBay's technical team in San José, California, USA.

Moving during the Covid-19 pandemic from Germany to California was challenge. But more about that in a future blog post.


Thrill Tutorial Title Slide

Thrill YouTube Tutorial: High-Performance Algorithmic Distributed Computing with C++

Posted on 2020-06-01 11:13 by Timo Bingmann at Permlink with 0 Comments. Tags: #talk #university #thrill

This post announces the completion of my new tutorial presentation and YouTube video: "Thrill Tutorial: High-Performance Algorithmic Distributed Computing with C++".

YouTube video link: https://youtu.be/UxW5YyETLXo (2h 41min)

Slide Deck: slides-20200601-thrill-tutorial.pdf slides-20200601-thrill-tutorial.pdf (21.3 MiB) (114 slides)

In this tutorial we present our new distributed Big Data processing framework called Thrill (https://project-thrill.org). It is a C++ framework consisting of a set of basic scalable algorithmic primitives like mapping, reducing, sorting, merging, joining, and additional MPI-like collectives. This set of primitives can be combined into larger more complex algorithms, such as WordCount, PageRank, and suffix sorting. Such compounded algorithms can then be run on very large inputs using a distributed computing cluster with external memory.

After introducing the audience to Thrill we guide participants through the initial steps of downloading and compiling the software package. The tutorial then continues to give an overview of the challenges of programming real distributed machines and models and frameworks for achieving this goal. With these foundations, Thrill's DIA programming model is introduced with an extensive listing of DIA operations and how to actually use them. The participants are then given a set of small example tasks to gain hands-on experience with DIAs.

After the hands-on session, the tutorial continues with more details on how to run Thrill programs on clusters and how to generate execution profiles. Then, deeper details of Thrill's internal software layers are discussed to advance the participants' mental model of how Thrill executes DIA operations. The final hands-on tutorial is designed as a concerted group effort to implement K-means clustering for 2D points.

The video on YouTube (https://youtu.be/UxW5YyETLXo) contains both presentation and live-coding sessions. It has a high information density and covers many topics.

Table of Contents

This article continues on the next page ...

Distributed Merge String Sort

IPDPS Paper "Communication-Efficient String Sorting" and Talk Recording

Posted on 2020-05-18 15:30 by Timo Bingmann at Permlink with 0 Comments. Tags: #talk #university

Due to the coronavirus this year's IPDPS conference is held in a virtual fashion, and we sadly missed a chance to visit New Orleans. Instead, I recorded a YouTube video of our conference talk, because the slides usually are illustrations requiring more explanation.

The paper "Communication-Efficient String Sorting", which I coauthored with Peter Sanders and Matthias Schimek, will still be published in the IEEE proceedings. A preprint of the full paper is available on arXiv:2001.08516 and also from this webpage:

2001.08516v1-Communication-Efficient-String-Sorting.pdf 2001.08516v1-Communication-Efficient-String-Sorting.pdf,

There are two versions of the slides: the longer presentation version slides-20200518-distributed-string-sorting-ipdps.pdf below used for the YouTube recording and a short ten page teaser version slides-20200518-distributed-string-sorting-ipdps-short.pdf for the virtual conference.

Download slides-20200518-distributed-string-sorting-ipdps.pdf

The source code and more documentation about the implementations of our communication-efficient distributed string sorting algorithms can be found on my GitHub repository https://github.com/bingmann/distributed-string-sorting.

Matthias Schimek's master thesis, on which this paper and presentation are based on, can be downloaded from this website 2019_Schimek_Distributed_String_Sorting_Algorithms.pdf as well.

Below you can watch the video recording of my presentation, or head over to YouTube: https://youtu.be/uWro8fsfs5I.

This article continues on the next page ...

Cover images of Youtube lecture series

List of Recordings of Lectures and Exercises on YouTube

Posted on 2020-01-28 01:05 by Timo Bingmann at Permlink with 0 Comments. Tags: #university

This post features a list of videos on Youtube of lectures and exercises which I have given at the KIT. The recordings are all in German and were produced semi-automatically by the Center for Media-Learning (Zentrum für Mediales Lernen) at the KIT.

I have listed various entry time points for topics in the lectures for an easier overview. These entry points are only to those parts of the lectures or exercises which I personally presented. The lecture series are mainly presented by Prof Peter Sanders and exercises were jointly given with colleagues.

The main purpose of this article is that I can never seem to find the lectures on Youtube when I am looking for them. And I can now point people to this post if I want to reference some video explanation on a topic I previously gave.

This article continues on the next page ...