panthema / 2014 / sqlplot-tools / directives

List of LaTeX SqlPlotTools Directives

SqlPlotTools directives in LaTeX are written as LaTeX comments, starting with percent signs '%'. Multi-line directives are prefixed with two percent signs '% %' and indented at the same level.

Auxiliary Directives

SQL [any-statement]

Executes any arbitrary SQL statement on the database. This can be used to create indices, delete imported rows, etc.

Examples:

% SQL DELETE FROM stats WHERE key = 'bad data'
% SQL CREATE INDEX stats_key ON stats(key)

IMPORT-DATA [options] [files...]

Imports RESULT lines into the current database from given files. By default the imported tables are temporary tables (if available), which means they are discarded after the program exits.

Options: 
  -1       Take field types from first line and process stream.
  -a       Process all line, regardless of RESULT marker.
  -C       Enumerate unnamed fields with col# instead of using key names.
  -E       Allow empty tables or globs without matching files.
  -d       Eliminate duplicate RESULT lines.
  -T       Import into TEMPORARY table (for in-file processing).
  -P       Import into non-TEMPORARY table (reverts the default -T).
  -v       Increase verbosity.

Example:

% IMPORT-DATA stats plotdata/experiment1/*.txt

RANGE BEGIN [key] and RANGE END [key]

To process only parts of a large file, RANGEs can specified in the LaTeX file. The sp-process tool can then be called with parameter -R [key] to process only directives within RANGE BEGIN [key] and RANGE END [key] lines. Ranges can be opened and closed multiple times in any order.

CONNECT [database-connection]

Switch connection to database. Common database connection strings are sqlite:file.db, postgresql:dbname=hello host=myhost and mysql:database=foo. By default SqlPlotTools will try PostgreSQL, then MySQL, then use SQLite, depending on which drivers are compiled into the binary.

Data Directives

PLOT [select-statement]

The PLOT directive executes one SQL select command. The result must contain at least two columns called x and y which are transformed into (x,y) coordinates. The coordinates are outputted as a single \addplot line for pgfplots. The program tries hard to read the following lines and keep your customizations of the \addplot command. It replaces only the data within the coordinates clause of following \addplot lines.

Example:

%% PLOT SELECT LOG(2, testsize) AS x, speed AS y FROM test
%% WHERE funcname='std::stable_sort' ORDER BY x
\addplot[red] coordinates { (24,0.144) };

Since the PLOT directive only outputs an \addplot line, it is usually wrapped in tikz and pgfplot environments. And as such, one can use multiple PLOT lines to generate more than one series per plot.

\begin{tikzpicture}
  \begin{axis}[
    title={Simple C++ Sorting Test},
    xlabel={Item Count [$\log_2(n)$]},
    ylabel={Run Time per Item [Nanoseconds / Item]},
    ]

    %% PLOT SELECT LOG(2, testsize) AS x, speed AS y FROM test
    %% WHERE funcname='std::stable_sort' ORDER BY x
    \add plot[red] coordinates { (24,0.144) };

  \end{axis}
\end{tikzpicture}

MULTIPLOT(group) [select-statement]

Multi-plot is used to generate multiple plot lines in one command. The multiple lines are distinguished by the values in the columns containing in the group clause (e.g. group by hostname and experiment type). Since the group columns are usually contained in the SELECT statement, all occurrences of the string "MULTIPLOT" are replaced by the group clause for convenience.

The select statement is executed on database, and the result must contain at least two columns called x and y, and additionally all columns in the group clauses. The result must be ordered such that all rows in a common group are in sequences, and within the group the x items are ascending. As with PLOT, the program tries to replace only the data within the coordinates clause of following \addplot lines, such that user changes outside are kept.

Additionally, the group clause is used to automatically generate legend entries following each plot line. For nicer production legends, these can easily be overridden using a \legend after the MULTIPLOT part.

Alternatively, the legend title can be defined using SQL by appending "|title" to the group specification and returning a column title in the SQL statement. LaTeX commands in the legend title column are escaped. If this is undesired, e.g. to use math symbols, use "|ptitle" and return a column ptitle instead.

%% MULTIPLOT(funcname)
%% SELECT LOG(testsize) / LOG(2) AS x, AVG(bandwidth) AS y, MULTIPLOT
%% FROM test WHERE host='earth' GROUP BY MULTIPLOT,x ORDER BY MULTIPLOT,x
\addplot coordinates { (10.1699,2.12566e+10) ... (34,3.50009e+09) };
\addlegendentry{funcname=ScanRead64PtrUnrollLoop};
\addplot coordinates { (10.1699,2.00049e+10) ... (34,2.1669e+09) };
\addlegendentry{funcname=ScanWrite64PtrUnrollLoop};

or with explicit legend titles:

%% MULTIPLOT(funcname|title)
%% SELECT LOG(testsize) / LOG(2) AS x, AVG(bandwidth) AS y, funcname AS title, MULTIPLOT
%% FROM test WHERE host='earth' GROUP BY MULTIPLOT,x ORDER BY MULTIPLOT,x
\addplot coordinates { (10.1699,2.12566e+10) ... (34,3.50009e+09) };
\addlegendentry{ScanRead64PtrUnrollLoop};
\addplot coordinates { (10.1699,2.00049e+10) ... (34,2.1669e+09) };
\addlegendentry{ScanWrite64PtrUnrollLoop};

TEXTTABLE [select-statement]

The directive executes the SELECT statement on the database and inserts the result into LaTeX as text table. The table is similar to the output of the SQL command line tools mysql/psql and can be used to test statements.

%% TEXTTABLE
%% SELECT COUNT(*), SUM(time) / 3600 / 24 AS time, SUM(iovolume) / POW(2,40) AS vol
%% FROM "20121026-ic1-fullrun32" WHERE sacaname='skew3' OR sacaname='eSAIS'
+-------+------------------+------------------+
| count |             time |              vol |
+-------+------------------+------------------+
|    90 | 3.22392927893518 | 16.8072862625122 |
+-------+------------------+------------------+
% END TEXTTABLE SELECT COUNT(*), SUM(time) / 3600 / 24 AS time, SUM(iovolume...)

TABULAR [REFORMAT()] [select-statement]

The directive executes the SELECT statement on the database and inserts the result into LaTeX as the contents of a tabular. The tabular environment itself is NOT generated!

Additionally, the columns and rows of the tabular matrix can be reformatted by rules in the optional REFORMAT() clause. Reformatting is experimental.

\begin{tabular}{l|rrr}
$n$ & \texttt{std::sort} & \texttt{std::stable\_sort} & STL heap sort \\ \hline
%% TABULAR REFORMAT(col 1-3=(precision=1) row 0-100=(min=bold))
%% SELECT '$2^{' || FLOOR(LOG(2, size)) || '}$' AS x,
%% (SELECT MEDIAN(time / repeats / size * 1e9) FROM stats s1 WHERE s1.algo='std::sort' AND s1.size = s.size GROUP BY s1.size),
%% (SELECT MEDIAN(time / repeats / size * 1e9) FROM stats s1 WHERE s1.algo='std::stable_sort' AND s1.size = s.size GROUP BY s1.size),
%% (SELECT MEDIAN(time / repeats / size * 1e9) FROM stats s1 WHERE s1.algo='std::heap_sort' AND s1.size = s.size GROUP BY s1.size)
%% FROM stats s
%% GROUP BY s.size ORDER BY s.size
$2^{10}$ & \bf 38.0 &      38.1 &  71.7 \\
$2^{11}$ &     42.9 &  \bf 41.8 &  77.6 \\
$2^{12}$ &     47.8 &  \bf 45.2 &  83.8 \\
$2^{13}$ &     51.2 &  \bf 48.3 &  89.6 \\
...
% END TABULAR SELECT '$2^{' || FLOOR(LOG(2, size)) || '}$' AS x, (SELECT MEDI...
\end{tabular}

DEFMACRO [select-statement]

The directive executes the SELECT statement on the database and inserts each column of the result into LaTeX as \def\column{value}. The result may only contain one row. The macro result can then be used in the LaTeX text, e.g. to summarize experimental results.

%% DEFMACRO
%% SELECT
%%   ROUND((CE0 - CE1) / CE0 * 100) AS RatioZeroOverOne,
%%   ROUND((CE1 - CE2) / CE1 * 100) AS RatioOneOverTwo,
%%   ROUND((CE2 - CE3) / CE2 * 100) AS RatioTwoOverThree,
%%   ROUND(((CI2 - CE2) / CI2 * 100)::numeric, 1) AS RatioTwoIoverE
%% FROM
%% (SELECT
%%   (SELECT EXP(AVG(LN(time))) FROM seqtime WHERE algo = 'bingmann/msd_CE0') AS CE0,
%%   (SELECT EXP(AVG(LN(time))) FROM seqtime WHERE algo = 'bingmann/msd_CE1') AS CE1,
%%   (SELECT EXP(AVG(LN(time))) FROM seqtime WHERE algo = 'bingmann/msd_CE2') AS CE2,
%%   (SELECT EXP(AVG(LN(time))) FROM seqtime WHERE algo = 'bingmann/msd_CE3_sb') AS CE3,
%%   (SELECT EXP(AVG(LN(time))) FROM seqtime WHERE algo = 'bingmann/msd_CI2') AS CI2
%% ) x
\def\ratiozerooverone{33}
\def\ratiooneovertwo{40}
\def\ratiotwooverthree{14}
\def\ratiotwoiovere{7.5}

% RANGE END seqalgo

List of Gnuplot SqlPlotTools Directives

SqlPlotTools directives in Gnuplot are written as Gnuplot comments, starting with hash signs '#'. Multi-line directives are prefixed with two hash signs '##' and indented at the same level.

All "auxiliary" directives, SQL, IMPORT-DATA, RANGE BEGIN/END, and CONNECT work identically to LaTeX directives. See above in the LaTeX section for examples.

SQL [any-statement]

Executes any arbitrary SQL statement on the database. This can be used to create indices, delete imported rows, etc.

IMPORT-DATA [options] [files...]

Imports RESULT lines into the current database from given files.

RANGE BEGIN [key] and RANGE END [key]

To process only parts of a large file, RANGEs can specified in the LaTeX file and selected using the -R command line parameter.

CONNECT [database-connection]

Switch connection to database.

Data Directives

PLOT [select-statement]

The PLOT directive executes one SQL select command. The result must contain at least two columns called x and y which are transformed into (x,y) coordinates. The coordinates are written to an auxiliary data file for Gnuplot to parse.

Example:

## PLOT
## SELECT LOG(2, size) AS x, MEDIAN(time / repeats / size * 1e9) AS y
## FROM stats WHERE algo='std::stable_sort' GROUP BY x ORDER BY x
plot \
    'speed-data.txt' index 0 with linespoints

MULTIPLOT(group) [select-statement]

Multi-plot is used to generate multiple plot lines in one command. The directive works analogously to the LaTeX version, but outputs an entire Gnuplot plot sequence and writes the coordinates to an auxiliary data file.

The appendixes "|title" and "|ptitle" are currently not available (patch welcome).

## MULTIPLOT(algo)
## SELECT LOG(2, size) AS x, MEDIAN(time / repeats / size * 1e9) AS y, MULTIPLOT
## FROM stats GROUP BY MULTIPLOT,x ORDER BY MULTIPLOT,x
plot \
    'speed-data.txt' index 1 title "algo=std::heap_sort" with linespoints, \
    'speed-data.txt' index 2 title "algo=std::sort" with linespoints, \
    'speed-data.txt' index 3 title "algo=std::stable_sort" with linespoints

MACRO [select-statement]

The directive executes the SELECT statement on the database and inserts each column of the result into the Gnuplot file as macros.

## MACRO SELECT MAX(size) AS max_size, MAX(time) AS max_time FROM stats
max_size = 1073741824
max_time = 625.761

Exits

First written 2015-02-15 by Timo Bingmann, updated 2018-11-27.