Stxxl
1.4.0
|
tar
zfxv
stxxl-x.y.z.tgz
stxxl
directory: cd
stxxl-x.y.z
make config_gnu
make.settings.local
file. Note: this will produce some warnings and abort with an error, which is intended.make.settings.local
file according to your system configuration:STXXL_ROOT
variable to STXXL
root directory ( directory_where_you_unpacked_the_tar_ball/stxxl-x.y.z
)STXXL
to use Boost libraries (you should have the Boost libraries already installed)USE_BOOST
variable to yes
BOOST_ROOT
variable to the Boost root path, unless Boost is installed in system default search paths.STXXL
to use the libstdc++ parallel modelibrary_g++_pmode
and tests_g++_pmode
instead of the ones listed belowSTXXL
to use the MCSTL library (you should have the MCSTL library already installed)MCSTL_ROOT
variable to the MCSTL root pathlibrary_g++_mcstl
and tests_g++_mcstl
instead of the ones listed belowOPT
variable to -O3
or other g++ optimization level you like (default: -O3
)DEBUG
variable to -g
or other g++ debugging option if you want to produce a debug version of the Stxxl library or Stxxl examples (default: not set)make.settings.gnu
, they are usually overridden by settings in make.settings.local
, using CPPFLAGS
and LDFLAGS
, for example, you can add arbitrary compiler and linker optionsmake library_g++
make tests_g++
After compiling the library, some Makefile variables are written to stxxl.mk
(pmstxxl.mk
if you have built with parallel mode, mcstxxl.mk
if you have built with MCSTL) in your STXXL_ROOT
directory. This file should be included from your application's Makefile.
The following variables can be used:
STXXL_CXX
- the compiler used to build the STXXL
library, it's recommended to use the same to build your applicationsSTXXL_CPPFLAGS
- add these flags to the compile commandsSTXXL_LDLIBS
- add these libraries to the link commandsAn example Makefile for an application using STXXL
:
STXXL_ROOT ?= .../stxxl STXXL_CONFIG ?= stxxl.mk include $(STXXL_ROOT)/$(STXXL_CONFIG) # use the variables from stxxl.mk CXX = $(STXXL_CXX) CPPFLAGS += $(STXXL_CPPFLAGS) LDLIBS += $(STXXL_LDLIBS) # add your own optimization, warning, debug, ... flags # (these are *not* set in stxxl.mk) CPPFLAGS += -O3 -Wall -g -DFOO=BAR # build your application # (my_example.o is generated from my_example.cpp automatically) my_example.bin: my_example.o $(CXX) $(CXXFLAGS) $(CPPFLAGS) $(LDFLAGS) my_example.o -o $@ $(LDLIBS)
To enable (shared-memory-)parallel execution of internal computation (in fact, sorting and merging, and random shuffling), you have several options depending on the compiler version used:
STXXL_PARALLEL_MODE_EXPLICIT
and enabling OpenMP (-DSTXXL_PARALLEL_MODE_EXPLICIT
-fopenmp
) during compilation and linkage of your program. Compiling the library binary with this flag enabled is not really necessary, since the most time-consuming operations are called by the generic routines and thus contained in the header files._GLIBCXX_PARALLEL
and enabling OpenMP (-D_GLIBCXX_PARALLEL
-fopenmp
). This has the implication that STL algorithms in your program will also be executed in parallel, which may have undesired side effects. These options are automatically used when you built STXXL using the *_pmode
target, and your Makefile includes pmstxxl.mk
.*_mcstl
target, and your Makefile includes mcstxxl.mk
.We recommend to try the first option at first.
The number of threads to be used can be set by the environment variable OMP_NUM_THREADS or by calling omp_set_num_threads. Detailed tuning can be achieved as described here.
Before you try to run one of the STXXL
examples (or your own STXXL
program) you must configure the disk space that will be used as external memory for the library.
To get best performance with STXXL
you should assign separate disks to it. These disks should be used by the library only. Since STXXL
is developed to exploit disk parallelism, the performance of your external memory application will increase if you use more than one disk. But from how many disks your application can benefit depends on how "I/O bound" it is. With modern disk bandwidths of about 50-75 MiB/s most of applications are I/O bound for one disk. This means that if you add another disk the running time will be halved. Adding more disks might also increase performance significantly.
The library benefits from direct transfers from user memory to disk, which saves superfluous copies. We recommend to use the XFS file system, which gives good read and write performance for large files. Note that file creation speed of
XFS
is a bit slower, so that disk files should be precreated for optimal performance.
If the filesystems only use is to store one large STXXL
disk file, we also recommend to add the following options to the mkfs.xfs
command to gain maximum performance:
-d agcount=1 -l size=512b
The following filesystems have been reported not to support direct I/O: tmpfs
, glusterfs
. Since direct I/O is enabled by default, you may recompile STXXL
with STXXL_DIRECT_IO_OFF
defined to access files on these file systems.
You must define the disk configuration for an STXXL
program in a file named '
.stxxl' that must reside in the same directory where you execute the program. You can change the default file name for the configuration file by setting the environment variable STXXLCFG
.
Each line of the configuration file describes a disk. A disk description uses the following format:
disk=full_disk_filename
,capacity,access_method
Description of the parameters:
full_disk_filename
: full disk filename. In order to access disks STXXL uses file access methods. Each disk is represented as a file. If you have a disk that is mounted in Unix to the path /mnt/disk0/, then the correct value for the full_disk_filename
would be /mnt/disk0/some_file_name
,capacity
: maximum capacity of the disk in megabytes (0 means autogrow, file will be deleted afterwards)access_method
: STXXL
has a number of different file access implementations for POSIX systems, choose one of them:syscall
: use read
and write
system calls which perform disk transfers directly on user memory pages without superfluous copying (currently the fastest method)mmap
: use
mmap
and munmap
system callsboostfd
: access the file using a Boost file descriptorfileperblock_syscall
, fileperblock_mmap
, fileperblock_boostfd
: same as above, but take a single file per block, using full_disk_filename as file name prefix. Usually provide worse performance than the standard variants, but release freed blocks to the file system immediately.simdisk
: simulates timings of the IBM IC35L080AVVA07 disk, full_disk_filename must point to a file on a RAM disk partition with sufficient spacememory
: keeps all data in RAM, for quicker testingwbtl
: library-based write-combining (good for writing small blocks onto SSDs), based on syscall
See also the example configuration file 'config_example'
included in the tarball.
STXXL
produces two kinds of log files, a message and an error log. By setting the environment variables STXXLLOGFILE
and STXXLERRLOGFILE
, you can configure the location of these files. The default values are stxxl.log
and stxxl.errlog
, respectively.
In order to get the maximum performance one should precreate disk files described in the configuration file, before running STXXL
applications.
The precreation utility is included in the set of STXXL
utilities ( utils/createdisks.bin
). Run this utility for each disk you have defined in the disk configuration file:
utils/createdisks.bin capacity full_disk_filename...