Posted on 2012-11-19 15:49 by Timo at Permlink with 2 Comments. Tags: research stringology c++
This web page accompanies our conference paper "Inducing Suffix and LCP Arrays in External Memory", which we presented at the Workshop on Algorithm Engineering and Experiments (ALENEX 2013). A PDF of the publication is available from this site as alenex13esais.pdf
or from the online proceedings. The paper was joint work with my colleagues Johannes Fischer and Vitaly Osipov.
The slides to my presentation of the paper on January 7th, 2013 in New Orleans, LA, USA is also available: alenex13esais-slides.pdf
. They contain little text and an example of the eSAIS algorithm with a simplified PQ.
Our implementations of eSAIS, the eSAIS-LCP variants, DC3 and DC3-LCP algorithms as described in the paper are available below under the GNU General Public License v3 (GPL).
| eSAIS and DC3 with LCP version 0.5.2 (current) updated 2013-03-30 | ||
| Source code archive: (includes STXXL) | eSAIS-DC3-LCP-0.5.2.tar.bz2 (975 KiB)MD5: 18abfd0836810d7755b7fcabf09ce5dd | Browse online |
| Git repositories | Suffix and LCP construction algorithmsgit clone http://algohub.iti.kit.edu/algo2/eSAIS/cd eSAIS; git submodule init; git submodule update | |
STXXL with custom patchesgit clone http://algohub.iti.kit.edu/algo2/stxxl/ | ||
| Customized STXXL HTML documentation | ||
The algorithm implementations requires a special version of the STXXL library, which is also listed above. For more information about compiling and testing the implementation, please refer to the README included in the source.
We have also collected the real-world input samples used to test our algorithms. Note that these are files are very large and hosted externally:
| Wikipedia XML Dump from June 2012 | download (11.1 GiB) | Creative Commons License |
| Gutenberg Concatenation from September 2012 | download (5.1 GiB) | Project Gutenberg License |
| UCSC Human Genome Assembly "hg19" | download (610 MiB) | Free by UCSC |
| First 8 Gi of Decimal Digits of Pi | download (3.8 GiB) | generated by y-cruncher |
Notice that most of these files are compressed with xz and must be recompressed with gzip for the esactest test program to be able to automatically decompress them on-the-fly.
| eSAIS and DC3 with LCP version 0.5.0 released 2012-11-26 | ||
| Source code archive: (includes STXXL) | eSAIS-DC3-LCP-0.5.0.tar.bz2 (980 KiB)MD5: ade0b73c4348a30d514a7ee05f22d36b | Browse online |