Keynote speakers
Martin Farach-Colton
NYU, USA
How Big is a Pointer?
The question seems to have a trivial answer: there are at least log n bits in a pointer to a memory of size n. In this talk, I’ll show that this is not always true. There are many situations where pointers can be compressed. After presenting tight bounds on the size of tiny pointers, I’ll show that compressed pointers can be used to solve a variety of open data structural problems.
Zsuzsanna Lipták
University of Verona, Italy
BWT everywhere
The Burrows-Wheeler Transform (BWT) is a reversible string transform at the heart of some of the most frequently used bioinformatics tools, such as the aligners bwa and bowtie. Introduced in 1994 by Burrows and Wheeler, the BWT takes advantage of the input string's repetitiveness for compression, while also allowing fast pattern matching in compressed space. BWT-based text indexes such as the FM-index and the more recent r-index are thus very effective, as much of current textual data is highly repetitive. Applications of the BWT are, however, not restricted to the world of compressed text indexing. In this talk, I will explore several other contexts in which the BWT and the eBWT (its extension to multisets), have been put to use. These include BWT-based distance measures on strings and their applications in bioinformatics. I will sketch how to generate random de Bruijn sequences based on their BWT---resulting in the first practical algorithm that is capable of generating every k-order de Bruijn sequence with positive probability. We will also see how the BWT can be used to explain the differences between coexisting methods of text index construction for string collections, which have so far been believed to be equivalent.
Tetsuo Shibuya
University of Tokyo, Japan
Preserving Privacy in Biomedical Data with “More Efficient” Differentially Private Algorithms
Much biomedical data contains highly sensitive information. Especially, individual genome data, known as the ultimate identifier, is extremely sensitive. When we publish even the simplest statistics of biomedical databases, we need to be very careful to avoid leaking any individual sensitive information. Differential privacy is one of the most important measures to evaluate how safe the published data are from the viewpoint of privacy. We will discuss how to design efficient differentially private techniques for publishing biomedical data, especially the genome association study (GWAS) data. First, we will discuss differentially private methods for various GWAS statistics. We will also cover how to publish ranking information obtained by these statistics. We will also discuss how to publish the entire database to enable further complicated study without leaking much privacies. Another topic will be on how to efficiently handle more complicated databases such as graph databases.