CPM 2024 - 35th Annual Symposium on Combinatorial Pattern Matching

Conference

Papers and proceedings

Local information

Venue

Travel

Other

Poster

Keynote speakers

Martin Farach-Colton

NYU, USA

How Big is a Pointer?

The question seems to have a trivial answer: there are at least log n bits in a pointer to a memory of size n. In this talk, I’ll show that this is not always true. There are many situations where pointers can be compressed. After presenting tight bounds on the size of tiny pointers, I’ll show that compressed pointers can be used to solve a variety of open data structural problems.

Zsuzsanna Lipták

University of Verona, Italy

BWT everywhere

The Burrows-Wheeler Transform (BWT) is a reversible string transform at the heart of some of the most frequently used bioinformatics tools, such as the aligners bwa and bowtie. Introduced in 1994 by Burrows and Wheeler, the BWT takes advantage of the input string's repetitiveness for compression, while also allowing fast pattern matching in compressed space. BWT-based text indexes such as the FM-index and the more recent r-index are thus very effective, as much of current textual data is highly repetitive.
Applications of the BWT are, however, not restricted to the world of compressed text indexing. In this talk, I will explore several other contexts in which the BWT and the eBWT (its extension to multisets), have been put to use. These include BWT-based distance measures on strings and their applications in bioinformatics. I will sketch how to generate random de Bruijn sequences based on their BWT---resulting in the first practical algorithm that is capable of generating every k-order de Bruijn sequence with positive probability. We will also see how the BWT can be used to explain the differences between coexisting methods of text index construction for string collections, which have so far been believed to be equivalent.

Tetsuo Shibuya

University of Tokyo, Japan

Preserving Privacy in Biomedical Data with “More Efficient” Differentially Private Algorithms

Much biomedical data contains highly sensitive information. Especially, individual genome data, known as the ultimate identifier, is extremely sensitive. When we publish even the simplest statistics of biomedical databases, we need to be very careful to avoid leaking any individual sensitive information. Differential privacy is one of the most important measures to evaluate how safe the published data are from the viewpoint of privacy. We will discuss how to design efficient differentially private techniques for publishing biomedical data, especially the genome association study (GWAS) data. First, we will discuss differentially private methods for various GWAS statistics. We will also cover how to publish ranking information obtained by these statistics. We will also discuss how to publish the entire database to enable further complicated study without leaking much privacies. Another topic will be on how to efficiently handle more complicated databases such as graph databases.