CPM 2014

25th Annual Symposium on Combinatorial Pattern Matching

Moscow, Russia, June 16-18, 2014



Local information



Keynote speakers

Alberto Apostolico
(Georgia Tech, USA, and IASI-CNR, Italy)

Sequence Comparison in the Time of the Deluge
It is almost fifty years since the appearance of the famous paper entitled "Binary codes capable of correcting deletions, insertions, and reversals" (1966 English translation, Soviet Physics Doklady) by which Vladimir Levenshtein introduced his eponymous measure of string distance. The notion was since to be re-discovered, variously dithered and more or less efficiently computed in distant domains of application. In particular, it provided the platform for half  a century of analysis, alignment, search, taxonomy and phylogeny of molecular sequences. As the size and multiplicity of the sequences produced expand to an unprecedented scale, many elegant techniques of the past no longer work.  In molecular taxonomy and phylogeny, for instance, the alignment of whole genomes proves both computationally unbearable and hardly significant. In metagenomics, the elucidation of microbiome compositions under conditions of noisy and largely unidentified reference sequences faces steep barriers during assembly and assignment. In recent studies, classical notions are increasingly being complemented or even supplanted by global similarity measures that refer, implicitly or explicitly, to the composition of sequences in terms of their constituent patterns.  Such measures hinge more or less uniformly on an underlying notion of relative compressibility, whereby two sequences are similar if either one can be described using mostly pieces from the other. They can free from chores of alignment and assembly. Their computation poses interesting and variously affordable algorithmic problems. This talk will review some such measures, their computation, and their applications. 

Maxime Crochemore
(King's College London, UK)

Repeats in Strings
Large amounts of text are generated every day in the cyberspacevia Web sites, emails, social networks, and other communicationnetworks. These text streams need to be analysed to detect criticalevents or the monitor business for example.An important characteristics to take into account in this settingis the existence of repetitions in texts. Their study constitutesa fundamental area of combinatorics on words due to major applicationsto string algorithms, data compression, music analysis, and biologicalsequences analysis, etc.The talk surveys algorithmic methods used to locate repetitivesegments in strings. It discusses the notion of runs that encompassesvarious types of periodicities considered by different authors, as wellas the notion of maximal-exponent factors that captures the mostsignificant repeats occurring in a string.The design and analysis of repeat finders rely on combinatorialproperties of words and raise a series of open problems incombinatorics on words.

Udi Manber
(Google, USA)

How to Think Big
"If you have to ask, you'll never know"
Louis Armstrong (answering a question
about the meaning of "swing")

In business, making more impact used to mean making more money.  But that changed.  In 2010 Twitter's revenues were 40x less than cat litter, but its impact was enormous. Same with Amazon, Google, and Facebook at their beginnings. The key to all of them was that they were doing completely new things. No one thought there was a need to share 140 characters before Twitter, but it turned out to be extremely useful. Very few people thought at the time that shopping on the web, highly relevant search, or communication between friends were important business needs.   

When the telephone was introduced to telegraph companies they discounted it. They had a good reason. More than a century later AT&T discounted voice over IP for similar reasons. Yet the 100+ years old AT&T was sold for less than the 5 years old WhatsApp. Things change rather quickly nowadays.

There are some common insights from these stories that can be applied to academic research and I will try to highlight them in this talk.  

Gene Myers  
(Max Planck Institute, Germany)

What's Behind BLAST
While BLAST is one of the most widely used search engines for molecular biology, and while there is a general understanding of how it works, few know the story of how it came about and the theoretical algorithmic result from which it was derived.  I will tell the story and explain the theoretical algorithm - an O(DNpow(D/P)log N) expected-time algorithm for finding all matches to a query of length P with not more than D differences in a database of length N » P. Surprisingly, this result, published in early 1994, has to my knowledge not been improved upon to this day.