UVa 11512

From Algorithmist
Jump to navigation Jump to search

11512 - GATTACA[edit]


Given a string (length <= 1000). Find the longest least lexicographical substring of the string that occurs at least twice.


Any algorithms that have complexity O(length^2) or lower can get accepted. One way is to use trie to enumerate all N suffixes of the string and keep the number of visitation for each node in the trie. To find the answer, a tree traversal will do. Find the deepest node in the trie that have at least 2 visitation count.

Another way is to use Suffix Array and Longest Common Prefix to find an adjacent strings in the Suffix Array that have the longest Longest Common Prefix. The Suffix Array and Longest Common Prefix can be constructed in O(N). Thus the overall algorithm complexity is O(N). The number of occurrence can be found while scanning through the Longest Common Prefix list.


  • The trie traversal should be done in lexicographical order: A,C,G,T


Trie O(length^2) implementations is as follows

#include <stdio.h>
#include <string.h>

#define REP(i,n) for (int i=0,_n=n; i<_n; i++)

#define MAXN 1000000

char s[MAXN], res[MAXN], *acgt = "ACGT";
int M[300], n, id, resLen, resCnt;
int to[MAXN][4], cnt[MAXN];

void clear(int i){
  cnt[i] = 0;
  REP(j,4) to[i][j] = -1;

void add(int i, char *x){
  if (!*x) return;
  int &ni = to[i][M[*x]];
  if (ni==-1) clear(ni = id++);
  add(ni, x+1);

void rec(int i, int depth=0){
  s[depth] = 0;
  if (depth > resLen && cnt[i] > 1){
    resLen = depth;
    resCnt = cnt[i];
  REP(j,4) if (to[i][j]!=-1){
    s[depth] = acgt[j];
    rec(to[i][j], depth+1);

int main(){
  M['A'] = 0;
  M['C'] = 1;
  M['G'] = 2;
  M['T'] = 3;
  while (n--){
    clear(0); id=1;
    REP(i,strlen(s)) add(0,s+i);
    resLen = resCnt = 0;
    if (resLen > 0){
      printf("%s %d\n",res,resCnt);
    } else{
      puts("No repetitions found!");