logging - Algorithm to find record in a log structured file system -


i have log of records. each record has id , timestamp. new records appended log monotonically increasing id, though there can deletion of records in between.

the problem - if given timestamp t1, provide efficient algorithm determine log record has timestamp = ceil(t1).

points note. log can big millions of records. there can missing records because of record deletion.

example: if log record = (id, timestamp), log can shown this:

(1, 10), (2, 11), (5, 15), (8,18), (9, 19), (10, 20)

find id of record min timestamp greater or equal 17.

answer 8.

find id of record min timestamp greater or equal 11.

answer 2.

find id of record min timestamp greater or equal 22.

answer nil

find id of record min timestamp greater or equal 5.

answer 1

i have come simple data-structures solve problem.

/* index:    0  1   2   3   4  5  6   7   8  9  10   11   12  */    int ids[]=     {1,  2,            6,  7,        10,  11,  12};  int map[]=  {0, 1,  1,  0,  0, 0, 1,  1,  0, 0, 1,   1,   1};  int time[]= {0, 10, 20, 0,  0, 0, 60, 70, 0, 0, 100, 110, 120};  int start= 1, end = 12;  // known us.

ids[] list of ids. given list can millions, cannot index list in array. in above example, ids 3, 4, 5, 8, 9 missing. list in increasing order.

map[] bitmap can tell if given id present or not. cheap operation.

time[] timestamp array each of ids present. remember assume getting timestamp expensive operation.

also there no co-relation between id's , timestamp value. instead of 10, 20, etc 1133, 2987..and on. in increasing order.

you need fill function:

int  find_ceil_id(int timestamp)   {    .....    ....    return(id)  }

if function being kept in memory between searches, take advantage of previous searches narrow down future searches. if getting timestamps expensive, significant improvement. if in memory in array have in post, moot point. assuming timestamps unique, start solution:

int findfirstnonzero(int startidx) {     int myidx=startidx;     while (map[myidx] == 0)     {         myidx++;     }     return(myidx); }  int findlastnonzero(int startidx) {     int myidx=startidx;     while (map[myidx] == 0)     {         myidx--;     }     return(myidx); }  int find_ceil_id(int timestamp)  {     int low=findfirstnonzero(0);     int high=findlastnonzero(map.count -1);     int checkindex = findlastnonzero((low + high)/2);     int checktime;      while (low < findlastnonzero(high - 1))     {         checktime = time[checkindex];         if (checktime >= timestamp) {             high = checkindex;         } else {             low = checkindex;         }         checkindex = findlastnonzero((low+high) / 2);         if (checkindex == low) {             checkindex = findfirstnonzero(low+1);         }     }     return (high); } 

from of comments, seems timestamps can repeat. so? if so, require minor change above... never mind, made change , made code simpler.

basic binary search , find right element in log2(n) tries, n being size of array of id's. on million entries in id array, need examine 20 of them @ right one. on billion entries, need examine 30.

not compiling , testing code, that's you. should work.


Comments

Popular posts from this blog

python - TypeError: start must be a integer -

c# - DevExpress RepositoryItemComboBox BackColor property ignored -

django - Creating multiple model instances in DRF3 -