Pull out information from last line from a if else statement within a for loop Python -


i don't think possible figured ask in case. trying write memory efficient python program parsing files typically 100+ gigs in size. trying use loop read in line, split on various characters multiple times , write within same loop.

the trick file has lines start "#" not important except last line starts "#" header of file. want able pull information last line because contains sample names.

for line in seqfile: line = line.rstrip() if line.startswith("#"):     continue (unless last line starts #)     samplenames = lastline[8:-1]     newheader.write(new header sample names) else:     columns = line.split("\t")      more splitting     write 

if not possible other alternative can think of store lines # (which can still 5 gigs in size) go , write beginning of file believe can't done directly if there way memory efficiently nice.

any appreciated.

thank you

if want index of last line starting #, read once using takewhile, consuming lines until hit first line not starting # seek , use itertools.islice line:

from itertools import takewhile,islice  open(file) f:     start = sum(1 _ in takewhile(lambda x: x[0] == "#",f)) -1     f.seek(0)     data = next(islice(f,start, start+1))     print(data) 

the first arg takewhile predicate while predicate true takewhile take elements iterable passed in second argument, because file object returns it's own iterator when consume takewhile object using sum file pointer pointing next line after header line want matter of seeking , getting line islice. can seek less if want go previous line , take few lines islice filtering out until reach last line starting #.

file:

### ## # header blah blah blah 

output:

 # header 

the memory efficient way think of if line anywhere mean reading file once updating index variable when had line starting #, pass islice in answer above or use linecache.getline in answer:

import linecache  open(file) f:     index = none     ind, line in enumerate(f, 1):         if line[0] == "#":             index = ind     data = linecache.getline(file, index)     print(data) 

we use starting index of 1 enumerate getline counts starting 1.

or update variable data hold each line starting # if want particular line , don't care position or other lines:

with open(file) f:      data = none     line in f:         if line[0] == "#":             data = line     print(data) # last occurrence of line starting `#` 

or using file.tell, keeping tack of previous pointer location , using seek call next on file object line/lines want:

with open(file) f:     curr_tell, prev_tell = none, none     line in iter(f.readline, ""):         if line[0] == "#":             curr_tell = prev_tell         prev_tell = f.tell()     f.seek(curr_tell)     data  = next(f)     print(data)     # header 

there consume recipe itertools code use consume file iterator header line index -1 call next on file object:

def consume(iterator, n):     "advance iterator n-steps ahead. if n none, consume entirely."     # use functions consume iterators @ c speed.     if n none:         # feed entire iterator zero-length deque         collections.deque(iterator, maxlen=0)     else:         # advance empty slice starting @ position n         next(islice(iterator, n, n), none) 

Comments

Popular posts from this blog

python - TypeError: start must be a integer -

c# - DevExpress RepositoryItemComboBox BackColor property ignored -

django - Creating multiple model instances in DRF3 -