Pull out information from last line from a if else statement within a for loop Python -
i don't think possible figured ask in case. trying write memory efficient python program parsing files typically 100+ gigs in size. trying use loop read in line, split on various characters multiple times , write within same loop.
the trick file has lines start "#" not important except last line starts "#" header of file. want able pull information last line because contains sample names.
for line in seqfile: line = line.rstrip() if line.startswith("#"): continue (unless last line starts #) samplenames = lastline[8:-1] newheader.write(new header sample names) else: columns = line.split("\t") more splitting write
if not possible other alternative can think of store lines # (which can still 5 gigs in size) go , write beginning of file believe can't done directly if there way memory efficiently nice.
any appreciated.
thank you
if want index of last line starting #
, read once using takewhile
, consuming lines until hit first line not starting #
seek , use itertools.islice line:
from itertools import takewhile,islice open(file) f: start = sum(1 _ in takewhile(lambda x: x[0] == "#",f)) -1 f.seek(0) data = next(islice(f,start, start+1)) print(data)
the first arg takewhile predicate while predicate true takewhile take elements iterable passed in second argument, because file object returns it's own iterator when consume takewhile object using sum file pointer pointing next line after header line want matter of seeking , getting line islice. can seek less if want go previous line , take few lines islice filtering out until reach last line starting #
.
file:
### ## # header blah blah blah
output:
# header
the memory efficient way think of if line anywhere mean reading file once updating index variable when had line starting #, pass islice in answer above or use linecache.getline in answer:
import linecache open(file) f: index = none ind, line in enumerate(f, 1): if line[0] == "#": index = ind data = linecache.getline(file, index) print(data)
we use starting index of 1
enumerate getline
counts starting 1
.
or update variable data hold each line starting #
if want particular line , don't care position or other lines:
with open(file) f: data = none line in f: if line[0] == "#": data = line print(data) # last occurrence of line starting `#`
or using file.tell
, keeping tack of previous pointer location , using seek call next on file object line/lines want:
with open(file) f: curr_tell, prev_tell = none, none line in iter(f.readline, ""): if line[0] == "#": curr_tell = prev_tell prev_tell = f.tell() f.seek(curr_tell) data = next(f) print(data) # header
there consume recipe itertools code use consume file iterator header line index -1 call next on file object:
def consume(iterator, n): "advance iterator n-steps ahead. if n none, consume entirely." # use functions consume iterators @ c speed. if n none: # feed entire iterator zero-length deque collections.deque(iterator, maxlen=0) else: # advance empty slice starting @ position n next(islice(iterator, n, n), none)
Comments
Post a Comment