loops - Split and parse (to new file) string every nth character iterating over starting character - python -
i asked more general approach problem in previous post getting stuck trying parse out results individual files. want iterate on long string, starting @ position 1 (python 0) , print out every 100 characters. then, want move on 1 character , start @ position 2 (python 1) , repeat process until reach last 100 characters. want parse each "100" line chunk new file. here working with:
seq = 7524 # number raw_input read_num=100 raw_reads in range(100): def nlength_parts(seq,read_num): return map(''.join,zip(*[seq[i:] in range(read_num)])) f = open('read' + str(raw_reads), 'w') f.write("read" '\n') f.write(nlength_parts(seq,read_num)) f.close
the error getting it
f.write(nlength_parts(seq,read_num)) typeerror: expected character buffer object
having issues, appreciated!
after help, have made changes still not working properly:
seq = 7524 # number raw_input read_num=100 def nlength_parts(seq,read_num): return map(''.join,zip(*[seq[i:] in range(read_num)])) raw_reads in range(100): # should gene length - 100 f = open('read' + str(raw_reads), 'w') f.write("read" + str(raw_reads)) f.write(nlength_parts) f.close
i may have left out important variables , definitions keep post short has caused confusion. have pasted entire code below.
#! /usr/bin/env python import sys,os import random import string raw = raw_input("text file: " ) open(raw) f: joined = "".join(line.strip() line in f) f = open(raw + '.txt', 'w') f.write(joined) f.closed seq = str(joined) read_num = 100 def nlength_parts(seq,read_num): return map(''.join,zip(*[seq[i:] in range(read_num)])) raw_reads in range(100): # ideally want range len(seq)-100 f = open('read' + str(raw_reads), 'w') f.write("read" + str(raw_reads)) f.write('\n') f.write(str(nlength_parts)) f.close
edit in response clarifying comment:
essentially, want rolling window of string. long_string = "012345678901234567890123456789..."
total length of 100.
in [18]: long_string out[18]: '0123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789' in [19]: window = 10 in [20]: in range(len(long_string) - window +1): .....: chunk = long_string[i:i+window] .....: print(chunk) .....: open('chunk_' + str(i+1) + '.txt','w') f: .....: f.write(chunk) .....: 0123456789 1234567890 2345678901 3456789012 4567890123 5678901234 6789012345 7890123456 8901234567 9012345678 0123456789 1234567890 2345678901 3456789012 4567890123 5678901234 6789012345 7890123456 8901234567 9012345678 0123456789 1234567890 2345678901 3456789012 4567890123 5678901234 6789012345 7890123456 8901234567 9012345678 0123456789 1234567890 2345678901 3456789012 4567890123 5678901234 6789012345 7890123456 8901234567 9012345678 0123456789 1234567890 2345678901 3456789012 4567890123 5678901234 6789012345 7890123456 8901234567 9012345678 0123456789 1234567890 2345678901 3456789012 4567890123 5678901234 6789012345 7890123456 8901234567 9012345678 0123456789 1234567890 2345678901 3456789012 4567890123 5678901234 6789012345 7890123456 8901234567 9012345678 0123456789 1234567890 2345678901 3456789012 4567890123 5678901234 6789012345 7890123456 8901234567 9012345678 0123456789 1234567890 2345678901 3456789012 4567890123 5678901234 6789012345 7890123456 8901234567 9012345678 0123456789
finally,
in [21]: ls chunk_10.txt chunk_20.txt chunk_30.txt chunk_40.txt chunk_50.txt chunk_60.txt chunk_70.txt chunk_80.txt chunk_90.txt chunk_11.txt chunk_21.txt chunk_31.txt chunk_41.txt chunk_51.txt chunk_61.txt chunk_71.txt chunk_81.txt chunk_91.txt chunk_12.txt chunk_22.txt chunk_32.txt chunk_42.txt chunk_52.txt chunk_62.txt chunk_72.txt chunk_82.txt chunk_9.txt chunk_13.txt chunk_23.txt chunk_33.txt chunk_43.txt chunk_53.txt chunk_63.txt chunk_73.txt chunk_83.txt chunk_14.txt chunk_24.txt chunk_34.txt chunk_44.txt chunk_54.txt chunk_64.txt chunk_74.txt chunk_84.txt chunk_15.txt chunk_25.txt chunk_35.txt chunk_45.txt chunk_55.txt chunk_65.txt chunk_75.txt chunk_85.txt chunk_16.txt chunk_26.txt chunk_36.txt chunk_46.txt chunk_56.txt chunk_66.txt chunk_76.txt chunk_86.txt chunk_17.txt chunk_27.txt chunk_37.txt chunk_47.txt chunk_57.txt chunk_67.txt chunk_77.txt chunk_87.txt chunk_18.txt chunk_28.txt chunk_38.txt chunk_48.txt chunk_58.txt chunk_68.txt chunk_78.txt chunk_88.txt chunk_19.txt chunk_29.txt chunk_39.txt chunk_49.txt chunk_59.txt chunk_69.txt chunk_79.txt chunk_89.txt chunk_1.txt chunk_2.txt chunk_3.txt chunk_4.txt chunk_5.txt chunk_6.txt chunk_7.txt chunk_8.txt
original response
i treat string file. lets avoid slicing headaches , pretty straightforward because file api lets "read" in chunks easily.
in [1]: import io in [2]: long_string = 'a'*100 + 'b'*100 + 'c'*100 + 'e'*88 in [3]: print(long_string) aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaabbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbcccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccceeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee in [4]: string_io = io.stringio(long_string) in [5]: chunk = string_io.read(100) in [6]: chunk_no = 1 in [7]: while chunk: ....: print(chunk) ....: open('chunk_' + str(chunk_no) + '.txt','w') f: ....: f.write(chunk) ....: chunk = string_io.read(100) ....: chunk_no += 1 ....: aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb cccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccc eeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee
note, i'm using ipython terminal, can use terminal commands inside interpreter session!
in [8]: ls chunk* chunk_1.txt chunk_2.txt chunk_3.txt chunk_4.txt in [9]: cat chunk_1.txt aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa in [10]: cat chunk_2.txt bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb in [11]: cat chunk_3.txt cccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccc in [12]: cat chunk_4.txt eeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeeee in [13]:
Comments
Post a Comment