i have file consists of stop words (each in new line) , file (a corpus actually) consists of lot of sentences each in new line. have delete stop words in corpus , return each line of without stop words. wrote code returns 1 sentence. (the language persian). how can fix it returns of sentences?
with open ("stopwords.txt", encoding = "utf-8") f1: open ("train.txt", encoding = "utf-8") f2: in f1: line in f2: if in line: line= line.replace(i, "") open ("nostopwordstrain.txt", "w", encoding = "utf-8") f3: f3.write (line)
the problem last 2 lines of code not in loop. iterating through entire f2, line-by-line, , doing nothing it. then, after last line, write last line f3. instead, try:
with open("stopwords.txt", encoding = "utf-8") stopfile: stopwords = stopfile.readlines() # make convenient list print stopwords # check words open("train.txt", encoding = "utf-8") trainfile: open ("nostopwordstrain.txt", "w", encoding = "utf-8") newfile: line in trainfile: # go through each line word in stopwords: # go through , replace each word line= line.replace(word, "") newfile.write (line)
Comments
Post a Comment