原始文件
观察只需去掉时间戳和索引号即可和换行符’\n’。
假定对话中不以0123456789等单字符开头
new = []
with open("Anne.S01E01.Your Will Shall Decide Your Destiny.WEBRip.x264-RARBG.mp4.srt", encoding="utf-8-sig") as f:
for ele in f.readlines():
if ele[0] not in list('\n0123456789'):
new.append(ele)
new = [ele.strip() for ele in new]
结果如图
如果想保存为csv文件
import pandas as pd
df = pd.DataFrame(data=new, columns=[ "text"])
df.to_csv("./text.csv", encoding="utf-8")