all_words2 = [] for sentence in all_words: all_words2.append("".join(sentence)) print(all_words2)
2.形式2:将每个list里面的字符串合并成一个字符串(以适用onehot、tfidf向量的输入)
1 2 3 4 5 6 7 8 9 10 11 12
#将list of list转换为list 以适合CountVectorizer函数的格式 all_data_str = [] for i in range(len(all_data)): sentence= '' for j in range(len(all_data[i])): word = all_data[i][j] if j>0: sentence = sentence+' '+word else: sentence = sentence + word all_data_str.append(sentence) print(all_data_str[:2])
3.复杂的形式:保存pd.DataFrame后,再读取有时候会出现
第一步转换:
1 2 3 4 5 6 7 8 9 10 11
B = A[0:5] all_words = [] sentence = "" words=[] for i in range(len(B)): sentence = B[i].strip("[]").replace("\'","").replace(",","").split("\n") cur_words = [] for word in sentence: cur_words.append(word) all_words.append(cur_words) print(all_words)
第二步转换:
1 2 3 4
all_words2 = [] for sentence in all_words: all_words2.append("".join(sentence)) print(all_words2)