Utility: Preprocessing text w/ Spacy
Mon Sep 05 2022 09:41:21 GMT+0000 (Coordinated Universal Time)
Saved by
@DataSynapse82
#python
#spacy
#nlp
#preprocessing
### utlity function for pre-processing the text
import spacy
# load english language model and create nlp object from it
nlp = spacy.load("en_core_web_sm")
def preprocess(text):
# remove stop words and lemmatize the text
doc = nlp(text)
filtered_tokens = []
for token in doc:
if token.is_stop or token.is_punct:
continue
filtered_tokens.append(token.lemma_)
return " ".join(filtered_tokens)
df['preprocessed_txt'] = df['Text'].apply(preprocess)
content_copyCOPY
Preprocessing text using Spacy package (stop words, small letters, punctuation cleaning) for NLP tasks in Pandas.
Comments