Utility: Preprocessing text w/ Spacy

PHOTO EMBED

Mon Sep 05 2022 09:41:21 GMT+0000 (Coordinated Universal Time)

Saved by @DataSynapse82 #python #spacy #nlp #preprocessing

### utlity function for pre-processing the text
import spacy

# load english language model and create nlp object from it
nlp = spacy.load("en_core_web_sm") 

def preprocess(text):
    # remove stop words and lemmatize the text
    doc = nlp(text)
    filtered_tokens = []
    for token in doc:
        if token.is_stop or token.is_punct:
            continue
        filtered_tokens.append(token.lemma_)
    
    return " ".join(filtered_tokens) 

df['preprocessed_txt'] = df['Text'].apply(preprocess)
content_copyCOPY

Preprocessing text using Spacy package (stop words, small letters, punctuation cleaning) for NLP tasks in Pandas.