2. Import NewsGroups Dataset

PHOTO EMBED

Thu Jun 03 2021 17:07:09 GMT+0000 (Coordinated Universal Time)

Saved by @randomize_first

# Import Dataset
df = pd.read_json('https://raw.githubusercontent.com/selva86/datasets/master/newsgroups.json')
df = df.loc[df.target_names.isin(['soc.religion.christian', 'rec.sport.hockey', 'talk.politics.mideast', 'rec.motorcycles']) , :]
print(df.shape)  #> (2361, 3)
df.head()
content_copyCOPY

Let’s import the news groups dataset and retain only 4 of the target_names categories.

https://www.machinelearningplus.com/nlp/topic-modeling-visualization-how-to-present-results-lda-models/