Quantifying cardinality In datasets

PHOTO EMBED

Mon Sep 05 2022 09:53:01 GMT+0000 (UTC)

Saved by @DataSynapse82 #python #pandas #dataset #eda #cardibality

data.nunique().plot.bar(figsize=(12,6))
plt.ylabel('Number of unique categories')
plt.xlabel('Variables')
plt.title('Cardinality')

## Version with 5% threshold

fig = label_freq.sort_values(ascending=False).plot.bar()
fig.axhline(y=0.05, color='red')
fig.set_ylabel('percentage of cars within each category')
fig.set_xlabel('Variable: class')
fig.set_title('Identifying Rare Categories')
plt.show()
content_copyCOPY

Nice snippet to find cardinality in categorical features.