DataFrame_Categorical_Imputer

PHOTO EMBED

Thu Aug 24 2023 15:45:50 GMT+0000 (Coordinated Universal Time)

Saved by @sumikk ##partialdependencyplot #info.column_information(df)info.agg_tabulation(df)info.num_count_summary(df) info.statistical_summary(df)

class DataFrame_Categorical_Imputer():
    

    def __init__(self):
        
        
        print("Imputation object created")
        
        
        
    def fit(self, data):
        
        
        """
        This method will fit 
        impute mode value for 
        all missing categoriical 
        variables
        """
        

        self.fill = pd.Series([data[column].\
                        value_counts().index[0]
            if data[column].dtype == np.dtype('O') else \
                 data[column].mode() for column in data],
            index=data.columns)

        return self
content_copyCOPY

This method is applicable for categorical variables, where you have a list of finite values. We can impute with the most frequent value. It is possible, if values are Nominal and Ordinal categorical values. Unfortunately this method doesn’t handle correlation between features and there is a possibility of introducing bias in the data. If the category values are not balanced than you are likely to introduce bias in the data. So make sure that our independent variables are balanced, if it is balanced we can impute with most frequent value.