progress bar for read_excel
Tue Dec 14 2021 02:26:32 GMT+0000 (Coordinated Universal Time)
Saved by [deleted user] #python
https://stackoverflow.com/questions/65557310/pysimplegui-add-progress-bar-for-pandas-pd-read-excelTue Dec 14 2021 02:26:32 GMT+0000 (Coordinated Universal Time)
Saved by [deleted user] #python
https://stackoverflow.com/questions/65557310/pysimplegui-add-progress-bar-for-pandas-pd-read-excel
Comments
@ffedor - Mon Sep 26 2022 10:17:17 GMT+0000 (Coordinated Universal Time)1. slower than single read_excel by chunks time.. e.g. pointless 2. myexcel = pd.concat doesn't work properly, since all column's names in chunks are differnet (only 1st chunk takes proper header). coulnd't fix it by specifying usecols (goe Exeption), but fixed using 'names' parameter of read_excel...
def read_excel_pgbar(excel_path, sheet_name, chunksize: int, names, dtype: object) -> pd.DataFrame: try: # print('Getting row count of excel file') wb = load_workbook(excel_path, read_only=True) if type(sheet_name) == int: sheet = wb.worksheets[int(sheet_name)] else: sheet = wb[sheet_name] rows = sheet.max_row chunks = rows // chunksize + 1 print() # print('Reading excel file') chunk_list = [] for i in tqdm(range(chunks), desc='# Chunks read: '): the_header = lambda i: 0 if (i < 1) else None tmp = pd.read_excel(excel_path, sheet_name=sheet_name, nrows=chunksize, skiprows=[k for k in range(i * chunksize)], dtype=dtype, names=names) chunk_list.append(tmp) myexcel = pd.concat((f for f in chunk_list), axis=0, ignore_index=True) print('Finish reading excel file') return myexcel except InvalidFileException: raise FileNotFoundError