Set TabularDataset Schema

PHOTO EMBED

Fri Apr 08 2022 22:56:10 GMT+0000 (Coordinated Universal Time)

Saved by @wessim

from azureml.core import Dataset
from azureml.data.dataset_factory import DataType

# create a TabularDataset from a delimited file behind a public web url and convert column "Survived" to boolean
web_path ='https://dprepdata.blob.core.windows.net/demo/Titanic.csv'
titanic_ds = Dataset.Tabular.from_delimited_files(path=web_path, set_column_types={'Survived': DataType.to_bool()})

# preview the first 3 rows of titanic_ds
titanic_ds.take(3).to_pandas_dataframe()
content_copyCOPY

By default, when you create a TabularDataset, column data types are inferred automatically. If the inferred types don't match your expectations, you can update your dataset schema by specifying column types with the following code. The parameter infer_column_type is only applicable for datasets created from delimited files.

https://docs.microsoft.com/en-us/azure/machine-learning/how-to-create-register-datasets