Set TabularDataset Schema

PHOTO

Fri Apr 08 2022 22:56:10 GMT+0000 (Coordinated Universal Time)

from azureml.core import Dataset
from azureml.data.dataset_factory import DataType

# create a TabularDataset from a delimited file behind a public web url and convert column "Survived" to boolean
web_path ='https://dprepdata.blob.core.windows.net/demo/Titanic.csv'
titanic_ds = Dataset.Tabular.from_delimited_files(path=web_path, set_column_types={'Survived': DataType.to_bool()})

# preview the first 3 rows of titanic_ds
titanic_ds.take(3).to_pandas_dataframe()

COPY

By default, when you create a TabularDataset, column data types are inferred automatically. If the inferred types don't match your expectations, you can update your dataset schema by specifying column types with the following code. The parameter infer_column_type is only applicable for datasets created from delimited files.

https://docs.microsoft.com/en-us/azure/machine-learning/how-to-create-register-datasets