Create a FilteDataset from web url

PHOTO EMBED

Fri Apr 08 2022 22:49:56 GMT+0000 (Coordinated Universal Time)

Saved by @wessim

from azureml.core import Workspace, Datastore, Dataset

# create a FileDataset pointing to files in 'animals' folder and its subfolders recursively
datastore_paths = [(datastore, 'animals')]
animal_ds = Dataset.File.from_files(path=datastore_paths)

# create a FileDataset from image and label files behind public web urls
web_paths = ['https://azureopendatastorage.blob.core.windows.net/mnist/train-images-idx3-ubyte.gz',
             'https://azureopendatastorage.blob.core.windows.net/mnist/train-labels-idx1-ubyte.gz']
mnist_ds = Dataset.File.from_files(path=web_paths)
content_copyCOPY

A FileDataset references single or multiple files in your datastores or public URLs. If your data is already cleansed, and ready to use in training experiments, you can download or mount the files to your compute as a FileDataset object. We recommend FileDatasets for your machine learning workflows, since the source files can be in any format, which enables a wider range of machine learning scenarios, including deep learning. Create a FileDataset with the Python SDK or the Azure Machine Learning studio . If your storage is behind a virtual network or firewall, set the parameter validate=False in your from_files() method. This bypasses the initial validation step, and ensures that you can create your dataset from these secure files.

https://docs.microsoft.com/en-us/azure/machine-learning/how-to-create-register-datasets