Amazon Exam MLS-C01 Topic 3 Question 104 Discussion

Actual exam question for Amazon's MLS-C01 exam

Question #: 104
Topic #: 3

A finance company needs to forecast the price of a commodity. The company has compiled a dataset of historical daily prices. A data scientist must train various forecasting models on 80% of the dataset and must validate the efficacy of those models on the remaining 20% of the dataset.

What should the data scientist split the dataset into a training dataset and a validation dataset to compare model performance?

APick a date so that 80% to the data points precede the date Assign that group of data points as the training dataset. Assign all the remaining data points to the validation dataset.

BPick a date so that 80% of the data points occur after the date. Assign that group of data points as the training dataset. Assign all the remaining data points to the validation dataset.

CStarting from the earliest date in the dataset. pick eight data points for the training dataset and two data points for the validation dataset. Repeat this stratified sampling until no data points remain.

DSample data points randomly without replacement so that 80% of the data points are in the training dataset. Assign all the remaining data points to the validation dataset.

Show Suggested Answer

Suggested Answer: A

AComprehensive Explanation: The best way to split the dataset into a training dataset and a validation dataset is to pick a date so that 80% of the data points precede the date and assign that group of data points as the training dataset. This method preserves the temporal order of the data and ensures that the validation dataset reflects the most recent trends and patterns in the commodity price. This is important for forecasting models that rely on time series analysis and sequential data. The other methods would either introduce bias or lose information by ignoring the temporal structure of the data.

References:

Time Series Forecasting - Amazon SageMaker

Time Series Splitting - scikit-learn

Time Series Forecasting - Towards Data Science

by Albina at Sep 15, 2024, 01:55 PM

Limited Time Offer

25%

Off

Get Premium MLS-C01 Questions as Interactive Web-Based Practice Test or PDF

Contribute your Thoughts:

Submit Cancel

Johnna

2 months ago

Wait, are we sure the answer isn't B? Because if it's not, I'm going to be kicking myself for the rest of the day. Option B all the way!

upvoted 0 times

...

3 months ago

Option B makes the most sense. We want the training data to come first in time, so the model can learn from the past and then be validated on the future data.

upvoted 0 times