Cyber Monday 2024! Hurry Up, Grab the Special Discount - Save 25% - Ends In 00:00:00 Coupon code: SAVE25
Welcome to Pass4Success

- Free Preparation Discussions

Amazon Exam MLS-C01 Topic 4 Question 98 Discussion

Actual exam question for Amazon's MLS-C01 exam
Question #: 98
Topic #: 4
[All MLS-C01 Questions]

A machine learning engineer is building a bird classification model. The engineer randomly separates a dataset into a training dataset and a validation dataset. During the training phase, the model achieves very high accuracy. However, the model did not generalize well during validation of the validation dataset. The engineer realizes that the original dataset was imbalanced.

What should the engineer do to improve the validation accuracy of the model?

Show Suggested Answer Hide Answer
Suggested Answer: A

Stratified sampling is a technique that preserves the class distribution of the original dataset when creating a smaller or split dataset. This means that the proportion of examples from each class in the original dataset is maintained in the smaller or split dataset. Stratified sampling can help improve the validation accuracy of the model by ensuring that the validation dataset is representative of the original dataset and not biased towards any class. This can reduce the variance and overfitting of the model and increase its generalization ability. Stratified sampling can be applied to both oversampling and undersampling methods, depending on whether the goal is to increase or decrease the size of the dataset.

The other options are not effective ways to improve the validation accuracy of the model. Acquiring additional data about the majority classes in the original dataset will only increase the imbalance and make the model more biased towards the majority classes. Using a smaller, randomly sampled version of the training dataset will not guarantee that the class distribution is preserved and may result in losing important information from the minority classes. Performing systematic sampling on the original dataset will also not ensure that the class distribution is preserved and may introduce sampling bias if the original dataset is ordered or grouped by class.

References:

* Stratified Sampling for Imbalanced Datasets

* Imbalanced Data

* Tour of Data Sampling Methods for Imbalanced Classification


Contribute your Thoughts:

Von
4 months ago
I'd go with option A. Balancing the dataset is key, and stratified sampling is the way to do it. Unless the engineer is a total ostrich, they should know that.
upvoted 0 times
...
Clay
4 months ago
Ah, the classic imbalanced dataset problem. Kinda like trying to teach a parrot to fly a plane. Stratified sampling is the solution, no doubt.
upvoted 0 times
...
Margarett
4 months ago
Systematic sampling? Really? That's like using a fishing net to catch a butterfly. Stratified sampling is the bird's eye view we need here.
upvoted 0 times
Laurel
2 months ago
D: Using a smaller, randomly sampled version of the training dataset might not address the imbalance issue.
upvoted 0 times
...
Hyun
2 months ago
C: Acquiring additional data about the majority classes could also help improve the model's accuracy.
upvoted 0 times
...
Kendra
3 months ago
B: Yeah, that sounds like a better approach to balance out the classes.
upvoted 0 times
...
Billy
3 months ago
A: I think we should perform stratified sampling on the original dataset.
upvoted 0 times
...
...
King
4 months ago
Acquiring more data for the majority classes? Nah, that's too much work. Just use a smaller sample of the training set, easy peasy.
upvoted 0 times
Ocie
4 months ago
C) Use a smaller, randomly sampled version of the training dataset.
upvoted 0 times
...
Jin
4 months ago
A) Perform stratified sampling on the original dataset.
upvoted 0 times
...
...
Eloisa
4 months ago
Using a smaller, randomly sampled version of the training dataset might also be a good approach.
upvoted 0 times
...
Pura
4 months ago
I believe acquiring additional data about the majority classes could also help.
upvoted 0 times
...
Rebecka
4 months ago
Stratified sampling is definitely the way to go here. Gotta make sure the training and validation sets have the same class distribution.
upvoted 0 times
Davida
3 months ago
Stratified sampling is definitely the way to go here. Gotta make sure the training and validation sets have the same class distribution.
upvoted 0 times
...
Solange
4 months ago
B) Acquire additional data about the majority classes in the original dataset.
upvoted 0 times
...
Josue
4 months ago
A) Perform stratified sampling on the original dataset.
upvoted 0 times
...
...
Laura
5 months ago
I agree with Lezlie, balancing the dataset is crucial for better generalization.
upvoted 0 times
...
Lezlie
5 months ago
I think the engineer should perform stratified sampling.
upvoted 0 times
...

Save Cancel
az-700  pass4success  az-104  200-301  200-201  cissp  350-401  350-201  350-501  350-601  350-801  350-901  az-720  az-305  pl-300  

Warning: Cannot modify header information - headers already sent by (output started at /pass.php:70) in /pass.php on line 77