Deal of The Day! Hurry Up, Grab the Special Discount - Save 25% - Ends In 00:00:00 Coupon code: SAVE25
Welcome to Pass4Success

- Free Preparation Discussions

Google Exam Professional-Machine-Learning-Engineer Topic 3 Question 88 Discussion

Actual exam question for Google's Google Professional Machine Learning Engineer exam
Question #: 88
Topic #: 3
[All Google Professional Machine Learning Engineer Questions]

You developed a BigQuery ML linear regressor model by using a training dataset stored in a BigQuery table. New data is added to the table every minute. You are using Cloud Scheduler and Vertex Al Pipelines to automate hourly model training, and use the model for direct inference. The feature preprocessing logic includes quantile bucketization and MinMax scaling on data received in the last hour. You want to minimize storage and computational overhead. What should you do?

Show Suggested Answer Hide Answer
Suggested Answer: D

The best option to minimize storage and computational overhead is to use the TRANSFORM clause in the CREATE MODEL statement in the SQL query to calculate the required statistics. The TRANSFORM clause allows you to specify feature preprocessing logic that applies to both training and prediction. The preprocessing logic is executed in the same query as the model creation, which avoids the need to create and store intermediate tables. The TRANSFORM clause also supports quantile bucketization and MinMax scaling, which are the preprocessing steps required for this scenario. Option A is incorrect because creating a component in the Vertex AI Pipelines DAG to calculate the required statistics may increase the computational overhead, as the component needs to run separately from the model creation. Moreover, the component needs to pass the statistics to subsequent components, which may increase the storage overhead. Option B is incorrect because preprocessing and staging the data in BigQuery prior to feeding it to the model may also increase the storage and computational overhead, as you need to create and maintain additional tables for the preprocessed data. Moreover, you need to ensure that the preprocessing logic is consistent for both training and inference. Option C is incorrect because creating SQL queries to calculate and store the required statistics in separate BigQuery tables may also increase the storage and computational overhead, as you need to create and maintain additional tables for the statistics. Moreover, you need to ensure that the statistics are updated regularly to reflect the new data.Reference:

BigQuery ML documentation

Using the TRANSFORM clause

Feature preprocessing with BigQuery ML


Contribute your Thoughts:

Mitzie
7 days ago
Haha, I bet the exam people are getting a kick out of these questions. Reminds me of that time I had to optimize a machine learning model for a hamster racing league. Good times!
upvoted 0 times
...
Sheron
8 days ago
Hey, is there a way we can automate the entire process, including the data preprocessing, model training, and inference? That would be a real time-saver!
upvoted 0 times
...
Ilene
12 days ago
I'm not sure, but I think option C could also work if we store the statistics in separate tables for reference.
upvoted 0 times
...
Lonny
12 days ago
I think option A is the way to go. Calculating the statistics in the Vertex AI Pipelines DAG and passing them on seems like the most efficient approach. Less data movement and storage required.
upvoted 0 times
...
Izetta
13 days ago
Hmm, I'm not sure about using separate BigQuery tables for the required statistics. That seems a bit complicated and might increase overhead. I'd go with D - the TRANSFORM clause in the CREATE MODEL statement.
upvoted 0 times
Jeannetta
3 days ago
I think D is a good option. It simplifies the process.
upvoted 0 times
...
...
Merlyn
14 days ago
I agree with Jani. Option A seems like the most efficient way to minimize storage and computational overhead.
upvoted 0 times
...
Tomas
20 days ago
Option B looks great! Preprocess and stage the data in BigQuery before feeding it to the model. That way, we can minimize storage and computational overhead during training and inference.
upvoted 0 times
Lorriane
3 days ago
User 3: I agree. It's important to optimize the process for efficiency.
upvoted 0 times
...
Jodi
7 days ago
User 2: That's a good point. It can help minimize storage and computational overhead.
upvoted 0 times
...
Lizette
12 days ago
User 1: Option B looks great! Preprocess and stage the data in BigQuery before feeding it to the model.
upvoted 0 times
...
...
Jani
24 days ago
I think option A is the best choice because it allows us to calculate the required statistics efficiently.
upvoted 0 times
...

Save Cancel
az-700  pass4success  az-104  200-301  200-201  cissp  350-401  350-201  350-501  350-601  350-801  350-901  az-720  az-305  pl-300  

Warning: Cannot modify header information - headers already sent by (output started at /pass.php:70) in /pass.php on line 77