Cyber Monday 2024! Hurry Up, Grab the Special Discount - Save 25% - Ends In 00:00:00 Coupon code: SAVE25
Welcome to Pass4Success

- Free Preparation Discussions

Google Exam Professional Data Engineer Topic 4 Question 79 Discussion

Actual exam question for Google's Professional Data Engineer exam
Question #: 79
Topic #: 4
[All Professional Data Engineer Questions]

You want to create a machine learning model using BigQuery ML and create an endpoint foe hosting the model using Vertex Al. This will enable the processing of continuous streaming data in near-real time from multiple vendors. The data may contain invalid values. What should you do?

Show Suggested Answer Hide Answer
Suggested Answer: D

Dataflow provides a scalable and flexible way to process and clean the incoming data in real-time before loading it into BigQuery.


Contribute your Thoughts:

Van
5 months ago
C) But processing data through Cloud Function offers more control and flexibility in data processing, don't you think?
upvoted 0 times
...
Miles
5 months ago
A) True, using an 'ingestion' dataset for training data could help in handling invalid values.
upvoted 0 times
...
Dino
5 months ago
D) Using Dataflow to process and sanitize data before streaming it to BigQuery seems like a reliable option.
upvoted 0 times
...
Steffanie
5 months ago
C) I think creating a Pub/Sub topic and using Cloud Function to process data might be more efficient.
upvoted 0 times
...
Veda
5 months ago
B) But wouldn't it be better to use BigQuery streaming inserts directly into the ML model deployed dataset?
upvoted 0 times
...
Gwenn
6 months ago
A) Create a new BigQuery dataset and use streaming inserts to land the data from multiple vendors. Configure your BigQuery ML model to use the 'ingestion' dataset as the training data.
upvoted 0 times
...
Ricki
6 months ago
What about using Cloud Functions to process the data instead?
upvoted 0 times
...
Truman
7 months ago
I agree with Mari, processing and sanitizing the data before streaming it to BigQuery seems like a better approach.
upvoted 0 times
...
Mari
7 months ago
I disagree, I believe we should create a Pub/Sub topic and use Dataflow to process and sanitize the data.
upvoted 0 times
...
Lenny
7 months ago
I think we should use BigQuery streaming inserts to land the data.
upvoted 0 times
...
Helene
8 months ago
You know, I was initially considering option C, but I think Dataflow might be a better choice here. It's designed for high-throughput, real-time data processing, which sounds like exactly what we need for this use case.
upvoted 0 times
...
Stefania
8 months ago
Hmm, I'm leaning towards option D. Using Pub/Sub to ingest the data and then leveraging Dataflow to process and sanitize it before streaming to BigQuery seems like a robust and scalable solution. Plus, Dataflow can handle the data transformation and cleaning, which is crucial given the potential for invalid values.
upvoted 0 times
Millie
7 months ago
I agree. It's important to ensure the data is clean before it goes into the ML model.
upvoted 0 times
...
Dominque
8 months ago
D) Create a Pub/Sub topic and send all vendor data to it Use Dataflow to process and sanitize the Pub/Sub data and stream it to BigQuery.
upvoted 0 times
...
Dyan
8 months ago
Yeah, Dataflow is great for handling data processing tasks.
upvoted 0 times
...
Niesha
8 months ago
D) Create a Pub/Sub topic and send all vendor data to it Use Dataflow to process and sanitize the Pub/Sub data and stream it to BigQuery.
upvoted 0 times
...
Ozell
8 months ago
That sounds like a solid plan. Dataflow can handle the data cleaning and transformation efficiently.
upvoted 0 times
...
Andree
8 months ago
D) Create a Pub/Sub topic and send all vendor data to it Use Dataflow to process and sanitize the Pub/Sub data and stream it to BigQuery.
upvoted 0 times
...
...
Martha
8 months ago
Haha, for real! Trying to manually process all that vendor data would be a nightmare. Dataflow is definitely the way to go here. Plus, it integrates nicely with Pub/Sub and BigQuery, so the entire pipeline will be neatly tied together.
upvoted 0 times
...
Doyle
8 months ago
This question seems to be testing our understanding of real-time data processing and model deployment on Vertex AI. The key here is to identify the most efficient and scalable solution that can handle continuous streaming data from multiple vendors, while also addressing the issue of invalid values.
upvoted 0 times
...

Save Cancel
az-700  pass4success  az-104  200-301  200-201  cissp  350-401  350-201  350-501  350-601  350-801  350-901  az-720  az-305  pl-300  

Warning: Cannot modify header information - headers already sent by (output started at /pass.php:70) in /pass.php on line 77