Google Exam Professional Data Engineer Topic 4 Question 79 Discussion

Actual exam question for Google's Professional Data Engineer exam

Question #: 79
Topic #: 4

[All Professional Data Engineer Questions]

You want to create a machine learning model using BigQuery ML and create an endpoint foe hosting the model using Vertex Al. This will enable the processing of continuous streaming data in near-real time from multiple vendors. The data may contain invalid values. What should you do?

ACreate a new BigOuery dataset and use streaming inserts to land the data from multiple vendors. Configure your BigQuery ML model to use the 'ingestion' dataset as the training data.

BUse BigQuery streaming inserts to land the data from multiple vendors whore your BigQuery dataset ML model is deployed.

CCreate a Pub'Sub topic and send all vendor data to it Connect a Cloud Function to the topic to process the data and store it in BigQuery.

DCreate a Pub/Sub topic and send all vendor data to it Use Dataflow to process and sanitize the Pub/Sub data and stream it to BigQuery.

Show Suggested Answer

Suggested Answer: D

Dataflow provides a scalable and flexible way to process and clean the incoming data in real-time before loading it into BigQuery.

by Cristy at Mar 15, 2024, 07:12 AM

Limited Time Offer

25%

Off

Get Premium Professional Data Engineer Questions as Interactive Web-Based Practice Test or PDF

Contribute your Thoughts:

Submit Cancel

Van

5 months ago

C) But processing data through Cloud Function offers more control and flexibility in data processing, don't you think?

upvoted 0 times

...

Miles

5 months ago

A) True, using an 'ingestion' dataset for training data could help in handling invalid values.

upvoted 0 times

...

Dino

5 months ago

D) Using Dataflow to process and sanitize data before streaming it to BigQuery seems like a reliable option.

upvoted 0 times

...

Steffanie

5 months ago

C) I think creating a Pub/Sub topic and using Cloud Function to process data might be more efficient.

upvoted 0 times

...

Veda

5 months ago

B) But wouldn't it be better to use BigQuery streaming inserts directly into the ML model deployed dataset?

upvoted 0 times

...

Gwenn

6 months ago

A) Create a new BigQuery dataset and use streaming inserts to land the data from multiple vendors. Configure your BigQuery ML model to use the 'ingestion' dataset as the training data.

upvoted 0 times

...

Ricki

6 months ago

What about using Cloud Functions to process the data instead?

upvoted 0 times

...

Truman

7 months ago

I agree with Mari, processing and sanitizing the data before streaming it to BigQuery seems like a better approach.

upvoted 0 times

...

Mari

7 months ago

I disagree, I believe we should create a Pub/Sub topic and use Dataflow to process and sanitize the data.

upvoted 0 times

...

Lenny

7 months ago

I think we should use BigQuery streaming inserts to land the data.

upvoted 0 times

...

Helene

8 months ago

You know, I was initially considering option C, but I think Dataflow might be a better choice here. It's designed for high-throughput, real-time data processing, which sounds like exactly what we need for this use case.

upvoted 0 times

...

Stefania

8 months ago

Hmm, I'm leaning towards option D. Using Pub/Sub to ingest the data and then leveraging Dataflow to process and sanitize it before streaming to BigQuery seems like a robust and scalable solution. Plus, Dataflow can handle the data transformation and cleaning, which is crucial given the potential for invalid values.

upvoted 0 times

Millie

7 months ago

I agree. It's important to ensure the data is clean before it goes into the ML model.

upvoted 0 times

...

Dominque

8 months ago

D) Create a Pub/Sub topic and send all vendor data to it Use Dataflow to process and sanitize the Pub/Sub data and stream it to BigQuery.

upvoted 0 times

...

Dyan

8 months ago

Yeah, Dataflow is great for handling data processing tasks.

upvoted 0 times

...

Niesha

8 months ago

D) Create a Pub/Sub topic and send all vendor data to it Use Dataflow to process and sanitize the Pub/Sub data and stream it to BigQuery.

upvoted 0 times

...

Ozell

8 months ago

That sounds like a solid plan. Dataflow can handle the data cleaning and transformation efficiently.

upvoted 0 times

...

Andree

8 months ago

D) Create a Pub/Sub topic and send all vendor data to it Use Dataflow to process and sanitize the Pub/Sub data and stream it to BigQuery.

upvoted 0 times

...

Martha

8 months ago

Haha, for real! Trying to manually process all that vendor data would be a nightmare. Dataflow is definitely the way to go here. Plus, it integrates nicely with Pub/Sub and BigQuery, so the entire pipeline will be neatly tied together.

upvoted 0 times

...

Doyle

8 months ago

This question seems to be testing our understanding of real-time data processing and model deployment on Vertex AI. The key here is to identify the most efficient and scalable solution that can handle continuous streaming data from multiple vendors, while also addressing the issue of invalid values.

upvoted 0 times

...