Google Exam Professional Data Engineer Topic 5 Question 75 Discussion

Actual exam question for Google's Professional Data Engineer exam

Question #: 75
Topic #: 5

[All Professional Data Engineer Questions]

An aerospace company uses a proprietary data format to store its night dat

a. You need to connect this new data source to BigQuery and stream the data into BigQuery. You want to efficiency import the data into BigQuery where consuming as few resources as possible. What should you do?

AUse a standard Dataflow pipeline to store the raw data in BigQuery and then transform the format later when the data is used.

BWrite a shell script that triggers a Cloud Function that performs periodic ETL batch jobs on the new data source

CUse Apache Hive to write a Dataproc job that streams the data into BigQuery in CSV format

DUse an Apache Beam custom connector to write a Dataflow pipeline that streams the data into BigQuery in Avro format

Show Suggested Answer

Suggested Answer: D

by Karl at Dec 21, 2023, 08:28 PM

Limited Time Offer

25%

Off

Get Premium Professional Data Engineer Questions as Interactive Web-Based Practice Test or PDF

Contribute your Thoughts:

Submit Cancel

Paz

8 months ago

Agreed, option D is the way to go. The only thing I'm a bit concerned about is the custom connector. I hope it's well-documented and easy to work with. Otherwise, we might spend more time than we'd like trying to get that set up. But overall, I think it's the most efficient solution.

upvoted 0 times

...

Lettie

8 months ago

I'm not sure about the other options. Using a standard Dataflow pipeline to store the raw data and then transform it later seems like it would waste a lot of resources. And a Dataproc job with Hive? That feels like overkill for this use case. I think the Beam/Dataflow approach is the way to go.

upvoted 0 times

...

Ludivina

8 months ago

I agree, option D does sound like the best approach. The Avro format will be more efficient than CSV, and the Apache Beam custom connector should give us the flexibility we need to handle the proprietary format. Plus, streaming the data directly into BigQuery will be more efficient than storing the raw data first and then transforming it.

upvoted 0 times

Malika

8 months ago

Streaming the data directly into BigQuery will be more efficient than storing the raw data first and then transforming it.

upvoted 0 times

...

Troy

8 months ago

I agree, the Apache Beam custom connector should give us the flexibility we need to handle the proprietary format.

upvoted 0 times

...

Desiree

8 months ago

Option D does sound like the best approach. The Avro format will be more efficient than CSV.

upvoted 0 times

...

Jeanice

8 months ago

Hmm, this is a tricky one. The proprietary data format is definitely a challenge, and we need to find an efficient way to get it into BigQuery. I'm leaning towards option D - using an Apache Beam custom connector to write a Dataflow pipeline that streams the data in Avro format. That way, we can preserve the structure of the data and avoid the overhead of converting to CSV.

upvoted 0 times

...