Google Exam Professional Data Engineer Topic 4 Question 74 Discussion

Actual exam question for Google's Professional Data Engineer exam

Question #: 74
Topic #: 4

[All Professional Data Engineer Questions]

You are collecting loT sensor data from millions of devices across the world and storing the data in BigQuery. Your access pattern is based on recent data tittered by location_id and device_version with the following query:

You want to optimize your queries for cost and performance. How should you structure your data?

APartition table data by create_date, location_id and device_version

BPartition table data by create_date cluster table data by tocation_id and device_version

CCluster table data by create_date location_id and device_version

DCluster table data by create_date, partition by location and device_version

Show Suggested Answer

Suggested Answer: C

by Louisa at Dec 13, 2023, 07:16 PM

Limited Time Offer

25%

Off

Get Premium Professional Data Engineer Questions as Interactive Web-Based Practice Test or PDF

Contribute your Thoughts:

Submit Cancel

Terrilyn

1 years ago

That's a good point, Candida. I was also considering option B, but I'm a little concerned about the potential for data skew if some locations or device versions are much more heavily used than others.

upvoted 0 times

Tayna

1 years ago

C: Good point, we should weigh the benefits of both before making a decision.

upvoted 0 times

...

Rebbecca

1 years ago

B: True, but we should consider the potential for data skew with clustering.

upvoted 0 times

...

Rory

1 years ago

A: It could, but partitioning can also help with organizing the data efficiently.

upvoted 0 times

...

Cordie

1 years ago

D: I think clustering would further improve query performance.

upvoted 0 times

...

Amie

1 years ago

C: But what about clustering the table data by create_date, location_id and device_version?

upvoted 0 times

...

Shakira

1 years ago

B: I agree, that would help optimize the queries for cost and performance.

upvoted 0 times

...

Dalene

1 years ago

A: You should partition table data by create_date, location_id and device_version.

upvoted 0 times

...

Candida

1 years ago

Hmm, let me think this through. I'm leaning towards option B because partitioning by create_date and clustering by location_id and device_version seems like it could give us the best of both worlds in terms of querying efficiency.

upvoted 0 times

...

Hyman

1 years ago

Haha, this is starting to sound like a real-life engineering meeting. I'm glad we're all putting in the effort to think this through carefully.

upvoted 0 times

...

Cassie

1 years ago

Ah, good catch, Michael. That's a really important consideration. Maybe option D could be a better choice, with clustering by create_date and partitioning by location and device_version?

upvoted 0 times

...