Google Exam Professional Data Engineer Topic 2 Question 93 Discussion

Actual exam question for Google's Professional Data Engineer exam

Question #: 93
Topic #: 2

[All Professional Data Engineer Questions]

Your company's customer_order table in BigOuery stores the order history for 10 million customers, with a table size of 10 PB. You need to create a dashboard for the support team to view the order history. The dashboard has two filters, countryname and username. Both are string data types in the BigQuery table. When a filter is applied, the dashboard fetches the order history from the table and displays the query results. However, the dashboard is slow to show the results when applying the filters to the following query:

How should you redesign the BigQuery table to support faster access?

ACluster the table by country field, and partition by username field.

BPartition the table by country and username fields.

CCluster the table by country and username fields

DPartition the table by _PARTITIONTIME.

Show Suggested Answer

Suggested Answer: C

To improve the performance of querying a large BigQuery table with filters on countryname and username, clustering the table by these fields is the most effective approach. Here's why option C is the best choice:

Clustering in BigQuery:

Clustering organizes data based on the values in specified columns. This can significantly improve query performance by reducing the amount of data scanned during query execution.

Clustering by countryname and username means that data is physically sorted and stored together based on these fields, allowing BigQuery to quickly locate and read only the relevant data for queries using these filters.

Filter Efficiency:

With the table clustered by countryname and username, queries that filter on these columns can benefit from efficient data retrieval, reducing the amount of data processed and speeding up query execution.

This directly addresses the performance issue of the dashboard queries that apply filters on these fields.

Steps to Implement:

Redesign the Table:

Create a new table with clustering on countryname and username:

CREATE TABLE project.dataset.new_table

CLUSTER BY countryname, username AS

SELECT * FROM project.dataset.customer_order;

Migrate Data:

Transfer the existing data from the original table to the new clustered table.

Update Queries:

Modify the dashboard queries to reference the new clustered table.

BigQuery Clustering Documentation

Optimizing Query Performance

by Carline at Sep 14, 2024, 10:55 PM

Limited Time Offer

25%

Off

Get Premium Professional Data Engineer Questions as Interactive Web-Based Practice Test or PDF

Contribute your Thoughts:

Submit Cancel

Jeffrey

2 months ago

I bet the support team is getting tired of waiting for those dashboard results. They should probably invest in a coffee maker.

upvoted 0 times

...

Paola

2 months ago

I'm just glad I don't have to worry about 10 PB of data. That's a lot of orders!

upvoted 0 times

Luke

1 months ago

C) Cluster the table by country and username fields

upvoted 0 times

...

Sonia

1 months ago

I'm just glad I don't have to worry about 10 PB of data. That's a lot of orders!

upvoted 0 times

...

Brynn

1 months ago

A) Cluster the table by country field, and partition by username field.

upvoted 0 times

...

Luisa

2 months ago

Option C is interesting, but I'm not sure it would be faster than partitioning. Partitioning just seems more straightforward for this use case.

upvoted 0 times

Lauran

1 months ago

I agree, partitioning seems like the most straightforward solution for faster access.

upvoted 0 times

...

Marylyn

1 months ago

Partitioning by _PARTITIONTIME might not be as effective for this specific use case.

upvoted 0 times

...

Ciara

1 months ago

But clustering the table by country and username fields could also improve performance.

upvoted 0 times

...

Lettie

2 months ago

I think partitioning the table by country and username fields would be the best option.

upvoted 0 times

...

Javier

2 months ago

But clustering by country and partitioning by username could help reduce the data scanned when applying filters.

upvoted 0 times

...

Louann

2 months ago

Partitioning by _PARTITIONTIME could work, but it won't be very helpful for these specific filters. I'd go with option B.

upvoted 0 times

Francesco

1 months ago

Definitely, partitioning by country and username fields seems like the best choice here.

upvoted 0 times

...

Sheldon

1 months ago

That sounds like a good idea. It should help speed up the dashboard.

upvoted 0 times

...

Emile

2 months ago

Option B) Partition the table by country and username fields.

upvoted 0 times

...

Callie

3 months ago

I disagree, I believe partitioning the table by country and username fields would be more efficient.

upvoted 0 times

...

Kristine

3 months ago

Clustering the table by country and username sounds like a good option too. It might be more efficient than partitioning, especially if the filter values are not evenly distributed.

upvoted 0 times

...

Launa

3 months ago

I think partitioning the table by both country and username fields is the way to go. That should provide fast access to the data when applying the filters.

upvoted 0 times

Avery

1 months ago

I think partitioning the table by both country and username fields is the way to go. That should provide fast access to the data when applying the filters.

upvoted 0 times

...