Oh, come on! It's obviously data locality. The customer is only using a small subset of the data, so the cluster is probably wasting time trying to access data that's not even being used. Classic case of poor data locality.
Hmm, I'm not so sure about that. Doesn't data availability seem like the more likely culprit? If the cluster doesn't have access to the full dataset, that could definitely slow things down.
Kristian
5 days agoMitsue
6 days agoSharee
10 days agoSalina
12 days agoMerilyn
15 days agoClay
17 days ago