Cloudera Exam CCA175 Topic 6 Question 54 Discussion

Actual exam question for Cloudera's CCA175 exam

Question #: 54
Topic #: 6

[All CCA175 Questions]

Problem Scenario 69 : Write down a Spark Application using Python,

In which it read a file "Content.txt" (On hdfs) with following content.

And filter out the word which is less than 2 characters and ignore all empty lines.

Once doen store the filtered data in a directory called "problem84" (On hdfs)

Content.txt

Hello this is ABCTECH.com

This is ABYTECH.com

Apache Spark Training

This is Spark Learning Session

Spark is faster than MapReduce

ASolution :
Step 1 : Create an application with following code and store it in problem84.py
# Import SparkContext and SparkConf
from pyspark import SparkContext, SparkConf
# Create configuration object and set App name
conf = SparkConf().setAppName('CCA 175 Problem 84') sc = sparkContext(conf=conf)
#load data from hdfs
contentRDD = sc.textFile(MContent.txt')
#filter out non-empty lines
nonemptyjines = contentRDD.filter(lambda x: len(x) > 0)
#Split line based on space
words = nonempty_lines.ffatMap(lambda x: x.split(''}}
#filter out all 2 letter words
finalRDD = words.filter(lambda x: len(x) > 2)
for word in finalRDD.collect():
print(word)
#Save final data finalRDD.saveAsTextFile('problem84M)
step 2 : Submit this application
spark-submit -master yarn problem84.py

BSolution :
Step 1 : Create an application with following code and store it in problem84.py
# Import SparkContext and SparkConf
from pyspark import SparkContext, SparkConf
# Create configuration object and set App name
conf = SparkConf().setAppName('CCA 175 Problem 84') sc = sparkContext(conf=conf)
#load data from hdfs
print(word)
#Save final data finalRDD.saveAsTextFile('problem84M)
step 2 : Submit this application
spark-submit -master yarn problem84.py

Show Suggested Answer

Suggested Answer: A

by Xuan at Feb 08, 2024, 06:56 PM

Limited Time Offer

25%

Off

Get Premium CCA175 Questions as Interactive Web-Based Practice Test or PDF