Free preview mode
Enjoy the free questions and consider upgrading to gain full access!
Certified Associate Developer for Apache Spark
Free trial
Verified
Question 26
Which of the following operations will fail to trigger evaluation?
- A: DataFrame.collect()
- B: DataFrame.count()
- C: DataFrame.first()
- D: DataFrame.join()
- E: DataFrame.take()
Question 27
The code block shown below should read a JSON at the file path filePath into a DataFrame with the specified schema schema. Choose the response that correctly fills in the numbered blanks within the code block to complete this task.
Code block:
1.2.3(4).format("csv").5(6)
- A: 1. spark 2. read() 3. schema 4. schema 5. json 6. filePath
- B: 1. spark 2. read() 3. json 4. filePath 5. format 6. schema
- C: 1. spark 2. read() 3. schema 4. schema 5. load 6. filePath
- D: 1. spark 2. read 3. schema 4. schema 5. load 6. filePath
- E: 1. spark 2. read 3. format 4. "json" 5. load 6. filePath
Question 28
Which of the following code blocks returns a new DataFrame with a new column customerSatisfactionAbs that is the absolute value of column customerSatisfaction in DataFrame storesDF? Note that column customerSatisfactionAbs is not in the original DataFrame storesDF.
- A: storesDF.withColumn(“customerSatisfactionAbs”, abs(col(“customerSatisfaction”)))
- B: storesDF.withColumnRenamed(“customerSatisfactionAbs”, abs(col(“customerSatisfaction”)))
- C: storesDF.withColumn(col(“customerSatisfactionAbs”, abs(col(“customerSatisfaction”)))
- D: storesDF.withColumn(“customerSatisfactionAbs”, abs(col(customerSatisfaction)))
- E: storesDF.withColumn(“customerSatisfactionAbs”, abs(“customerSatisfaction”))
Question 29
Which of the following statements about the Spark driver is true?
- A: Spark driver is horizontally scaled to increase overall processing throughput.
- B: Spark driver is the most coarse level of the Spark execution hierarchy.
- C: Spark driver is fault tolerant — if it fails, it will recover the entire Spark application.
- D: Spark driver is responsible for scheduling the execution of data by various worker nodes in cluster mode.
- E: Spark driver is only compatible with its included cluster manager.
Question 30
The code block shown below should write DataFrame storesDF to file path filePath as parquet and partition by values in column division. Choose the response that correctly fills in the numbered blanks within the code block to complete this task.
Code block:
storesDF.1.2(3).4(5)
- A: 1. write 2. partitionBy 3. “division” 4. path 5. filePath, node = parquet
- B: 1. write 2. partitionBy 3. “division” 4. parquet 5. filePath
- C: 1. write 2. partitionBy 3. col(“division”) 4. parquet 5. filePath
- D: 1. write() 2. partitionBy 3. col(“division”) 4. parquet 5. filePath
- E: 1. write 2. repartition 3. “division” 4. path 5. filePath, mode = “parquet”
Question 31
Which of the following types of processes induces a stage boundary?
- A: Shuffle
- B: Caching
- C: Executor failure
- D: Job delegation
- E: Application failure
Question 32
Which of the following cluster configurations will induce the least network traffic during a shuffle operation?
Note: each configuration has roughly the same compute power using 100GB of RAM and 200 cores.
- A: This cannot be determined without knowing the number of partitions.
- B: Scenario 5
- C: Scenario 1
- D: Scenario 4
- E: Scenario 6
Question 33
Which of the following describes a partition?
- A: A partition is the amount of data that fits in a single executor.
- B: A partition is an automatically-sized segment of data that is used to create efficient logical plans.
- C: A partition is the amount of data that fits on a single worker node.
- D: A partition is a portion of a Spark application that is made up of similar jobs.
- E: A partition is a collection of rows of data that fit on a single machine in a cluster.
Question 34
Which of the following identifies multiple narrow operations that are executed in sequence?
- A: Slot
- B: Job
- C: Stage
- D: Task
- E: Executor
Question 35
Which of the following cluster configurations is most likely to experience an out-of-memory error in response to data skew in a single partition?
Note: each configuration has roughly the same compute power using 100 GB of RAM and 200 cores.
- A: Scenario #4
- B: Scenario #5
- C: Scenario #6
- D: More information is needed to determine an answer.
- E: Scenario #1
Question 36
Spark's execution/deployment mode determines where the driver and executors are physically located when a Spark application is run. Which of the following Spark execution/deployment modes does not exist? If they all exist, please indicate so with Response E.
- A: Client mode
- B: Cluster mode
- C: Standard mode
- D: Local mode
- E: All of these execution/deployment modes exist
Question 37
Which of the following will cause a Spark job to fail?
- A: Never pulling any amount of data onto the driver node.
- B: Trying to cache data larger than an executor's memory.
- C: Data needing to spill from memory to disk.
- D: A failed worker node.
- E: A failed driver node.
Question 38
Which of the following best describes the similarities and differences between the MEMORY_ONLY storage level and the MEMORY_AND_DISK storage level?
- A: The MEMORY_ONLY storage level will store as much data as possible in memory and will store any data that does on fit in memory on disk and read it as it's called. The MEMORY_AND_DISK storage level will store as much data as possible in memory and will recompute any data that does not fit in memory as it’s called.
- B: The MEMORY_ONLY storage level will store as much data as possible in memory on two cluster nodes and will recompute any data that does not fit in memory as it’s called. The MEMORY_AND_DISK storage level will store as much data as possible in memory on two cluster nodes and will store any data that does on fit in memory on disk and read it as it's called.
- C: The MEMORY_ONLY storage level will store as much data as possible in memory on two cluster nodes and will store any data that does on fit in memory on disk and read it as it's called. The MEMORY_AND_DISK storage level will store as much data as possible in memory on two cluster nodes and will recompute any data that does not fit in memory as it's called.
- D: The MEMORY_ONLY storage level will store as much data as possible in memory and will recompute any data that does not fit in memory as it's called. The MEMORY_AND_DISK storage level will store as much data as possible in memory and will store any data that does on fit in memory on disk and read it as it's called.
- E: The MEMORY_ONLY storage level will store as much data as possible in memory and will recompute any data that does not fit in memory as it’s called. The MEMORY_AND_DISK storage level will store half of the data in memory and store half of the memory on disk. This provides quick preview and better logical plan design.
That’s the end of your free questions
You’ve reached the preview limit for Certified Associate Developer for Apache SparkConsider upgrading to gain full access!
Free preview mode
Enjoy the free questions and consider upgrading to gain full access!