Free preview mode

Enjoy the free questions and consider upgrading to gain full access!

Certified Associate Developer for Apache SparkFree trialFree trial

By databricks
Aug, 2025

Verified

25Q per page

Question 26

Which of the following operations will fail to trigger evaluation?

  • A: DataFrame.collect()
  • B: DataFrame.count()
  • C: DataFrame.first()
  • D: DataFrame.join()
  • E: DataFrame.take()

Question 27

The code block shown below should read a JSON at the file path filePath into a DataFrame with the specified schema schema. Choose the response that correctly fills in the numbered blanks within the code block to complete this task.

Code block:

1.2.3(4).format("csv").5(6)

  • A: 1. spark 2. read() 3. schema 4. schema 5. json 6. filePath
  • B: 1. spark 2. read() 3. json 4. filePath 5. format 6. schema
  • C: 1. spark 2. read() 3. schema 4. schema 5. load 6. filePath
  • D: 1. spark 2. read 3. schema 4. schema 5. load 6. filePath
  • E: 1. spark 2. read 3. format 4. "json" 5. load 6. filePath

Question 28

Which of the following code blocks returns a new DataFrame with a new column customerSatisfactionAbs that is the absolute value of column customerSatisfaction in DataFrame storesDF? Note that column customerSatisfactionAbs is not in the original DataFrame storesDF.

  • A: storesDF.withColumn(“customerSatisfactionAbs”, abs(col(“customerSatisfaction”)))
  • B: storesDF.withColumnRenamed(“customerSatisfactionAbs”, abs(col(“customerSatisfaction”)))
  • C: storesDF.withColumn(col(“customerSatisfactionAbs”, abs(col(“customerSatisfaction”)))
  • D: storesDF.withColumn(“customerSatisfactionAbs”, abs(col(customerSatisfaction)))
  • E: storesDF.withColumn(“customerSatisfactionAbs”, abs(“customerSatisfaction”))

Question 29

Which of the following statements about the Spark driver is true?

  • A: Spark driver is horizontally scaled to increase overall processing throughput.
  • B: Spark driver is the most coarse level of the Spark execution hierarchy.
  • C: Spark driver is fault tolerant — if it fails, it will recover the entire Spark application.
  • D: Spark driver is responsible for scheduling the execution of data by various worker nodes in cluster mode.
  • E: Spark driver is only compatible with its included cluster manager.

Question 30

The code block shown below should write DataFrame storesDF to file path filePath as parquet and partition by values in column division. Choose the response that correctly fills in the numbered blanks within the code block to complete this task.

Code block:

storesDF.1.2(3).4(5)

  • A: 1. write 2. partitionBy 3. “division” 4. path 5. filePath, node = parquet
  • B: 1. write 2. partitionBy 3. “division” 4. parquet 5. filePath
  • C: 1. write 2. partitionBy 3. col(“division”) 4. parquet 5. filePath
  • D: 1. write() 2. partitionBy 3. col(“division”) 4. parquet 5. filePath
  • E: 1. write 2. repartition 3. “division” 4. path 5. filePath, mode = “parquet”

Question 31

Which of the following types of processes induces a stage boundary?

  • A: Shuffle
  • B: Caching
  • C: Executor failure
  • D: Job delegation
  • E: Application failure

Question 32

Which of the following cluster configurations will induce the least network traffic during a shuffle operation?

Image 1

Note: each configuration has roughly the same compute power using 100GB of RAM and 200 cores.

  • A: This cannot be determined without knowing the number of partitions.
  • B: Scenario 5
  • C: Scenario 1
  • D: Scenario 4
  • E: Scenario 6

Question 33

Which of the following describes a partition?

  • A: A partition is the amount of data that fits in a single executor.
  • B: A partition is an automatically-sized segment of data that is used to create efficient logical plans.
  • C: A partition is the amount of data that fits on a single worker node.
  • D: A partition is a portion of a Spark application that is made up of similar jobs.
  • E: A partition is a collection of rows of data that fit on a single machine in a cluster.

Question 34

Which of the following identifies multiple narrow operations that are executed in sequence?

  • A: Slot
  • B: Job
  • C: Stage
  • D: Task
  • E: Executor

Question 35

Which of the following cluster configurations is most likely to experience an out-of-memory error in response to data skew in a single partition?

Image 1

Note: each configuration has roughly the same compute power using 100 GB of RAM and 200 cores.

  • A: Scenario #4
  • B: Scenario #5
  • C: Scenario #6
  • D: More information is needed to determine an answer.
  • E: Scenario #1

Question 36

Spark's execution/deployment mode determines where the driver and executors are physically located when a Spark application is run. Which of the following Spark execution/deployment modes does not exist? If they all exist, please indicate so with Response E.

  • A: Client mode
  • B: Cluster mode
  • C: Standard mode
  • D: Local mode
  • E: All of these execution/deployment modes exist

Question 37

Which of the following will cause a Spark job to fail?

  • A: Never pulling any amount of data onto the driver node.
  • B: Trying to cache data larger than an executor's memory.
  • C: Data needing to spill from memory to disk.
  • D: A failed worker node.
  • E: A failed driver node.

Question 38

Which of the following best describes the similarities and differences between the MEMORY_ONLY storage level and the MEMORY_AND_DISK storage level?

  • A: The MEMORY_ONLY storage level will store as much data as possible in memory and will store any data that does on fit in memory on disk and read it as it's called. The MEMORY_AND_DISK storage level will store as much data as possible in memory and will recompute any data that does not fit in memory as it’s called.
  • B: The MEMORY_ONLY storage level will store as much data as possible in memory on two cluster nodes and will recompute any data that does not fit in memory as it’s called. The MEMORY_AND_DISK storage level will store as much data as possible in memory on two cluster nodes and will store any data that does on fit in memory on disk and read it as it's called.
  • C: The MEMORY_ONLY storage level will store as much data as possible in memory on two cluster nodes and will store any data that does on fit in memory on disk and read it as it's called. The MEMORY_AND_DISK storage level will store as much data as possible in memory on two cluster nodes and will recompute any data that does not fit in memory as it's called.
  • D: The MEMORY_ONLY storage level will store as much data as possible in memory and will recompute any data that does not fit in memory as it's called. The MEMORY_AND_DISK storage level will store as much data as possible in memory and will store any data that does on fit in memory on disk and read it as it's called.
  • E: The MEMORY_ONLY storage level will store as much data as possible in memory and will recompute any data that does not fit in memory as it’s called. The MEMORY_AND_DISK storage level will store half of the data in memory and store half of the memory on disk. This provides quick preview and better logical plan design.

That’s the end of your free questions

You’ve reached the preview limit for Certified Associate Developer for Apache Spark

Consider upgrading to gain full access!

Page 2 of 8 • Questions 26-50 of 186

Free preview mode

Enjoy the free questions and consider upgrading to gain full access!