Loading questions...
Updated
Which of the following DataFrame operations is classified as a wide transformation?
The code block shown below contains an error. The code block is intended to return the exact number of distinct values in column division in DataFrame storesDF. Identify the error.
Code block:
storesDF.agg(approx_count_distinct(col(“division”)).alias(“divisionDistinct”))
Which of the following code blocks returns the number of rows in DataFrame storesDF for each distinct combination of values in column division and column storeCategory?
The code block shown below contains an error. The code block is intended to return a collection of summary statistics for column sqft in Data Frame storesDF. Identify the error.
Code block:
storesDF.describes(col(“sgft”))
The code block shown below should extract the integer value for column sqft from the first row of DataFrame storesDF. Choose the response that correctly fills in the numbered blanks within the code block to complete this task.
Code block:
1.2.3Int
The code block shown below should print the schema of DataFrame storesDF. Choose the response that correctly fills in the numbered blanks within the code block to complete this task.
Code block:
1.2
The code block shown below contains an error. The code block is intended to create and register a SQL UDF named “ASSESS_PERFORMANCE” using the Scala function assessPerformance() and apply it to column customerSatisfaction in the table stores. Identify the error.
Code block:
spark.udf.register(“ASSESS_PERFORMANCE”, assessPerforance)
spark.sql(“SELECT customerSatisfaction, assessPerformance(customerSatisfaction) AS result FROM stores”)
The code block shown below contains an error. The code block is intended to create the Scala UDF assessPerformanceUDF() and apply it to the integer column customers1t1sfaction in Data Frame storesDF. Identify the error.
Code block:
The code block shown below should create a single-column DataFrame from Scala list years which is made up of integers. Choose the response that correctly fills in the numbered blanks within the code block to complete this task.
Code block:
1.2(3).4
The code block shown below should cache DataFrame storesDF only in Spark's memory. Choose the response that correctly fil ls in the numbered blanks within the code block to complete this task.
Code block:
1.2(3).count()
Which of the following code blocks returns a DataFrame containing a column month, an integer representation of the day of the year from column openDate from DataFrame storesDF.
Note that column openDate is of type integer and represents a date in the UNIX epoch format – the number of seconds since midnight on January 1st, 1970.
A sample of storesDF is displayed below:
Which of the following describes the difference between cluster and client execution modes?
The code block shown below contains an error. The code block intended to return a new DataFrame that is the result of an inner join between DataFrame storesDF and DataFrame employeesDF on column storeId. Identify the error.
Code block:
StoresDF.join(employeesDF, Seq("storeId")
Which of the following pairs of arguments cannot be used in DataFrame.join() to perform an inner join on two DataFrames, named and aliased with "a" and "b" respectively, to specify two key columns column1 and column2?
The code block shown below contains an error. The code block is intended to return a new DataFrame that is the result of a position-wise union between DataFrame storesDF and DataFrame acquiredStoresDF.
Which of the following code blocks writes DataFrame storesDF to file path filePath as parquet overwriting any existing files in that location?
Which of the following code blocks reads a CSV at the file path filePath into a Data Frame with the specified schema schema?
Which of the following code blocks returns a DataFrame containing only the rows from DataFrame storesDF where the value in column sqft is less than or equal to 25,000 AND the value in column customerSatisfaction is greater than or equal to 30?
Which of the following sets of DataFrame methods will both return a new DataFrame only containing rows that meet a specified logical condition?
The code block shown below should return a DataFrame containing all columns from DataFrame storesDF except for column sqft and column customerSatisfaction. Choose the response that correctly fills in the numbered blanks within the code block to complete this task.
Code block:
1.2(3)
Which of the following describes the difference between DataFrame.repartition(n) and DataFrame.coalesce(n)?
Which of the following cluster configurations is most likely to experience delays due to garbage collection of a large Dataframe?
Which of the following statements about Spark’s stability is incorrect?
Which of the following DataFrame operations is classified as a transformation?
Which of the following describes the Spark driver?
Code block:
stored.withColumn(“openTimestamp”, col(“openDate”).cast(1))
.withColumn(2, 3(4))
Note: each configuration has roughly the same compute power using 100GB of RAM and 200 cores.