A machine learning engineer is migrating a machine learning pipeline to use Databricks Machine Learning. They have programmatically identified the best run from an MLflow Experiment and stored its URI in the model_uri variable and its Run ID in the run_id variable. They have also determined that the model was logged with the name "model". Now, the machine learning engineer wants to register that model in the MLflow Model Registry with the name "best_model".
Which of the following lines of code can they use to register the model to the MLflow Model Registry?
A machine learning engineer is manually refreshing a model in an existing machine learning pipeline. The pipeline uses the MLflow Model Registry model "project". The machine learning engineer would like to add a new version of the model to "project".
Which of the following MLflow operations can the machine learning engineer use to accomplish this task?
Amlflow.register_model
BMlflowClient.update_registered_model
Cmlflow.add_model_version
DMlflowClient.get_model_version
EThe machine learning engineer needs to create an entirely new MLflow Model Registry model
Which of the following describes the purpose of the context parameter in the predict method of Python models for MLflow?
AThe context parameter allows the user to specify which version of the registered MLflow Model should be used based on the given application's current scenario
BThe context parameter allows the user to document the performance of a model after it has been deployed
CThe context parameter allows the user to include relevant details of the business case to allow downstream users to understand the purpose of the model
DThe context parameter allows the user to provide the model with completely custom if-else logic for the given application's current scenario
EThe context parameter allows the user to provide the model access to objects like preprocessing models or custom configuration files
A machine learning engineer has developed a model and registered it using the FeatureStoreClient fs. The model has model URI model_uri. The engineer now needs to perform batch inference on customer-level Spark DataFrame spark_df, but it is missing a few of the static features that were used when training the model. The customer_id column is the primary key of spark_df and the training set used when training and logging the model.
Which of the following code blocks can be used to compute predictions for spark_df when the missing feature values can be found in the Feature Store by searching for features by customer_id?
A machine learning engineer wants to move their model version model_version for the MLflow Model Registry model model from the Staging stage to the Production stage using MLflow Client client.
Which of the following code blocks can they use to accomplish the task?
A
B
C
D
E
Question 6
Advanced MLflow Usage
0
Question 7
Validation Testing
Question 8
Custom Model Serving
Question 9
Scaling and Tuning
Question 10
Deployment Strategies
Question 11
Deployment Strategies
Question 12
Advanced MLflow Usage
Question 13
Advanced MLflow Usage
Question 14
Custom Model Serving
Question 15
Advanced MLflow Usage
Question 16
Deployment Strategies
Question 17
Advanced MLflow Usage
Question 18
Deployment Strategies
Question 19
Model Lifecycle Management
Question 20
Advanced MLflow Usage
Question 21
Deployment Strategies
Question 22
Advanced Feature Store Concepts
Question 23
Validation Testing
Question 24
Advanced MLflow Usage
Question 25
Drift Detection and Lakehouse Monitoring
Ask AstroTutor
0
Ask AstroTutor
0
Ask AstroTutor
0
Ask AstroTutor
0
Ask AstroTutor
0
Ask AstroTutor
0
Ask AstroTutor
0
Ad
Want a break from the ads?
Become a Supporter and enjoy a completely ad-free experience, plus unlock Learn Mode, Exam Mode, AstroTutor AI, and more.
Ask AstroTutor
0
Ask AstroTutor
0
Ask AstroTutor
0
Ask AstroTutor
0
Ask AstroTutor
0
Ask AstroTutor
0
Ask AstroTutor
0
Ask AstroTutor
0
Ask AstroTutor
0
Ask AstroTutor
0
Ask AstroTutor
0
Ask AstroTutor
0
Which of the following describes the concept of MLflow Model flavors?
AA convention that deployment tools can use to wrap preprocessing logic into a Model
BA convention that MLflow Model Registry can use to version models
CA convention that MLflow Experiments can use to organize their Runs by project
DA convention that deployment tools can use to understand the model
EA convention that MLflow Model Registry can use to organize its Models by project
In a continuous integration, continuous deployment (CI/CD) process for machine learning pipelines, which of the following events commonly triggers the execution of automated testing?
AThe launch of a new cost-efficient SQL endpoint
BCI/CD pipelines are not needed for machine learning pipelines
CThe arrival of a new feature table in the Feature Store
DThe launch of a new cost-efficient job cluster
EThe arrival of a new model version in the MLflow Model Registry
A machine learning engineer has developed a random forest model using scikit-learn, logged the model using MLflow as random_forest_model, and stored its run ID in the run_id Python variable. They now want to deploy that model by performing batch inference on a Spark DataFrame spark_df.
Which of the following code blocks can they use to create a function called predict that they can use to complete the task?
A
BIt is not possible to deploy a scikit-learn model on a Spark DataFrame.
C
D
E
A machine learning engineering team has written predictions computed in a batch job to a Delta table for querying. However, the team has noticed that the querying is running slowly. The team has already tuned the size of the data files. Upon investigating, the team has concluded that the rows meeting the query condition are sparsely located throughout each of the data files.
Based on the scenario, which of the following optimization techniques could speed up the query by colocating similar records while considering values in multiple columns?
AZ-Ordering
BBin-packing
CWrite as a Parquet file
DData skipping
ETuning the file size
A machine learning engineer needs to deliver predictions of a machine learning model in real-time. However, the feature values needed for computing the predictions are available one week before the query time.
Which of the following is a benefit of using a batch serving deployment in this scenario rather than a real-time serving deployment where predictions are computed at query time?
ABatch serving has built-in capabilities in Databricks Machine Learning
BThere is no advantage to using batch serving deployments over real-time serving deployments
CComputing predictions in real-time provides more up-to-date results
DTesting is not possible in real-time serving deployments
EQuerying stored predictions can be faster than computing predictions in real-time
Which of the following tools can assist in real-time deployments by packaging software with its own application, tools, and libraries?
ACloud-based compute
BNone of these tools
CREST APIs
DContainers
EAutoscaling clusters
A machine learning engineer has registered a sklearn model in the MLflow Model Registry using the sklearn model flavor with UI model_uri.
Which of the following operations can be used to load the model as an sklearn object for batch deployment?
Amlflow.spark.load_model(model_uri)
Bmlflow.pyfunc.read_model(model_uri)
Cmlflow.sklearn.read_model(model_uri)
Dmlflow.pyfunc.load_model(model_uri)
Emlflow.sklearn.load_model(model_uri)
A data scientist set up a machine learning pipeline to automatically log a data visualization with each run. They now want to view the visualizations in Databricks.
Which of the following locations in Databricks will show these data visualizations?
AThe MLflow Model Registry Model page
BThe Artifacts section of the MLflow Experiment page
CLogged data visualizations cannot be viewed in Databricks
DThe Artifacts section of the MLflow Run page
EThe Figures section of the MLflow Run page
A machine learning engineer has deployed a model recommender using MLflow Model Serving. They now want to query the version of that model that is in the Production stage of the MLflow Model Registry.
Which of the following model URIs can be used to query the described model version?
A data scientist has developed a scikit-learn model sklearn_model and they want to log the model using MLflow.
They write the following incomplete code block:
Which of the following lines of code can be used to fill in the blank so the code block can successfully complete the task?
Which of the following statements describes streaming with Spark as a model deployment strategy?
AThe inference of batch processed records as soon as a trigger is hit
BThe inference of all types of records in real-time
CThe inference of batch processed records as soon as a Spark job is run
DThe inference of incrementally processed records as soon as trigger is hit
EThe inference of incrementally processed records as soon as a Spark job is run
Which of the following is a benefit of logging a model signature with an MLflow model?
AThe model will have a unique identifier in the MLflow experiment
BThe schema of input data can be validated when serving models
CThe model can be deployed using real-time serving tools
DThe model will be secured by the user that developed it
EThe schema of input data will be converted to match the signature
A machine learning engineer wants to deploy a model for real-time serving using MLflow Model Serving. For the model, the machine learning engineer currently has one model version in each of the stages in the MLflow Model Registry. The engineer wants to know which model versions can be queried once Model Serving is enabled for the model.
Which of the following lists all of the MLflow Model Registry stages whose model versions are automatically deployed with Model Serving?
AStaging, Production, Archived
BProduction
CNone, Staging, Production, Archived
DStaging, Production
ENone, Staging, Production
A machine learning engineering team wants to build a continuous pipeline for data preparation of a machine learning application. The team would like the data to be fully processed and made ready for inference in a series of equal-sized batches.
Which of the following tools can be used to provide this type of continuous processing?
ASpark UDFs
BStructured Streaming
CMLflow
DDelta Lake
EAutoML
A data scientist has written a function to track the runs of their random forest model. The data scientist is changing the number of trees in the forest across each run.
Which of the following MLflow operations is designed to log single values like the number of trees in a random forest?
Amlflow.log_artifact
Bmlflow.log_model
Cmlflow.log_metric
Dmlflow.log_param
EThere is no way to store values like this.
Which of the following deployment paradigms can centrally compute predictions for a single record with exceedingly fast results?
AStreaming
BBatch
CEdge/on-device
DNone of these strategies will accomplish the task.
EReal-time
A data scientist has created a Python function compute_features that returns a Spark DataFrame with the following schema:
The resulting DataFrame is assigned to the features_df variable. The data scientist wants to create a Feature Store table using features_df.
Which of the following code blocks can they use to create and populate the Feature Store table using the Feature Store Client fs?
A machine learning engineer and data scientist are working together to convert a batch deployment to an always-on streaming deployment. The machine learning engineer has expressed that rigorous data tests must be put in place as a part of their conversion to account for potential changes in data formats.
Which of the following describes why these types of data type tests and checks are particularly important for streaming deployments?
ABecause the streaming deployment is always on, all types of data must be handled without producing an error
BAll of these statements
CBecause the streaming deployment is always on, there is no practitioner to debug poor model performance
DBecause the streaming deployment is always on, there is a need to confirm that the deployment can autoscale
ENone of these statements
A data scientist has developed a scikit-learn random forest model model, but they have not yet logged model with MLflow. They want to obtain the input schema and the output schema of the model so they can document what type of data is expected as input.
Which of the following MLflow operations can be used to perform this task?
Amlflow.models.schema.infer_schema
Bmlflow.models.signature.infer_signature
Cmlflow.models.Model.get_input_schema
Dmlflow.models.Model.signature
EThere is no way to obtain the input schema and the output schema of an unlogged model.
Which of the following describes label drift?
ALabel drift is when there is a change in the distribution of the predicted target given by the model
BNone of these describe label drift
CLabel drift is when there is a change in the distribution of an input variable
DLabel drift is when there is a change in the relationship between input variables and target variables
ELabel drift is when there is a change in the distribution of a target variable