A data scientist is performing a linear regression and wants to construct a model that explains the most variation in the data. Which of the following should the data scientist maximize when evaluating the regression performance metrics?
AAccuracy
BR2
Cp value
DAUC
A data scientist is building an inferential model with a single predictor variable. A scatter plot of the independent variable against the real-number dependent variable shows a strong relationship between them. The predictor variable is normally distributed with very few outliers. Which of the following algorithms is the best fit for this model, given the data scientist wants the model to be easily interpreted?
AA logistic regression
BAn exponential regression
CA linear regression
DA probit regression
A data scientist wants to evaluate the performance of various nonlinear models. Which of the following is best suited for this task?
AAIC
BChi-squared test
CMCC
DANOVA
Which of the following is the layer that is responsible for the depth in deep learning?
AConvolution
BDropout
CPooling
DHidden
Question 6
Operations and Processes
0
Question 7
Operations and Processes
Question 8
Operations and Processes
Question 9
Machine Learning
Question 10
Modeling, Analysis, and Outcomes
Question 11
Machine Learning
Question 12
Operations and Processes
Question 13
Operations and Processes
Question 14
Machine Learning
Question 15
Operations and Processes
Question 16
Mathematics and Statistics
Question 17
Modeling, Analysis, and Outcomes
Question 18
Operations and Processes
Question 19
Machine Learning
Question 20
Mathematics and Statistics
Question 21
Machine Learning
Question 22
Mathematics and Statistics
Question 23
Modeling, Analysis, and Outcomes
Question 24
Modeling, Analysis, and Outcomes
Question 25
Machine Learning
Ask AstroTutor
0
Ask AstroTutor
0
Ask AstroTutor
0
Ask AstroTutor
0
Ask AstroTutor
0
Ask AstroTutor
0
Ask AstroTutor
0
Ad
Want a break from the ads?
Become a Supporter and enjoy a completely ad-free experience, plus unlock Learn Mode, Exam Mode, AstroTutor AI, and more.
Ask AstroTutor
0
Ask AstroTutor
0
Ask AstroTutor
0
Ask AstroTutor
0
Ask AstroTutor
0
Ask AstroTutor
0
Ask AstroTutor
0
Ask AstroTutor
0
Ask AstroTutor
0
Ask AstroTutor
0
Ask AstroTutor
0
Ask AstroTutor
0
Which of the following modeling tools is appropriate for solving a scheduling problem?
AOne-armed bandit
BConstrained optimization
CDecision tree
DGradient descent
Which of the following environmental changes is most likely to resolve a memory constraint error when running a complex model using distributed computing?
AConverting an on-premises deployment to a containerized deployment
BMigrating to a cloud deployment
CMoving model processing to an edge deployment
DAdding nodes to a cluster deployment
A data analyst wants to save a newly analyzed data set to a local storage option. The data set must meet the following requirements:
Be minimal in size -
Have the ability to be ingested quickly
Have the associated schema, including data types, stored with it
Which of the following file types is the best to use?
AJSON
BParquet
CXML
DCSV
Which of the following is a key difference between KNN and k-means machine-learning techniques?
AKNN operates exclusively on continuous data, while k-means can work with both continuous and categorical data.
BKNN performs better with longitudinal data sets, while k-means performs better with survey data sets.
CKNN is used for finding centroids, while k-means is used for finding nearest neighbors.
DKNN is used for classification, while k-means is used for clustering.
A data scientist needs to:
Build a predictive model that gives the likelihood that a car will get a flat tire.
Provide a data set of cars that had flat tires and cars that did not.
All the cars in the data set had sensors taking weekly measurements of tire pressure similar to the sensors that will be installed in the cars consumers drive. Which of the following is the most immediate data concern?
AGranularity misalignment
BMultivariate outliers
CInsufficient domain expertise
DLagged observations
The term "greedy algorithms" refers to machine-learning algorithms that:
Aupdate priors as more data is seen.
Bexamine every node of a tree before making a decision.
Capply a theoretical model to the distribution of the data.
Dmake the locally optimal decision.
A data scientist is deploying a model that needs to be accessed by multiple departments with minimal development effort by the departments. Which of the following APIs would be best for the data scientist to use?
ASOAP
BRPC
CJSON
DREST
Which of the following compute delivery models allows packaging of only critical dependencies while developing a reusable asset?
AThin clients
BContainers
CVirtual machines
DEdge devices
A data analyst is analyzing data and would like to build conceptual associations. Which of the following is the best way to accomplish this task?
An-grams
BNER
CTF-IDF
DPOS
Which of the following belong in a presentation to the senior management team and/or C-suite executives? (Choose two.)
AFull literature reviews
BCode snippets
CFinal recommendations
DHigh-level results
EDetailed explanations of statistical tests
FSecurity keys and login information
During EDA, a data scientist wants to look for patterns, such as linearity, in the data. Which of the following plots should the data scientist use?
AViolin
BBox-and-whisker
CScatter
DQ-Q
Which of the following distribution methods or models can most effectively represent the actual arrival times of a bus that runs on an hourly schedule?
ABinomial
BExponential
CNormal
DPoisson
A data scientist has constructed a model that meets the minimum performance requirements specified in the proposal for a prediction project. The data scientist thinks the model's accuracy should be improved, but the proposed deadline is approaching. Which of the following actions should the data scientist take first?
AContinue collecting data.
BRequest additional funding.
CConsult the key project stakeholder.
DTest additional model specifications.
Which of the following best describes the minimization of the residual term in a ridge linear regression?
A|e|
Be
Ce2
D0
A statistician notices gaps in data associated with age-related illnesses and wants to further aggregate these observations. Which of the following is the best technique to achieve this goal?
ALabel encoding
BLinearization
CBinning
DImputing
A data scientist needs to analyze a company's chemical businesses and is using the master database of the conglomerate company. Nothing in the data differentiates the data observations for the different businesses. Which of the following is the most efficient way to identify the chemical businesses' observations?
AIngest the data from all of the hard drives and perform exploratory data analysis to identify which business is responsible for chemical operations.
BPerform analysis on all of the data and create a summary report on the results relevant to chemical operations.
CConsult with the business team to identify which sites are responsible for chemical operations and ingest only the relevant data for analysis.
DIngest data from the hard drive containing the most data and present sample results on the chemical operations.
Which of the following distance metrics for KNN is best described as a straight line?
ARadial
BEuclidean
CCosine
DManhattan
A data scientist is building a forecasting model for the price of copper. The only input in this model is the daily price of copper for the last ten years. Which of the following forecasting techniques is the most appropriate for the data scientist to use?
AAutoregressive
BMoving average
CDynamic time warping
DRelative strength
An analyst wants to show how the component pieces of a company's business units contribute to the company's overall revenue. Which of the following should the analyst use to best demonstrate this breakdown?
ABox-and-whisker chart
BSankey diagram
CScatter plot matrix
DResidual chart
Which of the following does k represent in the k-means model?