After which phase of the data analytics lifecycle should you determine if the model needs any recalibration?
AModel planning
BData preparation
CDiscovery
DOperationalize
0
Question 2
Big Data, Analytics, and the Data Scientist Role
0
Question 3
Advanced Analytics - Theory, Application, and Interpretation of Results for Eight Methods
0
Question 4
Advanced Analytics - Theory, Application, and Interpretation of Results for Eight Methods
0
Question 5
Initial Analysis of the Data
0
That's the end of the Preview
This exam has 59 community-verified practice questions. Create a free account to access all questions, comments, and explanations.
Topics covered:
Big Data, Analytics, and the Data Scientist RoleData Analytics LifecycleInitial Analysis of the DataAdvanced Analytics - Theory, Application, and Interpretation of Results for Eight MethodsAdvanced Analytics for Big Data - Technology and ToolsOperationalizing an Analytics Project and Data Visualization Techniques
What is the primary role of a business intelligence analyst on an analytics project?
AExtracts business data from source systems
BEnsures business milestones are met
CProvides business-domain expertise
DDefines business goals for the analytics project
When building a K-means clustering model, you notice that the clusters did not segment on variables that you expected. What should you do?
ADecrease the value of K
BMultiply each variable by its standard deviation
CAdd the WSS to each variable
DCheck that the data was properly scaled
You build a decision tree to classify five different types of customers based on their browsing history from a sample of 500. The resulting decision tree has 17 layers. One of the leaf nodes has only three customers.
What do you conclude?
AThe decision tree needs to be rebuilt without the three customers
BThe decision tree needs to be rebuilt to see if the results change
CThe sample size is too small, so the classes may not be accurate
DDue to large number of layers, there may be an overfitting problem
What are three built-in data types in the R programming language?
ABoolean, integer, and character
BBoolean, table, and character
CBoolean, table, and integer
DList, array, and integer
Question 6
Advanced Analytics - Theory, Application, and Interpretation of Results for Eight Methods
0
Question 7
Advanced Analytics for Big Data - Technology and Tools
Question 8
Operationalizing an Analytics Project and Data Visualization Techniques
Question 9
Advanced Analytics - Theory, Application, and Interpretation of Results for Eight Methods
Question 10
Advanced Analytics - Theory, Application, and Interpretation of Results for Eight Methods
Question 11
Advanced Analytics - Theory, Application, and Interpretation of Results for Eight Methods
Question 12
Big Data, Analytics, and the Data Scientist Role
Question 13
Advanced Analytics - Theory, Application, and Interpretation of Results for Eight Methods
Question 14
Advanced Analytics - Theory, Application, and Interpretation of Results for Eight Methods
Question 15
Advanced Analytics - Theory, Application, and Interpretation of Results for Eight Methods
Question 16
Advanced Analytics for Big Data - Technology and Tools
Question 17
Advanced Analytics for Big Data - Technology and Tools
Question 18
Data Analytics Lifecycle
Question 19
Advanced Analytics for Big Data - Technology and Tools
Question 20
Advanced Analytics for Big Data - Technology and Tools
Question 21
Initial Analysis of the Data
Question 22
Advanced Analytics - Theory, Application, and Interpretation of Results for Eight Methods
Question 23
Initial Analysis of the Data
Question 24
Advanced Analytics - Theory, Application, and Interpretation of Results for Eight Methods
Ask AstroTutor
0
Ask AstroTutor
0
Ask AstroTutor
0
Ask AstroTutor
0
Ask AstroTutor
0
Ask AstroTutor
0
Ask AstroTutor
0
Ad
Want a break from the ads?
Become a Supporter and enjoy a completely ad-free experience, plus unlock Learn Mode, Exam Mode, AstroTutor AI, and more.
Ask AstroTutor
0
Ask AstroTutor
0
Ask AstroTutor
0
Ask AstroTutor
0
Ask AstroTutor
0
Ask AstroTutor
0
Ask AstroTutor
0
Ask AstroTutor
0
Ask AstroTutor
0
Ask AstroTutor
0
Ask AstroTutor
0
Which SQL set operator returns rows that exist in the first SELECT statement answer set but not in the second SELECT statement?
AEXCEPT
BUNION
CUNION ALL
DINTERSECT
What is a benefit of Spark in-memory data processing as opposed to using MapReduce?
AAvoids writing intermediate data to disk, which speeds up processing
BSupports processing unstructured data, which MapReduce does not allow
CRemoves the need to use disks at all, which reduces cost
DAllows parallel processing, which MapReduce does not support
Which component of a final presentation provides a succinct overview of the business situation that was the impetus to initiate the project?
AModel description
BApproach
CProject goals
DRecommendations
When should you consider using multinomial logistic regression over binary logistic regression?
ADependent variable is continuous or dichotomous
BDependent variable is continuous or categorical
CDependent variable has more than two categories
DDependent variable is continuous only
What are good reasons to develop a naïve Bayes classifier model?
AHandles correlated variables and handles missing variables
BHandles categorical variables and handles numeric variables
CWell calibrated and easy to implement
DHandles very high dimensional data and resistant to over-fitting
Refer to the exhibit, which shows pairwise counts for items purchased together.
Consider the following association rule: Milk -> Eggs
What is value of the lift?
A1.18
B0.264
C120
D70.81
In addition to quantitative and technical skills, what is a key aspect of the profile of a data scientist?
AProject management and administrative skills
BProficient in Microsoft Project and Excel
CSkeptical and critical thinking
DAccounting and regulatory skills
What is part of the model output for a linear regression?
AThe assignment of each input datum to a cluster
BCoefficients indicating relative impact of the input variables on the outcome
CThe set of all rules X -> Y with minimum support and confidence
DProbability score for each possible class label
On which type of data should you run K-means clustering?
AOrdinal
BNumeric
CText
DNominal
Refer to the exhibit.
To predict whether or not a customer will renew their annual property insurance policy, an insurance company built and operationalized a naïve Bayes classification model. In the model, there are two class labels, renewal and non-renewal, that are assigned to each customer based on their attributes.
A subset of the key attributes, their values, and corresponding conditional probabilities are provided in the exhibit.
A customer has the following attributes:
Age is greater than 65 years -
Owns their own home -
Renewal month is August -
If 20% of customers do not renew the police every year, what is the score for a renewal in the naïve Bayesian model for the customer described above?
A0.0022
B0.0027
C0.0270
D0.0216
In a user-defined aggregate function, what is FFUNC?
AOptional final calculation function
BWindow function
CState transition function
DSegment-level calculation function
In which programming language is Hadoop written?
AC++
BScala
CJava
DPython
In the data preparation phase of the data analytics lifecycle, what does the term “data conditioning” refer to?
ABuilding training and testing datasets
BIdentifying relationships and correlations among variables
CDeploying the model and monitoring its performance
DCleaning the data, normalizing datasets, and performing transformations
What does “MAD” in MADlib stand for?
AMagnetic Association Design
BMagnetic Agile Deep
CMultiple Agile Development
DMultiple Access Design
MapReduce is designed to process data in which way?
AA few large files split into blocks processed in parallel across multiple machines
BMany small files processed serially on one machine
CA few large files split into blocks processed serially on one machine
DMany small files processed in parallel across multiple machines
What is a recommended use case for regular expressions?
ALinear regression
BDecision trees
CLogistic regression
DIn-database text analysis
In ANOVA, what is the null hypothesis for k population means?
AAll population means are equal to each other
BAt least two population means are equal
CAt least two population means are not equal
DAt most k-1 population means are equal
What metrics are used to help calculate relevance in text analysis?
ATF and R square
BIDF and information gain
CInformation gain and confidence interval
DTF and IDF
Executives want to determine whether a change in a shopping rewards program has been effective in getting customers to increase their spending. Which approach could be used to determine if a significant shift in spending has occurred?