Introduction to Data Analytics

Introduction to Data Analytics

We Also Provide SYNOPSIS AND PROJECT.

Contact www.kimsharma.co.in for best and lowest cost solution or

Email: amitymbaassignment@gmail.com

Call/what’s app: +91 8290772200

What’s app:8800352777

Assignment solution help, assignment answers help, Assignment Help, Synopsis and Project, Study Material, Exam Notes

1st Module Assessment

Data Analysis is a process of?

a.

inspecting data

b.

cleaning data

c.

transforming data

d.

All of the above

Question 2. Which of the following are Benefits of Big Data Processing?

a.

Businesses can utilize outside intelligence while taking decisions

b.

Improved customer service

c.

Better operational efficiency

d.

All of the above

Question 3. Which of the following is correct order of working?

a.

questions->evaluation ->algorithms

b.

evaluation->input data ->algorithms

c.

questions->input data ->algorithms

d.

all of the mentioned

Question 4.Data in ___________ bytes size is called Big Data

a.

Tera

b.

Giga

c.

Peta

d.

Meta

Question 5. Which of the following argument is used to set importance values?

a.

scale

b.

set

c.

value

d.

all of the mentioned

Question 6. Which of the following is not a major data analysis approaches?

a.

Data Mining

b.

Predictive Intelligence

c.

Business Intelligence

d.

Text Analytics

Question 7. Which of the following trade-off occurs during prediction?

a.

Speed vs Accuracy

b.

Simplicity vs Accuracy

c.

Scalability vs Accuracy

d.

None of the mentioned

Question 8. As companies move past the experimental phase with Hadoop, many cite the need for additional capabilities, including _______________

a.

Improved data storage and information retrieval

b.

Improved security, workload management, and SQL support

c.

Improved data warehousing functionality

d.

Improved extract, transform and load features for data integration

Question 9. Which of the following is the valid component of the predictor?

a.

data

b.

question

c.

algorithm

d.

all of the mentioned

Question 10. Transaction data of the bank is?

a.

structured data

b.

unstructured datat

c.Both A and B

d.

None of the above

11. What is a unit of data that flows through a Flume agent?

a.

Record

b.

Event

c.

Row

d.

Log

Question 12. Which of the following is correct skills for a Data Scientist?

a.

Probability & Statistics

b.

Machine Learning / Deep Learning

c.

Data Wrangling

d.

All of the above

Question 13. What are the different features of Big Data Analytics?

a.

Open-Source

b.

Scalability

c.

Data Recovery

d.

All the above

Question 14. The goal of business intelligence is to allow easy interpretation of large volumes of data to identify new opportunities

a.

FALSE

b.

TRUE

c.

Can not say

d.

Can be true or false

Which of the following is not a major data analysis approaches? Select one:  a. Business Intelligence  b. Data Mining  c. Predictive Intelligence  d. Text Analytics

Question 15. Which of the following is not a application for data science?

a.

Recommendation Systems

b.

Image & Speech Recognition

c.

Online Price Comparison

d.

Privacy Checker

Question 16. How many main statistical methodologies are used in data analysis?

a.

2

b.

4

c.

3

d.

5

Question 17. In descriptive statistics, data from the entire population or a sample is summarized with ?

a.

integer descriptors

b.

floating descriptors

c.

numerical descriptors

d.

decimal descriptors

We Also Provide SYNOPSIS AND PROJECT.

Contact www.kimsharma.co.in for best and lowest cost solution or

Email: amitymbaassignment@gmail.com

Call/what’s app: +91 8290772200

What’s app:8800352777

Assignment solution help, assignment answers help, Assignment Help, Synopsis and Project, Study Material, Exam Notes

2nd Module Assessment

Which of the following is a good way of performing experiments in data science?

a.

Measure variability

b.

Generalize to the problem

c.

Have Replication

d.

All of the mentioned

Question 2. Data visualization is also an element of the broader _____________.

a.

deliver presentation architecture

b.

data presentation architecture

c.

dataset presentation architecture

d.

data process architecture

Question 3. The analysis performed to uncover interesting statistical correlations between associated-attribute-value pairs is called?

a.

Mining of Association

b.

Mining of Clusters

c.

Mining of Correlations

d.

None of the above

Question 4. “Efficiency and scalability of data mining algorithms” issues comes under?

a.

Mining Methodology and User Interaction Issues

b.

Performance Issues

c.

Diverse Data Types Issues

d.

None of the above

Question 5. How do you handle missing or corrupted data in a dataset?

a.

Assign a unique category to missing values

b.

Drop missing rows or columns

c.

Replace missing values with mean/median/mode

d.

All of the above

Question 6. Which of the following approach should be used if you can’t fix the variable?

a.

randomize it

b.

non stratify it

c.

generalize it

d.

none of the mentioned

Question 7. Text Analytics, also referred to as Text Mining?

a.

TRUE

b.

FALSE

c.

Can be true or false

d.

Can not say

Question 8. Data Analysis is a process of?

a.

cleaning data

b.

transforming data

c.

inspecting data

d.

All of the above

Question 9. Data Visualization in mining cannot be done using

a.

Photos

b.

Graphs

c.

Charts

d.

Information Graphics

Question 10. Point out the correct statement.

a.

findLinearColumns will also return a vector of column positions can be removed to eliminate the linear dependencies

b.

findLinearCombos will return a list that enumerates dependencies

c.

the function findLinearRows can be used to generate a complete set of row variables from one factor

d.

none of the mentioned

11. Which of the following is a foundational exploratory visualization package for the R language in pandas ecosystem?

a.

yhat

b.

Seaborn

c.

Vincent

d.

None of the mentioned

Question 12. Which of the following analytical capabilities are provided by information management company?

a.

Stream Computing

b.

Content Management

c.

Information Integration

d.

All of the mentioned

Question 13. In normalization of relations ,the property which is critical and must be achieved is classified as:

a.

non additive join property

b.

additive join property

c.

indepedency reservation property

d.

dependency preservation property

Question 14. The process of analysing relation schemas to achieve minimal redudancy and insertion or update anomolies is classified as

a.

normalization of data

b.

denomination of data

c.

isolation of data

d.

de-normalization of data

Which of the following is the top most important thing in data science?

a. answer

b. question 

c. data 

d. none of the mentioned

Question 15. A goal of data mining includes which of the following?

a.

To explain some observed event or condition

b.

To confirm that data exists

c.

To analyze data for expected relationships

d.

To create a new data warehouse

Question 16. An operational system is which of the following?

a.

A system that is used to run the business in real time and is based on historical data.

b.

A system that is used to run the business in real time and is based on current data.

c.

A system that is used to support decision making and is based on current data.

d.

A system that is used to support decision making and is based on historical data.

Question 17. A multifield transformation does which of the following?

a.

Converts data from one field into multiple fields

b.

Converts data from multiple fields into one field

c.

Converts data from multiple fields into multiple fields

d.

All of the above

We Also Provide SYNOPSIS AND PROJECT.

Contact www.kimsharma.co.in for best and lowest cost solution or

Email: amitymbaassignment@gmail.com

Call/what’s app: +91 8290772200

What’s app:8800352777

Assignment solution help, assignment answers help, Assignment Help, Synopsis and Project, Study Material, Exam Notes


3rd Module Assessment

1. In descriptive statistics, data from the entire population or a sample is summarized with ?

a.

numerical descriptors

b.

integer descriptors

c.

decimal descriptors

d.

floating descriptors

Question 2. A__________ begins by hypothesizing a sentence (the symbol S) and successively predicting lower level constituents until individual preterminal symbols are written.

a.

bottow-up parser

b.

top parser

c.

top-down parser

d.

bottom parser

Question 3. A feature F1 can take certain value: A, B, C, D, E, & F and represents grade of students from a college. Which of the following statement is true in following case?

a.

Feature F1 is an example of nominal variable

b.

Feature F1 is an example of ordinal variable

c.

It doesn’t belong to any of the above category

d.

Both of these

Question 4. In Model based learning methods, an iterative process takes place on the ML models that are built based on various model parameters, called ?

a.

hyper parameters

b.

optimized parameters

c.

mini-batches

d.

super parameters

Question 5. Different learning methods does not include?

a.

Analogy

b.

Memorization

c.

Deduction

d.

Introduction

Question 6. varImp is a wrapper around the evimp function in the _______ package

a.

plot

b.

numpy

c.

earth

d.

none of the mentioned

Question 7. Which of the following hyper parameter(s), when increased may cause random forest to over fit the data?

 1. Number of Trees

 2. Depth of Tree

 3. Learning Rate

a.

1,2 and 3

b.

2 and 3

c.

1 and 2

d.

Only 2

Question 8. Which of the following is true about regression analysis?

a.

describing associations within the data

b.

answering yes/no questions about the data

c.

estimating numerical characteristics of the data

d.

modeling relationships within the data

Question 9. Fit a straight line y=a+bx into the given data: (x,y):(5,12)(10,13)(15,14)(20,15)(25,16).

a.

y=1.1+0.2x

b.

y=11

c.

y=11+0.2x

d.

y=0.2x

Question 10. What would be the probability of an event ‘G’ if H denotes its complement, according to the axioms of probability?

a.

P (G) = 1 / P (H)

b.

P (G) = 1 – P (H)

c.

P (G) = 1 + P (H)

d.

P (G) = P (H)

11. The equation of the regression line is y = 5x + 3. Predict y when x = 8.

a.

53

b.

43

c.

23

d.

None

Question 12. Mutually Exclusive events ___________

a.

Contain all common sample points

b.

Contain all sample points

c.

Does not contain any common sample point

d.

Does not contain any sample point

Question 13. Can we perform linear regression with a neural network?

a.

Yes, we can

b.

Partially we can

c.

No, we can not

d.

None

Now situation is same as written in previous question(under fitting).Which of following regularization algorithm would you prefer? Select one: 

a. L2 

b. L1 

c. All 

d. None of these

Question 14. The expected value of a discrete random variable ‘x’ is given by ___________

a.

P(x)

b.

∑ x P(x)

c.

∑ P(x)

d.

1

Question 15. Which of the following function tracks the changes in model statistics?

a.

findTrack

b.

varImpTrack

c.

varImp

d.

none of the mentioned

Question 16. Randomly assigning treatment to experimental units allows:

a.

Population inference

b.

Causal inference

c.

Both types of inference

d.

Neither type of inference

Question 17. A national random sample of 20 ACT scores from 2010 is listed below. Calculate the sample mean and standard deviation. 29, 26, 13, 23, 23, 25, 17, 22, 17, 19, 12, 26, 30, 30, 18, 14, 12, 26, 17, 18

a.

20.85, 5.94

b.

20.50, 5.79

c.

20.50, 5.94

d.

20.85, 5.79

We Also Provide SYNOPSIS AND PROJECT.

Contact www.kimsharma.co.in for best and lowest cost solution or

Email: amitymbaassignment@gmail.com

Call/what’s app: +91 8290772200

What’s app:8800352777

Assignment solution help, assignment answers help, Assignment Help, Synopsis and Project, Study Material, Exam Notes

4th Module Assessment

Which of the following is a disadvantage of decision trees?

a.

Factor analysis

b.

Decision trees are prone to be overfit

c.

Decision trees are robust to outliers

d.

None of the above

Question 2. That strategies can help reduce overfitting in decision trees?

(i) Enforce a maximum depth for the tree

(ii) Enforce a minimum number of samples in leaf nodes

(iii) Make sure each leaf node is one pure class

(IV) Pruning

a.

All

b.

(i), (iii), (iv)

c.

(i), (ii) and (iii)

d.

None

Question 3. Factors which affect the performance of learner system does not include

a.

Type of feedback

b.

Representation scheme used

c.

Training scenario

d.

Good data structures

Question 4. Which of the following is a reasonable way to select the number of principal components “k”?

a.

Use the elbow method

b.

Choose k to be 99% of m (k = 0.99*m, rounded to the nearest integer)

c.

Choose k to be the smallest value so that at least 99% of the varinace is retained

d.

Choose k to be the largest value so that 99% of the variance is retained

Question 5. Which of the following options is/are true for K-fold cross-validation?

Increase in K will result in higher time required to cross validate the result.

Higher values of K will result in higher confidence on the cross-validation result as compared to lower value of K.

If K=N, then it is called Leave one out cross validation, where N is the number of observations.

a.

1 and 2

b.

1 and 3

c.

1,2 and 3

d.

2 and 3

Question 6. What kind of learning algorithm for “Facial identities or facial expressions”?

a.

Prediction

b.

Generating Patterns

c.

Recognizing Anomalies Answe

d.

Recognition Patterns

Question 7. When performing regression or classification, which of the following is the correct way to preprocess the data?

a.

PCA -> normalize PCA output -> training

b.

Normalize the data -> PCA -> normalize PCA output -> training

c.

Normalize the data -> PCA -> training

d.

None of the above

Question 8. Which of the following is/are one of the important step(s) to pre-process the text in NLP based projects?

1. Stemming

2. Stop word removal

3. Object Standardization

a.

1 and 2

b.

1 and 3

c.

1,2 and 3

d.

All of them

Question 9. Which of the following clustering algorithms suffers from the problem of convergence at local optima?

a.

K- Means clustering

b.

Hierarchical clustering

c.

Diverse clustering

d.

All of the above

Question 10. What are the advantages of neural networks over conventional computers?

(i) They have the ability to learn by

(ii) They are more fault

(iiI) They are more suited for real time operation due to their high „computational‟

a.

Only (i)

b.

(i) and (iii)

c.

All

d.

(i) and (ii)

For clustering, we do not require-

a. Unlabeled data 

b. Numerical data 

c. Labeled data 

d. Categorical data

11. What is a support vector?

a.

The average distance between all the data points

b.

The distance between two boundary data points

c.

The distance between any two data points

d.

The minimum distance between any two data points

Question 12. On which data type, we can not perform cluster analysis?

a.

Multimedia data

b.

Text data

c.

Time series data

d.

None

Question 13.Which of the following is a lazy learning algorithm?

a.

KNN

b.

Decision tree

c.

SVM

d.

All of the above

Question 14. Which version of the clustering algorithm is most sensitive to outliers?

a.

K-means clustering algorithm

b.

K-medians clustering algorithm

c.

K-modes clustering algorithm

d.

None

Question 15. Which of the following is the best algorithm for text classification?

a.

Naive Bayes

b.

Random forest

c.

KNN

d.

Decision tree

Question 16. Support Vector Machine is-

a.

a lazy learning classifier

b.

a discriminative classifier

c.

a probabilistic classifier

d.

None

Question 17. What is the most widely used distance metric in KNN?

a.

Perpendicular distance

b.

Manhattan distance

c.

Euclidean distance

d.

All of the above

We Also Provide SYNOPSIS AND PROJECT.

Contact www.kimsharma.co.in for best and lowest cost solution or

Email: amitymbaassignment@gmail.com

Call/what’s app: +91 8290772200

What’s app:8800352777

Assignment solution help, assignment answers help, Assignment Help, Synopsis and Project, Study Material, Exam Notes

5th Module Assessment

  1. Let the following state-reward transactions are observed if a fixed policy is applied on some MDP with two state A,B. What will the value function for this when TD(O) is applied on the data,

A,2,B,3,A,4,B,3,A,3

B,4,B,3,A,9

A3

Select one:

a.

V(A) = 55/7, V(B) = 64/7

b.

V(A) = 60/7, V(B) = 66/7

c.

V(A) = 37/7, V(B) = 26/7

d.

V(A) = 57/7, V(B) = 55/7

Question 2. The multi-armed bandit problem is a generalized use case for-

a.

Unsupervised learning

b.

Supervised learning

c.

Reinforcement learning

d.

All of the above

Question 3. A definition of a concept is if it recognizes all the instances of that concept

a.

Consistent

b.

Complete

c.

Constant

d.

None of these

Question 4. Iteration is also called as ________

a.

Self-correcting process

b.

Accurate process

c.

Approximate process

d.

Rounding off process

Question 5. Which of the following is true about reinforcement learning?

a.

The target of an agent is to maximize the rewards

b.

The agent gets rewards or penalty according to the action

c.

It’s an online learning

d.

All of the above

Question 6. Case-based learning is

a.

A class of learning algorithm that tries to find an optimum classification of a set of examples using the probabilistic theory.

b.

An approach to the design of learning algorithms that is inspired by the fact that when people encounter new situations, they often explain them by reference to familiar experiences, adapting the explanations to fit the new situation

c.

Any mechanism employed by a learning system to constrain the search space of a hypothesis

d.

None of these

Question 7. What is the parameter of analysis in reinforcement learning?

a.

Number of requests during wake cycle

b.

Number of processes to achieve final outcome

c.

Degree of failure

d.

Degree of success

Question 8. Reinforcement learning is

a.

Unsupervised learning

b.

Award based learning

c.

Supervised learning

d.

None

Question 9. What is the difference between TD(O) and Monte-Carlo value function update equations?

a.

MC uses unbiased sample of the return while TD(O) uses biased sample

b.

MC uses biased sample of the return while TD(O) uses sample

c.

For a single update step MC uses reward while TD(()) a sample of the reward

d.

or a single update step MC uses sample of the reward while T D(O) uses reward

Question 10. Which of the factors affect the performance of learner system does not include?

a.

Training scenario

b.

Type of feedback

c.

Good data structures

d.

Representation scheme used

11. A model of language consists of the categories which does not include ________.

a.

structural units

b.

System Unit

c.

data units

d.

empirical units

Question 12. Real-Time decisions, Game AI, Learning Tasks, Skill Aquisition, and Robot Navigation are applications of which of the folowing

a.

Unsupervised Learning: Clustering

b.

Unsupervised Learning: Regression

c.

Supervised Learning: Classification

d.

Reinforcement Learning

Question 13. Concept learning inferred a ———– valued function from training examples of its input and output.

a.

Hexadecimal

b.

Decimal

c.

Boolean

d.

All of the above

Question 14. ________ produces the relation that has attributes of Ri and R2

a.

Difference

b.

Cartesian product

c.

Intersection

d.

Product

SET concept is used in: 

a. Relational Model 

b. Hierarchical Model 

c. Network Model 

d. None of these

Question 15. Which one of the following statements is

(a.) Planning can used when MDP parameters are unknown but sample can taken

(b.) Learning can be used when MDP parameters are unknown but samples can taken

(c.) Planning can used when a model is present even if sampling is not allowed

(d.) Learning can when a model is present even if sampling is not allowed

Select one:

a.

(a.)

b.

(a.) and (b.)

c.

(b.) and (c.)

d.

(d.) and (c.)

Question 16. Assertion: Having a simulator/model is an advantage when using rollouts based methcds

Reason: Multiple trajectories can sampled from the model from any given state

(a.) Assertion and Reason are both true and Reason is a explanation of Assertion

(b.) Assertion and Reason are true and Reason is not a correct explanation of Assertion

(c.) Assertion is true but Reason is false

(d.) Assertion and Reason are falæ

Select one:

a.

(a.)

b.

(a.) and (b.)

c.

(b.) and (c.)

d.

(d.) and (c.)

Question 17. Assertion: a-learning can use asynchronous samplæ from different policies to update Q values.

Reason: a-leaming is an online learning algorithm explanation of Assertion

(a.) Assertion and Reawn are true and Reason is not a correct explanation of Assertion

(b.) Assertion and Reason are both true and Reason is a correct “

(c.) Assertion is false but Reason is true” (d.) Assertion is true but Reason is false

a.

(a.)

b.

(a.) and (b.)

c.

(b.) and (c.)

d.

(d.) and (c.)

We Also Provide SYNOPSIS AND PROJECT.

Contact www.kimsharma.co.in for best and lowest cost solution or

Email: amitymbaassignment@gmail.com

Call/what’s app: +91 8290772200

What’s app:8800352777

Assignment solution help, assignment answers help, Assignment Help, Synopsis and Project, Study Material, Exam Notes

Assignment 2

Case Study: 

The oil and gas industry uses predictive analytics in many different ways to ensure efficient, safe, and clean extraction, processing, and delivery of their product. While shale oil and gas are abundant in the US, they are difficult to find and extract safely. Horizontaldrilling and fracking are expensive and possibly cause environmental damage. They are also relatively inefficient. As a result, some of the biggest oil and gas corporations are using prescriptive analytics to help deal with and minimize these problems

Question: How does Bigdata helps to oil and gas industry?

a.

Analytics of gas and oil field data

b.

Simulation of weather data

c.

Drilling

d.

Extration of oil

Question 2. The mapping or classification of a class with some predefined group or class is known as

a.

Data Characterization

b.

Data Discrimination

c.

Data Set

d.

Data Sub Structure

Question 3. In Model based learning methods including oil and gas, an iterative process takes place on the ML models that are built based on various model parameters, called ?

a.

mini-batches

b.

super parameters

c.

optimized parameters

d.

hyper parameters

Question 4. Data Analysis of oil and gas are a process of?

a.

inspecting data

b.

cleaning data

c.

transforming data

d.

All of the above

Question 5. Which of the following is not a major data analysis approaches of oil and gas ?

a.

Business Intelligence

b.

Data Mining

c.

Predictive Intelligence

d.

Text Analytics

We Also Provide SYNOPSIS AND PROJECT.

Contact www.kimsharma.co.in for best and lowest cost solution or

Email: amitymbaassignment@gmail.com

Call/what’s app: +91 8290772200

What’s app:8800352777

Assignment solution help, assignment answers help, Assignment Help, Synopsis and Project, Study Material, Exam Notes

Leave a Reply

Your email address will not be published. Required fields are marked *