1) What is Machine learning?
Machine learning is a branch of computer science which deals with system
programming in order to automatically learn and improve with experience.
For example: Robots are programed so that they can perform the task
based on data they gather from sensors. It automatically learns programs
from data.
2) Mention the difference between Data Mining and Machine learning?
Machine learning relates with the study, design and development of the
algorithms that give computers the capability to learn without being
explicitly programmed. While, data mining can be defined as the process
in which the unstructured data tries to extract knowledge or unknown
interesting patterns. During this process machine, learning algorithms
are used.
3) What is ‘Overfitting’ in Machine learning?
In
machine learning, when a statistical model describes random error or
noise instead of underlying relationship ‘overfitting’ occurs. When a
model is excessively complex, overfitting is normally observed, because
of having too many parameters with respect to the number of training
data types. The model exhibits poor performance which has been overfit.
4) Why overfitting happens?
The
possibility of overfitting exists as the criteria used for training the
model is not the same as the criteria used to judge the efficacy of a
model.
5) How can you avoid overfitting ?
By
using a lot of data overfitting can be avoided, overfitting happens
relatively as you have a small dataset, and you try to learn from it.
But if you have a small database and you are forced to come with a model
based on that. In such situation, you can use a technique known as
cross validation. In this method the dataset splits into two section,
testing and training datasets, the testing dataset will only test the
model while, in training dataset, the datapoints will come up with the
model.
In this technique, a model is usually given a dataset of a known data on which training (training data set) is run and a dataset of unknown data against which the model is tested. The idea of cross validation is to define a dataset to “test” the model in the training phase.
In this technique, a model is usually given a dataset of a known data on which training (training data set) is run and a dataset of unknown data against which the model is tested. The idea of cross validation is to define a dataset to “test” the model in the training phase.
6) What is inductive machine learning?
The
inductive machine learning involves the process of learning by
examples, where a system, from a set of observed instances tries to
induce a general rule.
7) What are the five popular algorithms of Machine Learning?
- Decision Trees
- Neural Networks (back propagation)
- Probabilistic networks
- Nearest Neighbor
- Support vector machines
8) What are the different Algorithm techniques in Machine Learning?
The different types of techniques in Machine Learning are
a) Supervised Learning
b) Unsupervised Learning
c) Semi-supervised Learning
d) Reinforcement Learning
e) Transduction
f) Learning to Learn
a) Supervised Learning
b) Unsupervised Learning
c) Semi-supervised Learning
d) Reinforcement Learning
e) Transduction
f) Learning to Learn
9) What are the three stages to build the hypotheses or model in machine learning?
- Model building
- Model testing
- Applying the model
10) What is the standard approach to supervised learning?
The standard approach to supervised learning is to split the set of example into the training set and the test.
11) What is ‘Training set’ and ‘Test set’?
In
various areas of information science like machine learning, a set of
data is used to discover the potentially predictive relationship known
as ‘Training Set’. Training set is an examples given to the learner,
while Test set is used to test the accuracy of the hypotheses generated
by the learner, and it is the set of example held back from the learner.
Training set are distinct from Test set.
12) List down various approaches for machine learning?
The different approaches in Machine Learning are
- Concept Vs Classification Learning
- Symbolic Vs Statistical Learning
- Inductive Vs Analytical Learning
13) What is not Machine Learning?
- Artificial Intelligence
- Rule based inference
14) Explain what is the function of ‘Unsupervised Learning’?
- Find clusters of the data
- Find low-dimensional representations of the data
- Find interesting directions in data
- Interesting coordinates and correlations
- Find novel observations/ database cleaning
15) Explain what is the function of ‘Supervised Learning’?
- Classifications
- Speech recognition
- Regression
- Predict time series
- Annotate strings
16) What is algorithm independent machine learning?
Machine
learning in where mathematical foundations is independent of any
particular classifier or learning algorithm is referred as algorithm
independent machine learning?
17) What is the difference between artificial learning and machine learning?
Designing
and developing algorithms according to the behaviours based on
empirical data are known as Machine Learning. While artificial
intelligence in addition to machine learning, it also covers other
aspects like knowledge representation, natural language processing,
planning, robotics etc.
18) What is classifier in machine learning?
A
classifier in a Machine Learning is a system that inputs a vector of
discrete or continuous feature values and outputs a single discrete
value, the class.
19) What are the advantages of Naive Bayes?
In
Naïve Bayes classifier will converge quicker than discriminative models
like logistic regression, so you need less training data. The main
advantage is that it can’t learn interactions between features.
20) In what areas Pattern Recognition is used?
Pattern Recognition can be used in
- Computer Vision
- Speech Recognition
- Data Mining
- Statistics
- Informal Retrieval
- Bio-Informatics
21) What is Genetic Programming?
Genetic
programming is one of the two techniques used in machine learning. The
model is based on the testing and selecting the best choice among a set
of results.
22) What is Inductive Logic Programming in Machine Learning?
Inductive
Logic Programming (ILP) is a subfield of machine learning which uses
logical programming representing background knowledge and examples.
23) What is Model Selection in Machine Learning?
The
process of selecting models among different mathematical models, which
are used to describe the same data set is known as Model Selection.
Model selection is applied to the fields of statistics, machine learning
and data mining.
24) What are the two methods used for the calibration in Supervised Learning?
The two methods used for predicting good probabilities in Supervised Learning are
- Platt Calibration
- Isotonic Regression
These methods are designed for binary classification, and it is not trivial.
25) Which method is frequently used to prevent overfitting?
When there is sufficient data ‘Isotonic Regression’ is used to prevent an overfitting issue.
26) What is the difference between heuristic for rule learning and heuristics for decision trees?
The
difference is that the heuristics for decision trees evaluate the
average quality of a number of disjointed sets while rule learners only
evaluate the quality of the set of instances that is covered with the
candidate rule.
27) What is Perceptron in Machine Learning?
In
Machine Learning, Perceptron is an algorithm for supervised
classification of the input into one of several possible non-binary
outputs.
28) Explain the two components of Bayesian logic program?
Bayesian
logic program consists of two components. The first component is a
logical one ; it consists of a set of Bayesian Clauses, which captures
the qualitative structure of the domain. The second component is a
quantitative one, it encodes the quantitative information about the
domain.
29) What are Bayesian Networks (BN) ?
Bayesian Network is used to represent the graphical model for probability relationship among a set of variables .
30) Why instance based learning algorithm sometimes referred as Lazy learning algorithm?
Instance
based learning algorithm is also referred as Lazy learning algorithm as
they delay the induction or generalization process until classification
is performed.
31) What are the two classification methods that SVM ( Support Vector Machine) can handle?
- Combining binary classifiers
- Modifying binary to incorporate multiclass learning
32) What is ensemble learning?
To
solve a particular computational program, multiple models such as
classifiers or experts are strategically generated and combined. This
process is known as ensemble learning.
33) Why ensemble learning is used?
Ensemble learning is used to improve the classification, prediction, function approximation etc of a model.
34) When to use ensemble learning?
Ensemble learning is used when you build component classifiers that are more accurate and independent from each other.
35) What are the two paradigms of ensemble methods?
The two paradigms of ensemble methods are
- Sequential ensemble methods
- Parallel ensemble methods
36) What is the general principle of an ensemble method and what is bagging and boosting in ensemble method?
The
general principle of an ensemble method is to combine the predictions
of several models built with a given learning algorithm in order to
improve robustness over a single model. Bagging is a method in ensemble
for improving unstable estimation or classification schemes. While
boosting method are used sequentially to reduce the bias of the combined
model. Boosting and Bagging both can reduce errors by reducing the
variance term.
37) What is bias-variance decomposition of classification error in ensemble method?
The
expected error of a learning algorithm can be decomposed into bias and
variance. A bias term measures how closely the average classifier
produced by the learning algorithm matches the target function. The
variance term measures how much the learning algorithm’s prediction
fluctuates for different training sets.
38) What is an Incremental Learning algorithm in ensemble?
Incremental
learning method is the ability of an algorithm to learn from new data
that may be available after classifier has already been generated from
already available dataset.
39) What is PCA, KPCA and ICA used for?
PCA
(Principal Components Analysis), KPCA ( Kernel based Principal
Component Analysis) and ICA ( Independent Component Analysis) are
important feature extraction techniques used for dimensionality
reduction.
40) What is dimension reduction in Machine Learning?
In
Machine Learning and statistics, dimension reduction is the process of
reducing the number of random variables under considerations and can be
divided into feature selection and feature extraction
41) What are support vector machines?
Support vector machines are supervised learning algorithms used for classification and regression analysis.
42) What are the components of relational evaluation techniques?
The important components of relational evaluation techniques are
- Data Acquisition
- Ground Truth Acquisition
- Cross Validation Technique
- Query Type
- Scoring Metric
- Significance Test
43) What are the different methods for Sequential Supervised Learning?
The different methods to solve Sequential Supervised Learning problems are
- Sliding-window methods
- Recurrent sliding windows
- Hidden Markow models
- Maximum entropy Markow models
- Conditional random fields
- Graph transformer networks
44) What are the areas in robotics and information processing where sequential prediction problem arises?
The areas in robotics and information processing where sequential prediction problem arises are
- Imitation Learning
- Structured prediction
- Model based reinforcement learning
45) What is batch statistical learning?
Statistical
learning techniques allow learning a function or predictor from a set
of observed data that can make predictions about unseen or future data.
These techniques provide guarantees on the performance of the learned
predictor on the future unseen data based on a statistical assumption on
the data generating process.
46) What is PAC Learning?
PAC
(Probably Approximately Correct) learning is a learning framework that
has been introduced to analyze learning algorithms and their statistical
efficiency.
47) What are the different categories you can categorized the sequence learning process?
- Sequence prediction
- Sequence generation
- Sequence recognition
- Sequential decision
48) What is sequence learning?
Sequence learning is a method of teaching and learning in a logical manner.
49) What are two techniques of Machine Learning ?
The two techniques of Machine Learning are
- Genetic Programming
- Inductive Learning
50) Give a popular application of machine learning that you see on day to day basis?
The recommendation engine implemented by major eCommerce websites uses Machine Learning
No comments:
Post a Comment