LEE-FELIR (F)

1. What is the defining characteristic of the training data used in supervised learning?

A) The @data is labeled, meaning each example is paired with a target output.
B) The data is unlabeled, and the model must find patterns on its own.
C) The data is generated randomly by the algorithm.
D) The data is generated randomly by the algorithm.

2. The primary goal of a supervised learning model is to:

A) Memorize the entire training dataset perfectly.
B) Discover hidden patterns without any guidance.
C) Gen@eralize from the training data to make accurate predictions on new, unseen data.
D) Reduce the dimensionality of the input data for visualization.

3. In the analogy of a child learning from flashcards, the animal's name on the card represents what component of supervised learning?

A) The@ label or target output.
B) The input features.
C) The loss function.
D) The model's parameters.

4. Which of the following tasks is a classic example of a classification problem?

A) Predicting the selling price of a house based on its features.
B) Forecasting the temperature for tomorrow.
C) Estimating the annual revenue of a company.
D) Diagnosing @a tumor as malignant or benign based on medical images.

5. A model that predicts the continuous value of a stock price for the next day is solving a:

A) Clustering problem.
B) Classification problem.
C) Regressio@n problem.
D) Dimensionality reduction problem.

6. What is the core objective of unsupervised learning?

A) To classify emails into spam and non-spam folders.
B) To predict a target variable based on labeled examples.
C) To @discover the inherent structure, patterns, or relationships within unlabeled data.
D) To achieve perfect accuracy on a held-out test set.

7. In the analogy of a child grouping toys without instructions, the act of putting all the cars together is most similar to which unsupervised learning technique?

A) C@lustering.
B) Classification.
C) Reinforcement Learning.
D) Regression.

8. Grouping customers based solely on their purchasing behavior, without pre-defined categories, is an application of:

A) Linear Regression, a type of supervised learning.
B) Clu@stering, a type of unsupervised learning.
C) A support vector machine for classification.
D) Logistic Regression, a type of supervised learning.

9. The main goal of dimensionality reduction techniques like PCA is to:

A) Increase the number of features to improve model accuracy.
B) Re@duce the number of features while preserving the most important information in the data.
C) Predict a continuous output variable.
D) Assign categorical labels to each data point.

10. Market basket analysis, which finds rules like "if chips then soda," is a classic example of:

A) Classification in supervised learning.
B) Deep learning with neural networks.
C) Regression in supervised learning.
D) Ass@ociation rule learning in unsupervised learning.

11. Semi-supervised learning is particularly useful in real-world scenarios because:

A) It requires no labeled data at all.
B) La@beling data is often expensive and time-consuming, so it leverages a small labeled set with a large unlabeled set.
C) It is simpler to implement than unsupervised learning.
D) It is always more accurate than fully supervised learning.

12. The fundamental question that a regression model aims to answer is:

A) "Is this pattern anomalous?"
B) "Which category?"
C) "What is the underlying group?"
D) "H@ow much?" or "How many?"

13. The fundamental question that a classification model aims to answer is:

A) "What is the correlation between these variables?"
B) "Whic@h category?" or "What class?"
C) "How much?" or "How many?"
D) "How can I reduce the number of features?"

14. Which algorithm is most directly designed for predicting a continuous target variable?

A) Lin@ear Regression
B) k-Nearest Neighbors for classification
C) Decision Tree for classification
D) Logistic Regression

15. A model that uses patient data to assign a label of "High," "Medium," or "Low" risk for a disease is performing:

A) Clustering
B) Regression
C) Mult@i-class classification
D) Dimensionality reduction

16. In a Decision Tree used for classification, what do the leaf nodes represent?

A) The probability of moving to the next node
B) Th@e final class labels or decisions
C) The average value of a continuous target
D) The input features for a new data point

17. In a Regression Tree, what is typically represented at the leaf nodes?

A) The name of the feature used for splitting
B) A random number
C) A categorical class label
D) A c@ontinuous value, often the mean of the target values of the training instances that reach the leaf

18. A key strength of Decision Trees is their:

A) Superior performance on all types of data compared to other algorithms
B) Immunity to overfitting on noisy datasets
C) Inter@pretability; the model's decision-making process is easy to understand and visualize
D) Guarantee to find the global optimum for any dataset

19. The "kernel trick" used in Support Vector Machines (SVMs) allows them to:

A) Initialize the weights of a neural network
B) Fin@d a linear separating hyperplane in a high-dimensional feature space, even when the data is not linearly separable in the original space
C) Perform linear regression more efficiently
D) Grow a tree structure by making sequential decisions

20. The "support vectors" in an SVM are the:

A) The axes of the original feature space
B) . Da@ta points that are closest to the decision boundary and most critical for defining the optimal hyperplane
C) The weights of a neural network layer
D) All data points in the training set

21. When comparing Decision Trees and SVMs, a primary advantage of SVMs is:

A) Their inherent resistance to any form of overfitting
B) Their superior interpretability and simplicity
C) Their lower computational cost for very large datasets
D) The@ir effectiveness in high-dimensional spaces and their ability to model complex, non-linear decision boundaries

22. The process in supervised learning where a model's parameters are adjusted to minimize the difference between its predictions and the true labels is called:

A) Dimensionality reduction
B) Clustering
C) Tr@aining or model fitting
D) Data preprocessing

23. A key challenge in unsupervised learning is evaluating model performance because:

A) The models are always less accurate than supervised models
B) The data is always too small
C) There@ are no ground truth labels to compare the results against
D) The algorithms are not well-defined

24. The task of reducing a 50-dimensional dataset to a 2-dimensional plot for visualization is best accomplished by:

A) An Association rule learning algorithm
B) A Regression algorithm like Linear Regression
C) Dimen@sionality Reduction techniques like Principal Component Analysis (PCA)
D) A Classification algorithm like Logistic Regression

25. If an e-commerce company wants to automatically group its products into categories without any pre-existing labels, it should use:

A) Classification, a supervised learning method
B) A neural network for image recognition
C) Regression, a supervised learning method
D) Clus@tering, an unsupervised learning method

26. The core building block of a neural network is a(n):

A) Support vector
B) Artifi@cial neuron or perceptron, which receives inputs, applies a transformation, and produces an output
C) Principal component
D) Decision node in a tree

27. In a neural network, the function inside a neuron that determines its output based on the weighted sum of its inputs is called the:

A) Loss function
B) Activ@ation function
C) Optimization algorithm
D) Kernel function

28. Which of the following is a non-linear activation function crucial for allowing neural networks to learn complex patterns?

A) The identity function (f(x) = x)
B) A constant function
C) Rectifie@d Linear Unit (ReLU)
D) The mean squared error function

29. The process of "training" a neural network involves:

A) Randomly assigning weights and never changing them
B) Manually setting the weights based on expert knowledge
C) Clustering the input data
D) Iterativ@ely adjusting the weights and biases to minimize a loss function

30. Backpropagation is the algorithm used in neural networks to:

A) Perform clustering on the output layer
B) Initialize the weights before training
C) Efficient@ly calculate the gradient of the loss function with respect to all the weights in the network, enabling the use of gradient descent
D) Visualize the network's architecture

31. Deep Learning is a subfield of machine learning that primarily uses:

A) K-means clustering exclusively
B) Decision trees with a single split
C) Neural n@etworks with many layers (hence "deep")
D) Simple linear regression models

32. A key advantage of deep neural networks over shallower models is their ability to:

A) Be perfectly interpretable, like a decision tree
B) Auto@matically learn hi@erarchical feature representations from data
C) Operate without any need for data preprocessing
D) Always train faster and with less data

33. Convolutional Neural Networks (CNNs) are particularly well-suited for tasks involving:

A) Image @data, due to their architecture which exploits spatial locality
B) Text data and natural language processing
C) Tabular data with many categorical features
D) Unsupervised clustering of audio signals

34. The "convolution" operation in a CNN is designed to:

A) Detect@ local features (like edges or textures) in the input by applying a set of learnable filters
B) Perform the final classification
C) Flatten the input into a single vector
D) Initialize the weights of the network

35. Recurrent Neural Networks (RNNs) are designed to handle:

A) Only image data
B) Independent and identically distributed (IID) data points
C) Static, non-temporal data
D) Sequ@ential data, like time series or text, due to their internal "memory" of previous inputs

36. The "vanishing gradient" problem in deep networks refers to:

A) The model overfitting to the training data
B) The@ gradients becoming exceedingly small as they are backpropagated through many layers, which can halt learning in early layers
C) The loss function reaching a perfect value of zero
D) The gradients becoming too large and causing numerical instability

37. The "training set" is used to:

A) Provide an unbiased evaluation of a final model's performance
B) Tune the model's hyperparameters
C) Fit th@e model's parameters (e.g., the weights in a neural network)
D) Deploy the model in a production environment

38. The "validation set" is primarily used for:

A) Data preprocessing and cleaning
B) The initial training of the model's weights
C) The final, unbiased assessment of the model's generalization error
D) Tun@ing hyperparameters and making decisions about the model architecture during development

39. The "test set" should be:

A) Used repeatedly to tune the model's hyperparameters
B) Use@d only once, for a final evaluation of the model's performance on unseen data after model development is complete
C) Ignored in the machine learning pipeline
D) Used repeatedly to tune the model's hyperparameters

40. Overfitting occurs when a model:

A) Is evaluated using the training set instead of a test set
B) Is too simple to capture the trends in the data
C) Learns @the training data too well, including its noise and outliers, and performs poorly on new, unseen data
D) Fails to learn the underlying pattern in the training data

41. A common technique to reduce overfitting in neural networks is:

A) Increasing the model's capacity by adding more layers
B) Using a smaller training dataset
C) Training for more epochs without any checks
D) Dropo@ut, which randomly ignores a subset of neurons during training

42. The "bias" of a model refers to:

A) The activation function used in the output layer
B) The error from sensitivity to small fluctuations in the training set, leading to overfitting
C) The erro@r from erroneous assumptions in the learning algorithm, leading to underfitting
D) The weights connecting the input layer to the hidden layer

43. The "variance" of a model refers to:

A) The er@ror from sensitivity to small fluctuations in the training set, leading to overfitting
B) The speed at which the model trains
C) The intercept term in a linear regression model
D) The error from erroneous assumptions in the learning algorithm, leading to underfitting

44. The "bias-variance tradeoff" implies that:

A) Decrea@sing bias will typically increase variance, and vice versa. The goal is to find a balance
B) Bias and variance can be minimized to zero simultaneously
C) Only variance is important for model performance
D) Only bias is important for model performance

45. A learning curve that shows high training accuracy but low validation accuracy is a classic sign of:

A) Overf@itting
B) A well-generalized model
C) Underfitting
D) Perfect model performance

46. In a neural network, the "loss function" (or cost function) measures:

A) The number of layers in the network
B) The speed of the backpropagation algorithm
C) The accuracy on the test set
D) How well the model is performing on the training data; it's the quantity we want to minimize during training

47. Gradient Descent is an optimization algorithm that:

A) Iteratively adjusts parameters in the direction that reduces the loss function
B) Randomly searches the parameter space for a good solution
C) Guarantees finding the global minimum for any loss function
D) Is only used for unsupervised learning

48. The "learning rate" in gradient descent controls:

A) The size of the step taken during each parameter update. A rate that is too high can cause divergence, while one that is too low can make training slow
B) The activation function for the output layer
C) The number of layers in a neural network
D) The amount of training data used in each epoch

49. "Epoch" in neural network training refers to:

A) The processing of a single training example
B) A type of regularization technique
C) The final evaluation on the test set
D) One complete pass of the entire training dataset through the learning algorithm

50. "Batch Size" in neural network training refers to:

A) The total number of examples in the training set
B) The number of layers in the network
C) The number of validation examples
D) The number of training examples used in one forward/backward pass before the model's parameters are updated

51. "Epoch" in neural network training refers to: