Coele1
  • 1. What is the defining characteristic of the training data used in supervised learning?
A) The data is always image-based.
B) The data is generated randomly by the algorithm.
C) The data is unlabeled, and the model must find patterns on its own
D) The data is labeled, meaning each example is paired with a target output.
  • 2. The primary goal of a supervised learning model is to:
A) Reduce the dimensionality of the input data for visualization.
B) Generalize from the training data to make accurate predictions on new, unseen data.
C) Memorize the entire training dataset perfectly.
D) Discover hidden patterns without any guidance
  • 3. In the analogy of a child learning from flashcards, the animal's name on the card represents what component of supervised learning?

    P
A) The label or target output.
B) The model's parameters.
C) The loss function
D) The input features.
  • 4. Which of the following tasks is a classic example of a classification problem?
A) Estimating the annual revenue of a company.
B) Predicting the selling price of a house based on its features.
C) Forecasting the temperature for tomorrow.
D) Diagnosing a tumor as malignant or benign based on medical images.
  • 5. A model that predicts the continuous value of a stock price for the next day is solving a:
A) Clustering problem.
B) Regression problem.
C) Dimensionality reduction problem
D) Classification problem
  • 6. What is the core objective of unsupervised learning?
A) To predict a target variable based on labeled examples
B) To classify emails into spam and non-spam folders
C) To achieve perfect accuracy on a held-out test set.
D) To discover the inherent structure, patterns, or relationships within unlabeled data.
  • 7. In the analogy of a child grouping toys without instructions, the act of putting all the cars together is most similar to which unsupervised learning technique?
A) Classification
B) Regression
C) Reinforcement Learning.
D) Clustering
  • 8. Grouping customers based solely on their purchasing behavior, without pre-defined categories, is an application of:
A) A support vector machine for classification.
B) Clustering, a type of unsupervised learning.
C) Logistic Regression, a type of supervised learning.
D) Linear Regression, a type of supervised learning.
  • 9. The main goal of dimensionality reduction techniques like PCA is to:
A) Assign categorical labels to each data point.
B) Increase the number of features to improve model accuracy.
C) Predict a continuous output variable.
D) Reduce the number of features while preserving the most important information in the data.
  • 10. Market basket analysis, which finds rules like "if chips then soda," is a classic example of:
    Association rule learning in unsupervised learning.
    Classification in supervised learning.
    Regression in supervised learning.
    Deep learning with neural networks.
A) Classification in supervised learning.
B) Deep learning with neural networks.
C) Regression in supervised learning.
D) Association rule learning in unsupervised learning.
  • 11. Semi-supervised learning is particularly useful in real-world scenarios because:
A) It requires no labeled data at all.
B) Labeling data is often expensive and time-consuming, so it leverages a small labeled set with a large unlabeled set.
C) It is simpler to implement than unsupervised learning.
D) It is always more accurate than fully supervised learning.
  • 12. The fundamental question that a regression model aims to answer is:
A) "What is the underlying group?"
B) "Is this pattern anomalous?"
C) "How much?" or "How many?"
D) "Which category?"
  • 13. The fundamental question that a classification model aims to answer is:
A) "Which category?" or "What class?"
B) "What is the correlation between these variables?"
C) "How much?" or "How many?"
D) "How can I reduce the number of features?"
  • 14. Which algorithm is most directly designed for predicting a continuous target variable?
A) Logistic Regression.
B) k-Nearest Neighbors for classification.
C) Decision Tree for classification.
D) Linear Regression.
  • 15. A model that uses patient data to assign a label of "High," "Medium," or "Low" risk for a disease is performing:
A) Clustering.
B) Multi-class classification.
C) Dimensionality reduction.
D) Regression.
  • 16. In a Decision Tree used for classification, what do the leaf nodes represent?
A) The average value of a continuous target.
B) The probability of moving to the next node.
C) The final class labels or decisions.
D) The input features for a new data point.
  • 17. In a Regression Tree, what is typically represented at the leaf nodes?
A) A random number.
B) A categorical class label.
C) A continuous value, often the mean of the target values of the training instances that reach the leaf.
D) The name of the feature used for splitting.
  • 18. A key strength of Decision Trees is their:
A) Immunity to overfitting on noisy datasets.
B) Superior performance on all types of data compared to other algorithms.
C) Interpretability; the model's decision-making process is easy to understand and visualize.
D) Guarantee to find the global optimum for any dataset.
  • 19. The "kernel trick" used in Support Vector Machines (SVMs) allows them to:
A) Grow a tree structure by making sequential decisions.
B) Perform linear regression more efficiently.
C) Initialize the weights of a neural network.
D) Find a linear separating hyperplane in a high-dimensional feature space, even when the data is not linearly separable in the original space.
  • 20. The "support vectors" in an SVM are the:
A) The weights of a neural network layer.
B) The axes of the original feature space.
C) Data points that are closest to the decision boundary and most critical for defining the optimal hyperplane.
D) All data points in the training set.
  • 21. When comparing Decision Trees and SVMs, a primary advantage of SVMs is:
A) Their superior interpretability and simplicity.
B) Their lower computational cost for very large datasets.
C) Their inherent resistance to any form of overfitting.
D) Their effectiveness in high-dimensional spaces and their ability to model complex, non-linear decision boundaries.
  • 22. The process in supervised learning where a model's parameters are adjusted to minimize the difference between its predictions and the true labels is called:
A) Dimensionality reduction.
B) Clustering.
C) Training or model fitting.
D) Data preprocessing.
  • 23. A key challenge in unsupervised learning is evaluating model performance because:
A) There are no ground truth labels to compare the results against.
B) The algorithms are not well-defined.
C) The models are always less accurate than supervised models.
D) The data is always too small.
  • 24. The task of reducing a 50-dimensional dataset to a 2-dimensional plot for visualization is best accomplished by:
A) A Classification algorithm like Logistic Regression.
B) An Association rule learning algorithm.
C) A Regression algorithm like Linear Regression.
D) Dimensionality Reduction techniques like Principal Component Analysis (PCA).
  • 25. If an e-commerce company wants to automatically group its products into categories without any pre-existing labels, it should use:
A) Regression, a supervised learning method.
B) Clustering, an unsupervised learning method.
C) A neural network for image recognition.
D) Classification, a supervised learning method.
  • 26. The core building block of a neural network is a(n):
A) Principal component.
B) Support vector.
C) Decision node in a tree.
D) Artificial neuron or perceptron, which receives inputs, applies a transformation, and produces an output.
  • 27. In a neural network, the function inside a neuron that determines its output based on the weighted sum of its inputs is called the:
A) Activation function.
B) Loss function.
C) Optimization algorithm.
D) Kernel function.
  • 28. Which of the following is a non-linear activation function crucial for allowing neural networks to learn complex patterns?
A) A constant function.
B) Rectified Linear Unit (ReLU).
C) The mean squared error function.
D) The identity function (f(x) = x).
  • 29. The process of "training" a neural network involves:
A) Manually setting the weights based on expert knowledge.
B) Randomly assigning weights and never changing them.
C) Clustering the input data.
D) Iteratively adjusting the weights and biases to minimize a loss function.
  • 30. Backpropagation is the algorithm used in neural networks to:
A) Visualize the network's architecture.
B) Initialize the weights before training.
C) Efficiently calculate the gradient of the loss function with respect to all the weights in the network, enabling the use of gradient descent.
D) Perform clustering on the output layer.
  • 31. Deep Learning is a subfield of machine learning that primarily uses:
A) Decision trees with a single split.
B) Neural networks with many layers (hence "deep").
C) Simple linear regression models.
D) K-means clustering exclusively.
  • 32. A key advantage of deep neural networks over shallower models is their ability to:
A) Always train faster and with less data.
B) Operate without any need for data preprocessing.
C) Automatically learn hierarchical feature representations from data.
D) Be perfectly interpretable, like a decision tree.
  • 33. Convolutional Neural Networks (CNNs) are particularly well-suited for tasks involving:
A) Tabular data with many categorical features.
B) Unsupervised clustering of audio signals.
C) Text data and natural language processing.
D) Image data, due to their architecture which exploits spatial locality.
  • 34. The "convolution" operation in a CNN is designed to:
A) Initialize the weights of the network.
B) Detect local features (like edges or textures) in the input by applying a set of learnable filters.
C) Flatten the input into a single vector.
D) Perform the final classification.
  • 35. Recurrent Neural Networks (RNNs) are designed to handle:
A) Static, non-temporal data.
B) Only image data.
C) Sequential data, like time series or text, due to their internal "memory" of previous inputs.
D) Independent and identically distributed (IID) data points.
  • 36. The "vanishing gradient" problem in deep networks refers to:
A) The gradients becoming too large and causing numerical instability.
B) The loss function reaching a perfect value of zero.
C) The gradients becoming exceedingly small as they are backpropagated through many layers, which can halt learning in early layers.
D) The model overfitting to the training data.
  • 37. The "training set" is used to:
A) Fit the model's parameters (e.g., the weights in a neural network).
B) Deploy the model in a production environment.
C) Provide an unbiased evaluation of a final model's performance.
D) Tune the model's hyperparameters.
  • 38. The "validation set" is primarily used for:
A) The final, unbiased assessment of the model's generalization error.
B) Tuning hyperparameters and making decisions about the model architecture during development.
C) The initial training of the model's weights.
D) Data preprocessing and cleaning.
  • 39. The "test set" should be:
A) Ignored in the machine learning pipeline.
B) Used as part of the training data to improve accuracy.
C) Used repeatedly to tune the model's hyperparameters.
D) Used only once, for a final evaluation of the model's performance on unseen data after model development is complete.
  • 40. Overfitting occurs when a model:
A) Is too simple to capture the trends in the data.
B) Is evaluated using the training set instead of a test set.
C) Learns the training data too well, including its noise and outliers, and performs poorly on new, unseen data.
D) Fails to learn the underlying pattern in the training data.
  • 41. A common technique to reduce overfitting in neural networks is:
A) Using a smaller training dataset.
B) Increasing the model's capacity by adding more layers.
C) Training for more epochs without any checks.
D) Dropout, which randomly ignores a subset of neurons during training.
  • 42. The "bias" of a model refers to:
A) The error from sensitivity to small fluctuations in the training set, leading to overfitting.
B) The weights connecting the input layer to the hidden layer.
C) The activation function used in the output layer.
D) The error from erroneous assumptions in the learning algorithm, leading to underfitting.
  • 43. The "variance" of a model refers to:
A) The error from sensitivity to small fluctuations in the training set, leading to overfitting.
B) The intercept term in a linear regression model.
C) The speed at which the model trains.
D) The error from erroneous assumptions in the learning algorithm, leading to underfitting.
  • 44. The "bias-variance tradeoff" implies that:
A) Only variance is important for model performance.
B) Bias and variance can be minimized to zero simultaneously.
C) Only bias is important for model performance.
D) Decreasing bias will typically increase variance, and vice versa. The goal is to find a balance.
  • 45. A learning curve that shows high training accuracy but low validation accuracy is a classic sign of:
A) A well-generalized model.
B) Overfitting.
C) Underfitting.
D) Perfect model performance.
  • 46. In a neural network, the "loss function" (or cost function) measures:
A) The speed of the backpropagation algorithm.
B) How well the model is performing on the training data; it's the quantity we want to minimize during training.
C) The number of layers in the network.
D) The accuracy on the test set.
  • 47. Gradient Descent is an optimization algorithm that:
A) Is only used for unsupervised learning.
B) Iteratively adjusts parameters in the direction that reduces the loss function.
C) Randomly searches the parameter space for a good solution.
D) Guarantees finding the global minimum for any loss function.
  • 48. The "learning rate" in gradient descent controls:
A) The amount of training data used in each epoch.
B) The number of layers in a neural network.
C) The size of the step taken during each parameter update. A rate that is too high can cause divergence, while one that is too low can make training slow.
D) The activation function for the output layer.
  • 49. "Epoch" in neural network training refers to:
A) A type of regularization technique.
B) One complete pass of the entire training dataset through the learning algorithm.
C) The processing of a single training example.
D) The final evaluation on the test set.
  • 50. "Batch Size" in neural network training refers to:
A) The number of layers in the network.
B) The total number of examples in the training set.
C) The number of validation examples.
D) The number of training examples used in one forward/backward pass before the model's parameters are updated.
  • 51. "Stochastic Gradient Descent" (SGD) uses a batch size of:
A) Exactly 50% of the training set.
B) A random number between 1 and 100.
C) The entire training set.
D) 1, meaning the parameters are updated after each individual training example.
Created with That Quiz — a math test site for students of all grade levels.