A) The data is generated randomly by the algorithm. B) The data is unlabeled, and the model must find patterns on its own. C) The data is labeled, meaning each example is paired with a target output. D) The data is generated randomly by the algorithm.
A) Reduce the dimensionality of the input data for visualization. B) Memorize the entire training dataset perfectly. C) Generalize from the training data to make accurate predictions on new, unseen data. D) Discover hidden patterns without any guidance.
A) The input features. B) The model's parameters. C) The label or target output. D) The loss function.
A) Forecasting the temperature for tomorrow. B) Diagnosing a tumor as malignant or benign based on medical images. C) Estimating the annual revenue of a company. D) Predicting the selling price of a house based on its features.
A) Regression problem. B) Dimensionality reduction problem. C) Classification problem. D) Clustering problem.
A) To discover the inherent structure, patterns, or relationships within unlabeled data. B) To achieve perfect accuracy on a held-out test set. C) To classify emails into spam and non-spam folders. D) To predict a target variable based on labeled examples.
A) Clustering. B) Classification. C) Reinforcement Learning. D) Regression.
A) A support vector machine for classification. B) Linear Regression, a type of supervised learning. C) Logistic Regression, a type of supervised learning. D) Clustering, a type of unsupervised learning.
A) Reduce the number of features while preserving the most important information in the data. B) Assign categorical labels to each data point. C) Predict a continuous output variable. D) Increase the number of features to improve model accuracy.
A) Classification in supervised learning. B) Deep learning with neural networks. C) Regression in supervised learning. D) Association rule learning in unsupervised learning.
A) It requires no labeled data at all. B) Labeling data is often expensive and time-consuming, so it leverages a small labeled set with a large unlabeled set. C) It is always more accurate than fully supervised learning. D) It is simpler to implement than unsupervised learning.
A) "Is this pattern anomalous?" B) "What is the underlying group?" C) "Which category?" D) "How much?" or "How many?"
A) "Which category?" or "What class?" B) "How can I reduce the number of features?" C) "How much?" or "How many?" D) "What is the correlation between these variables?"
A) Decision Tree for classification B) k-Nearest Neighbors for classification C) Linear Regression D) Logistic Regression
A) Multi-class classification B) Dimensionality reduction C) Regression D) Clustering
A) The probability of moving to the next node B) The final class labels or decisions C) The input features for a new data point D) The average value of a continuous target
A) A random number B) The name of the feature used for splitting C) A categorical class label D) A continuous value, often the mean of the target values of the training instances that reach the leaf
A) Immunity to overfitting on noisy datasets B) Superior performance on all types of data compared to other algorithms C) Interpretability; the model's decision-making process is easy to understand and visualize D) Guarantee to find the global optimum for any dataset
A) Grow a tree structure by making sequential decisions B) Initialize the weights of a neural network C) Find a linear separating hyperplane in a high-dimensional feature space, even when the data is not linearly separable in the original space D) Perform linear regression more efficiently
A) . Data points that are closest to the decision boundary and most critical for defining the optimal hyperplane B) The weights of a neural network layer C) All data points in the training set D) The axes of the original feature space
A) Their superior interpretability and simplicity B) Their lower computational cost for very large datasets C) Their effectiveness in high-dimensional spaces and their ability to model complex, non-linear decision boundaries D) Their inherent resistance to any form of overfitting
A) Data preprocessing B) Dimensionality reduction C) Training or model fitting D) Clustering
A) The algorithms are not well-defined B) There are no ground truth labels to compare the results against C) The models are always less accurate than supervised models D) The data is always too small
A) A Regression algorithm like Linear Regression B) An Association rule learning algorithm C) A Classification algorithm like Logistic Regression D) Dimensionality Reduction techniques like Principal Component Analysis (PCA)
A) Regression, a supervised learning method B) A neural network for image recognition C) Clustering, an unsupervised learning method D) Classification, a supervised learning method
A) Principal component B) Artificial neuron or perceptron, which receives inputs, applies a transformation, and produces an output C) Support vector D) Decision node in a tree
A) Activation function B) Optimization algorithm C) Loss function D) Kernel function
A) The identity function (f(x) = x) B) A constant function C) Rectified Linear Unit (ReLU) D) The mean squared error function
A) Randomly assigning weights and never changing them B) Iteratively adjusting the weights and biases to minimize a loss function C) Manually setting the weights based on expert knowledge D) Clustering the input data
A) Visualize the network's architecture B) Initialize the weights before training C) Perform clustering on the output layer D) Efficiently calculate the gradient of the loss function with respect to all the weights in the network, enabling the use of gradient descent
A) Simple linear regression models B) Neural networks with many layers (hence "deep") C) Decision trees with a single split D) K-means clustering exclusively
A) Automatically learn hierarchical feature representations from data B) Always train faster and with less data C) Operate without any need for data preprocessing D) Be perfectly interpretable, like a decision tree
A) Unsupervised clustering of audio signals B) Tabular data with many categorical features C) Image data, due to their architecture which exploits spatial locality D) Text data and natural language processing
A) Perform the final classification B) Detect local features (like edges or textures) in the input by applying a set of learnable filters C) Flatten the input into a single vector D) Initialize the weights of the network
A) Sequential data, like time series or text, due to their internal "memory" of previous inputs B) Static, non-temporal data C) Only image data D) Independent and identically distributed (IID) data points
A) The loss function reaching a perfect value of zero B) The gradients becoming exceedingly small as they are backpropagated through many layers, which can halt learning in early layers C) The gradients becoming too large and causing numerical instability D) The model overfitting to the training data
A) Fit the model's parameters (e.g., the weights in a neural network) B) Tune the model's hyperparameters C) Provide an unbiased evaluation of a final model's performance D) Deploy the model in a production environment
A) The final, unbiased assessment of the model's generalization error B) The initial training of the model's weights C) Tuning hyperparameters and making decisions about the model architecture during development D) Data preprocessing and cleaning
A) Ignored in the machine learning pipeline B) Used repeatedly to tune the model's hyperparameters C) Used repeatedly to tune the model's hyperparameters D) Used only once, for a final evaluation of the model's performance on unseen data after model development is complete
A) Learns the training data too well, including its noise and outliers, and performs poorly on new, unseen data B) Is evaluated using the training set instead of a test set C) Fails to learn the underlying pattern in the training data D) Is too simple to capture the trends in the data
A) Dropout, which randomly ignores a subset of neurons during training B) Increasing the model's capacity by adding more layers C) Using a smaller training dataset D) Training for more epochs without any checks
A) The error from sensitivity to small fluctuations in the training set, leading to overfitting B) The weights connecting the input layer to the hidden layer C) The activation function used in the output layer D) The error from erroneous assumptions in the learning algorithm, leading to underfitting
A) The speed at which the model trains B) The error from erroneous assumptions in the learning algorithm, leading to underfitting C) The error from sensitivity to small fluctuations in the training set, leading to overfitting D) The intercept term in a linear regression model
A) Only bias is important for model performance B) Only variance is important for model performance C) Decreasing bias will typically increase variance, and vice versa. The goal is to find a balance D) Bias and variance can be minimized to zero simultaneously
A) A well-generalized model B) Underfitting C) Overfitting D) Perfect model performance
A) The number of layers in the network B) The speed of the backpropagation algorithm C) How well the model is performing on the training data; it's the quantity we want to minimize during training D) The accuracy on the test set
A) Guarantees finding the global minimum for any loss function B) Randomly searches the parameter space for a good solution C) Is only used for unsupervised learning D) Iteratively adjusts parameters in the direction that reduces the loss function
A) The number of layers in a neural network B) The size of the step taken during each parameter update. A rate that is too high can cause divergence, while one that is too low can make training slow C) The activation function for the output layer D) The amount of training data used in each epoch
A) One complete pass of the entire training dataset through the learning algorithm B) A type of regularization technique C) The final evaluation on the test set D) The processing of a single training example
A) The total number of examples in the training set B) The number of training examples used in one forward/backward pass before the model's parameters are updated C) The number of layers in the network D) The number of validation examples |