A) The data is always image-based. B) The data is unlabeled, and the model must find patterns on its own C) The data is generated randomly by the algorithm. D) The data is labeled, meaning each example is paired with a target output.
A) Discover hidden patterns without any guidance B) Generalize from the training data to make accurate predictions on new, unseen data. C) Memorize the entire training dataset perfectly. D) Reduce the dimensionality of the input data for visualization.
A) The loss function B) The label or target output. C) The model's parameters. D) The input features.
A) Diagnosing a tumor as malignant or benign based on medical images. B) Forecasting the temperature for tomorrow. C) Predicting the selling price of a house based on its features. D) Estimating the annual revenue of a company.
A) Dimensionality reduction problem B) Classification problem C) Regression problem. D) Clustering problem.
A) To predict a target variable based on labeled examples B) To classify emails into spam and non-spam folders C) To discover the inherent structure, patterns, or relationships within unlabeled data. D) To achieve perfect accuracy on a held-out test set.
A) Regression B) Reinforcement Learning. C) Clustering D) Classification
A) A support vector machine for classification. B) Logistic Regression, a type of supervised learning. C) Clustering, a type of unsupervised learning. D) Linear Regression, a type of supervised learning.
A) Increase the number of features to improve model accuracy. B) Reduce the number of features while preserving the most important information in the data. C) Predict a continuous output variable. D) Assign categorical labels to each data point.
A) Deep learning with neural networks. B) Regression in supervised learning. C) Classification in supervised learning. D) Association rule learning in unsupervised learning.
A) Labeling data is often expensive and time-consuming, so it leverages a small labeled set with a large unlabeled set. B) It requires no labeled data at all. C) It is always more accurate than fully supervised learning. D) It is simpler to implement than unsupervised learning.
A) "Which category?" B) "What is the underlying group?" C) "How much?" or "How many?" D) "Is this pattern anomalous?"
A) "How much?" or "How many?" B) "How can I reduce the number of features?" C) "What is the correlation between these variables?" D) "Which category?" or "What class?"
A) Logistic Regression. B) Decision Tree for classification. C) k-Nearest Neighbors for classification. D) Linear Regression.
A) Regression. B) Multi-class classification. C) Clustering. D) Dimensionality reduction.
A) The input features for a new data point. B) The final class labels or decisions. C) The probability of moving to the next node. D) The average value of a continuous target.
A) A random number. B) A continuous value, often the mean of the target values of the training instances that reach the leaf. C) The name of the feature used for splitting. D) A categorical class label.
A) Superior performance on all types of data compared to other algorithms. B) Guarantee to find the global optimum for any dataset. C) Interpretability; the model's decision-making process is easy to understand and visualize. D) Immunity to overfitting on noisy datasets.
A) Initialize the weights of a neural network. B) Find a linear separating hyperplane in a high-dimensional feature space, even when the data is not linearly separable in the original space. C) Grow a tree structure by making sequential decisions. D) Perform linear regression more efficiently.
A) Data points that are closest to the decision boundary and most critical for defining the optimal hyperplane. B) The weights of a neural network layer. C) The axes of the original feature space. D) All data points in the training set.
A) Their effectiveness in high-dimensional spaces and their ability to model complex, non-linear decision boundaries. B) Their lower computational cost for very large datasets. C) Their superior interpretability and simplicity. D) Their inherent resistance to any form of overfitting.
A) Training or model fitting. B) Dimensionality reduction. C) Clustering. D) Data preprocessing.
A) The data is always too small. B) The models are always less accurate than supervised models. C) The algorithms are not well-defined. D) There are no ground truth labels to compare the results against.
A) A Classification algorithm like Logistic Regression. B) An Association rule learning algorithm. C) A Regression algorithm like Linear Regression. D) Dimensionality Reduction techniques like Principal Component Analysis (PCA).
A) Clustering, an unsupervised learning method. B) Classification, a supervised learning method. C) A neural network for image recognition. D) Regression, a supervised learning method.
A) Decision node in a tree. B) Support vector. C) Artificial neuron or perceptron, which receives inputs, applies a transformation, and produces an output. D) Principal component.
A) Kernel function. B) Loss function. C) Activation function. D) Optimization algorithm.
A) Rectified Linear Unit (ReLU). B) A constant function. C) The mean squared error function. D) The identity function (f(x) = x).
A) Randomly assigning weights and never changing them. B) Manually setting the weights based on expert knowledge. C) Clustering the input data. D) Iteratively adjusting the weights and biases to minimize a loss function.
A) Initialize the weights before training. B) Visualize the network's architecture. C) Efficiently calculate the gradient of the loss function with respect to all the weights in the network, enabling the use of gradient descent. D) Perform clustering on the output layer.
A) Simple linear regression models. B) Decision trees with a single split. C) K-means clustering exclusively. D) Neural networks with many layers (hence "deep").
A) Automatically learn hierarchical feature representations from data. B) Be perfectly interpretable, like a decision tree. C) Operate without any need for data preprocessing. D) Always train faster and with less data.
A) Text data and natural language processing. B) Image data, due to their architecture which exploits spatial locality. C) Unsupervised clustering of audio signals. D) Tabular data with many categorical features.
A) Flatten the input into a single vector. B) Detect local features (like edges or textures) in the input by applying a set of learnable filters. C) Perform the final classification. D) Initialize the weights of the network.
A) Only image data. B) Independent and identically distributed (IID) data points. C) Static, non-temporal data. D) Sequential data, like time series or text, due to their internal "memory" of previous inputs.
A) The gradients becoming too large and causing numerical instability. B) The model overfitting to the training data. C) The loss function reaching a perfect value of zero. D) The gradients becoming exceedingly small as they are backpropagated through many layers, which can halt learning in early layers.
A) Provide an unbiased evaluation of a final model's performance. B) Deploy the model in a production environment. C) Fit the model's parameters (e.g., the weights in a neural network). D) Tune the model's hyperparameters.
A) The initial training of the model's weights. B) The final, unbiased assessment of the model's generalization error. C) Data preprocessing and cleaning. D) Tuning hyperparameters and making decisions about the model architecture during development.
A) Used repeatedly to tune the model's hyperparameters. B) Ignored in the machine learning pipeline. C) Used only once, for a final evaluation of the model's performance on unseen data after model development is complete. D) Used as part of the training data to improve accuracy.
A) Learns the training data too well, including its noise and outliers, and performs poorly on new, unseen data. B) Is evaluated using the training set instead of a test set. C) Fails to learn the underlying pattern in the training data. D) Is too simple to capture the trends in the data.
A) Using a smaller training dataset. B) Training for more epochs without any checks. C) Increasing the model's capacity by adding more layers. D) Dropout, which randomly ignores a subset of neurons during training.
A) The weights connecting the input layer to the hidden layer. B) The error from sensitivity to small fluctuations in the training set, leading to overfitting. C) The error from erroneous assumptions in the learning algorithm, leading to underfitting. D) The activation function used in the output layer.
A) The speed at which the model trains. B) The error from sensitivity to small fluctuations in the training set, leading to overfitting. C) The error from erroneous assumptions in the learning algorithm, leading to underfitting. D) The intercept term in a linear regression model.
A) Bias and variance can be minimized to zero simultaneously. B) Decreasing bias will typically increase variance, and vice versa. The goal is to find a balance. C) Only variance is important for model performance. D) Only bias is important for model performance.
A) Overfitting. B) Perfect model performance. C) A well-generalized model. D) Underfitting.
A) The number of layers in the network. B) The accuracy on the test set. C) How well the model is performing on the training data; it's the quantity we want to minimize during training. D) The speed of the backpropagation algorithm.
A) Guarantees finding the global minimum for any loss function. B) Is only used for unsupervised learning. C) Randomly searches the parameter space for a good solution. D) Iteratively adjusts parameters in the direction that reduces the loss function.
A) The activation function for the output layer. B) The amount of training data used in each epoch. C) The size of the step taken during each parameter update. A rate that is too high can cause divergence, while one that is too low can make training slow. D) The number of layers in a neural network.
A) A type of regularization technique. B) The final evaluation on the test set. C) One complete pass of the entire training dataset through the learning algorithm. D) The processing of a single training example.
A) The total number of examples in the training set. B) The number of layers in the network. C) The number of validation examples. D) The number of training examples used in one forward/backward pass before the model's parameters are updated.
A) 1, meaning the parameters are updated after each individual training example. B) The entire training set. C) Exactly 50% of the training set. D) A random number between 1 and 100. |