A) Offers a balance between the efficiency of batch GD and the robustness of SGD. B) Does not require a loss function. C) Is guaranteed to converge faster than any other method. D) Is only applicable to linear models.
A) Training a model from scratch on every new problem. B) Taking a model pre-trained on a large dataset (e.g., ImageNet) and fine-tuning it for a new, specific task with a smaller dataset. C) Using only unsupervised learning techniques. D) Forgetting everything a model has learned.
A) Supervised classification of images. B) Unsupervised learning tasks like dimensionality reduction and data denoising.b C) Predicting continuous values in a regression task. D) Reinforcement learning.
A) A convolutional layer followed by an RNN layer. B) An encoder that compresses the input and a decoder that reconstructs the input from the compression. C) A single output neuron with a linear activation. D) Only a single layer of perceptrons.
A) The proportion of actual positives that were identified correctly. B) The proportion of total predictions that were correct. C) The proportion of positive identifications that were actually correct. D) The harmonic mean of precision and recall.
A) You are evaluating a regression model. B) The cost of false positives is high (e.g., in spam detection, where you don't want to flag legitimate emails as spam). C) The cost of false negatives is high (e.g., in disease screening, where you don't want to miss a sick patient). D) You need a single metric that combines precision and recall.
A) You need a single metric that combines precision and recall. B) The cost of false negatives is high (e.g., in disease screening, where you don't want to miss a sick patient). C) The cost of false positives is high (e.g., in spam detection). D) You are evaluating a clustering model.
A) A metric used exclusively for regression. B) The difference between precision and recall. C) The arithmetic mean of precision and recall. D) The harmonic mean of precision and recall, providing a single score that balances both concerns.
A) The average of the squares of the errors between predicted and actual values. B) The accuracy of a classification model. C) The total number of misclassified instances. D) The variance of the input features.
A) The architecture of a neural network. B) The loss of a regression model over time. C) The clustering quality of a K-means algorithm. D) The performance of a binary classification model at various classification thresholds.
A) 0.0. B) 0.5. C) 1.0. D) -1.0.
A) Replace the need for a separate test set. B) Visualize high-dimensional data. C) Increase the size of the training dataset. D) Obtain a more robust estimate of model performance by training and evaluating the model K times on different splits of the data.
A) The output of a linear function.u B) The majority vote among its K closest neighbors in the feature space. C) A random selection from the training set. D) A single, pre-defined rule.
A) Is the number of features in the dataset. B) Is always set to 1 for the best performance. C) Controls the model's flexibility. A small K can lead to overfitting, while a large K can lead to underfitting. D) Is the learning rate for the algorithm.
A) Predicting a target variable using linear combinations of features. B) Classifying data using a decision boundary. C) Clustering data into K groups. D) Finding new, uncorrelated dimensions (principal components) that capture the maximum variance in the data.u
A) Captures the least possible variance in the data. B) Is randomly oriented. C) Captures the greatest possible variance in the data. D) Is perpendicular to all other components.
A) The data is perfectly classified into known labels. B) The data is projected onto a single dimension. C) The within-cluster variance is minimized. D) The between-cluster variance is minimized.
A) Evaluate the accuracy of a classification model. B) Determine the learning rate for gradient descent. C) Initialize the cluster centroids. D) Help choose the optimal number of clusters K by looking for a "bend" in the plot of within-cluster variance.
A) Always have the lowest possible accuracy. B) Are very simple and cannot handle complex data. C) Make a strong (naive) assumption that all features are conditionally independent given the class label. D) Do not use probability in their predictions.
A) Classification algorithm that models the probability of a binary outcome using a logistic function. B) Regression algorithm for predicting continuous values. C) Dimensionality reduction technique. D) Clustering algorithm for grouping unlabeled data.
A) Probability that the input belongs to a particular class. B) Exact value of the target variable. C) Number of features in the input. D) Distance to the decision boundary.
A) Support Vector Machines. B) Decision Trees to reduce overfitting and improve generalization. C) Linear Regression models. D) K-NN models.
A) Reduce variance by training individual trees on random subsets of the data and averaging their results B) Reduce bias by making trees more complex. C) Increase the speed of a single decision tree. D) Perform feature extraction like PCA.
A) Build models sequentially, where each new model corrects the errors of the previous ones. B) Build all models independently and average them. C) Are exclusively used for unsupervised learning. D) Do not require any parameter tuning.
A) The process of deleting all features from a dataset. B) The automatic learning of features by a deep neural network. C) The evaluation of a model's final performance. D) The process of using domain knowledge to create new input features that make machine learning algorithms work better.
A) Reduce the dimensionality of image data. B) Normalize continuous numerical features. C) Cluster similar data points together. D) Convert categorical variables into a binary (0/1) format that can be provided to ML algorithms.
A) Are based on tree-based models like Decision Trees and Random Forests. B) Are based on distance calculations or gradient descent, such as SVM and Neural Networks. C) Are used for clustering only. D) Are used for association rule learning.
A) Dimensionality reduction always improves model performance. B) All datasets should have as many features as possible. C) There are never enough features to train a good model. D) As the number of features grows, the data becomes increasingly sparse, making it harder to find meaningful patterns.
A) Increase the variance of a model. B) Speed up the training time of a model. C) Make models more complex to fit the training data better. D) Prevent overfitting by adding a penalty term to the loss function that discourages complex models.
A) All features having non-zero weights. B) Sparse models where the weights of less important features are driven to zero, effectively performing feature selection. C) A decrease in model interpretability. D) Increased model complexity.
A) Configuration settings for the learning algorithm that are not learned from the data and must be set prior to training (e.g., learning rate, K in K-NN). B) The parameters that the model learns during training (e.g., weights in a neural network). C) The input features of the model. D) The output predictions of the model.
A) Cleaning the raw data. B) Deploying the final model. C) Searching for the best combination of hyperparameters that results in the best model performance. D) Training the model's internal weights.
A) Exhaustively searching over a specified set of hyperparameter values. B) Ignoring hyperparameters altogether. C) Randomly sampling hyperparameter combinations from a distribution. D) Using a separate neural network to predict the best hyperparameters.
A) Halting the training process when performance on a validation set starts to degrade, indicating the onset of overfitting. B) Stopping the training after a fixed, very short number of epochs. C) Using a very small learning rate. D) Starting the training process later than scheduled.
A) A single layer of neurons. B) Recurrent layers for processing sequences. C) Convolutional layers for processing images. D) Fully connected layers, where each neuron in one layer is connected to every neuron in the next layer.
A) Binary classification problems. B) Multi-class classification problems, as it outputs a probability distribution over the possible classes. C) Unsupervised learning problems. D) Regression problems.
A) Is guaranteed to find the global minimum for any function. B) Is only used for unsupervised learning. C) Combines the advantages of two other extensions of stochastic gradient descent, AdaGrad and RMSProp. D) Does not require any hyperparameters.
A) Improve the stability and speed of neural network training by normalizing the inputs to each layer. B) Replace the need for an activation function. C) Normalize the entire dataset before feeding it into the network. D) Increase the batch size during training.
A) Regression model's accuracy. B) Classification model on a set of test data for which the true values are known. C) Clustering algorithm's group assignments. D) Dimensionality reduction technique's effectiveness.
A) The model incorrectly predicted the negative class. B) The model incorrectly predicted the positive class. C) The model correctly predicted the negative class. D) The model correctly predicted the positive class.
A) The learning rate is set too high. B) The features are not scaled properly. C) One class in the training data has significantly more examples than another, which can bias the model. D) The model is too complex for the data.
A) Generates synthetic examples for the minority class to balance the dataset. B) Combines all classes into one. C) Ignores the minority class completely. D) Deletes examples from the majority class at random.
A) It is only used for clustering unlabeled data. B) It requires a fully labeled dataset for training. C) It learns by interacting with an environment and receiving rewards or penalties for actions, without a labeled dataset. D) It is a simpler and less powerful approach.
A) A policy that tells an agent what action to take under what circumstances by learning a value function. B) A clustering of possible actions. C) A decision tree for classification. D) The principal components of a state space.
A) Generating new, original text without any input. B) Grouping similar news articles without labels. C) Sentiment analysis, where text is classified as positive, negative, or neutral. D) Reducing the dimensionality of word vectors.
A) Are a type of clustering algorithm. B) Are used only for image classification. C) Represent words as dense vectors in a continuous space, capturing semantic meaning. D) Represent words as simple one-hot encoded vectors.
A) A Generator and a Discriminator, which are trained in opposition to each other. B) Two identical Convolutional Neural Networks. C) A single, large Regression network. D) An Encoder and a Decoder for compression.
A) Discriminating between real and fake data. B) Creating new, synthetic data that is indistinguishable from real data. C) Reducing the dimensionality of the input. D) Classifying input images into categories.
A) Binary classifier that tries to correctly label data as real (from the dataset) or fake (from the generator). B) Clustering algorithm grouping similar images. C) Dimensionality reduction technique. D) Regression model predicting a continuous value. |