ThatQuiz Test Library Take this test now
COELE1-2
Contributed by: Billo
  • 1. "Mini-batch Gradient Descent" is often preferred because it:
A) Offers a balance between the efficiency of batch GD and the robustness of SGD.
B) Does not require a loss function.
C) Is guaranteed to converge faster than any other method.
D) Is only applicable to linear models.
  • 2. "Transfer Learning" in deep learning involves:
A) Using only unsupervised learning techniques.
B) Training a model from scratch on every new problem.
C) Taking a model pre-trained on a large dataset (e.g., ImageNet) and fine-tuning it for a new, specific task with a smaller dataset.
D) Forgetting everything a model has learned.
  • 3. An "Autoencoder" is a type of neural network primarily used for:
A) Unsupervised learning tasks like dimensionality reduction and data denoising.b
B) Reinforcement learning.
C) Predicting continuous values in a regression task.
D) Supervised classification of images.
  • 4. The architecture of a typical autoencoder consists of:
A) An encoder that compresses the input and a decoder that reconstructs the input from the compression.
B) A convolutional layer followed by an RNN layer.
C) A single output neuron with a linear activation.
D) Only a single layer of perceptrons.
  • 5. In the context of model evaluation for classification, "Accuracy" is defined as:
A) The proportion of total predictions that were correct.
B) The proportion of actual positives that were identified correctly.
C) The proportion of positive identifications that were actually correct.
D) The harmonic mean of precision and recall.
  • 6. "Precision" is an important metric when:
A) The cost of false positives is high (e.g., in spam detection, where you don't want to flag legitimate emails as spam).
B) You need a single metric that combines precision and recall.
C) You are evaluating a regression model.
D) The cost of false negatives is high (e.g., in disease screening, where you don't want to miss a sick patient).
  • 7. "Recall" is an important metric when:
A) You need a single metric that combines precision and recall.
B) The cost of false positives is high (e.g., in spam detection).
C) You are evaluating a clustering model.
D) The cost of false negatives is high (e.g., in disease screening, where you don't want to miss a sick patient).
  • 8. The "F1 Score" is:
A) The arithmetic mean of precision and recall.
B) The harmonic mean of precision and recall, providing a single score that balances both concerns.
C) A metric used exclusively for regression.
D) The difference between precision and recall.
  • 9. For a regression model, the "Mean Squared Error" (MSE) measures:
A) The accuracy of a classification model.
B) The total number of misclassified instances.
C) The average of the squares of the errors between predicted and actual values.
D) The variance of the input features.
  • 10. The "ROC Curve" is a tool used to evaluate:
A) The performance of a binary classification model at various classification thresholds.
B) The clustering quality of a K-means algorithm.
C) The loss of a regression model over time.
D) The architecture of a neural network.
  • 11. "Area Under the ROC Curve" (AUC) provides an aggregate measure of performance across all possible classification thresholds. A perfect model has an AUC of:
A) -1.0.
B) 0.5.
C) 0.0.
D) 1.0.
  • 12. "K-fold Cross-Validation" is a technique used to:
A) Obtain a more robust estimate of model performance by training and evaluating the model K times on different splits of the data.
B) Visualize high-dimensional data.
C) Replace the need for a separate test set.
D) Increase the size of the training dataset.
  • 13. In the K-Nearest Neighbors (K-NN) algorithm for classification, the class of a new data point is determined by:
A) A single, pre-defined rule.
B) A random selection from the training set.
C) The output of a linear function.u
D) The majority vote among its K closest neighbors in the feature space.
  • 14. The parameter 'K' in the K-NN algorithm:
A) Is always set to 1 for the best performance.
B) Is the learning rate for the algorithm.
C) Controls the model's flexibility. A small K can lead to overfitting, while a large K can lead to underfitting.
D) Is the number of features in the dataset.
  • 15. "Principal Component Analysis" (PCA) works by:
A) Finding new, uncorrelated dimensions (principal components) that capture the maximum variance in the data.u
B) Predicting a target variable using linear combinations of features.
C) Classifying data using a decision boundary.
D) Clustering data into K groups.
  • 16. The first principal component in PCA is the direction in the feature space that:
A) Is randomly oriented.
B) Captures the greatest possible variance in the data.
C) Is perpendicular to all other components.
D) Captures the least possible variance in the data.
  • 17. "K-Means Clustering" aims to partition data into K clusters such that:
A) The data is perfectly classified into known labels.
B) The within-cluster variance is minimized.
C) The data is projected onto a single dimension.
D) The between-cluster variance is minimized.
  • 18. The "Elbow Method" is a heuristic used in K-Means to:
A) Help choose the optimal number of clusters K by looking for a "bend" in the plot of within-cluster variance.
B) Evaluate the accuracy of a classification model.
C) Initialize the cluster centroids.
D) Determine the learning rate for gradient descent.
  • 19. "Naive Bayes" classifiers are called "naive" because they:
A) Always have the lowest possible accuracy.
B) Make a strong (naive) assumption that all features are conditionally independent given the class label.
C) Do not use probability in their predictions.
D) Are very simple and cannot handle complex data.
  • 20. "Logistic Regression" is fundamentally a:
A) Regression algorithm for predicting continuous values.
B) Classification algorithm that models the probability of a binary outcome using a logistic function.
C) Clustering algorithm for grouping unlabeled data.
D) Dimensionality reduction technique.
  • 21. The output of a logistic regression model is a value between 0 and 1, which represents the:
A) Probability that the input belongs to a particular class.
B) Exact value of the target variable.
C) Distance to the decision boundary.
D) Number of features in the input.
  • 22. A "Random Forest" is an ensemble method that combines multiple:
A) Decision Trees to reduce overfitting and improve generalization.
B) K-NN models.
C) Support Vector Machines.
D) Linear Regression models.
  • 23. The "bagging" technique in a Random Forest helps to:
A) Reduce variance by training individual trees on random subsets of the data and averaging their results
B) Reduce bias by making trees more complex.
C) Perform feature extraction like PCA.
D) Increase the speed of a single decision tree.
  • 24. "Gradient Boosting" machines (e.g., XGBoost) are ensemble methods that:
A) Are exclusively used for unsupervised learning.
B) Do not require any parameter tuning.
C) Build all models independently and average them.
D) Build models sequentially, where each new model corrects the errors of the previous ones.
  • 25. The term "feature engineering" refers to:
A) The process of using domain knowledge to create new input features that make machine learning algorithms work better.
B) The automatic learning of features by a deep neural network.
C) The evaluation of a model's final performance.
D) The process of deleting all features from a dataset.
  • 26. "One-hot encoding" is a preprocessing technique used to:
A) Normalize continuous numerical features.
B) Cluster similar data points together.
C) Convert categorical variables into a binary (0/1) format that can be provided to ML algorithms.
D) Reduce the dimensionality of image data.
  • 27. "Feature scaling" (e.g., normalization or standardization) is often crucial for algorithms that:
A) Are used for clustering only.
B) Are based on distance calculations or gradient descent, such as SVM and Neural Networks.
C) Are used for association rule learning.
D) Are based on tree-based models like Decision Trees and Random Forests.
  • 28. The "curse of dimensionality" refers to the problem that:
A) Dimensionality reduction always improves model performance.
B) There are never enough features to train a good model.
C) All datasets should have as many features as possible.
D) As the number of features grows, the data becomes increasingly sparse, making it harder to find meaningful patterns.
  • 29. "Regularization" is a technique used to:
A) Speed up the training time of a model.
B) Increase the variance of a model.
C) Make models more complex to fit the training data better.
D) Prevent overfitting by adding a penalty term to the loss function that discourages complex models.
  • 30. L1 Regularization (Lasso) can often lead to:
A) Sparse models where the weights of less important features are driven to zero, effectively performing feature selection.
B) A decrease in model interpretability.
C) All features having non-zero weights.
D) Increased model complexity.
  • 31. "Hyperparameters" are:
A) The parameters that the model learns during training (e.g., weights in a neural network).
B) The output predictions of the model.
C) Configuration settings for the learning algorithm that are not learned from the data and must be set prior to training (e.g., learning rate, K in K-NN).
D) The input features of the model.
  • 32. The process of "Hyperparameter Tuning" involves:
A) Cleaning the raw data.
B) Deploying the final model.
C) Training the model's internal weights.
D) Searching for the best combination of hyperparameters that results in the best model performance.
  • 33. "Grid Search" is a common method for hyperparameter tuning that involves:
A) Exhaustively searching over a specified set of hyperparameter values.
B) Randomly sampling hyperparameter combinations from a distribution.
C) Using a separate neural network to predict the best hyperparameters.
D) Ignoring hyperparameters altogether.
  • 34. "Early Stopping" is a form of regularization that works by:
A) Halting the training process when performance on a validation set starts to degrade, indicating the onset of overfitting.
B) Using a very small learning rate.
C) Stopping the training after a fixed, very short number of epochs.
D) Starting the training process later than scheduled.
  • 35. A "Vanilla" neural network, also known as a Multilayer Perceptron (MLP), is typically composed of:
A) Recurrent layers for processing sequences.
B) Fully connected layers, where each neuron in one layer is connected to every neuron in the next layer.
C) Convolutional layers for processing images.
D) A single layer of neurons.
  • 36. The "softmax" activation function is commonly used in the output layer of a neural network for:
A) Multi-class classification problems, as it outputs a probability distribution over the possible classes.
B) Regression problems.
C) Unsupervised learning problems.
D) Binary classification problems.
  • 37. The "Adam" optimizer is an adaptive learning rate algorithm that is often preferred because it:
A) Does not require any hyperparameters.
B) Is only used for unsupervised learning.
C) Is guaranteed to find the global minimum for any function.
D) Combines the advantages of two other extensions of stochastic gradient descent, AdaGrad and RMSProp.
  • 38. "Batch Normalization" is a technique used to:
A) Replace the need for an activation function.
B) Increase the batch size during training.
C) Normalize the entire dataset before feeding it into the network.
D) Improve the stability and speed of neural network training by normalizing the inputs to each layer.
  • 39. The "confusion matrix" is a table that is used to describe the performance of a:
A) Dimensionality reduction technique's effectiveness.
B) Clustering algorithm's group assignments.
C) Regression model's accuracy.
D) Classification model on a set of test data for which the true values are known.
  • 40. In a confusion matrix, the "true positives" are the cases where:
A) The model correctly predicted the positive class.
B) The model incorrectly predicted the negative class.
C) The model correctly predicted the negative class.
D) The model incorrectly predicted the positive class.
  • 41. The problem of "imbalanced classes" occurs when:
A) The model is too complex for the data.
B) The features are not scaled properly.
C) One class in the training data has significantly more examples than another, which can bias the model.
D) The learning rate is set too high.
  • 42. A technique to address imbalanced classes is "SMOTE," which:
A) Deletes examples from the majority class at random.
B) Generates synthetic examples for the minority class to balance the dataset.
C) Ignores the minority class completely.
D) Combines all classes into one.
  • 43. "Reinforcement Learning" differs from supervised and unsupervised learning in that:
A) It learns by interacting with an environment and receiving rewards or penalties for actions, without a labeled dataset.
B) It is only used for clustering unlabeled data.
C) It is a simpler and less powerful approach.
D) It requires a fully labeled dataset for training.
  • 44. "Q-Learning" is a popular algorithm in reinforcement learning that learns:
A) A clustering of possible actions.
B) A policy that tells an agent what action to take under what circumstances by learning a value function.
C) The principal components of a state space.
D) A decision tree for classification.
  • 45. "Natural Language Processing" (NLP) often uses supervised learning for tasks like:
A) Sentiment analysis, where text is classified as positive, negative, or neutral.
B) Grouping similar news articles without labels.
C) Reducing the dimensionality of word vectors.
D) Generating new, original text without any input.
  • 46. "Word Embeddings" (like Word2Vec) are techniques that:
A) Are used only for image classification.
B) Represent words as dense vectors in a continuous space, capturing semantic meaning.
C) Represent words as simple one-hot encoded vectors.
D) Are a type of clustering algorithm.
  • 47. A "Generative Adversarial Network" (GAN) consists of two networks:
A) Two identical Convolutional Neural Networks.
B) A Generator and a Discriminator, which are trained in opposition to each other.
C) An Encoder and a Decoder for compression.
D) A single, large Regression network.
  • 48. The "Generator" in a GAN is responsible for:
A) Discriminating between real and fake data.
B) Classifying input images into categories.
C) Creating new, synthetic data that is indistinguishable from real data.
D) Reducing the dimensionality of the input.
  • 49. The "Discriminator" in a GAN is essentially a:
A) Dimensionality reduction technique.
B) Clustering algorithm grouping similar images.
C) Regression model predicting a continuous value.
D) Binary classifier that tries to correctly label data as real (from the dataset) or fake (from the generator).
Created with That Quiz — a math test site for students of all grade levels.