Complete Subject Question Bank
Generative AI primarily aims to:
Generative AI creates new data samples (text, images, audio, etc.) by learning the underlying statistical distribution of training data, rather than just classifying or predicting existing patterns.
Which of these is NOT typically produced by generative models?
The correct answer is: Labels for classification tasks.
Learning a data distribution p(x) allows a model to:
The correct answer is: Sample new plausible data points.
Which statement best contrasts discriminative and generative models?
Discriminative models learn p(y|x) — the conditional probability of a label given input. Generative models learn p(x) or the joint p(x,y), enabling them to sample new data.
Which is a common application of generative AI?
The correct answer is: Data augmentation.
Generative AI that helps artists by suggesting concepts is an example of:
The correct answer is: Creative tool / collaborator.
A model that learns to produce plausible human faces has learned approximations of:
The correct answer is: p(x).
Which capability is NOT typical of generative models?
The correct answer is: Guaranteed unbiased outputs.
Which of the following is a risk specifically mentioned for generative AI?
The correct answer is: Deepfakes and misinformation.
Text generation, image generation and music generation are examples of:
The correct answer is: Generative tasks.
Why is learning a distribution more powerful than memorizing examples?
Learning a data distribution allows a model to generate novel but plausible samples, unlike mere memorization which can only reproduce training examples.
Which of these is a direct benefit of synthetic data?
Synthetic data generated by generative models supplements real datasets, especially where real data is scarce, sensitive, or expensive to collect.
A generative model that outputs new molecules would be used in:
The correct answer is: Drug discovery.
Which term best describes creating content that resembles training data but is not identical?
The correct answer is: Generalization / generation.
Generative AI differs from classification because it focuses on:
The correct answer is: Creating samples.
Gaussian Mixture Models (GMMs) are examples of:
The correct answer is: Classical probabilistic models.
Hidden Markov Models (HMMs) are especially useful for:
The correct answer is: Sequence modeling like speech.
Which breakthrough enabled deep generative models to scale in the 2010s?
The correct answer is: Larger datasets and GPUs.
The VAE paper was published by:
The correct answer is: Kingma & Welling.
GANs introduced the idea of:
GANs (Generative Adversarial Networks), introduced by Goodfellow et al. (2014), set up an adversarial game between a generator (creates fake samples) and a discriminator (distinguishes real from fake).
“Attention Is All You Need” introduced:
The correct answer is: The Transformer architecture.
Which year is commonly associated with the original GAN paper?
The correct answer is: 2014.
Transformers replaced recurrence with:
The correct answer is: Self-attention.
Which early model is probabilistic and explicitly models density?
The correct answer is: HMM.
VAEs are celebrated for:
VAEs (Variational Autoencoders) are valued for stable training and smooth, continuous latent spaces that support interpolation and controlled generation.
Which model family is known as “implicit density”?
The correct answer is: GANs.
The rise of LLMs was enabled by:
Large Language Models (LLMs) are massive transformer-based models trained on vast text corpora. Scaling both model size and data enabled breakthroughs in natural language generation.
Which contribution is attributed to Goodfellow et al.?
The correct answer is: GAN.
CycleGAN is notable because it can:
CycleGAN performs unpaired image-to-image translation using cycle-consistency loss, allowing domain transfer (e.g., photos to paintings) without paired training examples.
Which development made sampling from complex distributions more practical?
The correct answer is: Normalizing Flows and invertible transforms.
Machine Learning systems typically start with:
The correct answer is: Data collection.
A perceptron computes:
A perceptron computes a weighted sum of its inputs plus a bias, then passes the result through an activation function to produce an output.
Which activation is most used to mitigate vanishing gradients?
The correct answer is: ReLU.
Backpropagation uses which calculus tool to compute gradients?
The correct answer is: Chain rule.
Gradient descent updates weights to:
Gradient descent iteratively updates model weights by moving in the direction that reduces the loss, using the gradient (partial derivatives) of the loss with respect to each weight.
Deep networks learn hierarchical features—early layers learn:
The correct answer is: Low-level features like edges.
Overfitting happens when the model:
Overfitting occurs when a model memorizes training data rather than learning generalizable patterns, resulting in poor performance on unseen data.
Which is NOT an optimizer for neural networks?
The correct answer is: KNN.
Dropout is used to:
Dropout is a regularization technique that randomly deactivates neurons during training, preventing co-adaptation and reducing overfitting.
Cross-entropy loss is most often used for:
The correct answer is: Classification tasks.
A bias term in a neuron is analogous to:
The correct answer is: Intercept in linear models.
Batch normalization primarily helps by:
Batch normalization normalizes layer inputs during training, reducing internal covariate shift, which stabilizes and accelerates training.
Which layer type is most common in image models?
The correct answer is: Convolutional.
Transfer learning helps when:
Transfer learning leverages pretrained model features from large datasets, enabling high performance even when task-specific labeled data is limited.
An epoch means:
An epoch is one complete pass through the entire training dataset. Multiple epochs allow the model to iteratively refine its weights.
Explicit density models provide:
The correct answer is: A formula for p(x).
Normalizing Flows are an example of:
Normalizing Flows are tractable explicit generative models that use a series of invertible transformations to map a simple distribution (e.g., Gaussian) to a complex data distribution.
Which model family does a VAE belong to?
The correct answer is: Approximate explicit models.
Implicit models are characterized by:
Implicit generative models (like GANs) generate samples directly without defining an explicit probability density function, making likelihood evaluation difficult.
Which is a tractable explicit model?
The correct answer is: Basic Gaussian, some Normalizing Flows.
Which approach approximates likelihoods using ELBO?
ELBO (Evidence Lower BOund) is the VAE training objective. Maximizing ELBO is equivalent to maximizing a lower bound on the data log-likelihood while regularizing the latent space.
Sampling from an implicit model requires:
Implicit generative models (like GANs) generate samples directly without defining an explicit probability density function, making likelihood evaluation difficult.
Which model gives exact likelihoods (when tractable)?
The correct answer is: Normalizing Flows.
Which is an advantage of explicit density models?
The correct answer is: They can evaluate likelihoods for samples.
An example of implicit modeling is:
Implicit generative models (like GANs) generate samples directly without defining an explicit probability density function, making likelihood evaluation difficult.
Which family is well-suited to likelihood-based anomaly detection?
The correct answer is: Explicit density models.
ELBO stands for:
ELBO (Evidence Lower BOund) is the VAE training objective. Maximizing ELBO is equivalent to maximizing a lower bound on the data log-likelihood while regularizing the latent space.
Which is a limitation of implicit models?
Implicit generative models (like GANs) generate samples directly without defining an explicit probability density function, making likelihood evaluation difficult.
Tractable models are useful because they allow:
The correct answer is: Direct density evaluation and likelihood comparisons.
VAEs, GANs and Flows are examples of:
The correct answer is: Different approaches to generative modeling.
A standard autoencoder differs from a VAE because a VAE:
The correct answer is: Encodes inputs as distributions (μ,σ).
The reparameterization trick allows:
The reparameterization trick moves randomness (ε ~ N(0,I)) outside the network path (z = μ + σ⊙ε), enabling gradients to flow through the sampling operation during backpropagation.
VAE loss includes reconstruction loss plus:
The correct answer is: KL divergence between encoder distribution and prior.
Sampling z = μ + σ ⊙ ε moves randomness to:
The correct answer is: Outside network path to allow backprop.
A common prior used in VAEs is:
The correct answer is: Standard normal N(0,1).
VAEs typically produce images that are:
The correct answer is: More blurry than GANs.
KL term in VAE encourages:
The KL divergence term in VAE loss regularizes the encoder by encouraging the latent code distribution to stay close to the prior (typically a standard Gaussian).
Advantages of VAEs include:
VAEs (Variational Autoencoders) are valued for stable training and smooth, continuous latent spaces that support interpolation and controlled generation.
Which is a limitation of VAEs?
The correct answer is: Assume a simple latent distribution like Gaussian.
In VAEs, the decoder maps from:
The correct answer is: Latent z to data x̂.
ELBO maximization is equivalent to:
ELBO (Evidence Lower BOund) is the VAE training objective. Maximizing ELBO is equivalent to maximizing a lower bound on the data log-likelihood while regularizing the latent space.
Choosing a too-large KL weight will typically:
The correct answer is: Reduce reconstruction quality to enforce structure.
VAEs are useful for:
The correct answer is: Latent interpolation and data generation.
A well-structured latent space allows:
The correct answer is: Smooth interpolation between samples.
Which is true about VAE encoder output?
The correct answer is: Vectors of means and log-variances.
GANs train by:
The correct answer is: An adversarial game between generator and discriminator.
Mode collapse means the generator:
Mode collapse occurs when a GAN generator produces only a limited variety of outputs, ignoring parts of the real data distribution. It is a training instability common in GANs.
If discriminator becomes too strong early, the generator may suffer from:
The correct answer is: Vanishing gradients.
DCGAN stands for a GAN variant optimized for:
DCGAN (Deep Convolutional GAN) applies convolutional architectures to GANs, replacing fully connected layers with strided and fractionally strided convolutions for stable image generation.
StyleGAN introduced:
GANs (Generative Adversarial Networks), introduced by Goodfellow et al. (2014), set up an adversarial game between a generator (creates fake samples) and a discriminator (distinguishes real from fake).
CycleGAN is primarily used for:
CycleGAN performs unpaired image-to-image translation using cycle-consistency loss, allowing domain transfer (e.g., photos to paintings) without paired training examples.
The generator maps noise z to:
The correct answer is: A synthetic data sample G(z).
Adversarial loss tries to make discriminator output for generated samples:
The correct answer is: Close to 1 (real).
A typical fix for mode collapse is:
Mode collapse occurs when a GAN generator produces only a limited variety of outputs, ignoring parts of the real data distribution. It is a training instability common in GANs.
GANs are categorized as:
The correct answer is: Implicit density models.
Which is a common component of GAN training to stabilize it?
The correct answer is: Batch normalization and careful learning rates.
Which GAN variant gives control over style at multiple scales?
The correct answer is: StyleGAN.
Discriminator\'s role is to:
The correct answer is: Classify inputs as real or fake.
GAN training objective is best described as:
The correct answer is: Minimax optimization.
A challenge when training GANs is:
The correct answer is: Sensitivity to hyperparameters and oscillations.
RNNs maintain memory via:
The correct answer is: A hidden state passed across time steps.
Vanishing gradient makes it hard to learn:
The correct answer is: Long-range dependencies in sequences.
LSTM introduces which mechanism to control information?
The correct answer is: Gating (forget, input, output gates).
GRU differs from LSTM by:
The correct answer is: Being simpler with fewer gates.
Sequence generation can be performed by training models to predict:
The correct answer is: Next token given previous tokens.
Teacher forcing is a training technique where:
The correct answer is: Model is given ground-truth previous tokens during training.
Which is a limitation of RNNs compared to Transformers?
Transformers process all tokens in parallel using self-attention (unlike sequential RNNs), making them faster to train and better at capturing long-range dependencies.
RNN backpropagation through time requires:
The correct answer is: Unrolling the network across time steps.
Applications of sequence models include:
The correct answer is: Time-series forecasting and language modeling.
Beam search is used in generation to:
The correct answer is: Find top-k likely sequences approximately.
Scheduled sampling mixes:
The correct answer is: Model predictions and ground-truth tokens during training.
An RNN cell output depends on:
The correct answer is: Current input and previous hidden state.
Which cell is computationally lighter?
The correct answer is: GRU.
Sequence-to-sequence (seq2seq) models typically have:
The correct answer is: An encoder and a decoder.
Teacher forcing can lead to:
The correct answer is: Exposure bias at inference time.
Self-attention allows tokens to:
The correct answer is: Attend to all tokens in the sequence.
Positional encoding provides:
The correct answer is: A sense of token order to the model.
Multi-head attention helps by:
The correct answer is: Learning multiple types of relationships in parallel.
Transformers are more parallelizable than RNNs because:
Transformers process all tokens in parallel using self-attention (unlike sequential RNNs), making them faster to train and better at capturing long-range dependencies.
Decoder-only models like GPT are trained to:
The correct answer is: Predict next-token in an autoregressive fashion.
BERT is primarily used for:
The correct answer is: Understanding tasks like classification and QA.
Transformer encoder blocks include:
The correct answer is: Self-attention + feed-forward layers.
Masked self-attention prevents a token from attending to:
The correct answer is: Future tokens during autoregressive generation.
Scaling transformers (more params + data) led to:
The correct answer is: Emergence of strong few/zero-shot capabilities.
A positional encoding can be:
The correct answer is: Learned or fixed sinusoidal.
Which model is decoder-only?
The correct answer is: GPT.
Attention scores are computed from queries, keys and values using:
Transformer attention computes similarity scores via scaled dot-products of queries (Q) and keys (K), normalizes with softmax, then produces a weighted sum of values (V).
Transformer attention is typically multi-head to:
The correct answer is: Capture different relations using different projection subspaces.
Encoder-decoder transformers are commonly used for:
Encoder-decoder transformers encode source sequences and condition the decoder for tasks like machine translation and conditional text generation.
Which is an advantage of Transformers over RNNs?
Transformers process all tokens in parallel using self-attention (unlike sequential RNNs), making them faster to train and better at capturing long-range dependencies.
Generative AI in healthcare can help by:
The correct answer is: Generating synthetic medical images for augmentation.
In drug discovery generative models can:
The correct answer is: Design novel molecular structures.
A major ethical risk of generative AI is:
The correct answer is: Misinformation and deepfakes.
Which practice helps reduce model bias?
Model bias can be reduced through careful data curation, diverse training data, bias audits, and fairness-aware evaluation metrics.
Copyright concerns arise because models may:
Generative models trained on large corpora may reproduce or remix copyrighted content from training data, raising intellectual property and fair-use legal questions.
Responsible deployment includes:
The correct answer is: Transparency, monitoring and human oversight.
Which industry widely uses generative AI for creative media?
The correct answer is: Advertising, entertainment and design.
Data augmentation via generative models mainly helps to:
The correct answer is: Increase effective data diversity.
Regulation and policy are needed because:
The correct answer is: Harm can be widespread and hard to control.
A practical mitigation for deepfakes is:
Deepfakes are AI-generated synthetic media (video/audio) that can be used to spread misinformation. Detection models and provenance tracking are key mitigation strategies.
Multi-modal generative models combine:
Multi-modal generative models process and generate across multiple data types — text, images, audio — enabling richer cross-modal understanding and generation.
Job displacement risk suggests:
The correct answer is: Workforce transitions and reskilling policies are important.
Which direction is important for future generative AI?
The correct answer is: More controllable, reliable and multimodal systems.
Intellectual property questions involve:
The correct answer is: Who owns AI-generated content and training data usage.
When deploying a generative model for production, you should:
The correct answer is: Implement monitoring, rate-limits, and human-in-the-loop.
Based on our question bank analysis, master these concepts to score high in Generative AI.
"Focus on understanding the logic behind pseudocode loops and selection statements, as they form the bulk of technical assessments."