Back to Dashboard

Generative AI

Complete Subject Question Bank

Part 1 — Introduction to Generative AI#1

Generative AI primarily aims to:

A
Classify inputs
B
Predict stock prices only
C
Create new data samples similar to training data
D
Only compress data

Generative AI creates new data samples (text, images, audio, etc.) by learning the underlying statistical distribution of training data, rather than just classifying or predicting existing patterns.

Part 1 — Introduction to Generative AI#2

Which of these is NOT typically produced by generative models?

A
Images
B
Labels for classification tasks
C
Music
D
Text

The correct answer is: Labels for classification tasks.

Part 1 — Introduction to Generative AI#3

Learning a data distribution p(x) allows a model to:

A
Compute p(y|x)
B
Sample new plausible data points
C
Only memorize training data
D
Always achieve perfect reconstruction

The correct answer is: Sample new plausible data points.

Part 1 — Introduction to Generative AI#4

Which statement best contrasts discriminative and generative models?

A
Discriminative models learn p(x), generative learn p(y|x)
B
Discriminative models learn p(y|x), generative learn p(x) or p(x,y)
C
They are identical
D
Generative models cannot be used for classification

Discriminative models learn p(y|x) — the conditional probability of a label given input. Generative models learn p(x) or the joint p(x,y), enabling them to sample new data.

Part 1 — Introduction to Generative AI#5

Which is a common application of generative AI?

A
Data augmentation
B
Direct OS kernel development
C
Manufacturing hardware
D
Network routing protocols

The correct answer is: Data augmentation.

Part 1 — Introduction to Generative AI#6

Generative AI that helps artists by suggesting concepts is an example of:

A
Autonomous replacement
B
Creative tool / collaborator
C
Discriminative learning
D
Feature extraction only

The correct answer is: Creative tool / collaborator.

Part 1 — Introduction to Generative AI#7

A model that learns to produce plausible human faces has learned approximations of:

A
p(y|x)
B
p(x)
C
loss landscapes only
D
an SVM decision boundary

The correct answer is: p(x).

Part 1 — Introduction to Generative AI#8

Which capability is NOT typical of generative models?

A
Simulation for training
B
Creating synthetic data
C
Guaranteed unbiased outputs
D
Art assistance

The correct answer is: Guaranteed unbiased outputs.

Part 1 — Introduction to Generative AI#9

Which of the following is a risk specifically mentioned for generative AI?

A
Deepfakes and misinformation
B
Faster compilers
C
Lower memory usage
D
Stable training always

The correct answer is: Deepfakes and misinformation.

Part 1 — Introduction to Generative AI#10

Text generation, image generation and music generation are examples of:

A
Discriminative tasks
B
Supervised regression
C
Generative tasks
D
Clustering tasks

The correct answer is: Generative tasks.

Part 1 — Introduction to Generative AI#11

Why is learning a distribution more powerful than memorizing examples?

A
It guarantees exact copies
B
It allows sampling novel but plausible items
C
It reduces compute to zero
D
It avoids any bias automatically

Learning a data distribution allows a model to generate novel but plausible samples, unlike mere memorization which can only reproduce training examples.

Part 1 — Introduction to Generative AI#12

Which of these is a direct benefit of synthetic data?

A
Reduces need for any validation
B
Helps train models where real data is scarce
C
Removes need for GPUs
D
Ensures perfect model fairness

Synthetic data generated by generative models supplements real datasets, especially where real data is scarce, sensitive, or expensive to collect.

Part 1 — Introduction to Generative AI#13

A generative model that outputs new molecules would be used in:

A
Drug discovery
B
Network security
C
Compiler optimizations
D
Operating system design

The correct answer is: Drug discovery.

Part 1 — Introduction to Generative AI#14

Which term best describes creating content that resembles training data but is not identical?

A
Overfitting
B
Generalization / generation
C
Discrimination
D
Regularization

The correct answer is: Generalization / generation.

Part 1 — Introduction to Generative AI#15

Generative AI differs from classification because it focuses on:

A
Label boundaries
B
Creating samples
C
Only supervised labels
D
Feature scaling

The correct answer is: Creating samples.

Part 2 — History & Foundations#16

Gaussian Mixture Models (GMMs) are examples of:

A
Implicit models
B
Classical probabilistic models
C
Transformer-based models
D
Adversarial networks

The correct answer is: Classical probabilistic models.

Part 2 — History & Foundations#17

Hidden Markov Models (HMMs) are especially useful for:

A
Image synthesis
B
Sequence modeling like speech
C
Style transfer for images
D
Transformer pretraining

The correct answer is: Sequence modeling like speech.

Part 2 — History & Foundations#18

Which breakthrough enabled deep generative models to scale in the 2010s?

A
Larger datasets and GPUs
B
Smaller datasets
C
Removal of backpropagation
D
Replacing neural nets with SVMs

The correct answer is: Larger datasets and GPUs.

Part 2 — History & Foundations#19

The VAE paper was published by:

A
Goodfellow et al.
B
Kingma & Welling
C
Vaswani et al.
D
Hinton alone

The correct answer is: Kingma & Welling.

Part 2 — History & Foundations#20

GANs introduced the idea of:

A
Autoencoding
B
A generator vs a discriminator adversarial training
C
Self-attention
D
Reinforcement learning

GANs (Generative Adversarial Networks), introduced by Goodfellow et al. (2014), set up an adversarial game between a generator (creates fake samples) and a discriminator (distinguishes real from fake).

Part 2 — History & Foundations#21

“Attention Is All You Need” introduced:

A
The GAN architecture
B
The Transformer architecture
C
LSTMs
D
Variational inference

The correct answer is: The Transformer architecture.

Part 2 — History & Foundations#22

Which year is commonly associated with the original GAN paper?

A
2014
B
2005
C
2018
D
1999

The correct answer is: 2014.

Part 2 — History & Foundations#23

Transformers replaced recurrence with:

A
Convolutions
B
Self-attention
C
Markov chains
D
Decision trees

The correct answer is: Self-attention.

Part 2 — History & Foundations#24

Which early model is probabilistic and explicitly models density?

A
HMM
B
GAN
C
DCGAN
D
StyleGAN

The correct answer is: HMM.

Part 2 — History & Foundations#25

VAEs are celebrated for:

A
Perfect photorealism
B
Stable training and continuous latent spaces
C
No need for optimization
D
Replacing discriminative models

VAEs (Variational Autoencoders) are valued for stable training and smooth, continuous latent spaces that support interpolation and controlled generation.

Part 2 — History & Foundations#26

Which model family is known as “implicit density”?

A
VAEs
B
Normalizing Flows
C
GANs
D
GMMs

The correct answer is: GANs.

Part 2 — History & Foundations#27

The rise of LLMs was enabled by:

A
Transformer scaling and massive text corpora
B
Only better activation functions
C
Elimination of GPUs
D
Decline of datasets

Large Language Models (LLMs) are massive transformer-based models trained on vast text corpora. Scaling both model size and data enabled breakthroughs in natural language generation.

Part 2 — History & Foundations#28

Which contribution is attributed to Goodfellow et al.?

A
VAE
B
GAN
C
Transformer
D
LSTM

The correct answer is: GAN.

Part 2 — History & Foundations#29

CycleGAN is notable because it can:

A
Translate between image domains without paired data
B
Generate audio from text
C
Train without a discriminator
D
Solve PDEs analytically

CycleGAN performs unpaired image-to-image translation using cycle-consistency loss, allowing domain transfer (e.g., photos to paintings) without paired training examples.

Part 2 — History & Foundations#30

Which development made sampling from complex distributions more practical?

A
Normalizing Flows and invertible transforms
B
SVMs
C
Decision trees
D
Naive Bayes

The correct answer is: Normalizing Flows and invertible transforms.

Part 3 — ML & Neural Network Fundamentals#31

Machine Learning systems typically start with:

A
Model deployment before data
B
Data collection
C
Hyperparameter search only
D
No data

The correct answer is: Data collection.

Part 3 — ML & Neural Network Fundamentals#32

A perceptron computes:

A
A nonlinear combination without weights
B
A weighted sum plus activation
C
Only biases
D
Only gradients

A perceptron computes a weighted sum of its inputs plus a bias, then passes the result through an activation function to produce an output.

Part 3 — ML & Neural Network Fundamentals#33

Which activation is most used to mitigate vanishing gradients?

A
Sigmoid
B
Tanh
C
ReLU
D
Linear

The correct answer is: ReLU.

Part 3 — ML & Neural Network Fundamentals#34

Backpropagation uses which calculus tool to compute gradients?

A
Integral calculus
B
Chain rule
C
Taylor series
D
Fourier transform

The correct answer is: Chain rule.

Part 3 — ML & Neural Network Fundamentals#35

Gradient descent updates weights to:

A
Maximize loss
B
Minimize loss
C
Randomize weights
D
Always set weights to zero

Gradient descent iteratively updates model weights by moving in the direction that reduces the loss, using the gradient (partial derivatives) of the loss with respect to each weight.

Part 3 — ML & Neural Network Fundamentals#36

Deep networks learn hierarchical features—early layers learn:

A
High-level concepts only
B
Low-level features like edges
C
Model hyperparameters
D
Loss functions

The correct answer is: Low-level features like edges.

Part 3 — ML & Neural Network Fundamentals#37

Overfitting happens when the model:

A
Generalizes well
B
Memorizes training data and performs poorly on new data
C
Never learns
D
Has too few parameters

Overfitting occurs when a model memorizes training data rather than learning generalizable patterns, resulting in poor performance on unseen data.

Part 3 — ML & Neural Network Fundamentals#38

Which is NOT an optimizer for neural networks?

A
SGD
B
Adam
C
RMSProp
D
KNN

The correct answer is: KNN.

Part 3 — ML & Neural Network Fundamentals#39

Dropout is used to:

A
Improve inference speed
B
Regularize and reduce overfitting
C
Increase training dataset size
D
Convert supervised to unsupervised learning

Dropout is a regularization technique that randomly deactivates neurons during training, preventing co-adaptation and reducing overfitting.

Part 3 — ML & Neural Network Fundamentals#40

Cross-entropy loss is most often used for:

A
Regression
B
Classification tasks
C
Clustering
D
Feature selection

The correct answer is: Classification tasks.

Part 3 — ML & Neural Network Fundamentals#41

A bias term in a neuron is analogous to:

A
Slope of a line only
B
Intercept in linear models
C
Activation function
D
Learning rate

The correct answer is: Intercept in linear models.

Part 3 — ML & Neural Network Fundamentals#42

Batch normalization primarily helps by:

A
Removing the need for activation functions
B
Stabilizing and speeding up training
C
Making models deterministic
D
Creating new data

Batch normalization normalizes layer inputs during training, reducing internal covariate shift, which stabilizes and accelerates training.

Part 3 — ML & Neural Network Fundamentals#43

Which layer type is most common in image models?

A
Dense only
B
Convolutional
C
RNN
D
k-NN layer

The correct answer is: Convolutional.

Part 3 — ML & Neural Network Fundamentals#44

Transfer learning helps when:

A
You have infinite labeled data
B
Data is limited and pretrained features help
C
You only use SVMs
D
You remove all hidden layers

Transfer learning leverages pretrained model features from large datasets, enabling high performance even when task-specific labeled data is limited.

Part 3 — ML & Neural Network Fundamentals#45

An epoch means:

A
One parameter update
B
One full pass over the training dataset
C
A single batch
D
Final test evaluation

An epoch is one complete pass through the entire training dataset. Multiple epochs allow the model to iteratively refine its weights.

Part 4 — Generative Model Taxonomy#46

Explicit density models provide:

A
A formula for p(x)
B
Only samples without density
C
No sampling mechanism
D
Only discriminative outputs

The correct answer is: A formula for p(x).

Part 4 — Generative Model Taxonomy#47

Normalizing Flows are an example of:

A
Implicit models
B
Tractable explicit models using invertible transforms
C
GAN variants
D
RNNs

Normalizing Flows are tractable explicit generative models that use a series of invertible transformations to map a simple distribution (e.g., Gaussian) to a complex data distribution.

Part 4 — Generative Model Taxonomy#48

Which model family does a VAE belong to?

A
Implicit density models
B
Approximate explicit models
C
Reinforcement learning
D
Rule-based systems

The correct answer is: Approximate explicit models.

Part 4 — Generative Model Taxonomy#49

Implicit models are characterized by:

A
Providing closed-form p(x)
B
Direct sampling without explicit probability
C
Always having tractable likelihoods
D
Using Gaussian mixtures only

Implicit generative models (like GANs) generate samples directly without defining an explicit probability density function, making likelihood evaluation difficult.

Part 4 — Generative Model Taxonomy#50

Which is a tractable explicit model?

A
Basic Gaussian, some Normalizing Flows
B
GANs
C
VAEs
D
CycleGANs

The correct answer is: Basic Gaussian, some Normalizing Flows.

Part 4 — Generative Model Taxonomy#51

Which approach approximates likelihoods using ELBO?

A
GANs
B
VAEs
C
HMMs
D
SVMs

ELBO (Evidence Lower BOund) is the VAE training objective. Maximizing ELBO is equivalent to maximizing a lower bound on the data log-likelihood while regularizing the latent space.

Part 4 — Generative Model Taxonomy#52

Sampling from an implicit model requires:

A
Computing p(x) directly
B
Passing noise through a generator network
C
Solving an inverse problem analytically
D
Closed-form integrals

Implicit generative models (like GANs) generate samples directly without defining an explicit probability density function, making likelihood evaluation difficult.

Part 4 — Generative Model Taxonomy#53

Which model gives exact likelihoods (when tractable)?

A
Normalizing Flows
B
GANs
C
Standard VAEs
D
Implicit GANs

The correct answer is: Normalizing Flows.

Part 4 — Generative Model Taxonomy#54

Which is an advantage of explicit density models?

A
Always easier to train
B
They can evaluate likelihoods for samples
C
Fewer parameters always
D
No assumptions about data

The correct answer is: They can evaluate likelihoods for samples.

Part 4 — Generative Model Taxonomy#55

An example of implicit modeling is:

A
Variational approximation
B
Adversarial training
C
Tractable inversion
D
Bayesian posterior calculations

Implicit generative models (like GANs) generate samples directly without defining an explicit probability density function, making likelihood evaluation difficult.

Part 4 — Generative Model Taxonomy#56

Which family is well-suited to likelihood-based anomaly detection?

A
GANs (implicit)
B
Explicit density models
C
Only discriminative models
D
Rule engines

The correct answer is: Explicit density models.

Part 4 — Generative Model Taxonomy#57

ELBO stands for:

A
Evidence Lower Bound
B
Exact Latent Bayesian Objective
C
Enhanced Learning Bound
D
Eigenvalue Lower Bound

ELBO (Evidence Lower BOund) is the VAE training objective. Maximizing ELBO is equivalent to maximizing a lower bound on the data log-likelihood while regularizing the latent space.

Part 4 — Generative Model Taxonomy#58

Which is a limitation of implicit models?

A
No sampling
B
No way to compute likelihood easily
C
Always slow at generation
D
Always better reconstruction

Implicit generative models (like GANs) generate samples directly without defining an explicit probability density function, making likelihood evaluation difficult.

Part 4 — Generative Model Taxonomy#59

Tractable models are useful because they allow:

A
Direct density evaluation and likelihood comparisons
B
No need for training
C
Guaranteed perfect samples
D
No hyperparameters

The correct answer is: Direct density evaluation and likelihood comparisons.

Part 4 — Generative Model Taxonomy#60

VAEs, GANs and Flows are examples of:

A
Only discriminative models
B
Different approaches to generative modeling
C
Hardware components
D
File formats

The correct answer is: Different approaches to generative modeling.

Part 5 — Variational Autoencoders (VAEs)#61

A standard autoencoder differs from a VAE because a VAE:

A
Is deterministic
B
Encodes inputs as distributions (μ,σ)
C
Has no decoder
D
Never reconstructs inputs

The correct answer is: Encodes inputs as distributions (μ,σ).

Part 5 — Variational Autoencoders (VAEs)#62

The reparameterization trick allows:

A
Sampling without blocking gradients
B
Exact analytical integrals always
C
Avoiding any randomness
D
Using RNNs instead

The reparameterization trick moves randomness (ε ~ N(0,I)) outside the network path (z = μ + σ⊙ε), enabling gradients to flow through the sampling operation during backpropagation.

Part 5 — Variational Autoencoders (VAEs)#63

VAE loss includes reconstruction loss plus:

A
Cross-entropy only
B
KL divergence between encoder distribution and prior
C
Adversarial loss
D
No other term

The correct answer is: KL divergence between encoder distribution and prior.

Part 5 — Variational Autoencoders (VAEs)#64

Sampling z = μ + σ ⊙ ε moves randomness to:

A
Inside weights update
B
Outside network path to allow backprop
C
Always deterministic path
D
The decoder only

The correct answer is: Outside network path to allow backprop.

Part 5 — Variational Autoencoders (VAEs)#65

A common prior used in VAEs is:

A
Uniform on [0,1]
B
Standard normal N(0,1)
C
Dirichlet with many components
D
No prior

The correct answer is: Standard normal N(0,1).

Part 5 — Variational Autoencoders (VAEs)#66

VAEs typically produce images that are:

A
Sharper than GANs
B
More blurry than GANs
C
Identical to training images
D
Only black and white

The correct answer is: More blurry than GANs.

Part 5 — Variational Autoencoders (VAEs)#67

KL term in VAE encourages:

A
Latent codes to match prior
B
Latent codes to diverge
C
Higher overfitting
D
Removing decoder

The KL divergence term in VAE loss regularizes the encoder by encouraging the latent code distribution to stay close to the prior (typically a standard Gaussian).

Part 5 — Variational Autoencoders (VAEs)#68

Advantages of VAEs include:

A
Stability in training and meaningful latent space
B
Perfect photorealism
C
No hyperparameters
D
No need for decoder

VAEs (Variational Autoencoders) are valued for stable training and smooth, continuous latent spaces that support interpolation and controlled generation.

Part 5 — Variational Autoencoders (VAEs)#69

Which is a limitation of VAEs?

A
Always produce mode collapse
B
Assume a simple latent distribution like Gaussian
C
Cannot be combined with transformers
D
No sampling possible

The correct answer is: Assume a simple latent distribution like Gaussian.

Part 5 — Variational Autoencoders (VAEs)#70

In VAEs, the decoder maps from:

A
Input x to μ
B
Latent z to data x̂
C
Gradient to loss
D
Weights to biases

The correct answer is: Latent z to data x̂.

Part 5 — Variational Autoencoders (VAEs)#71

ELBO maximization is equivalent to:

A
Minimizing reconstruction only
B
Maximizing a lower bound on data likelihood
C
Training discriminator
D
Solving linear equations only

ELBO (Evidence Lower BOund) is the VAE training objective. Maximizing ELBO is equivalent to maximizing a lower bound on the data log-likelihood while regularizing the latent space.

Part 5 — Variational Autoencoders (VAEs)#72

Choosing a too-large KL weight will typically:

A
Reduce reconstruction quality to enforce structure
B
Improve sharpness always
C
Remove regularization
D
Break reparameterization trick

The correct answer is: Reduce reconstruction quality to enforce structure.

Part 5 — Variational Autoencoders (VAEs)#73

VAEs are useful for:

A
Latent interpolation and data generation
B
Only classification tasks
C
Solving differential equations analytically
D
Database indexing

The correct answer is: Latent interpolation and data generation.

Part 5 — Variational Autoencoders (VAEs)#74

A well-structured latent space allows:

A
Arbitrary discontinuous jumps
B
Smooth interpolation between samples
C
Guaranteed photorealism
D
Avoiding all biases

The correct answer is: Smooth interpolation between samples.

Part 5 — Variational Autoencoders (VAEs)#75

Which is true about VAE encoder output?

A
A single scalar per input
B
Vectors of means and log-variances
C
No parameters
D
Only deterministic codes

The correct answer is: Vectors of means and log-variances.

Part 6 — Generative Adversarial Networks (GANs)#76

GANs train by:

A
Supervised regression only
B
An adversarial game between generator and discriminator
C
Maximizing ELBO
D
Using RNNs only

The correct answer is: An adversarial game between generator and discriminator.

Part 6 — Generative Adversarial Networks (GANs)#77

Mode collapse means the generator:

A
Outputs diverse samples
B
Produces limited types of outputs
C
Always wins the game
D
Does not use noise

Mode collapse occurs when a GAN generator produces only a limited variety of outputs, ignoring parts of the real data distribution. It is a training instability common in GANs.

Part 6 — Generative Adversarial Networks (GANs)#78

If discriminator becomes too strong early, the generator may suffer from:

A
Vanishing gradients
B
Perfect convergence
C
Faster sampling
D
Guaranteed diversity

The correct answer is: Vanishing gradients.

Part 6 — Generative Adversarial Networks (GANs)#79

DCGAN stands for a GAN variant optimized for:

A
Text generation
B
Image generation using convolutional architectures
C
Tabular data only
D
Audio synthesis only

DCGAN (Deep Convolutional GAN) applies convolutional architectures to GANs, replacing fully connected layers with strided and fractionally strided convolutions for stable image generation.

Part 6 — Generative Adversarial Networks (GANs)#80

StyleGAN introduced:

A
A style-based generator controlling details at different levels
B
Elimination of discriminator
C
RNN encoders
D
Exact likelihood computation

GANs (Generative Adversarial Networks), introduced by Goodfellow et al. (2014), set up an adversarial game between a generator (creates fake samples) and a discriminator (distinguishes real from fake).

Part 6 — Generative Adversarial Networks (GANs)#81

CycleGAN is primarily used for:

A
Unpaired image-to-image translation
B
Conditional text generation
C
Improving VAEs
D
Speech recognition

CycleGAN performs unpaired image-to-image translation using cycle-consistency loss, allowing domain transfer (e.g., photos to paintings) without paired training examples.

Part 6 — Generative Adversarial Networks (GANs)#82

The generator maps noise z to:

A
A probability density formula
B
A synthetic data sample G(z)
C
A true data point from the dataset
D
A loss value

The correct answer is: A synthetic data sample G(z).

Part 6 — Generative Adversarial Networks (GANs)#83

Adversarial loss tries to make discriminator output for generated samples:

A
Close to 0 (fake)
B
Close to 1 (real)
C
Exactly -1
D
Undefined

The correct answer is: Close to 1 (real).

Part 6 — Generative Adversarial Networks (GANs)#84

A typical fix for mode collapse is:

A
Careful architecture choices, regularization, or alternative losses
B
Deleting the discriminator entirely
C
Using smaller batches only
D
No training at all

Mode collapse occurs when a GAN generator produces only a limited variety of outputs, ignoring parts of the real data distribution. It is a training instability common in GANs.

Part 6 — Generative Adversarial Networks (GANs)#85

GANs are categorized as:

A
Explicit density models
B
Implicit density models
C
Supervised classifiers
D
Rule-based systems

The correct answer is: Implicit density models.

Part 6 — Generative Adversarial Networks (GANs)#86

Which is a common component of GAN training to stabilize it?

A
Batch normalization and careful learning rates
B
Removing noise
C
Using sigmoid only
D
No discriminator updates

The correct answer is: Batch normalization and careful learning rates.

Part 6 — Generative Adversarial Networks (GANs)#87

Which GAN variant gives control over style at multiple scales?

A
VAE
B
StyleGAN
C
HMM
D
Normalizing Flow

The correct answer is: StyleGAN.

Part 6 — Generative Adversarial Networks (GANs)#88

Discriminator\'s role is to:

A
Generate images
B
Classify inputs as real or fake
C
Compute ELBO
D
Encode latent vectors

The correct answer is: Classify inputs as real or fake.

Part 6 — Generative Adversarial Networks (GANs)#89

GAN training objective is best described as:

A
Supervised regression
B
Minimax optimization
C
Single-player optimization
D
Clustering objective

The correct answer is: Minimax optimization.

Part 6 — Generative Adversarial Networks (GANs)#90

A challenge when training GANs is:

A
Never any convergence issues
B
Sensitivity to hyperparameters and oscillations
C
No need for GPUs
D
Automatic perfect quality

The correct answer is: Sensitivity to hyperparameters and oscillations.

Part 7 — Sequence Models (RNN/LSTM/GRU)#91

RNNs maintain memory via:

A
A hidden state passed across time steps
B
External databases
C
Batch normalization only
D
ELBO terms

The correct answer is: A hidden state passed across time steps.

Part 7 — Sequence Models (RNN/LSTM/GRU)#92

Vanishing gradient makes it hard to learn:

A
Short-term dependencies only
B
Long-range dependencies in sequences
C
Image edges
D
Latent codes

The correct answer is: Long-range dependencies in sequences.

Part 7 — Sequence Models (RNN/LSTM/GRU)#93

LSTM introduces which mechanism to control information?

A
Attention only
B
Gating (forget, input, output gates)
C
Convolutions
D
GAN adversaries

The correct answer is: Gating (forget, input, output gates).

Part 7 — Sequence Models (RNN/LSTM/GRU)#94

GRU differs from LSTM by:

A
Having more gates
B
Being simpler with fewer gates
C
Using convolutional cells
D
Being non-recurrent

The correct answer is: Being simpler with fewer gates.

Part 7 — Sequence Models (RNN/LSTM/GRU)#95

Sequence generation can be performed by training models to predict:

A
Previous token given next
B
Next token given previous tokens
C
Labels only
D
Loss functions

The correct answer is: Next token given previous tokens.

Part 7 — Sequence Models (RNN/LSTM/GRU)#96

Teacher forcing is a training technique where:

A
Model is given ground-truth previous tokens during training
B
Model is never updated
C
Only use reinforcement learning
D
Always use GANs

The correct answer is: Model is given ground-truth previous tokens during training.

Part 7 — Sequence Models (RNN/LSTM/GRU)#97

Which is a limitation of RNNs compared to Transformers?

A
Parallelization and long-range learning
B
Ability to handle small sequences
C
Usefulness on text
D
Being neural networks

Transformers process all tokens in parallel using self-attention (unlike sequential RNNs), making them faster to train and better at capturing long-range dependencies.

Part 7 — Sequence Models (RNN/LSTM/GRU)#98

RNN backpropagation through time requires:

A
Only single-step gradients
B
Unrolling the network across time steps
C
No memory
D
Closed-form solutions

The correct answer is: Unrolling the network across time steps.

Part 7 — Sequence Models (RNN/LSTM/GRU)#99

Applications of sequence models include:

A
Time-series forecasting and language modeling
B
Only image classification
C
Static clustering
D
Hash table design

The correct answer is: Time-series forecasting and language modeling.

Part 7 — Sequence Models (RNN/LSTM/GRU)#100

Beam search is used in generation to:

A
Find top-k likely sequences approximately
B
Compute exact integrals
C
Train discriminators
D
Normalize inputs

The correct answer is: Find top-k likely sequences approximately.

Part 7 — Sequence Models (RNN/LSTM/GRU)#101

Scheduled sampling mixes:

A
Only reinforcement signals
B
Model predictions and ground-truth tokens during training
C
ELBO and adversarial losses
D
Two discriminators

The correct answer is: Model predictions and ground-truth tokens during training.

Part 7 — Sequence Models (RNN/LSTM/GRU)#102

An RNN cell output depends on:

A
Only current input
B
Current input and previous hidden state
C
Only previous input
D
No inputs

The correct answer is: Current input and previous hidden state.

Part 7 — Sequence Models (RNN/LSTM/GRU)#103

Which cell is computationally lighter?

A
LSTM
B
GRU
C
Transformer encoder
D
VAE encoder

The correct answer is: GRU.

Part 7 — Sequence Models (RNN/LSTM/GRU)#104

Sequence-to-sequence (seq2seq) models typically have:

A
Only a decoder
B
An encoder and a decoder
C
No neural components
D
Two discriminators

The correct answer is: An encoder and a decoder.

Part 7 — Sequence Models (RNN/LSTM/GRU)#105

Teacher forcing can lead to:

A
Exposure bias at inference time
B
Perfect robustness
C
No need for evaluation
D
Faster inference always

The correct answer is: Exposure bias at inference time.

Part 8 — Transformers & Attention#106

Self-attention allows tokens to:

A
Only attend to their immediate neighbor
B
Attend to all tokens in the sequence
C
Ignore other tokens
D
Compute ELBO directly

The correct answer is: Attend to all tokens in the sequence.

Part 8 — Transformers & Attention#107

Positional encoding provides:

A
Model weights
B
A sense of token order to the model
C
A new activation function
D
A training optimizer

The correct answer is: A sense of token order to the model.

Part 8 — Transformers & Attention#108

Multi-head attention helps by:

A
Computing the same attention repeatedly
B
Learning multiple types of relationships in parallel
C
Removing the need for encoders
D
Guaranteeing perfect generalization

The correct answer is: Learning multiple types of relationships in parallel.

Part 8 — Transformers & Attention#109

Transformers are more parallelizable than RNNs because:

A
They remove recurrence and process sequences simultaneous
B
They use smaller models
C
They use GPUs only
D
They use SVMs internally

Transformers process all tokens in parallel using self-attention (unlike sequential RNNs), making them faster to train and better at capturing long-range dependencies.

Part 8 — Transformers & Attention#110

Decoder-only models like GPT are trained to:

A
Predict masked tokens bidirectionally
B
Predict next-token in an autoregressive fashion
C
Compute exact likelihoods always
D
Implement HMMs

The correct answer is: Predict next-token in an autoregressive fashion.

Part 8 — Transformers & Attention#111

BERT is primarily used for:

A
Text generation
B
Understanding tasks like classification and QA
C
Image generation
D
Training discriminators

The correct answer is: Understanding tasks like classification and QA.

Part 8 — Transformers & Attention#112

Transformer encoder blocks include:

A
Self-attention + feed-forward layers
B
Only LSTM cells
C
Only convolutional layers
D
No nonlinearities

The correct answer is: Self-attention + feed-forward layers.

Part 8 — Transformers & Attention#113

Masked self-attention prevents a token from attending to:

A
All tokens
B
Future tokens during autoregressive generation
C
Past tokens only
D
Its own embedding

The correct answer is: Future tokens during autoregressive generation.

Part 8 — Transformers & Attention#114

Scaling transformers (more params + data) led to:

A
Emergence of strong few/zero-shot capabilities
B
Smaller vocabulary always
C
Removal of attention
D
Instant training

The correct answer is: Emergence of strong few/zero-shot capabilities.

Part 8 — Transformers & Attention#115

A positional encoding can be:

A
Learned or fixed sinusoidal
B
Only learned with random numbers
C
A type of optimizer
D
Irrelevant for order

The correct answer is: Learned or fixed sinusoidal.

Part 8 — Transformers & Attention#116

Which model is decoder-only?

A
BERT
B
GPT
C
Both BERT and GPT
D
Neither

The correct answer is: GPT.

Part 8 — Transformers & Attention#117

Attention scores are computed from queries, keys and values using:

A
Element-wise product followed by softmax
B
Dot-products and softmax (scaled)
C
Only LeakyReLU
D
Only KL divergence

Transformer attention computes similarity scores via scaled dot-products of queries (Q) and keys (K), normalizes with softmax, then produces a weighted sum of values (V).

Part 8 — Transformers & Attention#118

Transformer attention is typically multi-head to:

A
Reduce model parameters
B
Capture different relations using different projection subspaces
C
Remove positional info
D
Enforce Gaussian priors

The correct answer is: Capture different relations using different projection subspaces.

Part 8 — Transformers & Attention#119

Encoder-decoder transformers are commonly used for:

A
Machine translation and conditional generation
B
Unsupervised clustering only
C
Training GANs
D
Image denoising with VAEs only

Encoder-decoder transformers encode source sequences and condition the decoder for tasks like machine translation and conditional text generation.

Part 8 — Transformers & Attention#120

Which is an advantage of Transformers over RNNs?

A
Better at long-range dependencies and parallel training
B
Require sequential computation only
C
Always fewer params
D
No need for GPUs

Transformers process all tokens in parallel using self-attention (unlike sequential RNNs), making them faster to train and better at capturing long-range dependencies.

Part 9 — Applications, Ethics & Future#121

Generative AI in healthcare can help by:

A
Generating synthetic medical images for augmentation
B
Replacing doctors entirely
C
Creating irreproducible science
D
Avoiding regulatory review

The correct answer is: Generating synthetic medical images for augmentation.

Part 9 — Applications, Ethics & Future#122

In drug discovery generative models can:

A
Design novel molecular structures
B
Guarantee human trials success
C
Replace lab experiments entirely
D
Provide exact dosages without testing

The correct answer is: Design novel molecular structures.

Part 9 — Applications, Ethics & Future#123

A major ethical risk of generative AI is:

A
Improved dataset size only
B
Misinformation and deepfakes
C
Reduced compute costs
D
Guaranteed unbiased models

The correct answer is: Misinformation and deepfakes.

Part 9 — Applications, Ethics & Future#124

Which practice helps reduce model bias?

A
Careful data curation and auditing
B
Ignoring training data
C
Only using smaller models
D
Never using validation data

Model bias can be reduced through careful data curation, diverse training data, bias audits, and fairness-aware evaluation metrics.

Part 9 — Applications, Ethics & Future#125

Copyright concerns arise because models may:

A
Train only on public domain data
B
Reproduce or remix copyrighted works from training data
C
Always remove copyrighted content
D
Automatically clear rights

Generative models trained on large corpora may reproduce or remix copyrighted content from training data, raising intellectual property and fair-use legal questions.

Part 9 — Applications, Ethics & Future#126

Responsible deployment includes:

A
Transparency, monitoring and human oversight
B
No testing
C
Unlimited release without guardrails
D
Replacing human oversight entirely

The correct answer is: Transparency, monitoring and human oversight.

Part 9 — Applications, Ethics & Future#127

Which industry widely uses generative AI for creative media?

A
Agriculture only
B
Advertising, entertainment and design
C
Only aerospace
D
Only compiler development

The correct answer is: Advertising, entertainment and design.

Part 9 — Applications, Ethics & Future#128

Data augmentation via generative models mainly helps to:

A
Reduce model capacity
B
Increase effective data diversity
C
Eliminate test sets
D
Remove need for evaluation

The correct answer is: Increase effective data diversity.

Part 9 — Applications, Ethics & Future#129

Regulation and policy are needed because:

A
Models are trivial to interpret
B
Harm can be widespread and hard to control
C
There is no public interest
D
They reduce compute automatically

The correct answer is: Harm can be widespread and hard to control.

Part 9 — Applications, Ethics & Future#130

A practical mitigation for deepfakes is:

A
Detection models and provenance tracking
B
Never publishing any images
C
Removing GPUs
D
Relying only on manual checks always

Deepfakes are AI-generated synthetic media (video/audio) that can be used to spread misinformation. Detection models and provenance tracking are key mitigation strategies.

Part 9 — Applications, Ethics & Future#131

Multi-modal generative models combine:

A
Only text and rules
B
Text, images, audio and other modalities
C
Hardware sensors only
D
Only GANs

Multi-modal generative models process and generate across multiple data types — text, images, audio — enabling richer cross-modal understanding and generation.

Part 9 — Applications, Ethics & Future#132

Job displacement risk suggests:

A
Workforce transitions and reskilling policies are important
B
No changes are needed
C
Immediate mass layoffs
D
All jobs vanish overnight

The correct answer is: Workforce transitions and reskilling policies are important.

Part 9 — Applications, Ethics & Future#133

Which direction is important for future generative AI?

A
More controllable, reliable and multimodal systems
B
Less interpretability
C
Larger unregulated releases with no oversight
D
No alignment efforts

The correct answer is: More controllable, reliable and multimodal systems.

Part 9 — Applications, Ethics & Future#134

Intellectual property questions involve:

A
Who owns AI-generated content and training data usage
B
Only model architectures
C
Only optimizer choices
D
No legal concerns

The correct answer is: Who owns AI-generated content and training data usage.

Part 9 — Applications, Ethics & Future#135

When deploying a generative model for production, you should:

A
Skip monitoring
B
Implement monitoring, rate-limits, and human-in-the-loop
C
Train once and forget
D
Never evaluate outputs

The correct answer is: Implement monitoring, rate-limits, and human-in-the-loop.

Key Topics to Study

Based on our question bank analysis, master these concepts to score high in Generative AI.

GenerativeGANsVAEsTransformersAttentionRNNLSTMTraining
Preparation Tip

"Focus on understanding the logic behind pseudocode loops and selection statements, as they form the bulk of technical assessments."