Generative AI

Complete Subject Question Bank

Part 1 — Introduction to Generative AI#1

Generative AI primarily aims to:

Classify inputs

Predict stock prices only

Create new data samples similar to training data

Only compress data

Generative AI creates new data samples (text, images, audio, etc.) by learning the underlying statistical distribution of training data, rather than just classifying or predicting existing patterns.

Part 1 — Introduction to Generative AI#2

Which of these is NOT typically produced by generative models?

Images

Labels for classification tasks

Music

Text

The correct answer is: Labels for classification tasks.

Part 1 — Introduction to Generative AI#3

Learning a data distribution p(x) allows a model to:

Compute p(y|x)

Sample new plausible data points

Only memorize training data

Always achieve perfect reconstruction

The correct answer is: Sample new plausible data points.

Part 1 — Introduction to Generative AI#4

Which statement best contrasts discriminative and generative models?

Discriminative models learn p(x), generative learn p(y|x)

Discriminative models learn p(y|x), generative learn p(x) or p(x,y)

They are identical

Generative models cannot be used for classification

Discriminative models learn p(y|x) — the conditional probability of a label given input. Generative models learn p(x) or the joint p(x,y), enabling them to sample new data.

Part 1 — Introduction to Generative AI#5

Which is a common application of generative AI?

Data augmentation

Direct OS kernel development

Manufacturing hardware

Network routing protocols

The correct answer is: Data augmentation.

Part 1 — Introduction to Generative AI#6

Generative AI that helps artists by suggesting concepts is an example of:

Autonomous replacement

Creative tool / collaborator

Discriminative learning

Feature extraction only

The correct answer is: Creative tool / collaborator.

Part 1 — Introduction to Generative AI#7

A model that learns to produce plausible human faces has learned approximations of:

p(y|x)

p(x)

loss landscapes only

an SVM decision boundary

The correct answer is: p(x).

Part 1 — Introduction to Generative AI#8

Which capability is NOT typical of generative models?

Simulation for training

Creating synthetic data

Guaranteed unbiased outputs

Art assistance

The correct answer is: Guaranteed unbiased outputs.

Part 1 — Introduction to Generative AI#9

Which of the following is a risk specifically mentioned for generative AI?

Deepfakes and misinformation

Faster compilers

Lower memory usage

Stable training always

The correct answer is: Deepfakes and misinformation.

Part 1 — Introduction to Generative AI#10

Text generation, image generation and music generation are examples of:

Discriminative tasks

Supervised regression

Generative tasks

Clustering tasks

The correct answer is: Generative tasks.

Part 1 — Introduction to Generative AI#11

Why is learning a distribution more powerful than memorizing examples?

It guarantees exact copies

It allows sampling novel but plausible items

It reduces compute to zero

It avoids any bias automatically

Learning a data distribution allows a model to generate novel but plausible samples, unlike mere memorization which can only reproduce training examples.

Part 1 — Introduction to Generative AI#12

Which of these is a direct benefit of synthetic data?

Reduces need for any validation

Helps train models where real data is scarce

Removes need for GPUs

Ensures perfect model fairness

Synthetic data generated by generative models supplements real datasets, especially where real data is scarce, sensitive, or expensive to collect.

Part 1 — Introduction to Generative AI#13

A generative model that outputs new molecules would be used in:

Drug discovery

Network security

Compiler optimizations

Operating system design

The correct answer is: Drug discovery.

Part 1 — Introduction to Generative AI#14

Which term best describes creating content that resembles training data but is not identical?

Overfitting

Generalization / generation

Discrimination

Regularization

The correct answer is: Generalization / generation.

Part 1 — Introduction to Generative AI#15

Generative AI differs from classification because it focuses on:

Label boundaries

Creating samples

Only supervised labels

Feature scaling

The correct answer is: Creating samples.

Part 2 — History & Foundations#16

Gaussian Mixture Models (GMMs) are examples of:

Implicit models

Classical probabilistic models

Transformer-based models

Adversarial networks

The correct answer is: Classical probabilistic models.

Part 2 — History & Foundations#17

Hidden Markov Models (HMMs) are especially useful for:

Image synthesis

Sequence modeling like speech

Style transfer for images

Transformer pretraining

The correct answer is: Sequence modeling like speech.

Part 2 — History & Foundations#18

Which breakthrough enabled deep generative models to scale in the 2010s?

Larger datasets and GPUs

Smaller datasets

Removal of backpropagation

Replacing neural nets with SVMs

The correct answer is: Larger datasets and GPUs.

Part 2 — History & Foundations#19

The VAE paper was published by:

Goodfellow et al.

Kingma & Welling

Vaswani et al.

Hinton alone

The correct answer is: Kingma & Welling.

Part 2 — History & Foundations#20

GANs introduced the idea of:

Autoencoding

A generator vs a discriminator adversarial training

Self-attention

Reinforcement learning

GANs (Generative Adversarial Networks), introduced by Goodfellow et al. (2014), set up an adversarial game between a generator (creates fake samples) and a discriminator (distinguishes real from fake).

Part 2 — History & Foundations#21

“Attention Is All You Need” introduced:

The GAN architecture

The Transformer architecture

LSTMs

Variational inference

The correct answer is: The Transformer architecture.

Part 2 — History & Foundations#22

Which year is commonly associated with the original GAN paper?

2014

2005

2018

1999

The correct answer is: 2014.

Part 2 — History & Foundations#23

Transformers replaced recurrence with:

Convolutions

Self-attention

Markov chains

Decision trees

The correct answer is: Self-attention.

Part 2 — History & Foundations#24

Which early model is probabilistic and explicitly models density?

HMM

GAN

DCGAN

StyleGAN

The correct answer is: HMM.

Part 2 — History & Foundations#25

VAEs are celebrated for:

Perfect photorealism

Stable training and continuous latent spaces

No need for optimization

Replacing discriminative models

VAEs (Variational Autoencoders) are valued for stable training and smooth, continuous latent spaces that support interpolation and controlled generation.

Part 2 — History & Foundations#26

Which model family is known as “implicit density”?

VAEs

Normalizing Flows

GANs

GMMs

The correct answer is: GANs.

Part 2 — History & Foundations#27

The rise of LLMs was enabled by:

Transformer scaling and massive text corpora

Only better activation functions

Elimination of GPUs

Decline of datasets

Large Language Models (LLMs) are massive transformer-based models trained on vast text corpora. Scaling both model size and data enabled breakthroughs in natural language generation.

Part 2 — History & Foundations#28

Which contribution is attributed to Goodfellow et al.?

VAE

GAN

Transformer

LSTM

The correct answer is: GAN.

Part 2 — History & Foundations#29

CycleGAN is notable because it can:

Translate between image domains without paired data

Generate audio from text

Train without a discriminator

Solve PDEs analytically

CycleGAN performs unpaired image-to-image translation using cycle-consistency loss, allowing domain transfer (e.g., photos to paintings) without paired training examples.

Part 2 — History & Foundations#30

Which development made sampling from complex distributions more practical?

Normalizing Flows and invertible transforms

SVMs

Decision trees

Naive Bayes

The correct answer is: Normalizing Flows and invertible transforms.

Part 3 — ML & Neural Network Fundamentals#31

Machine Learning systems typically start with:

Model deployment before data

Data collection

Hyperparameter search only

No data

The correct answer is: Data collection.

Part 3 — ML & Neural Network Fundamentals#32

A perceptron computes:

A nonlinear combination without weights

A weighted sum plus activation

Only biases

Only gradients

A perceptron computes a weighted sum of its inputs plus a bias, then passes the result through an activation function to produce an output.

Part 3 — ML & Neural Network Fundamentals#33

Which activation is most used to mitigate vanishing gradients?

Sigmoid

Tanh

ReLU

Linear

The correct answer is: ReLU.

Part 3 — ML & Neural Network Fundamentals#34

Backpropagation uses which calculus tool to compute gradients?

Integral calculus

Chain rule

Taylor series

Fourier transform

The correct answer is: Chain rule.

Part 3 — ML & Neural Network Fundamentals#35

Gradient descent updates weights to:

Maximize loss

Minimize loss

Randomize weights

Always set weights to zero

Gradient descent iteratively updates model weights by moving in the direction that reduces the loss, using the gradient (partial derivatives) of the loss with respect to each weight.

Part 3 — ML & Neural Network Fundamentals#36

Deep networks learn hierarchical features—early layers learn:

High-level concepts only

Low-level features like edges

Model hyperparameters

Loss functions

The correct answer is: Low-level features like edges.

Part 3 — ML & Neural Network Fundamentals#37

Overfitting happens when the model:

Generalizes well

Memorizes training data and performs poorly on new data

Never learns

Has too few parameters

Overfitting occurs when a model memorizes training data rather than learning generalizable patterns, resulting in poor performance on unseen data.

Part 3 — ML & Neural Network Fundamentals#38

Which is NOT an optimizer for neural networks?

SGD

Adam

RMSProp

KNN

The correct answer is: KNN.

Part 3 — ML & Neural Network Fundamentals#39

Dropout is used to:

Improve inference speed

Regularize and reduce overfitting

Increase training dataset size

Convert supervised to unsupervised learning

Dropout is a regularization technique that randomly deactivates neurons during training, preventing co-adaptation and reducing overfitting.

Part 3 — ML & Neural Network Fundamentals#40

Cross-entropy loss is most often used for:

Regression

Classification tasks

Clustering

Feature selection

The correct answer is: Classification tasks.

Part 3 — ML & Neural Network Fundamentals#41

A bias term in a neuron is analogous to:

Slope of a line only

Intercept in linear models

Activation function

Learning rate

The correct answer is: Intercept in linear models.

Part 3 — ML & Neural Network Fundamentals#42

Batch normalization primarily helps by:

Removing the need for activation functions

Stabilizing and speeding up training

Making models deterministic

Creating new data

Batch normalization normalizes layer inputs during training, reducing internal covariate shift, which stabilizes and accelerates training.

Part 3 — ML & Neural Network Fundamentals#43

Which layer type is most common in image models?

Dense only

Convolutional

RNN

k-NN layer

The correct answer is: Convolutional.

Part 3 — ML & Neural Network Fundamentals#44

Transfer learning helps when:

You have infinite labeled data

Data is limited and pretrained features help

You only use SVMs

You remove all hidden layers

Transfer learning leverages pretrained model features from large datasets, enabling high performance even when task-specific labeled data is limited.

Part 3 — ML & Neural Network Fundamentals#45

An epoch means:

One parameter update

One full pass over the training dataset

A single batch

Final test evaluation

An epoch is one complete pass through the entire training dataset. Multiple epochs allow the model to iteratively refine its weights.

Part 4 — Generative Model Taxonomy#46

Explicit density models provide:

A formula for p(x)

Only samples without density

No sampling mechanism

Only discriminative outputs

The correct answer is: A formula for p(x).

Part 4 — Generative Model Taxonomy#47

Normalizing Flows are an example of:

Implicit models

Tractable explicit models using invertible transforms

GAN variants

RNNs

Normalizing Flows are tractable explicit generative models that use a series of invertible transformations to map a simple distribution (e.g., Gaussian) to a complex data distribution.

Part 4 — Generative Model Taxonomy#48

Which model family does a VAE belong to?

Implicit density models

Approximate explicit models

Reinforcement learning

Rule-based systems

The correct answer is: Approximate explicit models.

Part 4 — Generative Model Taxonomy#49

Implicit models are characterized by:

Providing closed-form p(x)

Direct sampling without explicit probability

Always having tractable likelihoods

Using Gaussian mixtures only

Implicit generative models (like GANs) generate samples directly without defining an explicit probability density function, making likelihood evaluation difficult.

Part 4 — Generative Model Taxonomy#50

Which is a tractable explicit model?

Basic Gaussian, some Normalizing Flows

GANs

VAEs

CycleGANs

The correct answer is: Basic Gaussian, some Normalizing Flows.

Part 4 — Generative Model Taxonomy#51

Which approach approximates likelihoods using ELBO?

GANs

VAEs

HMMs

SVMs

ELBO (Evidence Lower BOund) is the VAE training objective. Maximizing ELBO is equivalent to maximizing a lower bound on the data log-likelihood while regularizing the latent space.

Part 4 — Generative Model Taxonomy#52

Sampling from an implicit model requires:

Computing p(x) directly

Passing noise through a generator network

Solving an inverse problem analytically

Closed-form integrals

Implicit generative models (like GANs) generate samples directly without defining an explicit probability density function, making likelihood evaluation difficult.

Part 4 — Generative Model Taxonomy#53

Which model gives exact likelihoods (when tractable)?

Normalizing Flows

GANs

Standard VAEs

Implicit GANs

The correct answer is: Normalizing Flows.

Part 4 — Generative Model Taxonomy#54

Which is an advantage of explicit density models?

Always easier to train

They can evaluate likelihoods for samples

Fewer parameters always

No assumptions about data

The correct answer is: They can evaluate likelihoods for samples.

Part 4 — Generative Model Taxonomy#55

An example of implicit modeling is:

Variational approximation

Adversarial training

Tractable inversion

Bayesian posterior calculations

Implicit generative models (like GANs) generate samples directly without defining an explicit probability density function, making likelihood evaluation difficult.

Part 4 — Generative Model Taxonomy#56

Which family is well-suited to likelihood-based anomaly detection?

GANs (implicit)

Explicit density models

Only discriminative models

Rule engines

The correct answer is: Explicit density models.

Part 4 — Generative Model Taxonomy#57

ELBO stands for:

Evidence Lower Bound

Exact Latent Bayesian Objective

Enhanced Learning Bound

Eigenvalue Lower Bound

ELBO (Evidence Lower BOund) is the VAE training objective. Maximizing ELBO is equivalent to maximizing a lower bound on the data log-likelihood while regularizing the latent space.

Part 4 — Generative Model Taxonomy#58

Which is a limitation of implicit models?

No sampling

No way to compute likelihood easily

Always slow at generation

Always better reconstruction

Implicit generative models (like GANs) generate samples directly without defining an explicit probability density function, making likelihood evaluation difficult.

Part 4 — Generative Model Taxonomy#59

Tractable models are useful because they allow:

Direct density evaluation and likelihood comparisons

No need for training

Guaranteed perfect samples

No hyperparameters

The correct answer is: Direct density evaluation and likelihood comparisons.

Part 4 — Generative Model Taxonomy#60

VAEs, GANs and Flows are examples of:

Only discriminative models

Different approaches to generative modeling

Hardware components

File formats

The correct answer is: Different approaches to generative modeling.

Part 5 — Variational Autoencoders (VAEs)#61

A standard autoencoder differs from a VAE because a VAE:

Is deterministic

Encodes inputs as distributions (μ,σ)

Has no decoder

Never reconstructs inputs

The correct answer is: Encodes inputs as distributions (μ,σ).

Part 5 — Variational Autoencoders (VAEs)#62

The reparameterization trick allows:

Sampling without blocking gradients

Exact analytical integrals always

Avoiding any randomness

Using RNNs instead

The reparameterization trick moves randomness (ε ~ N(0,I)) outside the network path (z = μ + σ⊙ε), enabling gradients to flow through the sampling operation during backpropagation.

Part 5 — Variational Autoencoders (VAEs)#63

VAE loss includes reconstruction loss plus:

Cross-entropy only

KL divergence between encoder distribution and prior

Adversarial loss

No other term

The correct answer is: KL divergence between encoder distribution and prior.

Part 5 — Variational Autoencoders (VAEs)#64

Sampling z = μ + σ ⊙ ε moves randomness to:

Inside weights update

Outside network path to allow backprop

Always deterministic path

The decoder only

The correct answer is: Outside network path to allow backprop.

Part 5 — Variational Autoencoders (VAEs)#65

A common prior used in VAEs is:

Uniform on [0,1]

Standard normal N(0,1)

Dirichlet with many components

No prior

The correct answer is: Standard normal N(0,1).

Part 5 — Variational Autoencoders (VAEs)#66

VAEs typically produce images that are:

Sharper than GANs

More blurry than GANs

Identical to training images

Only black and white

The correct answer is: More blurry than GANs.

Part 5 — Variational Autoencoders (VAEs)#67

KL term in VAE encourages:

Latent codes to match prior

Latent codes to diverge

Higher overfitting

Removing decoder

The KL divergence term in VAE loss regularizes the encoder by encouraging the latent code distribution to stay close to the prior (typically a standard Gaussian).

Part 5 — Variational Autoencoders (VAEs)#68

Advantages of VAEs include:

Stability in training and meaningful latent space

Perfect photorealism

No hyperparameters

No need for decoder

VAEs (Variational Autoencoders) are valued for stable training and smooth, continuous latent spaces that support interpolation and controlled generation.

Part 5 — Variational Autoencoders (VAEs)#69

Which is a limitation of VAEs?

Always produce mode collapse

Assume a simple latent distribution like Gaussian

Cannot be combined with transformers

No sampling possible

The correct answer is: Assume a simple latent distribution like Gaussian.

Part 5 — Variational Autoencoders (VAEs)#70

In VAEs, the decoder maps from:

Input x to μ

Latent z to data x̂

Gradient to loss

Weights to biases

The correct answer is: Latent z to data x̂.

Part 5 — Variational Autoencoders (VAEs)#71

ELBO maximization is equivalent to:

Minimizing reconstruction only

Maximizing a lower bound on data likelihood

Training discriminator

Solving linear equations only

ELBO (Evidence Lower BOund) is the VAE training objective. Maximizing ELBO is equivalent to maximizing a lower bound on the data log-likelihood while regularizing the latent space.

Part 5 — Variational Autoencoders (VAEs)#72

Choosing a too-large KL weight will typically:

Reduce reconstruction quality to enforce structure

Improve sharpness always

Remove regularization

Break reparameterization trick

The correct answer is: Reduce reconstruction quality to enforce structure.

Part 5 — Variational Autoencoders (VAEs)#73

VAEs are useful for:

Latent interpolation and data generation

Only classification tasks

Solving differential equations analytically

Database indexing

The correct answer is: Latent interpolation and data generation.

Part 5 — Variational Autoencoders (VAEs)#74

A well-structured latent space allows:

Arbitrary discontinuous jumps

Smooth interpolation between samples

Guaranteed photorealism

Avoiding all biases

The correct answer is: Smooth interpolation between samples.

Part 5 — Variational Autoencoders (VAEs)#75

Which is true about VAE encoder output?

A single scalar per input

Vectors of means and log-variances

No parameters

Only deterministic codes

The correct answer is: Vectors of means and log-variances.

Part 6 — Generative Adversarial Networks (GANs)#76

GANs train by:

Supervised regression only

An adversarial game between generator and discriminator

Maximizing ELBO

Using RNNs only

The correct answer is: An adversarial game between generator and discriminator.

Part 6 — Generative Adversarial Networks (GANs)#77

Mode collapse means the generator:

Outputs diverse samples

Produces limited types of outputs

Always wins the game

Does not use noise

Mode collapse occurs when a GAN generator produces only a limited variety of outputs, ignoring parts of the real data distribution. It is a training instability common in GANs.

Part 6 — Generative Adversarial Networks (GANs)#78

If discriminator becomes too strong early, the generator may suffer from:

Vanishing gradients

Perfect convergence

Faster sampling

Guaranteed diversity

The correct answer is: Vanishing gradients.

Part 6 — Generative Adversarial Networks (GANs)#79

DCGAN stands for a GAN variant optimized for:

Text generation

Image generation using convolutional architectures

Tabular data only

Audio synthesis only

DCGAN (Deep Convolutional GAN) applies convolutional architectures to GANs, replacing fully connected layers with strided and fractionally strided convolutions for stable image generation.

Part 6 — Generative Adversarial Networks (GANs)#80

StyleGAN introduced:

A style-based generator controlling details at different levels

Elimination of discriminator

RNN encoders

Exact likelihood computation

Part 6 — Generative Adversarial Networks (GANs)#81

CycleGAN is primarily used for:

Unpaired image-to-image translation

Conditional text generation

Improving VAEs

Speech recognition

CycleGAN performs unpaired image-to-image translation using cycle-consistency loss, allowing domain transfer (e.g., photos to paintings) without paired training examples.

Part 6 — Generative Adversarial Networks (GANs)#82

The generator maps noise z to:

A probability density formula

A synthetic data sample G(z)

A true data point from the dataset

A loss value

The correct answer is: A synthetic data sample G(z).

Part 6 — Generative Adversarial Networks (GANs)#83

Adversarial loss tries to make discriminator output for generated samples:

Close to 0 (fake)

Close to 1 (real)

Exactly -1

Undefined

The correct answer is: Close to 1 (real).

Part 6 — Generative Adversarial Networks (GANs)#84

A typical fix for mode collapse is:

Careful architecture choices, regularization, or alternative losses

Deleting the discriminator entirely

Using smaller batches only

No training at all

Mode collapse occurs when a GAN generator produces only a limited variety of outputs, ignoring parts of the real data distribution. It is a training instability common in GANs.

Part 6 — Generative Adversarial Networks (GANs)#85

GANs are categorized as:

Explicit density models

Implicit density models

Supervised classifiers

Rule-based systems

The correct answer is: Implicit density models.

Part 6 — Generative Adversarial Networks (GANs)#86

Which is a common component of GAN training to stabilize it?

Batch normalization and careful learning rates

Removing noise

Using sigmoid only

No discriminator updates

The correct answer is: Batch normalization and careful learning rates.

Part 6 — Generative Adversarial Networks (GANs)#87

Which GAN variant gives control over style at multiple scales?

VAE

StyleGAN

HMM

Normalizing Flow

The correct answer is: StyleGAN.

Part 6 — Generative Adversarial Networks (GANs)#88

Discriminator\'s role is to:

Generate images

Classify inputs as real or fake

Compute ELBO

Encode latent vectors

The correct answer is: Classify inputs as real or fake.

Part 6 — Generative Adversarial Networks (GANs)#89

GAN training objective is best described as:

Supervised regression

Minimax optimization

Single-player optimization

Clustering objective

The correct answer is: Minimax optimization.

Part 6 — Generative Adversarial Networks (GANs)#90

A challenge when training GANs is:

Never any convergence issues

Sensitivity to hyperparameters and oscillations

No need for GPUs

Automatic perfect quality

The correct answer is: Sensitivity to hyperparameters and oscillations.

Part 7 — Sequence Models (RNN/LSTM/GRU)#91

RNNs maintain memory via:

A hidden state passed across time steps

External databases

Batch normalization only

ELBO terms

The correct answer is: A hidden state passed across time steps.

Part 7 — Sequence Models (RNN/LSTM/GRU)#92

Vanishing gradient makes it hard to learn:

Short-term dependencies only

Long-range dependencies in sequences

Image edges

Latent codes

The correct answer is: Long-range dependencies in sequences.

Part 7 — Sequence Models (RNN/LSTM/GRU)#93

LSTM introduces which mechanism to control information?

Attention only

Gating (forget, input, output gates)

Convolutions

GAN adversaries

The correct answer is: Gating (forget, input, output gates).

Part 7 — Sequence Models (RNN/LSTM/GRU)#94

GRU differs from LSTM by:

Having more gates

Being simpler with fewer gates

Using convolutional cells

Being non-recurrent

The correct answer is: Being simpler with fewer gates.

Part 7 — Sequence Models (RNN/LSTM/GRU)#95

Sequence generation can be performed by training models to predict:

Previous token given next

Next token given previous tokens

Labels only

Loss functions

The correct answer is: Next token given previous tokens.

Part 7 — Sequence Models (RNN/LSTM/GRU)#96

Teacher forcing is a training technique where:

Model is given ground-truth previous tokens during training

Model is never updated

Only use reinforcement learning

Always use GANs

The correct answer is: Model is given ground-truth previous tokens during training.

Part 7 — Sequence Models (RNN/LSTM/GRU)#97

Which is a limitation of RNNs compared to Transformers?

Parallelization and long-range learning

Ability to handle small sequences

Usefulness on text

Being neural networks

Transformers process all tokens in parallel using self-attention (unlike sequential RNNs), making them faster to train and better at capturing long-range dependencies.

Part 7 — Sequence Models (RNN/LSTM/GRU)#98

RNN backpropagation through time requires:

Only single-step gradients

Unrolling the network across time steps

No memory

Closed-form solutions

The correct answer is: Unrolling the network across time steps.

Part 7 — Sequence Models (RNN/LSTM/GRU)#99

Applications of sequence models include:

Time-series forecasting and language modeling

Only image classification

Static clustering

Hash table design

The correct answer is: Time-series forecasting and language modeling.

Part 7 — Sequence Models (RNN/LSTM/GRU)#100

Beam search is used in generation to:

Find top-k likely sequences approximately

Compute exact integrals

Train discriminators

Normalize inputs

The correct answer is: Find top-k likely sequences approximately.

Part 7 — Sequence Models (RNN/LSTM/GRU)#101

Scheduled sampling mixes:

Only reinforcement signals

Model predictions and ground-truth tokens during training

ELBO and adversarial losses

Two discriminators

The correct answer is: Model predictions and ground-truth tokens during training.

Part 7 — Sequence Models (RNN/LSTM/GRU)#102

An RNN cell output depends on:

Only current input

Current input and previous hidden state

Only previous input

No inputs

The correct answer is: Current input and previous hidden state.

Part 7 — Sequence Models (RNN/LSTM/GRU)#103

Which cell is computationally lighter?

LSTM

GRU

Transformer encoder

VAE encoder

The correct answer is: GRU.

Part 7 — Sequence Models (RNN/LSTM/GRU)#104

Sequence-to-sequence (seq2seq) models typically have:

Only a decoder

An encoder and a decoder

No neural components

Two discriminators

The correct answer is: An encoder and a decoder.

Part 7 — Sequence Models (RNN/LSTM/GRU)#105

Teacher forcing can lead to:

Exposure bias at inference time

Perfect robustness

No need for evaluation

Faster inference always

The correct answer is: Exposure bias at inference time.

Part 8 — Transformers & Attention#106

Self-attention allows tokens to:

Only attend to their immediate neighbor

Attend to all tokens in the sequence

Ignore other tokens

Compute ELBO directly

The correct answer is: Attend to all tokens in the sequence.

Part 8 — Transformers & Attention#107

Positional encoding provides:

Model weights

A sense of token order to the model

A new activation function

A training optimizer

The correct answer is: A sense of token order to the model.

Part 8 — Transformers & Attention#108

Multi-head attention helps by:

Computing the same attention repeatedly

Learning multiple types of relationships in parallel

Removing the need for encoders

Guaranteeing perfect generalization

The correct answer is: Learning multiple types of relationships in parallel.

Part 8 — Transformers & Attention#109

Transformers are more parallelizable than RNNs because:

They remove recurrence and process sequences simultaneous

They use smaller models

They use GPUs only

They use SVMs internally

Transformers process all tokens in parallel using self-attention (unlike sequential RNNs), making them faster to train and better at capturing long-range dependencies.

Part 8 — Transformers & Attention#110

Decoder-only models like GPT are trained to:

Predict masked tokens bidirectionally

Predict next-token in an autoregressive fashion

Compute exact likelihoods always

Implement HMMs

The correct answer is: Predict next-token in an autoregressive fashion.

Part 8 — Transformers & Attention#111

BERT is primarily used for:

Text generation

Understanding tasks like classification and QA

Image generation

Training discriminators

The correct answer is: Understanding tasks like classification and QA.

Part 8 — Transformers & Attention#112

Transformer encoder blocks include:

Self-attention + feed-forward layers

Only LSTM cells

Only convolutional layers

No nonlinearities

The correct answer is: Self-attention + feed-forward layers.

Part 8 — Transformers & Attention#113

Masked self-attention prevents a token from attending to:

All tokens

Future tokens during autoregressive generation

Past tokens only

Its own embedding

The correct answer is: Future tokens during autoregressive generation.

Part 8 — Transformers & Attention#114

Scaling transformers (more params + data) led to:

Emergence of strong few/zero-shot capabilities

Smaller vocabulary always

Removal of attention

Instant training

The correct answer is: Emergence of strong few/zero-shot capabilities.

Part 8 — Transformers & Attention#115

A positional encoding can be:

Learned or fixed sinusoidal

Only learned with random numbers

A type of optimizer

Irrelevant for order

The correct answer is: Learned or fixed sinusoidal.

Part 8 — Transformers & Attention#116

Which model is decoder-only?

BERT

GPT

Both BERT and GPT

Neither

The correct answer is: GPT.

Part 8 — Transformers & Attention#117

Attention scores are computed from queries, keys and values using:

Element-wise product followed by softmax

Dot-products and softmax (scaled)

Only LeakyReLU

Only KL divergence

Transformer attention computes similarity scores via scaled dot-products of queries (Q) and keys (K), normalizes with softmax, then produces a weighted sum of values (V).

Part 8 — Transformers & Attention#118

Transformer attention is typically multi-head to:

Reduce model parameters

Capture different relations using different projection subspaces

Remove positional info

Enforce Gaussian priors

The correct answer is: Capture different relations using different projection subspaces.

Part 8 — Transformers & Attention#119

Encoder-decoder transformers are commonly used for:

Machine translation and conditional generation

Unsupervised clustering only

Training GANs

Image denoising with VAEs only

Encoder-decoder transformers encode source sequences and condition the decoder for tasks like machine translation and conditional text generation.

Part 8 — Transformers & Attention#120

Which is an advantage of Transformers over RNNs?

Better at long-range dependencies and parallel training

Require sequential computation only

Always fewer params

No need for GPUs

Transformers process all tokens in parallel using self-attention (unlike sequential RNNs), making them faster to train and better at capturing long-range dependencies.

Part 9 — Applications, Ethics & Future#121

Generative AI in healthcare can help by:

Generating synthetic medical images for augmentation

Replacing doctors entirely

Creating irreproducible science

Avoiding regulatory review

The correct answer is: Generating synthetic medical images for augmentation.

Part 9 — Applications, Ethics & Future#122

In drug discovery generative models can:

Design novel molecular structures

Guarantee human trials success

Replace lab experiments entirely

Provide exact dosages without testing

The correct answer is: Design novel molecular structures.

Part 9 — Applications, Ethics & Future#123

A major ethical risk of generative AI is:

Improved dataset size only

Misinformation and deepfakes

Reduced compute costs

Guaranteed unbiased models

The correct answer is: Misinformation and deepfakes.

Part 9 — Applications, Ethics & Future#124

Which practice helps reduce model bias?

Careful data curation and auditing

Ignoring training data

Only using smaller models

Never using validation data

Model bias can be reduced through careful data curation, diverse training data, bias audits, and fairness-aware evaluation metrics.

Part 9 — Applications, Ethics & Future#125

Train only on public domain data

Reproduce or remix copyrighted works from training data

Always remove copyrighted content

Automatically clear rights

Generative models trained on large corpora may reproduce or remix copyrighted content from training data, raising intellectual property and fair-use legal questions.

Part 9 — Applications, Ethics & Future#126

Responsible deployment includes:

Transparency, monitoring and human oversight

No testing

Unlimited release without guardrails

Replacing human oversight entirely

The correct answer is: Transparency, monitoring and human oversight.

Part 9 — Applications, Ethics & Future#127

Which industry widely uses generative AI for creative media?

Agriculture only

Advertising, entertainment and design

Only aerospace

Only compiler development

The correct answer is: Advertising, entertainment and design.

Part 9 — Applications, Ethics & Future#128

Data augmentation via generative models mainly helps to:

Reduce model capacity

Increase effective data diversity

Eliminate test sets

Remove need for evaluation

The correct answer is: Increase effective data diversity.

Part 9 — Applications, Ethics & Future#129

Regulation and policy are needed because:

Models are trivial to interpret

Harm can be widespread and hard to control

There is no public interest

They reduce compute automatically

The correct answer is: Harm can be widespread and hard to control.

Part 9 — Applications, Ethics & Future#130

A practical mitigation for deepfakes is:

Detection models and provenance tracking

Never publishing any images

Removing GPUs

Relying only on manual checks always

Deepfakes are AI-generated synthetic media (video/audio) that can be used to spread misinformation. Detection models and provenance tracking are key mitigation strategies.

Part 9 — Applications, Ethics & Future#131

Multi-modal generative models combine:

Only text and rules

Text, images, audio and other modalities

Hardware sensors only

Only GANs

Multi-modal generative models process and generate across multiple data types — text, images, audio — enabling richer cross-modal understanding and generation.

Part 9 — Applications, Ethics & Future#132

Job displacement risk suggests:

Workforce transitions and reskilling policies are important

No changes are needed

Immediate mass layoffs

All jobs vanish overnight

The correct answer is: Workforce transitions and reskilling policies are important.

Part 9 — Applications, Ethics & Future#133

Which direction is important for future generative AI?

More controllable, reliable and multimodal systems

Less interpretability

Larger unregulated releases with no oversight

No alignment efforts

The correct answer is: More controllable, reliable and multimodal systems.

Part 9 — Applications, Ethics & Future#134

Intellectual property questions involve:

Who owns AI-generated content and training data usage

Only model architectures

Only optimizer choices

No legal concerns

The correct answer is: Who owns AI-generated content and training data usage.

Part 9 — Applications, Ethics & Future#135

When deploying a generative model for production, you should:

Skip monitoring

Implement monitoring, rate-limits, and human-in-the-loop

Train once and forget

Never evaluate outputs

The correct answer is: Implement monitoring, rate-limits, and human-in-the-loop.

Key Topics to Study

Based on our question bank analysis, master these concepts to score high in Generative AI.

GenerativeGANsVAEsTransformersAttentionRNNLSTMTraining

Preparation Tip

"Focus on understanding the logic behind pseudocode loops and selection statements, as they form the bulk of technical assessments."