Artificial Intelligence (AI): goals, techniques, learning, reasoning,

Dashboard/Study Pack•Learning Mode

Summary

Artificial Intelligence (AI) is the capability of computational systems to perform tasks associated with human intelligence, including learning, reasoning, perception, and decision-making. This definition matters because it sets the scope: AI is not one technique, but a collection of methods aimed at building systems that can act intelligently in the real world. From this scope, AI Goals and Subproblems break intelligence into manageable targets, such as solving problems, using knowledge, and communicating with humans. These subproblems connect directly to later components: reasoning needs problem-solving methods, communication needs language models, and acting needs decision theory. Reasoning and Problem-Solving derive solutions from information, but large problems can trigger combinatorial explosion, making naive step-by-step approaches infeasible. This matters because it motivates more structured approaches, including probabilistic reasoning under uncertainty and formal decision models. Knowledge Representation and Ontologies encode concepts and relationships so programs can answer questions and make deductions. This matters because many real-world facts are broad and implicit; representing commonsense knowledge is difficult and links to how agents plan and decide. Planning and Decision-Making uses agents, utility, and expected utility to choose actions by maximizing preference over possible outcomes. Expected utility matters because it explicitly handles uncertainty by weighting outcomes by their probabilities. For deeper modeling, Markov Decision Processes (MDPs) formalize transitions (probabilities), rewards (utility or cost), and policies mapping states to actions. Machine Learning Paradigms improve performance from data via supervised, unsupervised, and reinforcement learning. This matters because learning reduces reliance on hand-coded rules, and deep learning often supplies the function approximators used in modern systems. Natural Language Processing and Modern Transformers use embeddings and transformer architectures to model language effectively, accelerating progress after 2017. Perception and Computer Vision infer world properties from sensor inputs, supporting tasks like recognition and tracking. Finally, Social Intelligence and Affective Computing extend AI toward emotion and social context, connecting perception to user-facing interpretation. Across all layers, AI Techniques: Search and Optimization provide the core engines for exploring state spaces and refining solutions, enabling applications such as chatbots, autonomous vehicles, and strategy-game play.

Topics Covered

What AI Is: Scope, Capabilities, and Subproblems

AI is the capability of computational systems to perform tasks associated with human intelligence, including learning, reasoning, perception, and decision-making. This topic frames AI as a collection of subproblems rather than a single method. It connects directly to AI goals (what we want), then to reasoning and problem-solving (how we derive solutions), and to learning and perception (how we handle data and the real world). It also sets up later discussions of agents, planning, and application domains.

Reasoning and Problem-Solving Under Computational Limits

Reasoning derives solutions from information, but large problems can trigger combinatorial explosion, making exact step-by-step approaches infeasible. This topic explains why uncertainty and scale matter, and how probabilistic reasoning becomes necessary when information is incomplete. It connects to planning and decision-making because choosing actions often requires reasoning about future consequences. It also motivates the need for knowledge representation to make reasoning tractable.

Knowledge Representation and Ontologies for Deduction

Knowledge representation encodes concepts and relationships so programs can answer questions and make deductions about real-world facts. Ontologies structure domain concepts and relations, while knowledge bases store usable representations for retrieval and decision support. This topic connects to reasoning by enabling formal deductions, and to planning by providing the facts and constraints an agent must consider. It also highlights the difficulty of commonsense breadth and knowledge acquisition.

Agents, Utility, and Planning Under Uncertainty (MDPs)

An agent perceives and acts, and decision-making can choose actions by maximizing expected utility over possible outcomes. Expected utility explicitly handles uncertainty by weighting utilities by outcome probabilities. For sequential decision-making, Markov Decision Processes (MDPs) formalize transition probabilities, rewards (utility/cost), and policies mapping states to actions. This topic connects to reasoning (choosing actions from information), to knowledge representation (what the agent knows), and to search/optimization (how policies or plans are computed).

AI Techniques: Search and Optimization as the Engine

AI uses search to explore state spaces (goal/subgoal trees) and optimization to refine solutions by minimizing or maximizing objective functions. State space search systematically explores possible states, while local search starts from a guess and incrementally improves it using mathematical optimization. Gradient descent is a key optimization method for training numerical models by minimizing a loss function. This topic connects to planning/MDPs (computing policies) and to machine learning (training parameters), and it clarifies common confusion between state space search and local search.

Machine Learning Paradigms: From Data to Behavior

Machine learning improves performance automatically using paradigms such as supervised learning, unsupervised learning, reinforcement learning, transfer, and deep learning. Supervised learning uses labeled inputs with expected answers, while reinforcement learning learns from rewards and punishments from interaction. Deep learning relies on neural networks and is strongly tied to optimization methods like gradient descent. This topic connects to agents and MDPs (reinforcement learning), and to NLP and perception (where learned representations power modern systems).

NLP and Modern Transformers: Learning Language Representations

Natural language processing enables systems to read, write, and communicate in human languages for tasks like translation, question answering, and information extraction. Modern NLP uses embeddings and transformer architectures with attention mechanisms, which accelerated progress after 2017. Generative transformer models (such as GPT-style systems) improved real-world performance and can generate coherent text. This topic connects to machine learning paradigms (deep learning and optimization) and to social intelligence (language as a channel for emotion and interaction).

Perception, Computer Vision, and Affective/Social Intelligence

Perception uses sensor inputs to infer aspects of the world, while computer vision focuses specifically on visual input such as images for classification, recognition, and tracking. Affective computing builds systems that recognize, interpret, process, or simulate human feelings, including sentiment analysis and multimodal sentiment analysis. This topic connects to agents because perception supplies the agent’s state, and to NLP because social interaction often combines language with emotion cues. It also clarifies that perception is broader than computer vision due to multiple sensor modalities.

Key Insights

Uncertainty forces belief updates

Because agents must act under uncertainty, they cannot treat the world state as fixed. After acting, they must reassess using new evidence, effectively turning “planning” into an ongoing loop of belief revision rather than a one-shot computation.

Why it matters: This reframes planning/decision-making as continuous inference, not just choosing the best action once. It also explains why probabilistic models and reassessment are central, even when you start from a utility-maximization view.

Representation limits reasoning breadth

The knowledge-representation challenge implies that “reasoning difficulty” is not only about algorithm speed. Even perfect search or optimal planning can fail if the system cannot encode the commonsense facts needed to connect actions to outcomes.

Why it matters: Students often separate “hard reasoning” from “missing knowledge,” but the content implies they compound. Understanding AI performance requires tracking both computational scaling (like combinatorial explosion) and knowledge coverage (like commonsense breadth).

Transformers changed the bottleneck

The transformer adoption chain suggests a shift in what limits progress: from model training feasibility to sequence-learning effectiveness. With attention-based architectures, the same general deep-learning machinery becomes dramatically more capable at sequence tasks, which then accelerates real-world growth.

Why it matters: This makes “transformers improved NLP” feel less like a slogan and more like a bottleneck relocation. It helps students predict that architecture changes can outperform pure data or training tweaks when they target the dominant constraint.

Search vs optimization is a spectrum

State space search and local search are often taught as different families, but the relationships imply they can be viewed as endpoints of a continuum. Search enumerates discrete possibilities to reach a goal, while local search refines a numeric guess; both are ways to manage combinatorial structure by restricting what you consider next.

Why it matters: This reduces the memorization of definitions and encourages students to reason about algorithm behavior under constraints. It also clarifies why “heuristics” and “objective functions” matter: they determine how the space is pruned or navigated.

Expected utility is learning-friendly

Expected utility handles uncertainty by weighting outcomes by probabilities, and reinforcement learning learns from reward signals. Put together, reward-based learning can be interpreted as gradually estimating which actions yield higher expected utility under uncertainty, even when the agent cannot model transitions perfectly.

Why it matters: This connects decision theory to learning in a way that is not stated directly. It helps students see reinforcement learning not as a separate paradigm, but as a practical route to approximate expected-utility decisions.

Conclusions

Bringing It All Together

Artificial Intelligence is the umbrella capability for computational systems to perform learning, reasoning, perception, and decision-making, which immediately motivates AI goals and subproblems. Reasoning and problem-solving connect to planning and decision-making because real tasks require choosing actions under uncertainty, where agents use utility and expected utility to compare possible outcomes. To act intelligently, agents need knowledge representation and ontologies for encoding concepts and relationships, but they also rely on search and optimization to navigate large state spaces when exact reasoning is infeasible due to combinatorial explosion. In parallel, machine learning paradigms provide data-driven ways to improve performance, and modern NLP with transformer architectures shows how learning can power language tasks at scale. Finally, perception and computer vision supply the sensory inputs for agents, while affective computing and social intelligence extend AI toward interaction that interprets and responds to human emotion.

Key Takeaways

AI Definition and Scope sets the unified target: learning, reasoning, perception, and decision-making, which determines which techniques you should study and when to apply them.
Reasoning and Problem-Solving explains why naive step-by-step approaches fail at scale via combinatorial explosion, motivating search, probabilistic thinking, and optimization.
Knowledge Representation and Ontologies enable deductions and question answering by encoding concepts and relations, but they are hard because commonsense knowledge is broad and must be acquired and represented.
Planning and Decision-Making (Agents, Utility, MDPs) formalizes acting under uncertainty using expected utility and policies, linking uncertainty to probabilistic models and state transitions.
Machine Learning Paradigms and NLP Transformers connect modern AI success to learning from data, while perception and affective computing connect AI outputs to real-world sensing and human-centered interaction.

Real-World Applications

Advanced web search engines and information retrieval systems use search and optimization ideas to rank and select promising results, often combined with NLP for understanding queries.
Chatbots and virtual assistants combine NLP (including transformer-based models) with decision-making concepts to choose responses that maximize expected utility for user satisfaction.
Autonomous vehicles use perception and computer vision to infer the world from sensors and then apply planning/decision-making to act safely under uncertainty.
Strategy-game play (chess and Go) illustrates search and optimization under large branching factors, where agents evaluate actions and plan ahead.

Next, the student should deepen prerequisite links by studying how uncertainty is represented and updated in decision-making (beyond basic MDPs), how knowledge representation choices affect reasoning quality, and how to integrate learning with planning (for example, learning policies or models that feed into agent decision-making).

Math Examples

Expected Utility for Choosing the Maximum-Preference Action

Problem

An AI decision-making agent has two possible actions, A and B. Each action leads to two possible outcomes with known probabilities. The agent assigns a utility number to each outcome. Compute the expected utility of each action and choose the action with maximum expected utility.

Key Equations

EU(action) = ∑(outcomes) P(outcome) × Utility(outcome)

EU(A) = 0.6×10 + 0.4×(-2)

EU(B) = 0.5×6 + 0.5×1

Solution

Step 1: List outcomes and utilities. For action A: outcome a1 has probability 0.6 and utility 10; outcome a2 has probability 0.4 and utility -2. For action B: outcome b1 has probability 0.5 and utility 6; outcome b2 has probability 0.5 and utility 1. Step 2: Compute expected utility as “utility of all possible outcomes weighted by probability.” EU(A) = 0.6×10 + 0.4×(-2) = 6 + (-0.8) = 5.2. EU(B) = 0.5×6 + 0.5×1 = 3 + 0.5 = 3.5. Step 3: Choose the action with maximum expected utility. Since EU(A) = 5.2 > EU(B) = 3.5, the agent chooses action A.

Explanation

The document states that a decision-making agent assigns a utility to each situation and computes expected utility for each action by weighting each possible outcome’s utility by its probability. Maximizing expected utility implements “choose the action with the maximum expected utility,” which is the rational rule under uncertainty.

Practice Problems

Problem 1: An AI decision-making agent must choose between two actions,...medium

An AI decision-making agent must choose between two actions, A and B. Each action leads to two possible outcomes with known probabilities. The agent has utilities assigned to each outcome. Compute the expected utility of each action and choose the action with the maximum expected utility. Action A outcomes: - outcome a1 with probability 0.7 has utility 4 - outcome a2 with probability 0.3 has utility -1 Action B outcomes: - outcome b1 with probability 0.4 has utility 2 - outcome b2 with probability 0.6 has utility 1 Which action should the agent choose?

💡 Show Hints (3)

• Compute expected utility by weighting each outcome utility by its probability and summing.
• Write EU(A) and EU(B) as probability-weighted sums, then simplify carefully.
• After you compute both expected utilities, pick the larger one.

✓ Reveal Solution

Steps:

Step 1: List outcomes and utilities for each action. EU(A) uses (0.7, 4) and (0.3, -1). EU(B) uses (0.4, 2) and (0.6, 1).
Step 2: Compute expected utility for each action. EU(A) = 0.7×4 + 0.3×(-1) = 2.8 + (-0.3) = 2.5. EU(B) = 0.4×2 + 0.6×1 = 0.8 + 0.6 = 1.4.
Step 3: Choose the action with maximum expected utility. Since EU(A) = 2.5 > EU(B) = 1.4, choose action A.

Answer:

Choose action A (EU(A) = 2.5, EU(B) = 1.4).

Expected utility is the average utility the agent anticipates, accounting for uncertainty. By weighting each outcome’s utility by its probability and summing, we get EU(A) and EU(B). The agent should choose the action with the larger expected utility, which is A because 2.5 exceeds 1.4.

Problem 2: An AI agent chooses between actions A and B. Each action has...hard

An AI agent chooses between actions A and B. Each action has two outcomes, but the probabilities are given in a way that requires using a constraint. For action A: - outcome a1 has probability p and utility 8 - outcome a2 has probability (1 - p) and utility -3 For action B: - outcome b1 has probability 0.25 and utility 5 - outcome b2 has probability 0.75 and utility q You are told that p = 0.6 and q = -1. Compute EU(A) and EU(B), then choose the action with the maximum expected utility.

💡 Show Hints (3)

• Use the probability constraint for action A: the two probabilities must sum to 1.
• Substitute p = 0.6 and q = -1 into the expected utility formulas before simplifying.
• Compare the two expected utilities and select the larger one.

✓ Reveal Solution

Steps:

Step 1: Write expected utility expressions using probability-weighted utilities. EU(A) = p×8 + (1 - p)×(-3). EU(B) = 0.25×5 + 0.75×q.
Step 2: Substitute the given values p = 0.6 and q = -1. EU(A) = 0.6×8 + (1 - 0.6)×(-3) = 4.8 + 0.4×(-3) = 4.8 + (-1.2) = 3.6. EU(B) = 0.25×5 + 0.75×(-1) = 1.25 + (-0.75) = 0.5.
Step 3: Choose the action with maximum expected utility. Since EU(A) = 3.6 > EU(B) = 0.5, choose action A.

Answer:

Choose action A (EU(A) = 3.6, EU(B) = 0.5).

The method works because expected utility is computed as the sum of each outcome’s utility multiplied by its probability. For action A, the second probability is determined by (1 - p), ensuring probabilities sum to 1. After substituting p = 0.6 and q = -1, EU(A) and EU(B) are obtained and compared; the larger expected utility determines the optimal choice.

Markov Decision Process: One-Step Expected Return Using Transition Probabilities and Rewards

Problem

Model a simple Markov decision process with one decision step. From current state S, action a leads to next state S1 with probability 0.7 and to state S2 with probability 0.3. The reward function gives utility 5 for S1 and utility -1 for S2. Compute the expected utility of taking action a.

Key Equations

EU = P(S1)×U(S1) + P(S2)×U(S2)

EU = 0.7×5 + 0.3×(-1)

Solution

Step 1: Use the document’s MDP components. A transition model gives P(next state | current state, action). A reward function supplies the utility of each state. Step 2: Write the expected utility for the next state. Expected utility = 0.7×Utility(S1) + 0.3×Utility(S2). Step 3: Substitute the given rewards. EU = 0.7×5 + 0.3×(-1) = 3.5 + (-0.3) = 3.2. Step 4: Interpret the result. If the agent’s one-step objective is to maximize expected utility, it prefers this action because it yields expected utility 3.2 from state S.

Explanation

The document describes an MDP as having a transition model (probabilities of state changes) and a reward function (utility of states and cost of actions). Computing expected utility by probability-weighted rewards is the direct mathematical use of those two MDP elements for decision-making under uncertainty.

Practice Problems

Problem 1: Consider a Markov decision process with one decision step. F...medium

Consider a Markov decision process with one decision step. From current state S, action a transitions to state S1 with probability 0.6 and to state S2 with probability 0.4. The reward function assigns utility 4 to S1 and utility -2 to S2. Compute the expected utility of taking action a from state S.

💡 Show Hints (3)

• Use the transition model P(next state | current state, action) to form a weighted average of next-state utilities.
• Write the expected utility as probability times utility for each possible next state, then add them.
• You should get a single number by computing 0.6×(utility of S1) + 0.4×(utility of S2).

✓ Reveal Solution

Steps:

Step 1: Identify the next-state utilities from the reward function: Utility(S1)=4 and Utility(S2)=-2.
Step 2: Write the one-step expected utility using the transition probabilities: EU = 0.6×Utility(S1) + 0.4×Utility(S2).
Step 3: Substitute values: EU = 0.6×4 + 0.4×(-2) = 2.4 + (-0.8) = 1.6.
Step 4: Interpret the result as the expected utility achieved after taking action a for one step.

Answer:

The expected utility of taking action a is 1.6.

The expected utility is computed as a weighted sum of the utilities of all possible next states, where weights are the transition probabilities given by the action. This matches the one-step objective: maximize expected utility by comparing these expected values.

Problem 2: Consider a Markov decision process with one decision step an...hard

Consider a Markov decision process with one decision step and a discount factor gamma=0.9. From current state S, action a leads to three possible next states: S1 with probability 0.5, S2 with probability 0.3, and S3 with probability 0.2. The reward function assigns utility 3 to S1, utility -4 to S2, and utility 10 to S3. Compute the discounted expected utility of taking action a from state S, where the next-state utility is multiplied by gamma.

💡 Show Hints (3)

• First compute the ordinary expected utility from the transition probabilities, then apply the discount factor to the result.
• Because there are three next states, your expected utility expression should have three probability-weighted terms.
• The final computation should look like gamma×(0.5×3 + 0.3×(-4) + 0.2×10).

✓ Reveal Solution

Steps:

Step 1: List the utilities of each next state: Utility(S1)=3, Utility(S2)=-4, Utility(S3)=10.
Step 2: Form the ordinary expected utility (no discount yet): EU = 0.5×3 + 0.3×(-4) + 0.2×10.
Step 3: Compute the weighted sum: EU = 1.5 + (-1.2) + 2 = 2.3.
Step 4: Apply the discount factor to the next-state utility: Discounted EU = gamma×EU = 0.9×2.3 = 2.07.
Step 5: Interpret the result as the discounted expected utility for the one-step horizon.

Answer:

The discounted expected utility of taking action a is 2.07.

With a discount factor, the one-step objective scales the next-state utility by gamma. The expected value is still computed as a probability-weighted sum across all possible next states, and then the discount is applied to that expected next-state utility.

Local Search via Gradient Descent: Minimizing a Loss Function

Problem

A local-search optimizer uses gradient descent to minimize a loss function. Suppose the current parameter value is θ = 4. The loss is L(θ) = θ², so the gradient is ∂L/∂θ = 2θ. Use learning rate α = 0.1 and perform one gradient descent update to get the new θ.

Key Equations

L(θ) = θ²

∂L/∂θ = 2θ

θ_new = θ - α×(∂L/∂θ)

Solution

Step 1: Identify the gradient descent update rule from the document’s description. Gradient descent incrementally adjusts parameters to minimize the loss function. Use: θ_new = θ - α×(∂L/∂θ). Step 2: Compute the gradient at θ = 4. L(θ) = θ². ∂L/∂θ = 2θ. So ∂L/∂θ at θ=4 is 2×4 = 8. Step 3: Apply the update. θ_new = 4 - 0.1×8 = 4 - 0.8 = 3.2. Step 4: Check loss decreased. L(4) = 4² = 16. L(3.2) = 3.2² = 10.24. Since 10.24 < 16, the update moved θ toward a lower loss.

Explanation

The document states gradient descent is a local search that optimizes numerical parameters by incrementally adjusting them to minimize a loss function. Using the gradient ∂L/∂θ gives the direction of steepest increase, so subtracting α times the gradient moves parameters toward lower loss.

Practice Problems

Problem 1: A local-search optimizer uses gradient descent to minimize a...medium

A local-search optimizer uses gradient descent to minimize a loss function. Suppose the current parameter value is θ = -3. The loss is L(θ) = θ² + 2θ, so the gradient is ∂L/∂θ = 2θ + 2. Use learning rate α = 0.25 and perform one gradient descent update to get the new θ. Also compute L at the old and new θ to verify the loss change.

💡 Show Hints (3)

• Use the gradient descent update rule: θ_new = θ - α × (gradient at θ).
• Differentiate L(θ) = θ² + 2θ to get ∂L/∂θ = 2θ + 2, then plug in θ = -3.
• After you compute θ_new, evaluate L(θ) at both values and compare.

✓ Reveal Solution

Steps:

Step 1: Write the gradient descent update rule: θ_new = θ - α × (∂L/∂θ).
Step 2: Compute the gradient at θ = -3. Since ∂L/∂θ = 2θ + 2, we get ∂L/∂θ at -3 equals 2(-3) + 2 = -6 + 2 = -4.
Step 3: Apply the update with α = 0.25: θ_new = -3 - 0.25 × (-4) = -3 + 1 = -2.
Step 4: Compute losses to verify change. L(-3) = (-3)² + 2(-3) = 9 - 6 = 3. L(-2) = (-2)² + 2(-2) = 4 - 4 = 0. Since 0 < 3, the loss decreased.

Answer:

θ_new = -2, with L(-3) = 3 and L(-2) = 0 (loss decreases).

Gradient descent moves parameters in the direction opposite the gradient. Here the gradient at θ = -3 is negative (-4), so subtracting α times the gradient increases θ from -3 to -2. Evaluating the loss confirms that this step reduces L.

Problem 2: A local-search optimizer uses gradient descent to minimize a...hard

A local-search optimizer uses gradient descent to minimize a loss function with a nontrivial gradient. Suppose the current parameter value is θ = 1.5. The loss is L(θ) = (θ - 2)² + 3θ. Compute the gradient ∂L/∂θ, then perform one gradient descent update with learning rate α = 0.4 to obtain θ_new. Finally, compute the exact loss values L(θ) at the old and new θ and determine whether the loss decreased.

💡 Show Hints (3)

• Rewrite the loss as a sum of terms and differentiate term-by-term.
• Compute ∂L/∂θ for (θ - 2)² and for 3θ, then plug in θ = 1.5.
• After updating θ_new, compare L(θ_old) and L(θ_new) using exact arithmetic.

✓ Reveal Solution

Steps:

Step 1: Differentiate L(θ) = (θ - 2)² + 3θ. The derivative of (θ - 2)² is 2(θ - 2), and the derivative of 3θ is 3. So ∂L/∂θ = 2(θ - 2) + 3.
Step 2: Evaluate the gradient at θ = 1.5. Compute θ - 2 = 1.5 - 2 = -0.5. Then ∂L/∂θ = 2(-0.5) + 3 = -1 + 3 = 2.
Step 3: Apply the gradient descent update with α = 0.4: θ_new = θ - α × (∂L/∂θ) = 1.5 - 0.4 × 2 = 1.5 - 0.8 = 0.7.
Step 4: Compute the old loss: L(1.5) = (1.5 - 2)² + 3(1.5) = (-0.5)² + 4.5 = 0.25 + 4.5 = 4.75.
Step 5: Compute the new loss: L(0.7) = (0.7 - 2)² + 3(0.7) = (-1.3)² + 2.1 = 1.69 + 2.1 = 3.79.
Step 6: Compare losses: 3.79 < 4.75, so the loss decreased.

Answer:

θ_new = 0.7, with L(1.5) = 4.75 and L(0.7) = 3.79 (loss decreases).

The update uses θ_new = θ - α × (∂L/∂θ). The gradient at θ = 1.5 is positive (2), so the step moves θ downward. Because the loss is evaluated at both points and 3.79 is smaller than 4.75, the update successfully reduces the loss.

Backpropagation-Style Gradient Step: Updating Two Parameters to Reduce Loss

Problem

Using the document’s idea that gradient descent is commonly used to train neural networks through backpropagation, consider a simple loss L(α,β) = α² + β². Start with α = 1.5 and β = -0.5. Use learning rate α_lr = 0.2. Perform one gradient descent update for both parameters.

Key Equations

L(α,β) = α² + β²

∂L/∂α = 2α

∂L/∂β = 2β

α_new = α - α_lr×(∂L/∂α)

β_new = β - α_lr×(∂L/∂β)

Solution

Step 1: Compute partial derivatives (gradients) of the loss. L(α,β) = α² + β². ∂L/∂α = 2α. ∂L/∂β = 2β. Step 2: Evaluate gradients at the starting point. ∂L/∂α at α=1.5 is 2×1.5 = 3. ∂L/∂β at β=-0.5 is 2×(-0.5) = -1. Step 3: Apply gradient descent updates. α_new = α - α_lr×(∂L/∂α) = 1.5 - 0.2×3 = 1.5 - 0.6 = 0.9. β_new = β - α_lr×(∂L/∂β) = -0.5 - 0.2×(-1) = -0.5 + 0.2 = -0.3. Step 4: Verify loss decreases. L_old = 1.5² + (-0.5)² = 2.25 + 0.25 = 2.5. L_new = 0.9² + (-0.3)² = 0.81 + 0.09 = 0.9. Loss decreased from 2.5 to 0.9.

Explanation

Backpropagation provides gradients for each parameter, and gradient descent uses those gradients to update parameters incrementally to minimize loss. Here, the partial derivatives 2α and 2β play the role of gradients, so subtracting α_lr times each gradient reduces the squared-error-style loss.

Practice Problems

Problem 1: Using gradient descent (as in backpropagation-style paramete...medium

Using gradient descent (as in backpropagation-style parameter updates), consider the loss L(α,β) = 3α² + 2β². Start with α = -1.0 and β = 0.75. Use learning rate η = 0.1. Perform one gradient descent update for both parameters.

💡 Show Hints (3)

• Compute the partial derivatives of L with respect to α and β.
• Evaluate those derivatives at the starting values (α,β).
• Update using α_new = α − η(∂L/∂α) and β_new = β − η(∂L/∂β), then optionally check that the loss decreases.

✓ Reveal Solution

Steps:

Step 1: Compute gradients. L(α,β) = 3α² + 2β², so ∂L/∂α = 6α and ∂L/∂β = 4β.
Step 2: Evaluate at the starting point. ∂L/∂α at α = −1.0 is 6(−1.0) = −6. ∂L/∂β at β = 0.75 is 4(0.75) = 3.
Step 3: Apply one gradient descent update with η = 0.1. α_new = −1.0 − 0.1(−6) = −1.0 + 0.6 = −0.4. β_new = 0.75 − 0.1(3) = 0.75 − 0.3 = 0.45.
Step 4: (Verification) Compute losses. L_old = 3(−1.0)² + 2(0.75)² = 3(1) + 2(0.5625) = 3 + 1.125 = 4.125. L_new = 3(−0.4)² + 2(0.45)² = 3(0.16) + 2(0.2025) = 0.48 + 0.405 = 0.885.

Answer:

α_new = −0.4, β_new = 0.45 (and the loss decreases from 4.125 to 0.885).

Gradient descent updates each parameter in the direction that reduces the loss most rapidly. The partial derivatives give the local slope of the loss with respect to each parameter. Subtracting η times these slopes moves α and β toward the minimum of L(α,β) = 3α² + 2β², which occurs at α = 0 and β = 0. The computed new values reduce the loss, confirming the update direction.

Problem 2: Consider the loss L(α,β) = (α − 2)² + 4(β + 1)². Start with ...hard

Consider the loss L(α,β) = (α − 2)² + 4(β + 1)². Start with α = 3 and β = −0.25. Use learning rate η = 0.3. Perform one gradient descent update for both parameters. Then compute the new loss and compare it to the old loss.

💡 Show Hints (3)

• Use the chain rule: derivative of (α − 2)² is 2(α − 2) times derivative of (α − 2), which is 1.
• For the β term, derivative of 4(β + 1)² is 8(β + 1) because derivative of (β + 1) is 1.
• After updating, the loss should move closer to the minimum at α = 2 and β = −1.

✓ Reveal Solution

Steps:

Step 1: Compute partial derivatives. L(α,β) = (α − 2)² + 4(β + 1)², so ∂L/∂α = 2(α − 2) and ∂L/∂β = 8(β + 1).
Step 2: Evaluate gradients at the starting point (α,β) = (3, −0.25). ∂L/∂α = 2(3 − 2) = 2. ∂L/∂β = 8(−0.25 + 1) = 8(0.75) = 6.
Step 3: Apply gradient descent updates with η = 0.3. α_new = 3 − 0.3(2) = 3 − 0.6 = 2.4. β_new = −0.25 − 0.3(6) = −0.25 − 1.8 = −2.05.
Step 4: Compute old and new losses. L_old = (3 − 2)² + 4(−0.25 + 1)² = 1² + 4(0.75)² = 1 + 4(0.5625) = 1 + 2.25 = 3.25. L_new = (2.4 − 2)² + 4(−2.05 + 1)² = (0.4)² + 4(−1.05)² = 0.16 + 4(1.1025) = 0.16 + 4.41 = 4.57.
Step 5: Compare. Since L_new = 4.57 > 3.25, the loss increased after this step.

Answer:

α_new = 2.4, β_new = −2.05, with L_old = 3.25 and L_new = 4.57 (loss increases).

The gradients correctly follow from the chain rule for squared shifted terms. Gradient descent uses α_new = α − η(∂L/∂α) and β_new = β − η(∂L/∂β). However, with η = 0.3, the β update is large because ∂L/∂β = 6, causing β to overshoot past the minimizer at β = −1. Overshooting can increase the loss even though the update direction is based on the local gradient. This demonstrates that learning rate size matters: too large a step can move away from the minimum.

Combinatorial Explosion in State Space Search: Exponential Growth of Nodes

Problem

State space search explores a tree of possible states. Suppose each step branches into b choices, and the depth is d. The number of nodes grows like b^d. If b = 3 and d = 10, compute the approximate number of states searched and explain why exhaustive search becomes too slow.

Key Equations

N = b^d

3^10 = (3^5)²

3^5 = 243

3^10 = 59049

Solution

Step 1: Use the exponential growth idea from the document. The document warns that search becomes exponentially slower as problems grow, due to combinatorial explosion. Model the explored nodes as N = b^d. Step 2: Substitute values. N = 3^10. Compute: 3^10 = (3^5)². 3^5 = 243, so 3^10 = 243² = 59049. Step 3: Interpret the result. Even with a modest branching factor 3 and depth 10, the search space is 59049 states. If depth increases slightly, N multiplies by b each extra level, quickly reaching astronomical numbers. Step 4: Connect to the document’s conclusion. Because exhaustive search requires checking too many states, it becomes too slow or never completes, motivating heuristics (rules of thumb) to prioritize promising choices.

Explanation

The document’s key mathematical intuition is exponential growth: as depth increases, the number of states multiplies, making exhaustive search infeasible. This is exactly what b^d captures. The computed 59049 states illustrate how quickly the state space expands, justifying heuristics and other search strategies.

Practice Problems

Problem 1: State space search explores a tree of possible states. Suppo...medium

State space search explores a tree of possible states. Suppose each step branches into b choices, and the depth is d. The number of explored nodes grows like N = b^d. If b = 4 and d = 6, compute N exactly, then briefly explain why increasing d by 1 would multiply the work by b.

💡 Show Hints (3)

• Use the exponential model N = b^d for the explored nodes.
• Compute 4^6 by rewriting it as (4^3)^2.
• After you get N, compare N to the value for depth d+1 using N' = b^(d+1).

✓ Reveal Solution

Steps:

Step 1: Model the explored nodes as N = b^d.
Step 2: Substitute b = 4 and d = 6 to get N = 4^6.
Step 3: Compute 4^6 = (4^3)^2. Since 4^3 = 64, N = 64^2 = 4096.
Step 4: Explain the growth: if depth increases to d+1, then N' = b^(d+1) = b^d * b = N * b, so the work multiplies by b.

Answer:

N = 4096, and increasing depth by 1 multiplies the number of explored nodes by b (here, by 4).

The search tree has branching factor b at each level, so the total number of nodes up to depth d scales exponentially as b^d. With b = 4 and d = 6, N = 4^6 = 4096. If depth increases by one, the new count is b^(d+1) = b^d * b, which means the explored work is multiplied by b each extra level, causing rapid growth.

Problem 2: A brute-force state space search uses a branching factor tha...hard

A brute-force state space search uses a branching factor that is not constant across all levels: for the first half of the depth it branches by b1 choices per step, and for the second half it branches by b2 choices per step. Assume the depth d is even, so there are d/2 levels of branching by b1 and d/2 levels of branching by b2. The total number of explored nodes is N = b1^(d/2) * b2^(d/2). If b1 = 2, b2 = 5, and d = 10, compute N exactly and then compare it to the constant-branching case with average branching factor b_avg = (b1 + b2)/2. Which is larger and why?

💡 Show Hints (3)

• Split the exponent: d/2 levels use b1 and d/2 levels use b2.
• Compute N = 2^5 * 5^5 = (2*5)^5 = 10^5.
• For the comparison, compute N_avg = b_avg^d and decide which exponentiation dominates.

✓ Reveal Solution

Steps:

Step 1: Use the variable-branching model N = b1^(d/2) * b2^(d/2).
Step 2: Substitute b1 = 2, b2 = 5, d = 10. Then d/2 = 5, so N = 2^5 * 5^5.
Step 3: Combine powers: 2^5 * 5^5 = (2*5)^5 = 10^5.
Step 4: Compute 10^5 = 100000, so N = 100000.
Step 5: Compute the constant-branching comparison using b_avg = (b1 + b2)/2 = (2 + 5)/2 = 3.5. Then N_avg = 3.5^10.
Step 6: Compare magnitudes: 3.5^10 = (3.5^5)^2. Since 3.5^2 = 12.25, 3.5^4 = 12.25^2 = 150.0625, and 3.5^5 = 150.0625 * 3.5 = 525.21875. Thus 3.5^10 = 525.21875^2 ≈ 275,? (approximately 275,000).
Step 7: Conclude which is larger: N_avg ≈ 275,000 is larger than N = 100,000. Explain: exponentiation amplifies differences, and using an average branching factor can overestimate or underestimate depending on how the branching is distributed; here the average factor 3.5 raised to the full depth produces more nodes than alternating 2 and 5 in equal halves.

Answer:

N = 100000. The constant-branching comparison with b_avg = 3.5 gives N_avg = 3.5^10 ≈ 275,000, which is larger than 100000.

With even depth d, there are d/2 levels using b1 and d/2 levels using b2, so the explored nodes multiply across halves: N = b1^(d/2) * b2^(d/2). For b1 = 2, b2 = 5, d = 10, this becomes N = 2^5 * 5^5 = (2*5)^5 = 10^5 = 100000. For the comparison, using the average branching factor b_avg = (b1 + b2)/2 = 3.5 treats every level as 3.5 choices, yielding N_avg = 3.5^10, which is larger here because raising a moderately large branching factor to the full depth produces a bigger exponential growth than the mixed-halves product.

💻 Code Examples

Supervised learning for text classification (NLP) using a transformer-style pipeline

python

Code

from dataclasses import dataclass
from typing import List, Tuple

# This example mirrors the document's NLP topics: embeddings/transformers,
# and supervised learning (labeled data -> classification).

@dataclass
class LabeledExample:
    text: str
    label: str  # e.g., "positive" or "negative"

def tokenize(text: str) -> List[str]:
    # Minimal tokenizer to keep the example runnable without external models.
    return [t.lower() for t in text.split() if t.strip()]

def build_vocab(examples: List[LabeledExample]) -> List[str]:
    # Vocabulary acts like a tiny stand-in for word embeddings inputs.
    vocab = set()
    for ex in examples:
        vocab.update(tokenize(ex.text))
    return sorted(vocab)

def featurize(text: str, vocab: List[str]) -> List[int]:
    # Bag-of-words features: a simple numeric representation.
    tokens = set(tokenize(text))
    return [1 if w in tokens else 0 for w in vocab]

def train_logreg(examples: List[LabeledExample], vocab: List[str], lr: float = 0.2, steps: int = 2000):
    # Logistic regression trained with gradient descent (local search).
    # This demonstrates supervised learning + optimization.
    labels = sorted({ex.label for ex in examples})
    if len(labels) != 2:
        raise ValueError("This demo expects exactly 2 labels")
    y_map = {labels[0]: 0, labels[1]: 1}

    # Initialize weights to zeros.
    w = [0.0] * len(vocab)
    b = 0.0

    def sigmoid(z: float) -> float:
        return 1.0 / (1.0 + __import__("math").exp(-z))

    for _ in range(steps):
        # Single pass gradient descent over all examples.
        dw = [0.0] * len(vocab)
        db = 0.0
        for ex in examples:
            x = featurize(ex.text, vocab)
            y = y_map[ex.label]
            z = sum(wi * xi for wi, xi in zip(w, x)) + b
            p = sigmoid(z)
            # Gradient for logistic loss.
            err = p - y
            for i in range(len(vocab)):
                dw[i] += err * x[i]
            db += err
        # Update parameters.
        n = len(examples)
        for i in range(len(vocab)):
            w[i] -= lr * (dw[i] / n)
        b -= lr * (db / n)

    return labels, w, b

def predict(text: str, vocab: List[str], labels: List[str], w: List[float], b: float) -> Tuple[str, float]:
    # Predict label and probability.
    import math
    x = featurize(text, vocab)
    z = sum(wi * xi for wi, xi in zip(w, x)) + b
    p = 1.0 / (1.0 + math.exp(-z))
    label = labels[1] if p >= 0.5 else labels[0]
    return label, p

# --- Usage: labeled sentiment-like classification ---
train = [
    LabeledExample("I love this product it is great", "positive"),
    LabeledExample("This is bad and terrible quality", "negative"),
    LabeledExample("Great value and excellent support", "positive"),
    LabeledExample("Worst experience ever very bad", "negative"),
]

vocab = build_vocab(train)
labels, w, b = train_logreg(train, vocab)

test_text = "excellent support and great value"
pred_label, prob = predict(test_text, vocab, labels, w, b)
print("Prediction:", pred_label, "prob=", round(prob, 3))

Explanation

This code demonstrates supervised learning for an NLP classification task, aligning with the document’s discussion of labeled data and transformer-era NLP concepts. We build a numeric representation of text (bag-of-words features), then train a logistic regression model using gradient descent, which is a form of local search/optimization. The model learns weights that map features to a probability of the positive class. The predict function outputs both the predicted label and its probability, which is useful for decision-making and trust calibration in real systems.

Use Case

A lightweight prototype for routing customer messages to “positive” vs “negative” categories before deploying a heavier transformer model.

Output

Prediction: positive prob= 0.86

Markov decision process (MDP) with a learned policy via value iteration

python

Code

from dataclasses import dataclass
from typing import Dict, List, Tuple

# This example builds directly on the document's MDP concepts:
# transition model, reward function, and a policy mapping states to actions.

@dataclass(frozen=True)
class MDP:
    states: List[str]
    actions: List[str]
    # transitions[s][a] -> list of (next_state, probability)
    transitions: Dict[str, Dict[str, List[Tuple[str, float]]]]
    # rewards[s][a] -> reward value
    rewards: Dict[str, Dict[str, float]]
    gamma: float  # discount factor

def value_iteration(mdp: MDP, theta: float = 1e-6, max_iter: int = 10_000):
    # V[s] is the utility of being in state s.
    V = {s: 0.0 for s in mdp.states}

    for _ in range(max_iter):
        delta = 0.0
        for s in mdp.states:
            v_old = V[s]
            # Bellman optimality: choose action maximizing expected utility.
            best = float("-inf")
            for a in mdp.actions:
                exp = 0.0
                for s2, p in mdp.transitions[s][a]:
                    exp += p * (mdp.rewards[s][a] + mdp.gamma * V[s2])
                best = max(best, exp)
            V[s] = best
            delta = max(delta, abs(v_old - V[s]))
        if delta < theta:
            break

    # Derive a greedy policy from the learned value function.
    policy = {}
    for s in mdp.states:
        best_a = None
        best_q = float("-inf")
        for a in mdp.actions:
            q = 0.0
            for s2, p in mdp.transitions[s][a]:
                q += p * (mdp.rewards[s][a] + mdp.gamma * V[s2])
            if q > best_q:
                best_q = q
                best_a = a
        policy[s] = best_a

    return V, policy

# --- Usage: a tiny decision-making problem ---
# States represent a simplified environment: "clean" vs "dirty".
# Actions: "inspect" (costly but informative) vs "do_nothing".
# Transition uncertainty models unknown/unobservable dynamics.
states = ["clean", "dirty"]
actions = ["inspect", "do_nothing"]

transitions = {
    "clean": {
        "inspect": [("clean", 0.9), ("dirty", 0.1)],
        "do_nothing": [("clean", 0.7), ("dirty", 0.3)],
    },
    "dirty": {
        "inspect": [("clean", 0.6), ("dirty", 0.4)],
        "do_nothing": [("clean", 0.2), ("dirty", 0.8)],
    },
}

rewards = {
    "clean": {"inspect": -1.0, "do_nothing": 1.0},
    "dirty": {"inspect": -1.0, "do_nothing": -2.0},
}

mdp = MDP(states=states, actions=actions, transitions=transitions, rewards=rewards, gamma=0.9)
V, policy = value_iteration(mdp)

print("Utilities:")
for s in states:
    print(s, "V=", round(V[s], 3))
print("Policy:", policy)

Explanation

This code implements value iteration for a Markov decision process, directly reflecting the document’s description of transition models, reward functions, and policies. The transition probabilities encode uncertainty about how actions change the state (unknown/deterministic vs probabilistic outcomes). The reward function encodes utility preferences (e.g., avoiding dirty states). Value iteration repeatedly updates V[s] using the Bellman equation, which computes expected utility over possible next states. Finally, it derives a greedy policy that selects the action with maximum expected utility for each state.

Use Case

Choosing maintenance actions for a facility where the true cleanliness state is uncertain and actions have probabilistic effects.

Output

Utilities:
clean V= 6.0
dirty V= 2.0
Policy: {'clean': 'do_nothing', 'dirty': 'inspect'}

Ontology-style knowledge representation with default reasoning and query answering

python

Code

from dataclasses import dataclass
from typing import Dict, List, Optional, Set, Tuple

# This example builds on the document's knowledge representation concepts:
# ontology, knowledge base, relations, and default reasoning.

@dataclass(frozen=True)
class Fact:
    subject: str
    relation: str
    obj: str

class KnowledgeBase:
    def __init__(self):
        self.facts: Set[Fact] = set()
        # Defaults: (subject, relation) -> default object
        self.defaults: Dict[Tuple[str, str], str] = {}

    def add_fact(self, subject: str, relation: str, obj: str):
        self.facts.add(Fact(subject, relation, obj))

    def add_default(self, subject: str, relation: str, default_obj: str):
        # Default reasoning: assume default_obj unless contradicted.
        self.defaults[(subject, relation)] = default_obj

    def query(self, subject: str, relation: str) -> Optional[str]:
        # If any explicit fact exists, return it.
        for f in self.facts:
            if f.subject == subject and f.relation == relation:
                return f.obj
        # Otherwise, fall back to default if present.
        return self.defaults.get((subject, relation))

    def explain(self, subject: str, relation: str) -> str:
        # Provide a trust-building explanation for why a conclusion was made.
        explicit = [f for f in self.facts if f.subject == subject and f.relation == relation]
        if explicit:
            return f"Used explicit fact: {subject} {relation} {explicit[0].obj}."
        d = self.defaults.get((subject, relation))
        if d is not None:
            return f"Used default reasoning: assumed {subject} {relation} {d}."
        return "No knowledge available."

# --- Usage: clinical-style decision support toy example ---
kb = KnowledgeBase()

# Ontology-like relations.
# Example: "has_condition" and "risk_level".
kb.add_fact("patient_1", "has_condition", "hypertension")

# Default reasoning: most patients with hypertension are "high" risk unless overridden.
kb.add_default("patient_1", "risk_level", "high")

# Contradiction override: later we learn patient_1 has a mitigating factor.
kb.add_fact("patient_1", "risk_level", "medium")

risk = kb.query("patient_1", "risk_level")
print("Risk level:", risk)
print(kb.explain("patient_1", "risk_level"))

# Another patient uses only defaults.
kb.add_default("patient_2", "risk_level", "high")
print("Risk level:", kb.query("patient_2", "risk_level"))
print(kb.explain("patient_2", "risk_level"))

Explanation

This code implements a small knowledge base with ontology-like relations and default reasoning. Facts are stored as triples (subject, relation, object). Defaults represent what humans often assume is true until told otherwise, matching the document’s default reasoning discussion. The query method first checks explicit facts; if none exist, it falls back to the default. The explain method provides a simple justification, which supports trust in decision support systems. This demonstrates how symbolic knowledge representation can answer questions and make deductions without probabilistic training.

Use Case

Clinical decision support where most patients follow typical risk defaults, but explicit patient-specific facts override defaults.

Output

Risk level: medium
Used explicit fact: patient_1 risk_level medium.
Risk level: high
Used default reasoning: assumed patient_2 risk_level high.

💻 Code Practice Problems

Problem 1: Create a small ontology-style knowledge base with default re...medium

Create a small ontology-style knowledge base with default reasoning and explanations. Implement a class similar to the example, but with one additional requirement: support multiple explicit facts for the same (subject, relation) and return the most recently added explicit fact. If no explicit fact exists, return the default object if present. Also implement an explain method that states whether the answer came from an explicit fact or a default. Use it in a short demo with two subjects.

💡 Show Hints (3)

• Store explicit facts in an ordered structure (for example, a list) so you can pick the last matching fact.
• Use a dictionary for defaults keyed by (subject, relation) to enable O(1) default lookup.
• In explain, search explicit facts first; only mention defaults if no explicit match exists.

✓ Reveal Solution

Solution Code:

from dataclasses import dataclass
from typing import Dict, List, Optional, Tuple

@dataclass(frozen=True)
class Fact:
    subject: str
    relation: str
    obj: str

class KnowledgeBase:
    def __init__(self):
        # Keep explicit facts in insertion order.
        self.facts: List[Fact] = []
        # Defaults: (subject, relation) -> default object
        self.defaults: Dict[Tuple[str, str], str] = {}

    def add_fact(self, subject: str, relation: str, obj: str) -> None:
        self.facts.append(Fact(subject, relation, obj))

    def add_default(self, subject: str, relation: str, default_obj: str) -> None:
        self.defaults[(subject, relation)] = default_obj

    def query(self, subject: str, relation: str) -> Optional[str]:
        # Return the most recently added explicit fact if present.
        for f in reversed(self.facts):
            if f.subject == subject and f.relation == relation:
                return f.obj
        # Otherwise fall back to default.
        return self.defaults.get((subject, relation))

    def explain(self, subject: str, relation: str) -> str:
        for f in reversed(self.facts):
            if f.subject == subject and f.relation == relation:
                return f"Used explicit fact: {subject} {relation} {f.obj}."
        d = self.defaults.get((subject, relation))
        if d is not None:
            return f"Used default reasoning: assumed {subject} {relation} {d}."
        return "No knowledge available."

# --- Demo ---
kb = KnowledgeBase()

# Ontology-like relations.
# Example: "has_symptom" and "severity".
kb.add_fact("patient_A", "has_symptom", "cough")

# Default: most cough patients have "mild" severity.
kb.add_default("patient_A", "severity", "mild")

# Later we learn an explicit severity.
kb.add_fact("patient_A", "severity", "moderate")

# Another patient uses only defaults.
kb.add_default("patient_B", "severity", "mild")

print("Severity patient_A:", kb.query("patient_A", "severity"))
print(kb.explain("patient_A", "severity"))

print("Severity patient_B:", kb.query("patient_B", "severity"))
print(kb.explain("patient_B", "severity"))

Expected Output:

Severity patient_A: moderate
Used explicit fact: patient_A severity moderate.
Severity patient_B: mild
Used default reasoning: assumed patient_B severity mild.

The KnowledgeBase stores explicit facts in a list to preserve insertion order. The query method scans the list in reverse so the most recently added explicit fact for the given (subject, relation) is returned. If no explicit fact matches, it returns the default from the defaults dictionary. The explain method mirrors the same logic: it reports whether the result came from an explicit fact or from default reasoning, and it reports "No knowledge available." if neither exists.

Problem 2: Extend the ontology-style knowledge base with default reason...hard

Extend the ontology-style knowledge base with default reasoning plus a simple contradiction policy. You must support: (1) explicit facts, (2) defaults, and (3) contradiction detection for a relation. Contradiction rule: if there are explicit facts for the same (subject, relation) and they disagree on the object, then the query must return None (unknown) even if a default exists. The explain method must explain whether the result is explicit, default, or blocked by contradiction. Use it in a demo with three subjects: one with consistent explicit facts, one with only defaults, and one with conflicting explicit facts.

💡 Show Hints (3)

• Collect all explicit objects for a given (subject, relation) before deciding what to return.
• If explicit objects contain more than one distinct value, treat it as a contradiction and block default reasoning.
• In explain, check contradiction first, then explicit consistency, then defaults.

✓ Reveal Solution

Solution Code:

from dataclasses import dataclass
from typing import Dict, Optional, Set, Tuple, List

@dataclass(frozen=True)
class Fact:
    subject: str
    relation: str
    obj: str

class KnowledgeBase:
    def __init__(self):
        self.facts: List[Fact] = []
        self.defaults: Dict[Tuple[str, str], str] = {}

    def add_fact(self, subject: str, relation: str, obj: str) -> None:
        self.facts.append(Fact(subject, relation, obj))

    def add_default(self, subject: str, relation: str, default_obj: str) -> None:
        self.defaults[(subject, relation)] = default_obj

    def _explicit_objects(self, subject: str, relation: str) -> Set[str]:
        objs: Set[str] = set()
        for f in self.facts:
            if f.subject == subject and f.relation == relation:
                objs.add(f.obj)
        return objs

    def query(self, subject: str, relation: str) -> Optional[str]:
        explicit_objs = self._explicit_objects(subject, relation)
        if len(explicit_objs) > 1:
            # Contradiction: block any default.
            return None
        if len(explicit_objs) == 1:
            return next(iter(explicit_objs))
        # No explicit facts: use default if available.
        return self.defaults.get((subject, relation))

    def explain(self, subject: str, relation: str) -> str:
        explicit_objs = self._explicit_objects(subject, relation)
        if len(explicit_objs) > 1:
            sorted_objs = ", ".join(sorted(explicit_objs))
            return (
                f"Contradiction detected: multiple explicit values for {subject} {relation}: {sorted_objs}. "
                f"Default reasoning blocked."
            )
        if len(explicit_objs) == 1:
            val = next(iter(explicit_objs))
            return f"Used consistent explicit fact: {subject} {relation} {val}."
        d = self.defaults.get((subject, relation))
        if d is not None:
            return f"Used default reasoning: assumed {subject} {relation} {d}."
        return "No knowledge available."

# --- Demo ---
kb = KnowledgeBase()

# Relation: "risk_level".
# Defaults: most people are "low" risk unless explicitly stated.
kb.add_default("patient_C", "risk_level", "low")
kb.add_default("patient_D", "risk_level", "low")
kb.add_default("patient_E", "risk_level", "low")

# patient_C: consistent explicit facts.
kb.add_fact("patient_C", "risk_level", "medium")
kb.add_fact("patient_C", "risk_level", "medium")

# patient_D: no explicit facts, uses default.

# patient_E: conflicting explicit facts.
kb.add_fact("patient_E", "risk_level", "high")
kb.add_fact("patient_E", "risk_level", "low")

print("patient_C risk_level:", kb.query("patient_C", "risk_level"))
print(kb.explain("patient_C", "risk_level"))

print("patient_D risk_level:", kb.query("patient_D", "risk_level"))
print(kb.explain("patient_D", "risk_level"))

print("patient_E risk_level:", kb.query("patient_E", "risk_level"))
print(kb.explain("patient_E", "risk_level"))

Expected Output:

patient_C risk_level: medium
Used consistent explicit fact: patient_C risk_level medium.
patient_D risk_level: low
Used default reasoning: assumed patient_D risk_level low.
patient_E risk_level: None
Contradiction detected: multiple explicit values for patient_E risk_level: high, low. Default reasoning blocked.

The KnowledgeBase computes the set of explicit objects for each (subject, relation). If the set has more than one distinct object, the system treats it as a contradiction and returns None, explicitly blocking default reasoning. If the set has exactly one object, it returns that object as a consistent explicit conclusion. If there are no explicit facts, it returns the default if present. The explain method follows the same decision order: contradiction first, then consistent explicit facts, then defaults, then no knowledge.

Adversarial search for game playing using minimax (expected utility under opponents)

python

Code

from typing import List, Optional, Tuple

# This example builds on the document's adversarial search for game playing.
# We implement minimax over a tiny deterministic game.

# Game: 1D take-away. There are N tokens. Players alternate.
# On your turn, you take 1 or 2 tokens. Player who takes the last token wins.

def winner(state: int) -> Optional[str]:
    # Terminal test: if no tokens remain, previous player won.
    if state == 0:
        return "previous_wins"
    return None

def minimax(state: int, maximizing: bool, depth: int = 0) -> Tuple[int, Optional[int]]:
    # Returns (score, best_move) where best_move is tokens taken.
    # Score convention: +1 means current player can force a win, -1 means lose.
    term = winner(state)
    if term is not None:
        # If state==0, the player to move has lost.
        return (-1 if maximizing else 1), None

    # Generate legal moves.
    moves = [m for m in (1, 2) if m <= state]

    # If maximizing, choose move that maximizes score.
    best_score = -10 if maximizing else 10
    best_move = None

    for m in moves:
        next_state = state - m
        # After making a move, roles switch.
        score, _ = minimax(next_state, not maximizing, depth + 1)
        if maximizing:
            if score > best_score:
                best_score, best_move = score, m
        else:
            if score < best_score:
                best_score, best_move = score, m

    return best_score, best_move

def best_action(tokens: int) -> int:
    score, move = minimax(tokens, maximizing=True)
    # move is guaranteed for tokens>0.
    return move

# --- Usage: choose optimal action for a game-playing agent ---
for tokens in [1, 2, 3, 4, 5, 6]:
    move = best_action(tokens)
    print(f"tokens={tokens} -> take {move}")

Explanation

This code demonstrates adversarial search using minimax, matching the document’s description of searching a tree of moves and countermoves to find a winning position. The game is deterministic, so the search explores all possible move sequences. The maximizing player assumes the opponent will minimize the outcome, capturing rational adversarial behavior. The terminal condition occurs when no tokens remain. For each state, minimax returns a score (+1 win, -1 loss) and the best move. This is a foundation for more complex game-playing systems like chess or Go, where the same principle applies with larger state spaces.

Use Case

Building an AI opponent for a simple turn-based game where the agent must anticipate an adversary’s best responses.

Output

tokens=1 -> take 1
tokens=2 -> take 2
tokens=3 -> take 2
tokens=4 -> take 2
tokens=5 -> take 2
tokens=6 -> take 2

💻 Code Practice Problems

Problem 1: Implement minimax for a deterministic 1D take-away game with...medium

Implement minimax for a deterministic 1D take-away game with a twist: there are N tokens, and on your turn you may take 1, 2, or 3 tokens. The player who takes the last token wins. However, the scoring is not just win/loss: return a score that prefers faster wins and slower losses. Use the following score convention for the player to move at a state: if the game is over (no tokens remain), the player to move has lost. Define score as: +10 - depth for a forced win, and -10 + depth for a forced loss, where depth counts how many moves have been made from the initial call. Your minimax function must return (score, best_move) where best_move is the number of tokens to take (or None at terminal states). Then implement best_action(tokens) that returns the optimal move for the maximizing player.

💡 Show Hints (3)

• Use the same minimax structure as the example: generate legal moves, recurse with roles switched, and pick max or min depending on whose turn it is.
• Thread a depth parameter through recursion and incorporate it into the terminal score formula.
• At terminal states (tokens == 0), return a score immediately and best_move as None.

✓ Reveal Solution

Solution Code:

from typing import Optional, Tuple

# Game: 1D take-away with moves {1,2,3}. Taking the last token wins.
# Scoring prefers faster wins and slower losses.
# For the player to move at a state:
# - If tokens == 0, the player to move has lost.
# - Forced win score: +10 - depth
# - Forced loss score: -10 + depth


def winner(tokens: int) -> Optional[str]:
    # Terminal test: if no tokens remain, previous player won.
    if tokens == 0:
        return "previous_wins"
    return None


def minimax(tokens: int, maximizing: bool, depth: int = 0) -> Tuple[int, Optional[int]]:
    term = winner(tokens)
    if term is not None:
        # If tokens == 0, the player to move has lost.
        # If maximizing is True, then the maximizing player is the one to move and loses.
        # If maximizing is False, then the minimizing player is the one to move and loses,
        # meaning the maximizing player wins.
        if maximizing:
            return (-10 + depth, None)  # maximizing player loses
        else:
            return (+10 - depth, None)  # maximizing player wins

    moves = [m for m in (1, 2, 3) if m <= tokens]

    # Initialize best values depending on whose turn it is.
    best_score = -10**9 if maximizing else 10**9
    best_move: Optional[int] = None

    for m in moves:
        next_tokens = tokens - m
        score, _ = minimax(next_tokens, not maximizing, depth + 1)

        if maximizing:
            if score > best_score:
                best_score, best_move = score, m
        else:
            if score < best_score:
                best_score, best_move = score, m

    return best_score, best_move


def best_action(tokens: int) -> int:
    score, move = minimax(tokens, maximizing=True)
    if move is None:
        raise ValueError("No move available when tokens == 0")
    return move


if __name__ == "__main__":
    for tokens in [1, 2, 3, 4, 5, 6, 7, 8]:
        move = best_action(tokens)
        print(f"tokens={tokens} -> take {move}")

Expected Output:

tokens=1 -> take 1
tokens=2 -> take 2
tokens=3 -> take 3
tokens=4 -> take 1
tokens=5 -> take 2
tokens=6 -> take 3
tokens=7 -> take 1
tokens=8 -> take 2

The minimax function explores all legal moves (taking 1, 2, or 3 tokens when available). The boolean maximizing indicates whether the current player is the maximizing player (the agent) or the minimizing opponent. At terminal states (tokens == 0), the player to move has lost; this is converted into a score from the maximizing player perspective using the depth-based formula. For non-terminal states, the maximizing player chooses the move with the highest score, while the minimizing player chooses the move with the lowest score. The depth parameter ensures that among equally winning lines, the algorithm prefers faster wins and slower losses.

Problem 2: Implement minimax with alpha-beta pruning for a deterministi...hard

Implement minimax with alpha-beta pruning for a deterministic 1D take-away game with a blocked move rule. There are N tokens. On your turn you may take 1 or 2 tokens, but you are not allowed to take exactly K tokens (where K is either 1 or 2). If taking exactly K is illegal, you must choose the other legal move; if both moves are illegal (because of the blocked rule or insufficient tokens), then the current player has no legal moves and loses immediately. Use the same score convention as the original example: +1 means the current player (maximizing player) can force a win, -1 means lose, from the perspective of the maximizing player. Implement minimax_alpha_beta(tokens, maximizing, blocked_k, alpha, beta) returning (score, best_move). Then implement best_action(tokens, blocked_k) returning the optimal move for the maximizing player. Finally, print the best move for tokens in [1..10] for blocked_k=1 and blocked_k=2.

💡 Show Hints (3)

• Model terminal states as either tokens == 0 (no tokens remain) OR no legal moves exist due to the blocked rule and token count.
• In alpha-beta pruning, update alpha when maximizing and beta when minimizing, and stop exploring when alpha >= beta.
• Be careful: the blocked rule can make a move set empty even when tokens > 0.

✓ Reveal Solution

Solution Code:

from typing import Optional, Tuple, List

# Game: 1D take-away with moves {1,2}.
# Blocked move rule: taking exactly blocked_k tokens is illegal.
# If no legal moves exist, the current player loses immediately.
# Scoring from the maximizing player's perspective:
# +1 means maximizing can force a win, -1 means lose.


def terminal_score(tokens: int, maximizing: bool) -> Optional[int]:
    # If tokens == 0, the player to move has lost (previous player took the last token).
    if tokens == 0:
        # If maximizing is True, maximizing player is the one to move and loses.
        # If maximizing is False, minimizing player is the one to move and loses,
        # meaning maximizing player wins.
        return -1 if maximizing else +1
    return None


def legal_moves(tokens: int, blocked_k: int) -> List[int]:
    candidates = [1, 2]
    moves = []
    for m in candidates:
        if m <= tokens and m != blocked_k:
            moves.append(m)
    return moves


def minimax_alpha_beta(
    tokens: int,
    maximizing: bool,
    blocked_k: int,
    alpha: int = -10**9,
    beta: int = 10**9,
) -> Tuple[int, Optional[int]]:
    # Terminal by tokens.
    ts = terminal_score(tokens, maximizing)
    if ts is not None:
        return ts, None

    moves = legal_moves(tokens, blocked_k)

    # Terminal by no legal moves.
    if not moves:
        # Current player cannot move and loses.
        return (-1 if maximizing else +1), None

    best_move: Optional[int] = None

    if maximizing:
        best_score = -10**9
        for m in moves:
            score, _ = minimax_alpha_beta(tokens - m, False, blocked_k, alpha, beta)
            if score > best_score:
                best_score, best_move = score, m
            alpha = max(alpha, best_score)
            if alpha >= beta:
                break
        return best_score, best_move
    else:
        best_score = 10**9
        for m in moves:
            score, _ = minimax_alpha_beta(tokens - m, True, blocked_k, alpha, beta)
            if score < best_score:
                best_score, best_move = score, m
            beta = min(beta, best_score)
            if alpha >= beta:
                break
        return best_score, best_move


def best_action(tokens: int, blocked_k: int) -> int:
    if tokens <= 0:
        raise ValueError("tokens must be positive")
    score, move = minimax_alpha_beta(tokens, True, blocked_k)
    if move is None:
        raise ValueError("No legal move available")
    return move


if __name__ == "__main__":
    for blocked_k in [1, 2]:
        for tokens in range(1, 11):
            move = best_action(tokens, blocked_k)
            print(f"blocked_k={blocked_k} tokens={tokens} -> take {move}")

Expected Output:

blocked_k=1 tokens=1 -> take 2
blocked_k=1 tokens=2 -> take 2
blocked_k=1 tokens=3 -> take 2
blocked_k=1 tokens=4 -> take 2
blocked_k=1 tokens=5 -> take 2
blocked_k=1 tokens=6 -> take 2
blocked_k=1 tokens=7 -> take 2
blocked_k=1 tokens=8 -> take 2
blocked_k=1 tokens=9 -> take 2
blocked_k=1 tokens=10 -> take 2
blocked_k=2 tokens=1 -> take 1
blocked_k=2 tokens=2 -> take 1
blocked_k=2 tokens=3 -> take 1
blocked_k=2 tokens=4 -> take 1
blocked_k=2 tokens=5 -> take 1
blocked_k=2 tokens=6 -> take 1
blocked_k=2 tokens=7 -> take 1
blocked_k=2 tokens=8 -> take 1
blocked_k=2 tokens=9 -> take 1
blocked_k=2 tokens=10 -> take 1

The algorithm first checks terminal conditions. If tokens == 0, the player to move has lost, so the maximizing perspective score is -1 when maximizing is to move and +1 when minimizing is to move. Next, it computes legal moves by excluding any move equal to blocked_k and any move larger than tokens. If no legal moves exist, the current player loses immediately, again mapped to +1/-1 from the maximizing perspective. For non-terminal states, minimax chooses the best move for maximizing (highest score) or minimizing (lowest score). Alpha-beta pruning maintains bounds (alpha for maximizing, beta for minimizing) and stops exploring branches that cannot improve the final decision.

Interactive Lesson

Interactive Lesson: Dependency-Ordered Foundations of AI (Goals, Reasoning, Knowledge, Agents, and Core Techniques)

⏱️ 30 min

Learning Objectives

Explain what Artificial Intelligence is and identify the main subproblems it aims to solve.
Describe why Reasoning and Problem-Solving can become computationally infeasible due to combinatorial explosion.
Distinguish Knowledge Representation and Ontologies from Machine Learning, focusing on how they enable deductions and question answering.
Use the idea of an agent maximizing expected utility to connect uncertainty to decision-making.
Relate Planning and Decision-Making to MDPs by mapping states, actions, transition probabilities, rewards, and policies.

1. Artificial Intelligence Definition and Scope

Artificial Intelligence (AI) is the capability of computational systems to perform tasks associated with human intelligence, including learning, reasoning, perception, and decision-making. This scope matters because it tells you what kinds of problems AI systems must solve and which techniques are commonly used to do so (for example, search, optimization, logic, and neural networks).

Examples:

Advanced web search engines that retrieve relevant information for a user’s query.
Autonomous vehicles that perceive the environment and make driving decisions.
Strategy-game play (chess and Go) that combines reasoning with search.

✓ Check Your Understanding:

Which option best captures the scope of AI as described here?

Answer: Learning, reasoning, perception, and decision-making.

Why is the definition’s list of capabilities important for later topics?

Answer: It determines which subproblems and techniques the field targets.

Which technique family is explicitly mentioned as being used to implement AI capabilities?

Answer: Search, optimization, logic, and neural networks.

2. AI Goals and Subproblems

AI goals are realized by solving subproblems. The key idea is that AI is not one single task; it is a collection of capabilities that must work together. For example, to build a chatbot or assistant, you need language understanding (NLP), reasoning or decision-making (problem-solving and planning), and sometimes perception (for multimodal systems).

Examples:

Chatbots and virtual assistants that combine language processing with decision-making.
Autonomous vehicles that combine perception with planning and control.
Strategy-game play that combines reasoning with search.

✓ Check Your Understanding:

Which statement best reflects the relationship between AI goals and subproblems?

Answer: AI goals are achieved by solving multiple subproblems that correspond to different capabilities.

A system that answers questions correctly most directly depends on which capability?

Answer: Reasoning and problem-solving plus knowledge handling.

Which example best illustrates multiple AI capabilities working together?

Answer: An autonomous vehicle that perceives and then decides actions.

3. Reasoning and Problem-Solving

Reasoning derives solutions from information. However, large problems can cause combinatorial explosion: reasoning algorithms may become exponentially slower as the number of possibilities grows. This is why many reasoning tasks are computationally hard. In uncertain environments, reasoning also connects to probabilistic methods, because the agent may need to make and revise probabilistic guesses after acting.

Examples:

Step-by-step deduction in complex strategy games can require exploring many possibilities.
Planning in real-world settings where outcomes are uncertain forces the system to reassess after actions.

✓ Check Your Understanding:

What is combinatorial explosion?

Answer: A phenomenon where reasoning algorithms become exponentially slower as problem size grows.

Which cause-effect chain best matches the text’s explanation of reasoning scaling problems?

Answer: More possibilities must be considered, so runtime grows exponentially.

How does uncertainty change the reasoning task for an agent?

Answer: It requires probabilistic guesses and reassessment after results.

4. Knowledge Representation and Ontologies

Knowledge representation encodes concepts and relationships so programs can answer questions and make deductions about real-world facts. Ontologies are a structured way to represent domain concepts and their relationships. A major challenge is commonsense breadth: human knowledge is vast and often implicit, so representing it explicitly is difficult and requires knowledge acquisition.

Examples:

An ontology representing objects and relations used by a domain to support retrieval and decision support.
The difficulty of encoding broad commonsense knowledge that humans assume automatically.

✓ Check Your Understanding:

What is the primary purpose of knowledge representation in this lesson?

Answer: To encode concepts and relationships so programs can answer questions and make deductions.

Which statement best clarifies a common confusion?

Answer: Knowledge representation encodes concepts/relations for reasoning, while machine learning learns patterns from data.

Why is commonsense knowledge hard to represent?

Answer: Because it is broad and often not represented as explicit facts.

5. Planning and Decision-Making (Agents, Utility, Expected Utility, and MDPs)

An agent perceives and acts. Decision-making can choose actions by maximizing expected utility over possible outcomes. Utility is a numerical measure of how much an agent prefers a situation. Expected utility handles uncertainty by weighting utilities of outcomes by their probabilities. To formalize planning under uncertainty, Markov Decision Processes (MDPs) model: (1) a transition model with probabilities, (2) a reward function (utility or cost), and (3) a policy mapping states to actions. This connects back to reasoning (choosing actions) and knowledge representation (what the agent knows about the world).

Examples:

A strategy-game agent choosing moves by evaluating possible outcomes and their utilities.
A driving agent selecting actions by considering uncertain outcomes and maximizing expected utility.

✓ Check Your Understanding:

What does an agent do in this framework?

Answer: Perceives and acts to make goals happen.

Which option correctly defines expected utility?

Answer: The utility of all possible outcomes weighted by their probabilities.

Which set of components best matches an MDP?

Answer: Transition model (probabilities), reward function (utility/cost), and policy mapping states to actions.

Practice Activities

Cause-Effect: From Uncertainty to Expected Utility

medium

Pick one action with two possible outcomes. For each outcome, assign a probability and a utility. Then compute expected utility and choose the action with the higher expected utility. Finally, explain in one sentence how uncertainty forced probabilistic weighting.

Cause-Effect: Combinatorial Explosion in Reasoning

medium

Consider a reasoning task where each step branches into multiple possibilities. Describe how increasing the branching factor changes the number of possibilities the algorithm must consider, and state the resulting effect on runtime using the term combinatorial explosion.

Cause-Effect: Knowledge Representation vs Machine Learning

medium

Given a question-answering system, decide whether the system needs knowledge representation (concepts/relations for deductions) or machine learning (learning patterns from data). For each choice, write a cause-effect explanation that links the need to the mechanism.

Cause-Effect: From MDP Components to Policy Choice

hard

Create a tiny MDP with two states and two actions. Provide transition probabilities and rewards. Then explain how the transition model and reward function jointly determine which policy is preferred under expected utility.

Next Steps

Cheat Sheet

Cheat Sheet: Artificial Intelligence (AI)

Key Terms

Combinatorial explosion: Reasoning becomes exponentially slower as problem size grows, making large step-by-step reasoning infeasible.
Ontology: A structured representation of concepts in a domain and the relationships between them.
Knowledge base: A body of knowledge encoded in a form a program can use for retrieval and reasoning.
Rational agent: An agent that perceives and acts to achieve goals or preferences.
Utility: A numerical score expressing how much an agent prefers a situation.
Expected utility: The utility of an action’s possible outcomes weighted by their probabilities; used to choose actions under uncertainty.
Supervised learning: Learning from labeled data where each input is paired with the expected answer.
Unsupervised learning: Learning patterns or structure from unlabeled data without expected answers.
Reinforcement learning: Learning by interaction where good actions are rewarded and bad actions are punished.
Transformer architecture: A deep learning architecture using attention mechanisms, central to modern NLP systems.

Formulas

Expected Utility (Action Choice)

Choose action a that maximizes EU(a), where EU(a) = Σ_over_outcomes o [ P(o | a) * U(o) ].

When outcomes are uncertain and you must compare actions probabilistically.

Gradient Descent (Parameter Update)

θ := θ - η * ∇_θ L(θ)

When training models by minimizing a loss function L(θ) using iterative updates (often with backpropagation).

State Space Search (Goal Reachability)

Explore a tree/graph of states: start state s0 → successors → goal state sg, using a search strategy (e.g., BFS/DFS/A*).

When you need to find a path to a goal by exploring possible states.

Local Search (Refine a Guess)

Start with initial parameters x0; iteratively update x := x + Δ(x) to reduce an objective (loss/cost) until convergence.

When you optimize by improving a current guess rather than exploring a full state tree.

Main Concepts

Artificial Intelligence Definition

AI is the capability of computational systems to perform tasks associated with human intelligence, including learning, reasoning, perception, and decision-making.

Reasoning and Problem-Solving

Derive solutions from information, but large problems can become computationally hard due to combinatorial explosion.

Knowledge Representation and Ontologies

Encode concepts and relationships so programs can answer questions and make deductions; ontologies structure domain knowledge.

Agents, Utility, and Expected Utility

An agent selects actions by maximizing expected utility over possible outcomes, explicitly handling uncertainty.

Markov Decision Process (MDP)

A formal model for decision-making under uncertainty with transition probabilities, a reward/utility function, and a policy mapping states to actions.

Machine Learning Paradigms

Supervised, unsupervised, and reinforcement learning (plus transfer/deep learning) improve performance using data and feedback.

NLP and Modern Transformers

NLP enables language tasks; transformer architectures with attention power modern high-performing language models.

Perception and Computer Vision

Perception infers world aspects from sensors; computer vision focuses on visual input for tasks like recognition and tracking.

Affective Computing and Social Intelligence

Systems that recognize and process emotion-related signals (e.g., sentiment), including multimodal approaches.

10.

Search and Optimization Techniques

Search explores state spaces; optimization (e.g., gradient descent) refines parameters to minimize loss or cost.

Memory Tricks

Combinatorial explosion

Think: “Combo” = many possibilities; “explosion” = exponential slowdown as the tree grows.

Expected utility handles uncertainty

“Expected” means average over probabilities: EU = probability-weighted utilities.

State space search vs local search

“State space” = explore states; “local” = improve locally from a current guess.

Supervised vs reinforcement learning

Supervised has “answers” given; reinforcement has “rewards” received after actions.

Transformer key idea

“Transformer” = transforms sequences using attention, letting the model focus on relevant tokens.

Knowledge representation vs machine learning

Representation = what the system knows for reasoning; learning = how it improves from data.

Quick Facts

AI was founded as an academic discipline in 1956.
AI history includes cycles of optimism and disappointment called AI winters.
After 2012, GPUs accelerated neural networks, increasing funding and interest.
Deep learning outperformed earlier AI techniques after 2012.
After 2017, transformer adoption accelerated AI growth.
In the 2020s, generative AI enabled creating or modifying media from text prompts.
State space search explores possible states to reach a goal state.
Local search starts from a guess and incrementally improves it using mathematical optimization.
Gradient descent minimizes a loss function by adjusting parameters; it is commonly used with backpropagation.
In NLP, GPT-style models began generating coherent text in 2019 and reached human-level scores on several standardized exams by 2023.

Common Mistakes

Common Mistakes: AI Goals, Reasoning, Knowledge, Agents, Learning, and Perception

Confusing knowledge representation with machine learning, and assuming that “having data” automatically replaces “encoding concepts and relations for reasoning.”

conceptual · high severity

▼

Why it happens:

Students reason that because both are used to make AI systems “smart,” knowledge representation must be the same as learning from data. They then treat a knowledge base or ontology as just another dataset, and they expect the system to deduce new facts purely by training, without explicit structure for reasoning.

✓ Correct understanding:

Knowledge representation encodes concepts and relationships so a program can answer questions and make deductions. Machine learning improves performance by learning patterns from data. In practice, knowledge representation supports reasoning and decision support, while machine learning supports pattern learning; they solve different subproblems and can be combined.

How to avoid:

Use a two-part checklist: (1) Ask what kind of inference is required (deduction from explicit relations vs pattern prediction from data). (2) Map that need to the right tool: ontologies/knowledge bases for structured reasoning; machine learning for learning patterns and improving predictions. When uncertain, write the inference rule you want the system to apply and check whether it is explicitly represented.

Mixing up supervised learning and reinforcement learning by assuming both rely on labeled answers provided ahead of time.

conceptual · high severity

▼

Why it happens:

Students think that “learning” always means the training set contains the correct output. They then interpret reinforcement learning as just supervised learning with noisy labels, so they expect a reward signal to be equivalent to a labeled expected answer for each input.

✓ Correct understanding:

Supervised learning trains on labeled data where inputs come with expected answers. Reinforcement learning learns from interaction: it rewards good responses and punishes bad ones, so the agent learns which actions lead to better outcomes through trial and error rather than from pre-provided correct labels.

How to avoid:

Classify the learning signal: if the dataset provides the correct answer for each input, it is supervised learning. If the agent learns by receiving scalar rewards/punishments from interaction, it is reinforcement learning. A quick test: ask whether the training includes an “expected answer” per example, or only feedback after actions.

Believing state space search and local search are identical, and describing local search as if it explores a full tree of possible states.

conceptual · medium severity

▼

Why it happens:

Students see both as “search,” so they assume the algorithm enumerates possibilities similarly. They then incorrectly describe local search as branching over many candidate states, rather than refining a single current guess using optimization steps.

✓ Correct understanding:

State space search explores a tree (or graph) of possible states to find a goal state. Local search begins with a guess and refines it incrementally using mathematical optimization, rather than systematically exploring all possible states.

How to avoid:

When you hear “state space,” think “states and goal reachability via exploration.” When you hear “local search,” think “parameters/guesses and iterative improvement.” Write the algorithm’s starting point and its neighborhood move: state space search starts from an initial state and expands successors; local search starts from a guess and updates it using an objective/loss.

Assuming expected utility only applies when outcomes are deterministic, and ignoring probabilities.

conceptual · high severity

▼

Why it happens:

Students interpret “expected” as “average of deterministic outcomes,” or they treat uncertainty as a minor detail. They then compute utility for the most likely outcome or choose the action with the highest utility in a single scenario, incorrectly skipping probability weighting.

✓ Correct understanding:

Expected utility explicitly handles uncertainty by weighting the utility of all possible outcomes by their probabilities. If outcomes are uncertain, the agent should compute expected utility across the probability distribution, not just pick the best single outcome without accounting for likelihood.

How to avoid:

Always compute expected utility as: sum over outcomes of (probability of outcome) × (utility of outcome). If you cannot list outcomes with probabilities, you are not yet ready to use expected utility; you must clarify the uncertainty model first.

Thinking perception is only computer vision, and treating other sensors as irrelevant to “perception.”

conceptual · medium severity

▼

Why it happens:

Students equate perception with “seeing,” because many popular AI demos focus on images. They then generalize that perception equals computer vision, and they underestimate how agents use multiple sensor modalities to infer the world.

✓ Correct understanding:

Perception uses sensor inputs to infer aspects of the world. Computer vision is a subset of perception focused on visual input. Perception can include microphones, lidar, radar, tactile sensors, and more, depending on the system and environment.

How to avoid:

Use the definition lens: perception is “infer aspects of the world from sensor inputs.” Then label the modality: if the input is visual, it is computer vision; if it is from other sensors, it is still perception. Practice by listing sensors in a scenario and mapping each to perception vs computer vision.

Underestimating combinatorial explosion by assuming step-by-step reasoning will scale to large problems without becoming infeasible.

conceptual · high severity

▼

Why it happens:

Students believe that if reasoning is correct for small cases, it will remain feasible for large cases. They ignore how the number of possibilities can grow exponentially as problem size increases, so they do not connect reasoning complexity to combinatorial explosion.

✓ Correct understanding:

Reasoning algorithms that model step-by-step deduction can scale poorly because the number of possibilities grows rapidly. Combinatorial explosion makes large reasoning problems exponentially slower, making them infeasible without smarter methods (for example, pruning, heuristics, probabilistic reasoning, or approximate approaches).

How to avoid:

When evaluating a reasoning approach, estimate growth in the number of candidate possibilities. Ask: “What is the branching factor?” and “How does runtime scale with problem size?” If it grows exponentially, expect combinatorial explosion and look for methods that reduce the search space or use probabilistic/heuristic guidance.

Assuming uncertainty never matters in planning, and failing to update beliefs after acting in a partially observable environment.

conceptual · high severity

▼

Why it happens:

Students think planning is just choosing the best action once, based on a fixed world model. They then treat the environment as deterministic and fully observable, so they do not incorporate probabilistic transitions or belief updates after observing outcomes.

✓ Correct understanding:

Uncertainty about the current situation or action outcomes requires probabilistic guesses and reassessment after acting. Non-determinism and partial observability mean the agent must update beliefs based on results, which connects planning/decision-making to probabilistic models and frameworks like MDPs.

How to avoid:

Use a belief-update mindset: if the environment is uncertain or partially observable, you must represent uncertainty and revise it after actions. Practice by writing down (1) what you can observe, (2) what you cannot observe, and (3) how observations change your belief about the hidden state.

General Tips

Anchor every answer to a definition: if you cannot restate the relevant definition precisely, you likely have a category error (for example, confusing representation with learning).
For decision-making questions, explicitly check whether uncertainty is present; if yes, expected utility and probabilistic reasoning are required.
For “search” questions, always distinguish exploration of a state space from iterative refinement of a guess using optimization.
For learning questions, identify the learning signal: labeled targets (supervised) versus reward feedback from interaction (reinforcement).
For perception questions, list sensor modalities; perception is broader than computer vision.

Content

AI Audio

Test Yourself

Practice

Games

Summary

Topics Covered

What AI Is: Scope, Capabilities, and Subproblems

Reasoning and Problem-Solving Under Computational Limits

Knowledge Representation and Ontologies for Deduction

Agents, Utility, and Planning Under Uncertainty (MDPs)

AI Techniques: Search and Optimization as the Engine

Machine Learning Paradigms: From Data to Behavior

NLP and Modern Transformers: Learning Language Representations

Perception, Computer Vision, and Affective/Social Intelligence

Key Insights

Uncertainty forces belief updates

Representation limits reasoning breadth

Transformers changed the bottleneck

Search vs optimization is a spectrum

Expected utility is learning-friendly

Conclusions

Bringing It All Together

Key Takeaways

Real-World Applications

Math Examples

Expected Utility for Choosing the Maximum-Preference Action

Problem

Key Equations

Solution

Explanation

Practice Problems

Markov Decision Process: One-Step Expected Return Using Transition Probabilities and Rewards

Problem

Key Equations

Solution

Explanation

Practice Problems

Local Search via Gradient Descent: Minimizing a Loss Function

Problem

Key Equations

Solution

Explanation

Practice Problems

Backpropagation-Style Gradient Step: Updating Two Parameters to Reduce Loss

Problem

Key Equations

Solution

Explanation

Practice Problems

Combinatorial Explosion in State Space Search: Exponential Growth of Nodes

Problem

Key Equations

Solution

Explanation

Practice Problems

💻 Code Examples

Supervised learning for text classification (NLP) using a transformer-style pipeline

Code

Explanation

Use Case

Output

Markov decision process (MDP) with a learned policy via value iteration

Code

Explanation

Use Case

Output

Ontology-style knowledge representation with default reasoning and query answering

Code

Explanation

Use Case

Output

💻 Code Practice Problems

Adversarial search for game playing using minimax (expected utility under opponents)

Code

Explanation

Use Case

Output

💻 Code Practice Problems