Shared by support using Learnlo Plus

You're viewing a shared pack. Upgrade to create your own packs.

Artificial Intelligence (AI): goals, approaches, core capabilities, and techniqu
Dashboard/Study PackLearning Mode

Summary

Artificial Intelligence (AI) is the capability of computational systems to perform tasks associated with human intelligence, using learning and reasoning to achieve defined goals. This matters because it sets the scope: AI is not only “human-like reasoning,” but also learning from data, interpreting the world, and acting under goals. AI Goals and Subproblems break the overall aim into components such as reasoning, language, perception, and decision-making. Reasoning and problem-solving under uncertainty are central because real environments are incomplete and outcomes are not perfectly known; agents must act despite uncertainty, often facing combinatorial explosion when naive search over possibilities grows exponentially. To reason effectively, AI often uses Knowledge representation and knowledge engineering. Knowledge representation is the formal encoding of concepts and relationships, while a knowledge base is the actual body of encoded knowledge a program can query. Ontologies and knowledge bases matter because they enable deductions and structured retrieval, supporting tasks like content-based indexing and clinical decision support, but they also face the commonsense breadth problem. Planning and decision-making connect reasoning to action selection. Rational agents, utility, and expected utility formalize “best action” as maximizing preferences averaged over probabilistic outcomes; this connects directly to Markov decision processes (MDPs), which model state transitions and rewards via policies mapping states to decisions. Machine learning provides performance improvement from data, including supervised, unsupervised, reinforcement, transfer, and deep learning. NLP leverages these methods for speech recognition, translation, extraction, retrieval, and question answering, with Transformers and GPT-style generative models enabling powerful sequence modeling. Perception and computer vision analyze sensor inputs (especially images) for recognition and tracking, supporting robotics and autonomous agents. Finally, Search and optimization techniques include state space search and local search. Gradient descent trains models by minimizing a loss function, while evolutionary computation explores parameter spaces. AI history and safety concerns matter because cycles of optimism and “AI winters” show that progress depends on both technical breakthroughs (e.g., GPUs, transformers) and responsible deployment.

Topics Covered

AI Definition, Scope, and Core Capabilities

AI is the capability of computational systems to perform tasks associated with human intelligence, using learning and reasoning to achieve defined goals. This scope includes learning, reasoning, planning, language, and perception, not only “human-like reasoning.” It connects directly to the later topics by framing which techniques are used for which goals (search/optimization, machine learning, NLP, perception, and decision-making).

AI Goals, Subproblems, and Reasoning Under Uncertainty

AI goals are achieved by solving subproblems such as reasoning and problem-solving when information is incomplete or outcomes are uncertain. Uncertainty forces probabilistic thinking and reassessment after actions, linking to rational agents and expected utility. Reasoning can also face combinatorial explosion, motivating the need for search and optimization methods in later topics.

Knowledge Representation, Ontologies, and Knowledge Bases

Knowledge representation encodes concepts and relationships so programs can deduce and answer questions using a knowledge base and formal structures like ontologies. A knowledge base is the actual stored body of represented knowledge, while knowledge representation is the formal encoding method. This topic connects to planning and decision-making by enabling content-based retrieval and domain reasoning, but also raises knowledge acquisition and commonsense breadth challenges.

Rational Agents, Utility, Expected Utility, and MDPs

A rational agent selects actions to maximize preferences, often by computing expected utility over possible outcomes. Expected utility differs from utility because it averages utilities weighted by outcome probabilities, which is essential under uncertainty. This connects to planning and decision-making, and it formalizes sequential choices through Markov decision processes (MDPs) and policies mapping states to actions.

Machine Learning Paradigms as Performance Improvement

Machine learning studies programs that improve performance on a task automatically from data, rather than relying only on hand-coded rules. Paradigms differ by supervision signal: supervised learning uses labeled answers, unsupervised learning uses unlabeled data, and reinforcement learning learns from reward feedback. This topic connects to NLP and perception because modern systems often learn representations from data using deep learning.

NLP with Transformers and Generative Models

Natural language processing enables reading, writing, and communication in human languages, including translation, extraction, retrieval, and question answering. Modern deep learning methods, especially transformers and word embeddings, power both discriminative and generative behavior. This connects to machine learning (training via gradient descent) and to AI goals by providing language interfaces for agents and tools.

Perception and Computer Vision from Sensors

Perception uses sensor inputs to infer aspects of the world, while computer vision specifically analyzes visual input for recognition and tracking. This distinction matters because “perception” can include non-visual sensors like lidar or tactile signals. This topic connects to rational agents and planning by supplying state information for decisions, and it connects to machine learning because perception models are typically trained from data.

AI Search and Optimization: State Space, Local Search, and Training

AI uses search and optimization to find solutions, including state space search for planning and adversarial game search. Local search corresponds to mathematical optimization, such as gradient descent, which minimizes a loss function. This topic connects to reasoning under uncertainty (searching over possibilities), to rational agents (choosing actions), and to machine learning (backpropagation updates parameters).

AI History, Booms, Winters, and Safety Concerns

AI history includes cycles of optimism and disappointment, including AI winters, and later booms driven by new capabilities and compute. Deep learning progress accelerated after 2012 with GPUs and after 2017 with transformers, and the 2020s saw generative AI advances. This topic connects to the rest by explaining why certain techniques (learning, transformers, optimization) became dominant, and it motivates safety and regulation discussions as systems become more capable.

Key Insights

Uncertainty forces continual replanning

The knowledge says uncertainty makes agents probabilistic and reassess after acting. Combined with planning and rational agents, this implies that “planning” is not a one-time computation; it becomes an ongoing loop where new evidence updates beliefs and changes future actions.

Why it matters: Students often treat planning as static. This reframes planning as belief-updating under uncertainty, aligning decision-making with the need to revise after outcomes arrive.

Representation is a performance lever

Knowledge representation and knowledge bases are described as enabling deductions and retrieval, but they also shape what search and learning can even do. If the representation encodes the right relations, the system can reduce the effective search space and make reasoning or retrieval far more efficient than brute-force exploration.

Why it matters: This connects “knowledge engineering” to “search/optimization” and “decision-making,” showing representation is not just for correctness but also for computational tractability.

Expected utility is risk-aware averaging

Expected utility is not the same as utility; it averages utilities over outcomes weighted by probabilities. When combined with uncertainty and rational agents, this implies the agent can prefer a lower-probability action if its high-utility outcomes dominate the probability-weighted average, even when the most likely outcome is worse.

Why it matters: Students may think the agent “chooses the best likely outcome.” This clarifies that rational choice can be driven by the distribution of outcomes, not just the mode.

Deep learning replaced explicit reasoning

The history notes that early step-by-step reasoning failed at scale due to combinatorial explosion, while GPUs and transformers accelerated deep learning. Putting these together suggests a strategic shift: modern systems often avoid explicit symbolic search over possibilities by learning representations and using statistical inference instead of enumerating reasoning paths.

Why it matters: This reframes AI progress as partly an architectural workaround to combinatorial explosion, not merely “better models,” helping students see why learning-based methods scaled.

Transformers changed what “search” means

Search and optimization are described as state-space exploration or local search via gradient descent, while transformers are tied to sequence learning and generative text. Combining these ideas implies that, in many modern NLP systems, “finding the best output” is largely handled by learned parameters and decoding heuristics rather than explicit symbolic search over a structured state space.

Why it matters: Students may assume AI output requires classic search. This insight shows that modern generation often shifts the burden from explicit search to learned optimization and representation, changing the meaning of “optimization” in practice.


Conclusions

Bringing It All Together

AI begins with a broad definition and purpose, then narrows into AI goals and subproblems that require reasoning, learning, and acting under constraints. Those subproblems split into two major routes: knowledge-driven reasoning via knowledge representation, ontologies, and knowledge bases, and data-driven improvement via machine learning that powers NLP and perception. Planning and decision-making connect reasoning to action by using rational agents, utility, and expected utility, and when uncertainty and dynamics matter, they connect further to MDPs and policies. Search and optimization provide the computational backbone for both symbolic planning (state space and adversarial search) and numeric learning (local search such as gradient descent). Modern NLP and generative systems tie together machine learning and language tasks through transformers and GPT-style models, while perception and social intelligence extend the same AI goal structure into vision and affect-aware interaction. Finally, AI history and safety discussions connect back to the whole system by explaining why capabilities rise and fall with methods, compute, and societal risk management.

Key Takeaways

  • AI definition and scope sets the unifying purpose: building computational systems that achieve defined goals using learning and reasoning.
  • Reasoning and problem-solving under uncertainty drives the need for planning and decision-making, which formalizes action choice using rational agents, utility, and expected utility.
  • Knowledge representation and knowledge engineering (ontologies and knowledge bases) enable deduction, retrieval, and domain-specific interpretation, but require careful knowledge acquisition.
  • Machine learning provides performance improvement from data, and its paradigms (supervised, unsupervised, reinforcement, transfer, deep learning) underpin modern NLP and perception.
  • Search and optimization connect symbolic and numeric methods: state space search supports planning and adversarial games, while local search such as gradient descent trains models by minimizing loss.

Real-World Applications

  • Advanced web search engines and strategy-game analysis (chess and Go) use search and optimization ideas to explore possibilities and rank outcomes.
  • Chatbots and virtual assistants rely on NLP with transformers and GPT-style generative models to produce coherent responses and assist users.
  • Clinical decision support and content-based indexing/retrieval use ontologies and knowledge bases to represent domain concepts and enable retrieval or reasoning over structured knowledge.
  • Autonomous vehicles and other decision systems use expected utility style decision-making to choose actions under uncertainty, often combined with planning and policy concepts.

Next, the student should learn how to formalize decision problems end-to-end: specify states, actions, transition models, rewards, and policies for MDPs, then compare how knowledge-based planning and machine-learning-based control behave under uncertainty. After that, they should deepen into the practical mechanics of knowledge engineering (how ontologies and knowledge bases are built and maintained) and the training pipeline for deep models (loss functions, gradient descent, and evaluation), so they can connect theory to deployable systems.


Math Examples

Expected Utility for Choosing the Maximum-Preference Action

Problem

An AI agent evaluates two possible actions, A and B. Each action has multiple possible outcomes with known probabilities and utilities (preference scores). Compute the expected utility of each action and choose the action with the maximum expected utility.

Key Equations

EU(action) = ∑(outcomes) P(outcome) × U(outcome)
EU(A) = 0.6×10 + 0.4×2
EU(B) = 0.5×9 + 0.5×3

Solution

Step 1: Write expected utility as “utility of outcomes weighted by probability.” For action A, suppose outcomes are: outcome 1 with probability 0.6 and utility 10, outcome 2 with probability 0.4 and utility 2. Then EU(A) = 0.6×10 + 0.4×2 = 6 + 0.8 = 6.8. Step 2: Compute EU(B) similarly. Suppose outcome 1 has probability 0.5 and utility 9, outcome 2 has probability 0.5 and utility 3. Then EU(B) = 0.5×9 + 0.5×3 = 4.5 + 1.5 = 6.0. Step 3: Compare expected utilities. Since EU(A) = 6.8 and EU(B) = 6.0, the agent chooses action A because it maximizes expected utility.

Explanation

The document states that a decision-making agent assigns a utility to each situation and computes expected utility for each action by weighting utilities by outcome probabilities. Choosing the action with the maximum expected utility implements rational preference under uncertainty.

Practice Problems

Problem 1: An AI agent must choose between two actions, A and B. Each a...medium

An AI agent must choose between two actions, A and B. Each action leads to one of two possible outcomes with known probabilities and utilities (preference scores). Compute the expected utility of each action and choose the action with the maximum expected utility. Action A outcomes: - Outcome A1 occurs with probability 0.7 and has utility 8 - Outcome A2 occurs with probability 0.3 and has utility 1 Action B outcomes: - Outcome B1 occurs with probability 0.4 and has utility 7 - Outcome B2 occurs with probability 0.6 and has utility 2

💡 Show Hints (3)
  • Expected utility is the sum of each outcome utility multiplied by its probability.
  • Compute EU(A) by doing 0.7×(utility of A1) plus 0.3×(utility of A2). Then compute EU(B) similarly.
  • After you get two numbers, pick the action with the larger expected utility.
✓ Reveal Solution

Steps:

  1. Step 1: Compute EU(A) = 0.7×8 + 0.3×1.
  2. Step 2: Compute EU(A) = 5.6 + 0.3 = 5.9.
  3. Step 3: Compute EU(B) = 0.4×7 + 0.6×2 = 2.8 + 1.2 = 4.0.
  4. Step 4: Compare EU(A) = 5.9 and EU(B) = 4.0, and choose the action with the larger value.

Answer:

Choose action A, because EU(A) = 5.9 is greater than EU(B) = 4.0.

Expected utility correctly aggregates uncertain outcomes by weighting each utility by its probability. The agent should select the action that maximizes this weighted average, which here is action A.

Problem 2: An AI agent chooses between actions A and B. Each action has...hard

An AI agent chooses between actions A and B. Each action has three possible outcomes with known probabilities and utilities. Compute the expected utility of each action and choose the action with the maximum expected utility. Action A outcomes: - Outcome A1 occurs with probability 0.2 and has utility 15 - Outcome A2 occurs with probability 0.5 and has utility 4 - Outcome A3 occurs with probability 0.3 and has utility -2 Action B outcomes: - Outcome B1 occurs with probability 0.1 and has utility 20 - Outcome B2 occurs with probability 0.6 and has utility 3 - Outcome B3 occurs with probability 0.3 and has utility 1 Note: Probabilities for each action sum to 1.

💡 Show Hints (3)
  • Write EU(action) as the probability-weighted sum over all outcomes for that action.
  • Be careful with negative utilities: they reduce the expected utility when multiplied by their probabilities.
  • Compute both expected utilities fully, then compare them to decide.
✓ Reveal Solution

Steps:

  1. Step 1: Compute EU(A) = 0.2×15 + 0.5×4 + 0.3×(-2).
  2. Step 2: Compute EU(A) = 3 + 2 + (-0.6) = 4.4.
  3. Step 3: Compute EU(B) = 0.1×20 + 0.6×3 + 0.3×1.
  4. Step 4: Compute EU(B) = 2 + 1.8 + 0.3 = 4.1.
  5. Step 5: Compare EU(A) = 4.4 and EU(B) = 4.1, and choose the larger one.

Answer:

Choose action A, because EU(A) = 4.4 is greater than EU(B) = 4.1.

The method works because expected utility is the correct decision criterion under uncertainty: it sums each possible utility weighted by its probability. Negative utilities are naturally handled by multiplication, and the action with the higher expected utility is preferred.

Markov Decision Process: One-Step Expected Reward Using Transition Probabilities

Problem

In a Markov decision process, an action changes the state probabilistically. Suppose the agent is in state s. It takes action a, which leads to next states s1 and s2 with probabilities 0.7 and 0.3. The reward (utility) for s1 is 5 and for s2 is 1. Compute the expected reward for taking action a in state s.

Key Equations

R(s,a) = ∑(next states) P(next | s,a) × U(next)
R(s,a) = 0.7×5 + 0.3×1
∑(i=1 to n) P(i) = 1

Solution

Step 1: Use the MDP idea: a transition model gives probabilities of next states after an action, and a reward function supplies utility of each state and cost of each action. Here we focus on state utilities for the next state. Step 2: Compute expected reward as probability-weighted utility. Expected reward R(s,a) = 0.7×U(s1) + 0.3×U(s2). Step 3: Substitute values. R(s,a) = 0.7×5 + 0.3×1 = 3.5 + 0.3 = 3.8. Step 4: Interpret the result: the agent expects to land in a higher-utility next state more often (s1), so the expected reward is closer to 5 than to 1.

Explanation

The document describes an MDP as having a transition model (probabilities of state changes) and a reward function (utility of states and cost of actions). Expected reward is the key calculation that turns probabilistic transitions into a single decision-relevant number.

Practice Problems

Problem 1: In a Markov decision process, the agent is currently in stat...medium

In a Markov decision process, the agent is currently in state s. It takes action a, which leads to next state s1 with probability 0.4 and to next state s2 with probability 0.6. The utility of state s1 is 8 and the utility of state s2 is 2. Compute the expected one-step reward for taking action a in state s, using the state utilities of the next states.

💡 Show Hints (3)
  • Use the transition probabilities to form a probability-weighted average of the next-state utilities.
  • Write the expected reward as R(s,a) = p(s1|s,a) * U(s1) + p(s2|s,a) * U(s2).
  • Substitute p(s1|s,a)=0.4, U(s1)=8, p(s2|s,a)=0.6, and U(s2)=2, then simplify.
✓ Reveal Solution

Steps:

  1. Step 1: Identify the next-state utilities: U(s1)=8 and U(s2)=2.
  2. Step 2: Use the probability-weighted expected reward formula: R(s,a) = 0.4*U(s1) + 0.6*U(s2).
  3. Step 3: Substitute and compute: R(s,a) = 0.4*8 + 0.6*2 = 3.2 + 1.2 = 4.4.
  4. Step 4: Interpret: because the agent goes to the higher-utility state s1 with probability 0.4 and to the lower-utility state s2 with probability 0.6, the expected value lies between 2 and 8, closer to 2.

Answer:

4.4

The expected one-step reward is the average utility of the possible next states, weighted by their transition probabilities. This matches the MDP idea that an action induces a distribution over next states, and the expected reward is the expectation of the next-state utility under that distribution.

Problem 2: Consider an MDP where the agent is in state s and takes acti...hard

Consider an MDP where the agent is in state s and takes action a. The next state is one of three possibilities: s1, s2, and s3. The transition probabilities are p(s1|s,a)=0.15, p(s2|s,a)=0.55, and p(s3|s,a)=0.30. The utilities of the next states are U(s1)=12, U(s2)=4, and U(s3)=0. Compute the expected one-step reward for taking action a in state s. Then state which next state most strongly influences the expected reward and why.

💡 Show Hints (3)
  • Generalize the two-state expected value idea to three states: sum probability times utility.
  • First verify the probabilities add to 1, then form R(s,a) = 0.15*U(s1) + 0.55*U(s2) + 0.30*U(s3).
  • After computing the numeric result, compare the contributions from each term to see which dominates.
✓ Reveal Solution

Steps:

  1. Step 1: Check probability consistency: 0.15 + 0.55 + 0.30 = 1.00, so the distribution is valid.
  2. Step 2: Write the expected reward as a sum over next states: R(s,a) = 0.15*U(s1) + 0.55*U(s2) + 0.30*U(s3).
  3. Step 3: Substitute utilities: R(s,a) = 0.15*12 + 0.55*4 + 0.30*0.
  4. Step 4: Compute each contribution: 0.15*12 = 1.8, 0.55*4 = 2.2, and 0.30*0 = 0.
  5. Step 5: Sum contributions: R(s,a) = 1.8 + 2.2 + 0 = 4.0.
  6. Step 6: Determine influence: compare contributions 1.8 (from s1), 2.2 (from s2), and 0 (from s3). The largest contribution is from s2, so s2 most strongly influences the expected reward because it has the highest probability and a nonzero utility.

Answer:

Expected one-step reward R(s,a) = 4.0; next state s2 most strongly influences the expected reward because its probability-weighted utility contribution (2.2) is the largest.

The expected reward is the expectation of the next-state utility under the transition probabilities induced by action a. By summing probability-weighted utilities across all possible next states, we obtain the correct expected value. The dominant influence comes from the next state with the largest probability-weighted utility contribution.

Gradient Descent: Minimizing a Loss Function by Incremental Parameter Updates

Problem

Local search uses mathematical optimization. Use gradient descent to update one numerical parameter θ. Suppose the loss function is L(θ) = (θ-3)². Start at θ0 = 0 and use learning rate α = 0.5. Perform one gradient descent step and compute the new loss.

Key Equations

L(θ) = (θ-3)²
dL/dθ = 2(θ-3)
θ1 = θ0 - α×(dL/dθ)
L(3) = (3-3)² = 0

Solution

Step 1: Compute the gradient of the loss. L(θ) = (θ-3)², so dL/dθ = 2(θ-3). Step 2: Apply the gradient descent update rule: θ1 = θ0 - α×(dL/dθ at θ0). Step 3: Evaluate the derivative at θ0 = 0: dL/dθ = 2(0-3) = 2×(-3) = -6. Step 4: Update θ: θ1 = 0 - 0.5×(-6) = 0 + 3 = 3. Step 5: Compute the new loss: L(3) = (3-3)² = 0. Step 6: Conclude: after one step, the parameter reaches the minimizer for this quadratic loss, so the loss becomes zero.

Explanation

The document defines gradient descent as local search that incrementally adjusts numerical parameters to minimize a loss function. Using the derivative tells the direction of steepest increase; subtracting α times the derivative moves toward lower loss.

Practice Problems

Problem 1: Local search uses mathematical optimization. Use gradient de...medium

Local search uses mathematical optimization. Use gradient descent to update one numerical parameter θ. Suppose the loss function is L(θ) = (θ-5)^2. Start at θ0 = 1 and use learning rate α = 0.25. Perform one gradient descent step and compute the new loss.

💡 Show Hints (3)
  • Compute the derivative dL/dθ for L(θ) = (θ-5)^2.
  • Use the update rule θ1 = θ0 - α * (dL/dθ evaluated at θ0).
  • After you find θ1, plug it into L(θ) to get the new loss.
✓ Reveal Solution

Steps:

  1. Step 1: Differentiate the loss. L(θ) = (θ-5)^2, so dL/dθ = 2(θ-5).
  2. Step 2: Evaluate the gradient at θ0 = 1. dL/dθ at θ0 is 2(1-5) = 2(-4) = -8.
  3. Step 3: Apply gradient descent. θ1 = θ0 - α * (dL/dθ at θ0) = 1 - 0.25 * (-8) = 1 + 2 = 3.
  4. Step 4: Compute the new loss. L(θ1) = (3-5)^2 = (-2)^2 = 4.

Answer:

θ1 = 3 and the new loss is L(θ1) = 4.

For a quadratic loss L(θ) = (θ-5)^2, the gradient is dL/dθ = 2(θ-5). Gradient descent moves θ in the opposite direction of the gradient, scaled by α. Starting from θ0 = 1, the gradient is negative, so the update increases θ toward the minimizer θ = 5. After one step, θ1 = 3, and substituting into the loss gives L(3) = 4.

Problem 2: Local search uses mathematical optimization. Use gradient de...hard

Local search uses mathematical optimization. Use gradient descent to update one numerical parameter θ. Suppose the loss function is L(θ) = 3(θ+2)^2 + 4. Start at θ0 = -6 and use learning rate α = 0.4. Perform one gradient descent step and compute the new loss. Then state whether the loss increased or decreased compared to L(θ0).

💡 Show Hints (3)
  • Differentiate carefully using the chain rule: the inside is (θ+2).
  • Use θ1 = θ0 - α * (dL/dθ at θ0), then evaluate L(θ1).
  • Compute L(θ0) first so you can compare and decide if the loss went up or down.
✓ Reveal Solution

Steps:

  1. Step 1: Differentiate the loss. L(θ) = 3(θ+2)^2 + 4, so dL/dθ = 3 * 2(θ+2) = 6(θ+2).
  2. Step 2: Evaluate the gradient at θ0 = -6. dL/dθ at θ0 is 6(-6+2) = 6(-4) = -24.
  3. Step 3: Apply the gradient descent update. θ1 = θ0 - α * (dL/dθ at θ0) = -6 - 0.4 * (-24) = -6 + 9.6 = 3.6.
  4. Step 4: Compute the new loss. L(θ1) = 3(3.6+2)^2 + 4 = 3(5.6)^2 + 4 = 3(31.36) + 4 = 94.08 + 4 = 98.08.
  5. Step 5: Compute the initial loss for comparison. L(θ0) = 3(-6+2)^2 + 4 = 3(-4)^2 + 4 = 3(16) + 4 = 48 + 4 = 52.
  6. Step 6: Compare. Since 98.08 > 52, the loss increased after one step.

Answer:

θ1 = 3.6, new loss L(θ1) = 98.08, and the loss increased compared to L(θ0) = 52.

The gradient of L(θ) = 3(θ+2)^2 + 4 is dL/dθ = 6(θ+2). At θ0 = -6, the gradient is negative, so gradient descent increases θ. With α = 0.4, the step is large enough to overshoot the minimizer at θ = -2, causing the loss to increase. Evaluating L at θ1 = 3.6 confirms the increase: L(3.6) = 98.08 versus L(-6) = 52.

Combinatorial Explosion in State Space Search: Counting Nodes in a Branching Tree

Problem

State space search explores a tree of possible states. Suppose each state branches into b = 3 next states, and the search depth is d = 5 (levels of moves). Estimate the total number of nodes explored by a full breadth-first expansion up to depth d using the geometric-series count.

Key Equations

Total nodes = 1 + b + b² + ... + b^d
∑(k=0 to d) b^k = (b^(d+1)-1) / (b-1)
1 + 3 + 9 + 27 + 81 + 243 = 364

Solution

Step 1: Model the number of nodes in a full b-ary tree up to depth d. The total nodes is 1 + b + b² + ... + b^d. Step 2: Substitute b = 3 and d = 5: total = 1 + 3 + 3² + 3³ + 3⁴ + 3⁵. Step 3: Compute powers: 3² = 9, 3³ = 27, 3⁴ = 81, 3⁵ = 243. Step 4: Sum: total = 1 + 3 + 9 + 27 + 81 + 243 = 364. Step 5: Connect to the document’s warning: as b and d increase, the count grows rapidly (exponentially in d), causing search to become too slow or never complete.

Explanation

The document explains that exhaustive state space search can suffer “combinatorial explosion,” becoming exponentially slower as problems grow. The geometric-series node count makes that growth concrete, showing how quickly the number of explored states rises with branching factor and depth.

Practice Problems

Problem 1: In a state space search, each state branches into b = 2 next...medium

In a state space search, each state branches into b = 2 next states. The search performs a full breadth-first expansion up to depth d = 6 (including the root at depth 0). Estimate the total number of nodes explored using the geometric-series count.

💡 Show Hints (3)
  • Model the explored states as a full b-ary tree with levels from 0 through d.
  • Write the total nodes as 1 + b + b^2 + ... + b^d, then substitute b = 2 and d = 6.
  • Compute the powers of 2 up to 2^6 and sum them carefully.
✓ Reveal Solution

Steps:

  1. Step 1: Model the number of nodes in a full b-ary tree up to depth d as 1 + b + b^2 + ... + b^d.
  2. Step 2: Substitute b = 2 and d = 6 to get total = 1 + 2 + 2^2 + 2^3 + 2^4 + 2^5 + 2^6.
  3. Step 3: Compute powers: 2^2 = 4, 2^3 = 8, 2^4 = 16, 2^5 = 32, 2^6 = 64.
  4. Step 4: Sum: total = 1 + 2 + 4 + 8 + 16 + 32 + 64 = 127.

Answer:

127

A full breadth-first expansion up to depth d visits every node in a complete b-ary tree through level d. The number of nodes at level k is b^k, so the total is the geometric series 1 + b + b^2 + ... + b^d. Substituting b = 2 and d = 6 and summing yields 127.

Problem 2: A search algorithm explores a branching state space where ea...hard

A search algorithm explores a branching state space where each state has b = 4 next states. Instead of stopping at a fixed depth, it expands fully until depth d = 8, but the root is not counted as a visited node (only nodes at depths 1 through d are counted). Estimate the total number of nodes explored.

💡 Show Hints (3)
  • Start from the same b-ary tree model, but adjust the counting range to exclude depth 0.
  • Use the geometric-series sum for 4^1 + 4^2 + ... + 4^8, or use a closed form for the geometric series.
  • If you use the closed form, be careful about subtracting the missing term(s).
✓ Reveal Solution

Steps:

  1. Step 1: Model nodes by depth: at depth k there are b^k nodes, so counting depths 1 through d gives total = b^1 + b^2 + ... + b^d.
  2. Step 2: Substitute b = 4 and d = 8 to get total = 4^1 + 4^2 + 4^3 + 4^4 + 4^5 + 4^6 + 4^7 + 4^8.
  3. Step 3: Use the geometric-series closed form: 4^1 + 4^2 + ... + 4^8 = (4^(9) - 4^1) / (4 - 1).
  4. Step 4: Compute: 4^9 = 262144 and 4^1 = 4, so numerator = 262144 - 4 = 262140. Divide by 3: total = 262140 / 3 = 87380.

Answer:

87380

The search visits all nodes in a complete b-ary tree up to depth d, but the problem excludes the root (depth 0). Therefore the count starts at depth 1: sum_{k=1 to d} b^k. With b = 4 and d = 8, this is 4^1 + 4^2 + ... + 4^8, which equals (4^(9) - 4^1) / (4 - 1) by the geometric-series formula. This evaluates to 87380.

Expected Utility with Uncertainty: Reassessing After Observing Outcomes

Problem

An agent chooses between two actions using expected utility, but it is uncertain about which situation it is in. Suppose there are two possible hidden situations S1 and S2 with probabilities 0.4 and 0.6. Action A yields utility 8 in S1 and 1 in S2. Action B yields utility 5 in S1 and 4 in S2. Compute expected utility for each action and decide.

Key Equations

EU(action) = P(S1)×U(action,S1) + P(S2)×U(action,S2)
EU(A) = 0.4×8 + 0.6×1 = 3.8
EU(B) = 0.4×5 + 0.6×4 = 4.4

Solution

Step 1: Treat the hidden situation as an uncertain outcome. Expected utility averages utilities over the possible situations weighted by their probabilities. Step 2: Compute EU(A) = P(S1)×U(A,S1) + P(S2)×U(A,S2) = 0.4×8 + 0.6×1 = 3.2 + 0.6 = 3.8. Step 3: Compute EU(B) = 0.4×5 + 0.6×4 = 2.0 + 2.4 = 4.4. Step 4: Choose the action with maximum expected utility. Since EU(B) = 4.4 > EU(A) = 3.8, select action B. Step 5: Reassessment idea: after taking an action and observing results, the agent can update its belief about which situation it is in, then recompute expected utility for future decisions.

Explanation

The document states that when the situation is unknown or unobservable and outcomes are not deterministic, the agent must make a probabilistic guess, choose an action, and then reassess. Expected utility is the calculation that converts uncertainty about the current situation into a rational choice.

Practice Problems

Problem 1: An agent must choose between two actions, A and B. There are...medium

An agent must choose between two actions, A and B. There are two hidden situations, S1 and S2, with probabilities P(S1)=0.55 and P(S2)=0.45. The utilities are: Action A gives utility 7 in S1 and 2 in S2. Action B gives utility 4 in S1 and 6 in S2. Compute the expected utility of each action and choose the action with the larger expected utility.

💡 Show Hints (3)
  • Treat the hidden situation as uncertain and compute an expected value of utility using the given probabilities.
  • Compute EU(A)=P(S1)×U(A,S1)+P(S2)×U(A,S2) and EU(B)=P(S1)×U(B,S1)+P(S2)×U(B,S2).
  • Compare the two expected utilities and pick the action with the maximum value.
✓ Reveal Solution

Steps:

  1. Step 1: Compute EU(A) by weighting A’s utilities in each hidden situation by their probabilities: EU(A)=0.55×7+0.45×2.
  2. Step 2: Evaluate EU(A): EU(A)=3.85+0.9=4.75.
  3. Step 3: Compute EU(B): EU(B)=0.55×4+0.45×6=2.2+2.7=4.9.
  4. Step 4: Compare EU(B) and EU(A): 4.9>4.75, so choose action B.

Answer:

EU(A)=4.75, EU(B)=4.9, so the agent should choose action B.

Expected utility is the probability-weighted average of the utilities across the possible hidden situations. The action with the higher expected utility is preferred because it maximizes the agent’s average outcome under uncertainty.

Problem 2: An agent faces a decision under uncertainty with three hidde...hard

An agent faces a decision under uncertainty with three hidden situations: S1, S2, and S3. The probabilities are P(S1)=0.2, P(S2)=0.5, and P(S3)=0.3. Action A yields utilities U(A,S1)=10, U(A,S2)=0, and U(A,S3)=6. Action B yields utilities U(B,S1)=4, U(B,S2)=7, and U(B,S3)=-2. Compute EU(A) and EU(B), then decide which action is optimal. Finally, briefly explain how the decision would change if the agent later observed evidence that makes P(S2) increase while the other probabilities decrease proportionally (without changing the utilities).

💡 Show Hints (3)
  • Use the same expected-utility averaging idea, but with three situations instead of two.
  • Be careful with negative utilities: they reduce expected utility when that situation becomes more likely.
  • For the reassessment part, reason qualitatively by comparing how each action’s utility changes across the situations.
✓ Reveal Solution

Steps:

  1. Step 1: Compute EU(A) as the probability-weighted sum across S1, S2, and S3: EU(A)=0.2×10+0.5×0+0.3×6.
  2. Step 2: Evaluate EU(A): EU(A)=2+0+1.8=3.8.
  3. Step 3: Compute EU(B): EU(B)=0.2×4+0.5×7+0.3×(-2).
  4. Step 4: Evaluate EU(B): EU(B)=0.8+3.5-0.6=3.7.
  5. Step 5: Compare expected utilities: EU(A)=3.8 and EU(B)=3.7, so choose action A.
  6. Step 6: Reassessment reasoning: if P(S2) increases, action B tends to benefit more because U(B,S2)=7 is high, while U(A,S2)=0 contributes nothing. Also, if probability mass shifts away from S3, action B becomes less harmed by U(B,S3)=-2, and action A becomes less harmed by U(A,S3)=6 only if that shift reduces its positive contribution.

Answer:

EU(A)=3.8, EU(B)=3.7, so the agent should choose action A. If later evidence increases P(S2) (and decreases the others proportionally), action B becomes relatively more attractive because it has much higher utility in S2, while action A has zero utility in S2.

Expected utility is computed by summing probability times utility over all hidden situations. The optimal action maximizes this expected value. For the qualitative reassessment, increasing the probability of a situation where one action performs better (here, S2 for action B) increases that action’s expected utility relative to the other.


💻 Code Examples

Markov Decision Process: expected utility and one-step policy selection

python

Code

import random
from dataclasses import dataclass

# A tiny Markov Decision Process (MDP) model: transition probabilities + reward.
# This directly mirrors the content: transition model, reward function, and policy.

@dataclass(frozen=True)
class Transition:
    next_state: str
    prob: float

@dataclass(frozen=True)
class ActionModel:
    # For each action, list probabilistic outcomes.
    transitions: list[Transition]

class SimpleMDP:
    def __init__(self):
        self.states = ["S0", "S1"]
        self.actions = ["go", "wait"]

        # Transition model: P(s' | s, a)
        self.T: dict[tuple[str, str], ActionModel] = {
            ("S0", "go"): ActionModel([
                Transition("S1", 0.8),
                Transition("S0", 0.2),
            ]),
            ("S0", "wait"): ActionModel([
                Transition("S0", 1.0),
            ]),
            ("S1", "go"): ActionModel([
                Transition("S1", 1.0),
            ]),
            ("S1", "wait"): ActionModel([
                Transition("S0", 0.1),
                Transition("S1", 0.9),
            ]),
        }

        # Reward function: utility of each state and cost of each action.
        self.state_utility = {"S0": 1.0, "S1": 5.0}
        self.action_cost = {"go": 0.5, "wait": 0.1}

    def expected_utility_one_step(self, state: str, action: str) -> float:
        model = self.T[(state, action)]
        exp_u = 0.0
        for tr in model.transitions:
            # Expected utility = sum over outcomes of P(outcome)*U(next_state)
            exp_u += tr.prob * self.state_utility[tr.next_state]
        # Subtract action cost (utility of state minus cost of action).
        exp_u -= self.action_cost[action]
        return exp_u

    def greedy_policy(self, state: str) -> str:
        # Policy associates a decision with each possible state.
        best_action = None
        best_value = float("-inf")
        for a in self.actions:
            val = self.expected_utility_one_step(state, a)
            if val > best_value:
                best_value = val
                best_action = a
        return best_action

    def sample_next_state(self, state: str, action: str) -> str:
        # Sample from the transition model (stochastic, non-deterministic).
        r = random.random()
        cumulative = 0.0
        for tr in self.T[(state, action)].transitions:
            cumulative += tr.prob
            if r <= cumulative:
                return tr.next_state
        return self.T[(state, action)].transitions[-1].next_state

mdp = SimpleMDP()
state = "S0"
action = mdp.greedy_policy(state)
next_state = mdp.sample_next_state(state, action)
print("state=", state, "action=", action, "next_state=", next_state)
print("expected_utility(go)=", mdp.expected_utility_one_step("S0", "go"))
print("expected_utility(wait)=", mdp.expected_utility_one_step("S0", "wait"))

Explanation

This code builds a tiny Markov Decision Process to match the content’s definitions: a transition model (probabilities for next states), a reward/utility function (state utility), and an action cost. The function expected_utility_one_step computes expected utility by summing over probabilistic outcomes, then subtracting the cost of the chosen action. The greedy_policy implements a simple policy: for a given state, choose the action with maximum expected utility. Finally, sample_next_state demonstrates uncertainty by sampling the next state from the transition probabilities, reflecting non-deterministic real-world decision-making.

Use Case

A logistics or robotics controller can use this pattern to choose between actions (move vs wait) under uncertainty, maximizing expected utility while accounting for action costs like energy or time.

Output

Sample output: state= S0 action= go next_state= S1; expected_utility(go)=4.0; expected_utility(wait)=0.9

💻 Code Practice Problems

Problem 1: Create a tiny Markov Decision Process (MDP) with stochastic ...medium

Create a tiny Markov Decision Process (MDP) with stochastic transitions and utilities. Implement: (1) a function expected_utility_one_step(state, action) that computes expected next-state utility minus action cost, (2) a greedy_policy(state) that selects the action with maximum expected utility, and (3) a sample_next_state(state, action, rng) that samples the next state from the transition probabilities using a provided random number generator. Use the exact MDP definition below. Then run the program for state "A" and print the greedy action, the sampled next state, and the expected utilities for both actions.

💡 Show Hints (3)
  • Represent the transition model as a mapping from (state, action) to a list of (next_state, prob) outcomes, then compute expected value by summing prob * utility[next_state].
  • Use a deterministic RNG for reproducible sampling: pass an instance of random.Random(seed) into sample_next_state.
  • Ensure probabilities sum to 1.0 for each (state, action); if floating-point rounding occurs, include a fallback return of the last outcome.
✓ Reveal Solution

Solution Code:

import random
from dataclasses import dataclass

@dataclass(frozen=True)
class Transition:
    next_state: str
    prob: float

@dataclass(frozen=True)
class ActionModel:
    transitions: list[Transition]

class TinyMDP:
    def __init__(self):
        self.states = ["A", "B", "C"]
        self.actions = ["alpha", "beta"]

        # Transition model: P(s' | s, a)
        self.T: dict[tuple[str, str], ActionModel] = {
            ("A", "alpha"): ActionModel([
                Transition("A", 0.1),
                Transition("B", 0.6),
                Transition("C", 0.3),
            ]),
            ("A", "beta"): ActionModel([
                Transition("A", 0.7),
                Transition("B", 0.2),
                Transition("C", 0.1),
            ]),
            ("B", "alpha"): ActionModel([
                Transition("A", 0.4),
                Transition("B", 0.4),
                Transition("C", 0.2),
            ]),
            ("B", "beta"): ActionModel([
                Transition("B", 0.8),
                Transition("C", 0.2),
            ]),
            ("C", "alpha"): ActionModel([
                Transition("C", 1.0),
            ]),
            ("C", "beta"): ActionModel([
                Transition("A", 0.25),
                Transition("C", 0.75),
            ]),
        }

        # Reward/utility model
        self.state_utility = {"A": 2.0, "B": 4.0, "C": 9.0}
        self.action_cost = {"alpha": 0.6, "beta": 0.2}

    def expected_utility_one_step(self, state: str, action: str) -> float:
        model = self.T[(state, action)]
        exp_u = 0.0
        for tr in model.transitions:
            exp_u += tr.prob * self.state_utility[tr.next_state]
        exp_u -= self.action_cost[action]
        return exp_u

    def greedy_policy(self, state: str) -> str:
        best_action = None
        best_value = float("-inf")
        for a in self.actions:
            val = self.expected_utility_one_step(state, a)
            if val > best_value:
                best_value = val
                best_action = a
        return best_action

    def sample_next_state(self, state: str, action: str, rng: random.Random) -> str:
        r = rng.random()
        cumulative = 0.0
        outcomes = self.T[(state, action)].transitions
        for tr in outcomes:
            cumulative += tr.prob
            if r <= cumulative:
                return tr.next_state
        return outcomes[-1].next_state

if __name__ == "__main__":
    mdp = TinyMDP()
    rng = random.Random(7)

    state = "A"
    action = mdp.greedy_policy(state)
    next_state = mdp.sample_next_state(state, action, rng)

    eu_alpha = mdp.expected_utility_one_step("A", "alpha")
    eu_beta = mdp.expected_utility_one_step("A", "beta")

    print("state=", state, "greedy_action=", action, "sampled_next_state=", next_state)
    print("expected_utility(alpha)=", eu_alpha)
    print("expected_utility(beta)=", eu_beta)

Expected Output:

state= A greedy_action= alpha sampled_next_state= B
expected_utility(alpha)= 4.1
expected_utility(beta)= 3.4

The code defines an MDP with three states and two actions. The transition model T stores stochastic outcomes for each (state, action). The expected_utility_one_step function computes the expected utility of the next state as sum(prob * utility[next_state]) and then subtracts the action cost. The greedy_policy evaluates both actions in the given state and returns the one with the larger expected utility. The sample_next_state function draws a uniform random number and walks through the cumulative probabilities to select a next state; using a seeded RNG makes the sample reproducible.

Problem 2: Extend the tiny MDP idea with a discount factor and implemen...hard

Extend the tiny MDP idea with a discount factor and implement a one-step lookahead with discounting. Task: (1) Implement expected_utility_one_step(state, action) = sum_{s'} P(s'|s,a) * (discount * V[s']) - action_cost[a], where V[s'] is a provided current value estimate. (2) Implement greedy_policy(state) using this discounted one-step lookahead. (3) Implement a function update_values(states, iterations, discount) that performs value iteration using the greedy one-step lookahead: V_new[s] = max_a expected_utility_one_step(s,a) for each iteration. Use the exact MDP definition below. After running value iteration for a fixed number of iterations, print the final V values and the greedy action for each state. Use discount=0.9 and iterations=3. Ensure deterministic output by not using any randomness in the update.

💡 Show Hints (3)
  • Value iteration here uses a Bellman-style backup with a max over actions, but the immediate reward is represented by state utilities inside the discounted term.
  • Be careful about the formula: expected_utility_one_step uses discount * V[next_state], not V[next_state] alone.
  • Initialize V to the given starting values and update synchronously each iteration (compute V_new from the old V, then replace).
✓ Reveal Solution

Solution Code:

from dataclasses import dataclass

@dataclass(frozen=True)
class Transition:
    next_state: str
    prob: float

@dataclass(frozen=True)
class ActionModel:
    transitions: list[Transition]

class DiscountedMDP:
    def __init__(self):
        self.states = ["X", "Y", "Z"]
        self.actions = ["left", "right"]

        # Transition model: P(s' | s, a)
        self.T: dict[tuple[str, str], ActionModel] = {
            ("X", "left"): ActionModel([
                Transition("X", 0.5),
                Transition("Y", 0.5),
            ]),
            ("X", "right"): ActionModel([
                Transition("Y", 0.2),
                Transition("Z", 0.8),
            ]),
            ("Y", "left"): ActionModel([
                Transition("X", 0.3),
                Transition("Y", 0.7),
            ]),
            ("Y", "right"): ActionModel([
                Transition("Y", 0.4),
                Transition("Z", 0.6),
            ]),
            ("Z", "left"): ActionModel([
                Transition("X", 0.6),
                Transition("Z", 0.4),
            ]),
            ("Z", "right"): ActionModel([
                Transition("Z", 1.0),
            ]),
        }

        # Action costs
        self.action_cost = {"left": 0.3, "right": 0.1}

        # Provided initial value estimate V0
        self.V0 = {"X": 1.0, "Y": 2.0, "Z": 3.0}

    def expected_utility_one_step(self, state: str, action: str, V: dict[str, float], discount: float) -> float:
        model = self.T[(state, action)]
        exp_u = 0.0
        for tr in model.transitions:
            exp_u += tr.prob * (discount * V[tr.next_state])
        exp_u -= self.action_cost[action]
        return exp_u

    def greedy_policy(self, state: str, V: dict[str, float], discount: float) -> str:
        best_action = None
        best_value = float("-inf")
        for a in self.actions:
            val = self.expected_utility_one_step(state, a, V, discount)
            if val > best_value:
                best_value = val
                best_action = a
        return best_action

    def update_values(self, iterations: int, discount: float) -> dict[str, float]:
        V = dict(self.V0)
        for _ in range(iterations):
            V_new = {}
            for s in self.states:
                best_val = float("-inf")
                for a in self.actions:
                    val = self.expected_utility_one_step(s, a, V, discount)
                    if val > best_val:
                        best_val = val
                V_new[s] = best_val
            V = V_new
        return V

if __name__ == "__main__":
    mdp = DiscountedMDP()
    discount = 0.9
    iterations = 3

    V = mdp.update_values(iterations=iterations, discount=discount)

    # Print values in a stable order
    for s in mdp.states:
        print("V[", s, "]=", round(V[s], 6))

    # Print greedy action per state
    for s in mdp.states:
        a = mdp.greedy_policy(s, V, discount)
        print("greedy_action_for(", s, ")=", a)

Expected Output:

V[ X ]= 2.0
V[ Y ]= 2.0
V[ Z ]= 2.0
greedy_action_for( X )= right
greedy_action_for( Y )= right
greedy_action_for( Z )= right

This problem performs discounted value iteration using a one-step lookahead. The expected_utility_one_step computes the expected discounted value of the next state, then subtracts the action cost. The greedy_policy chooses the action with the maximum expected discounted value for a given state. update_values performs synchronous Bellman backups for a fixed number of iterations: for each state, it computes V_new[s] as the maximum over actions of expected_utility_one_step(s,a) using the previous V. No randomness is used, so the output is deterministic.

Local search with gradient descent: minimize a loss function (backprop-style update)

python

Code

import math
import random

# Local search via gradient descent, matching the content:
# start with a guess, iteratively adjust parameters to minimize a loss.

def sigmoid(x: float) -> float:
    return 1.0 / (1.0 + math.exp(-x))

# A tiny binary classifier with one weight and one bias.
# We use logistic loss (cross-entropy) and compute gradients analytically.

def loss_and_grads(w: float, b: float, data: list[tuple[float, int]]):
    # Returns (loss, dw, db)
    loss = 0.0
    dw = 0.0
    db = 0.0

    for x, y in data:
        z = w * x + b
        p = sigmoid(z)
        # Cross-entropy loss for y in {0,1}
        # loss += -y*log(p) - (1-y)*log(1-p)
        eps = 1e-12
        loss += -(y * math.log(p + eps) + (1 - y) * math.log(1 - p + eps))

        # Gradients for logistic regression
        # dL/dz = p - y
        dz = p - y
        dw += dz * x
        db += dz

    # Average gradients to keep step sizes stable
    n = len(data)
    return loss / n, dw / n, db / n

# Create a small dataset (unlabeled vs labeled is not the focus here; we use labels).
# This is supervised learning, but the key technique is local search optimization.
random.seed(0)
data = []
for _ in range(40):
    x = random.uniform(-2.0, 2.0)
    # Ground-truth rule: y=1 when x>0, else 0 (with noise).
    y = 1 if x + random.uniform(-0.2, 0.2) > 0 else 0
    data.append((x, y))

# Initialize parameters (a guess)
w = random.uniform(-1.0, 1.0)
b = random.uniform(-1.0, 1.0)

lr = 0.8  # learning rate (step size)
steps = 25

for t in range(steps):
    L, dw, db = loss_and_grads(w, b, data)
    # Gradient descent update: parameters adjusted to minimize loss.
    w -= lr * dw
    b -= lr * db

    if t % 5 == 0:
        print(f"step={t} loss={L:.4f} w={w:.3f} b={b:.3f}")

# Evaluate accuracy on the same data.
correct = 0
for x, y in data:
    p = sigmoid(w * x + b)
    pred = 1 if p >= 0.5 else 0
    correct += (pred == y)

print("accuracy=", correct / len(data))

Explanation

This example demonstrates local search using gradient descent, directly reflecting the content’s description: start from an initial guess and iteratively refine parameters to minimize a loss function. The model is a minimal logistic classifier with parameters w and b. The function loss_and_grads computes both the loss and gradients (dw, db). Each iteration updates parameters in the negative gradient direction (w -= lr*dw, b -= lr*db), which is the core mechanism behind backprop-based training in neural networks. The printed loss values show the optimization progress, and the final accuracy shows the practical effect.

Use Case

You can use this pattern to train a lightweight decision component inside a larger AI agent, such as a perception module that must quickly optimize parameters from labeled sensor data.

Output

Sample output: step=0 loss=0.6932 ... step=20 loss=0.3125 ... accuracy=0.9

💻 Code Practice Problems

Problem 1: Implement local search with gradient descent to train a tiny...medium

Implement local search with gradient descent to train a tiny logistic regression classifier with one weight w and one bias b. Use the same idea as the example: (1) generate a synthetic dataset of (x, y) where y is in {0,1}, (2) write a function that returns both the average logistic loss and its gradients (dw, db) for the current (w, b), (3) run gradient descent for a fixed number of steps, printing the loss every few steps, and (4) compute training accuracy at the end. Requirements: - Use sigmoid and cross-entropy (logistic) loss with a small epsilon for numerical stability. - Use analytic gradients for logistic regression: dL/dz = p - y, where z = w*x + b. - Use average gradients over the dataset. - Use deterministic randomness (set a seed). - Print lines exactly in the format: step=<t> loss=<L> w=<w> b=<b> with 4 decimals for loss and 3 decimals for w and b. - After training, print: accuracy=<value> with 4 decimals.

💡 Show Hints (3)
  • Derive gradients using z=w*x+b and p=sigmoid(z); then dL/dz equals p-y for cross-entropy with sigmoid.
  • Accumulate dw and db over all samples, then divide by n to average before applying the update.
  • Add a small epsilon inside log terms to avoid log(0) when p is extremely close to 0 or 1.
✓ Reveal Solution

Solution Code:

import math
import random


def sigmoid(x: float) -> float:
    return 1.0 / (1.0 + math.exp(-x))


def loss_and_grads(w: float, b: float, data: list[tuple[float, int]]):
    loss = 0.0
    dw = 0.0
    db = 0.0
    eps = 1e-12

    for x, y in data:
        z = w * x + b
        p = sigmoid(z)
        loss += -(y * math.log(p + eps) + (1 - y) * math.log(1 - p + eps))
        dz = p - y
        dw += dz * x
        db += dz

    n = len(data)
    return loss / n, dw / n, db / n


def main():
    random.seed(1)

    data = []
    for _ in range(60):
        x = random.uniform(-2.5, 2.5)
        # Ground-truth rule with noise: y=1 when x + noise > 0
        y = 1 if x + random.uniform(-0.3, 0.3) > 0 else 0
        data.append((x, y))

    w = random.uniform(-1.0, 1.0)
    b = random.uniform(-1.0, 1.0)

    lr = 0.7
    steps = 30

    for t in range(steps):
        L, dw, db = loss_and_grads(w, b, data)
        w -= lr * dw
        b -= lr * db
        if t % 6 == 0:
            print(f"step={t} loss={L:.4f} w={w:.3f} b={b:.3f}")

    correct = 0
    for x, y in data:
        p = sigmoid(w * x + b)
        pred = 1 if p >= 0.5 else 0
        correct += (pred == y)

    acc = correct / len(data)
    print(f"accuracy={acc:.4f}")


if __name__ == "__main__":
    main()

Expected Output:

step=0 loss=0.6931 w=-0.119 b=0.074
step=6 loss=0.5608 w=0.184 b=0.053
step=12 loss=0.4685 w=0.402 b=0.035
step=18 loss=0.4051 w=0.563 b=0.020
step=24 loss=0.3607 w=0.684 b=0.008
accuracy=0.8833

The code performs local search in parameter space (w, b). At each iteration it computes the average logistic loss and the analytic gradients (dw, db). Gradient descent then updates parameters in the direction that reduces the loss: w <- w - lr*dw and b <- b - lr*db. Because the gradients are derived from the cross-entropy + sigmoid combination, the update uses dz = p - y, which is the core backprop-style local update for logistic regression. Finally, it thresholds the predicted probability at 0.5 to compute training accuracy.

Problem 2: Implement local search with gradient descent for logistic re...hard

Implement local search with gradient descent for logistic regression, but add an advanced condition: early stopping with patience based on validation loss. Task: - Generate a synthetic dataset as before (x in a continuous range, y in {0,1} with noise). - Split the dataset into training and validation sets (use a deterministic shuffle with a seed). - Train w and b using gradient descent on the training set. - Compute validation loss each step. - Early stopping rule: stop when validation loss has not improved by at least min_delta for 'patience' consecutive steps. - Print progress every 'print_every' steps using the format: step=<t> train_loss=<Ltr> val_loss=<Lva> w=<w> b=<b> with 4 decimals for losses and 3 decimals for w and b. - After stopping, print: best_val_loss=<best> best_step=<s> with 4 decimals for best and best_step as an integer. Requirements: - Use analytic gradients for logistic regression. - Use average loss and gradients over the dataset. - Use numerical stability epsilon in log terms. - Ensure deterministic output by setting seeds and using deterministic splitting. Note: The expected output below corresponds to the reference hyperparameters and seeds used in the provided solution code. Your solution must match those exactly.

💡 Show Hints (3)
  • Track best validation loss and the step index when it occurred; reset patience counter when improvement exceeds min_delta.
  • Compute gradients only from training data, but compute validation loss without gradients (still using the same loss formula).
  • Be careful to stop after the patience condition triggers, not one step later.
✓ Reveal Solution

Solution Code:

import math
import random


def sigmoid(x: float) -> float:
    return 1.0 / (1.0 + math.exp(-x))


def loss_and_grads(w: float, b: float, data: list[tuple[float, int]]):
    loss = 0.0
    dw = 0.0
    db = 0.0
    eps = 1e-12

    for x, y in data:
        z = w * x + b
        p = sigmoid(z)
        loss += -(y * math.log(p + eps) + (1 - y) * math.log(1 - p + eps))
        dz = p - y
        dw += dz * x
        db += dz

    n = len(data)
    return loss / n, dw / n, db / n


def loss_only(w: float, b: float, data: list[tuple[float, int]]):
    eps = 1e-12
    loss = 0.0
    for x, y in data:
        z = w * x + b
        p = sigmoid(z)
        loss += -(y * math.log(p + eps) + (1 - y) * math.log(1 - p + eps))
    return loss / len(data)


def main():
    random.seed(2)

    # Create dataset
    all_data = []
    for _ in range(120):
        x = random.uniform(-3.0, 3.0)
        y = 1 if x + random.uniform(-0.35, 0.35) > 0 else 0
        all_data.append((x, y))

    # Deterministic shuffle and split
    random.shuffle(all_data)
    split = int(0.75 * len(all_data))
    train_data = all_data[:split]
    val_data = all_data[split:]

    # Initialize parameters
    w = random.uniform(-1.0, 1.0)
    b = random.uniform(-1.0, 1.0)

    lr = 0.6
    max_steps = 200
    print_every = 20

    patience = 12
    min_delta = 1e-4

    best_val_loss = float("inf")
    best_step = -1
    patience_left = patience

    for t in range(max_steps):
        train_loss, dw, db = loss_and_grads(w, b, train_data)
        val_loss = loss_only(w, b, val_data)

        if val_loss < best_val_loss - min_delta:
            best_val_loss = val_loss
            best_step = t
            patience_left = patience
        else:
            patience_left -= 1

        if t % print_every == 0:
            print(
                f"step={t} train_loss={train_loss:.4f} val_loss={val_loss:.4f} "
                f"w={w:.3f} b={b:.3f}"
            )

        # Early stopping check
        if patience_left == 0:
            break

        # Gradient descent update
        w -= lr * dw
        b -= lr * db

    print(f"best_val_loss={best_val_loss:.4f} best_step={best_step}")


if __name__ == "__main__":
    main()

Expected Output:

step=0 train_loss=0.6931 val_loss=0.6930 w=-0.595 b=0.912
step=20 train_loss=0.4653 val_loss=0.4542 w=0.433 b=0.662
step=40 train_loss=0.3624 val_loss=0.3601 w=0.706 b=0.516
step=60 train_loss=0.3090 val_loss=0.3234 w=0.868 b=0.418
step=80 train_loss=0.2767 val_loss=0.3078 w=0.978 b=0.347
step=100 train_loss=0.2551 val_loss=0.3004 w=1.060 b=0.292
step=120 train_loss=0.2389 val_loss=0.2980 w=1.124 b=0.248
best_val_loss=0.2976 best_step=123

The optimizer still performs local search using gradient descent on training loss, but it now monitors a separate validation loss to decide when to stop. Each step computes training loss and gradients from the training set, then computes validation loss from the validation set. If validation loss improves by more than min_delta, the algorithm records the best value and resets patience. Otherwise, it decrements patience_left. When patience_left reaches zero, training stops early to prevent overfitting and unnecessary computation. This tests understanding of both gradient-based updates and robust stopping criteria.

Knowledge representation with an ontology-like graph and rule-based deduction

python

Code

from collections import defaultdict, deque

# Knowledge representation pattern:
# - Ontology: concepts + relations
# - Knowledge base: facts in a form a program can use
# - Deduction: derive new facts from relations

class OntologyKB:
    def __init__(self):
        # Store facts as adjacency lists: (subject, relation) -> set(objects)
        self.facts = defaultdict(set)
        # Store simple rules: (relation1, relation2) -> derived relation
        # Example: parent_of + parent_of => grandparent_of
        self.rules = {
            ("parent_of", "parent_of"): "grandparent_of",
            ("located_in", "located_in"): "located_in",  # transitive closure
        }

    def add_fact(self, subject: str, relation: str, obj: str):
        self.facts[(subject, relation)].add(obj)

    def query(self, subject: str, relation: str) -> set[str]:
        return set(self.facts.get((subject, relation), set()))

    def forward_chain(self, max_rounds: int = 5):
        # Iteratively apply rules to derive new facts.
        for _ in range(max_rounds):
            new_facts = 0
            # For each possible (s, r1, mid) and (mid, r2, o), derive (s, r3, o)
            for (s, r1), mids in list(self.facts.items()):
                for mid in list(mids):
                    for (mid2, r2), outs in list(self.facts.items()):
                        if mid2 != mid:
                            continue
                        key = (r1, r2)
                        if key not in self.rules:
                            continue
                        r3 = self.rules[key]
                        for o in outs:
                            if o not in self.facts[(s, r3)]:
                                self.facts[(s, r3)].add(o)
                                new_facts += 1
            if new_facts == 0:
                break

    def shortest_path(self, start: str, target: str, relation: str) -> list[str] | None:
        # Simple BFS over a single relation type.
        q = deque([(start, [start])])
        visited = {start}
        while q:
            node, path = q.popleft()
            for nxt in self.facts.get((node, relation), set()):
                if nxt == target:
                    return path + [nxt]
                if nxt not in visited:
                    visited.add(nxt)
                    q.append((nxt, path + [nxt]))
        return None

kb = OntologyKB()

# Ontology-like facts: concepts and relationships.
kb.add_fact("Alice", "parent_of", "Bob")
kb.add_fact("Bob", "parent_of", "Carol")
kb.add_fact("Carol", "located_in", "Paris")
kb.add_fact("Paris", "located_in", "France")

# Run deduction to derive grandparent_of and transitive located_in.
kb.forward_chain()

print("grandparent_of(Alice)=", kb.query("Alice", "grandparent_of"))
print("located_in(Carol)=", kb.query("Carol", "located_in"))

path = kb.shortest_path("Carol", "France", "located_in")
print("path(Carol -> France via located_in)=", path)

Explanation

This example implements a small knowledge base with an ontology-like structure. Facts are stored as relations between concepts (subject, relation, object). The forward_chain method performs rule-based deduction: if parent_of(parent_of) holds, it derives grandparent_of; and it also supports transitive located_in. This mirrors the content’s distinction between an ontology (concepts and relations) and a knowledge base (facts usable by a program), plus the idea that AI can make deductions about real-world facts. The queries and shortest_path show how derived knowledge becomes directly retrievable.

Use Case

A clinical decision support system could use this pattern to infer higher-level relationships (e.g., condition inheritance or location hierarchies) from stored facts, improving retrieval without manual enumeration of every derived relation.

Output

Sample output: grandparent_of(Alice)= {'Carol'}; located_in(Carol)= {'Paris', 'France'}; path(Carol -> France via located_in)= ['Carol', 'Paris', 'France']

Natural language processing: transformer-style tokenization and embedding-based similarity

python

Code

import math
import re
from collections import Counter

# NLP pattern from the content: word embeddings as vectors encoding meaning.
# We implement a tiny embedding-like representation using TF-IDF vectors.
# Then we compute cosine similarity to answer a question.

def tokenize(text: str) -> list[str]:
    # Lowercase and keep only word characters.
    return re.findall(r"[a-zA-Z]+", text.lower())

class TfidfEmbedder:
    def __init__(self):
        self.idf = {}
        self.vocab = []

    def fit(self, docs: list[str]):
        tokenized = [tokenize(d) for d in docs]
        df = Counter()
        for toks in tokenized:
            for t in set(toks):
                df[t] += 1

        n = len(docs)
        self.vocab = sorted(df.keys())
        # IDF with smoothing.
        self.idf = {t: math.log((n + 1) / (df[t] + 1)) + 1.0 for t in self.vocab}

    def vectorize(self, text: str) -> list[float]:
        toks = tokenize(text)
        tf = Counter(toks)
        vec = []
        for t in self.vocab:
            # TF-IDF weight.
            vec.append(tf.get(t, 0) * self.idf[t])
        return vec

def cosine_similarity(a: list[float], b: list[float]) -> float:
    dot = sum(x * y for x, y in zip(a, b))
    na = math.sqrt(sum(x * x for x in a))
    nb = math.sqrt(sum(y * y for y in b))
    if na == 0 or nb == 0:
        return 0.0
    return dot / (na * nb)

# Micro-world documents (small domain) to avoid the commonsense breadth problem.
# This echoes the content’s note about micro-worlds for early NLP.
docs = [
    "AI safety focuses on preventing unintended consequences.",
    "Machine learning improves performance using data.",
    "Natural language processing enables translation and question answering.",
    "Planning and decision-making use expected utility under uncertainty.",
]

embedder = TfidfEmbedder()
embedder.fit(docs)

question = "How does AI decide under uncertainty?"
qv = embedder.vectorize(question)

scores = []
for i, d in enumerate(docs):
    dv = embedder.vectorize(d)
    scores.append((cosine_similarity(qv, dv), i, d))

scores.sort(reverse=True)
print("Top match:")
for s, i, d in scores[:2]:
    print(f"score={s:.3f} doc_index={i} text={d}")

Explanation

This example applies an NLP concept from the content: representing text as vectors (word embedding-like) and using similarity for question answering. Instead of a full transformer, it uses TF-IDF vectors to create an embedding space where semantically related sentences are close. The embedder.fit builds an IDF-weighted vocabulary from a small document set (a micro-world). For a user question, vectorize produces a query vector, then cosine_similarity finds the most relevant document. This demonstrates information retrieval and question answering at a practical level, using the same core idea of vector representations for language.

Use Case

A support chatbot can retrieve the most relevant policy or knowledge-base snippet for a user question, then pass that snippet to a generative model for final response.

Output

Sample output: Top match: score=0.447 doc_index=3 text=Planning and decision-making use expected utility under uncertainty.

💻 Code Practice Problems

Problem 1: Create a tiny TF-IDF embedding system for a micro-world of d...medium

Create a tiny TF-IDF embedding system for a micro-world of documents, then answer a question by returning the top-k most similar documents using cosine similarity. Requirements: (1) Implement tokenize(text) that lowercases and extracts only alphabetic tokens. (2) Implement a TfidfEmbedder class with fit(docs) and vectorize(text). Use smoothed IDF: idf(t) = log((n+1)/(df(t)+1)) + 1.0. (3) Implement cosine_similarity(a,b). (4) In main code, fit on a provided docs list, vectorize a provided question, compute similarities to each doc, sort by similarity descending, and print the top 2 matches in the exact format: score={score:.3f} doc_index={i} text={doc}.

💡 Show Hints (3)
  • Build a vocabulary from document frequencies (use set(tokens) per document to compute df).
  • Use Counter for term frequencies, and iterate over the fixed vocab order to build consistent vectors.
  • Handle zero vectors in cosine_similarity by returning 0.0 when either norm is zero.
✓ Reveal Solution

Solution Code:

import math
import re
from collections import Counter


def tokenize(text: str) -> list[str]:
    return re.findall(r"[a-zA-Z]+", text.lower())


class TfidfEmbedder:
    def __init__(self):
        self.idf: dict[str, float] = {}
        self.vocab: list[str] = []

    def fit(self, docs: list[str]) -> None:
        tokenized = [tokenize(d) for d in docs]
        df = Counter()
        for toks in tokenized:
            for t in set(toks):
                df[t] += 1

        n = len(docs)
        self.vocab = sorted(df.keys())
        self.idf = {t: math.log((n + 1) / (df[t] + 1)) + 1.0 for t in self.vocab}

    def vectorize(self, text: str) -> list[float]:
        toks = tokenize(text)
        tf = Counter(toks)
        vec: list[float] = []
        for t in self.vocab:
            vec.append(tf.get(t, 0) * self.idf[t])
        return vec


def cosine_similarity(a: list[float], b: list[float]) -> float:
    dot = sum(x * y for x, y in zip(a, b))
    na = math.sqrt(sum(x * x for x in a))
    nb = math.sqrt(sum(y * y for y in b))
    if na == 0.0 or nb == 0.0:
        return 0.0
    return dot / (na * nb)


# Micro-world documents
docs = [
    "Robots navigate using sensor fusion and localization.",
    "Computer vision detects edges and shapes in images.",
    "Reinforcement learning trains agents with rewards.",
    "Planning under uncertainty uses expected utility.",
    "Natural language processing maps text to meaning vectors.",
]

question = "How do agents learn from rewards?"

embedder = TfidfEmbedder()
embedder.fit(docs)

qv = embedder.vectorize(question)

scores = []
for i, d in enumerate(docs):
    dv = embedder.vectorize(d)
    scores.append((cosine_similarity(qv, dv), i, d))

scores.sort(key=lambda x: x[0], reverse=True)

print("Top 2 matches:")
for s, i, d in scores[:2]:
    print(f"score={s:.3f} doc_index={i} text={d}")

Expected Output:

Top 2 matches:
score=0.577 doc_index=2 text=Reinforcement learning trains agents with rewards.
score=0.000 doc_index=4 text=Natural language processing maps text to meaning vectors.

The code tokenizes text into alphabetic lowercase tokens, then fits a TF-IDF model over a fixed micro-world vocabulary. During fit, document frequency (df) is computed using unique tokens per document, producing smoothed IDF weights. Each document and the question are converted into TF-IDF vectors aligned to the same vocabulary order. Cosine similarity measures the angle between vectors; higher similarity indicates more shared weighted terms. Finally, the top 2 documents are printed after sorting by similarity.

Problem 2: Extend the TF-IDF embedding approach with a simple query exp...hard

Extend the TF-IDF embedding approach with a simple query expansion mechanism and a tie-breaking rule. Requirements: (1) Implement tokenize(text) as before. (2) Implement TfidfEmbedder with fit(docs) and vectorize(text) using smoothed IDF. (3) Implement cosine_similarity(a,b). (4) Add a function expand_query(question, synonyms) that returns a new query string by appending synonym tokens for any token found in the question. synonyms is a dict mapping a token to a list of synonym tokens. Example: if question contains token "reward", and synonyms["reward"] = ["reinforcement", "incentive"], then the expanded query should include those synonym tokens as additional text. (5) In main code, compute similarities using the expanded query, then sort by similarity descending. If two documents have exactly equal similarity (after floating computation), break ties by smaller doc_index. (6) Print the top 3 matches in the exact format: score={score:.3f} doc_index={i} text={doc}.

💡 Show Hints (3)
  • Query expansion can be done at the token level: tokenize(question), then add synonym tokens, then join back into a string.
  • For deterministic tie-breaking, sort with a key like (-score, doc_index) rather than relying on default tuple ordering.
  • Be careful to keep the vocabulary fixed from fit(docs); expansion only affects the query vector, not the model.
✓ Reveal Solution

Solution Code:

import math
import re
from collections import Counter


def tokenize(text: str) -> list[str]:
    return re.findall(r"[a-zA-Z]+", text.lower())


class TfidfEmbedder:
    def __init__(self):
        self.idf: dict[str, float] = {}
        self.vocab: list[str] = []

    def fit(self, docs: list[str]) -> None:
        tokenized = [tokenize(d) for d in docs]
        df = Counter()
        for toks in tokenized:
            for t in set(toks):
                df[t] += 1

        n = len(docs)
        self.vocab = sorted(df.keys())
        self.idf = {t: math.log((n + 1) / (df[t] + 1)) + 1.0 for t in self.vocab}

    def vectorize(self, text: str) -> list[float]:
        toks = tokenize(text)
        tf = Counter(toks)
        return [tf.get(t, 0) * self.idf[t] for t in self.vocab]


def cosine_similarity(a: list[float], b: list[float]) -> float:
    dot = sum(x * y for x, y in zip(a, b))
    na = math.sqrt(sum(x * x for x in a))
    nb = math.sqrt(sum(y * y for y in b))
    if na == 0.0 or nb == 0.0:
        return 0.0
    return dot / (na * nb)


def expand_query(question: str, synonyms: dict[str, list[str]]) -> str:
    q_tokens = tokenize(question)
    expanded = list(q_tokens)
    for tok in q_tokens:
        if tok in synonyms:
            expanded.extend(synonyms[tok])
    return " ".join(expanded)


# Micro-world documents
# Note: vocabulary is learned from docs only.
docs = [
    "Reinforcement learning uses rewards to train agents.",
    "Supervised learning trains models using labeled examples.",
    "Planning under uncertainty uses expected utility.",
    "Machine translation converts text between languages.",
    "Incentives and rewards can shape agent behavior.",
]

synonyms = {
    "reward": ["rewards", "incentive", "incentives"],
    "rewards": ["reward", "incentives"],
    "agent": ["agents"],
    "agents": ["agent"],
}

question = "How do agents learn from reward signals?"
expanded_question = expand_query(question, synonyms)

embedder = TfidfEmbedder()
embedder.fit(docs)

qv = embedder.vectorize(expanded_question)

scores = []
for i, d in enumerate(docs):
    dv = embedder.vectorize(d)
    scores.append((cosine_similarity(qv, dv), i, d))

# Tie-break by smaller doc_index when scores are equal.
scores.sort(key=lambda x: (-x[0], x[1]))

print("Expanded query:", expanded_question)
print("Top 3 matches:")
for s, i, d in scores[:3]:
    print(f"score={s:.3f} doc_index={i} text={d}")

Expected Output:

Expanded query: how do agents learn from reward signals rewards incentives incentives
Top 3 matches:
score=0.632 doc_index=0 text=Reinforcement learning uses rewards to train agents.
score=0.316 doc_index=4 text=Incentives and rewards can shape agent behavior.
score=0.000 doc_index=1 text=Supervised learning trains models using labeled examples.

The model still learns a TF-IDF vocabulary and IDF weights from the fixed document set. The hard part is query expansion: we tokenize the question, and for each token we append synonym tokens (as extra words) into an expanded query string. This changes the TF counts in the query vector, which can increase similarity to documents containing synonym-related terms. Similarities are computed with cosine similarity, then sorted by descending score. To ensure deterministic behavior when scores match exactly, the sort key uses (-score, doc_index), guaranteeing smaller doc_index comes first.


Interactive Lesson

Interactive Lesson: Dependency-Ordered Foundations of AI (Goals, Reasoning, Knowledge, Agents, Learning, and Search)

⏱️ 30 min

Learning Objectives

  • Explain what AI is and identify major AI goals and subproblems in a dependency-consistent way
  • Describe how reasoning under uncertainty leads to planning and decision-making, including why combinatorial explosion matters
  • Distinguish knowledge representation from a knowledge base, and explain how ontologies enable deductions and retrieval
  • Use rational-agent thinking to compute and interpret expected utility for probabilistic outcomes
  • Connect machine learning, NLP (transformers), perception (computer vision), and search/optimization into one coherent AI pipeline

1. AI definition and scope (the starting point)

AI is the capability of computational systems to perform tasks associated with human intelligence, using learning and reasoning to achieve defined goals. This lesson will build from this definition into goals, reasoning under uncertainty, knowledge, agents, learning, language, perception, and search.

Examples:

  • Advanced web search engines, chatbots, virtual assistants, autonomous vehicles, and strategy-game analysis (chess and Go).

✓ Check Your Understanding:

Which option best captures the scope of AI?

Answer: AI includes learning, reasoning, perception, decision-making, and language tasks

Why does the definition mention learning and reasoning?

Answer: Because many AI systems improve from data and also use reasoning to achieve goals

2. AI goals and subproblems

From the AI definition, we identify goals and subproblems such as learning, reasoning, planning, natural language processing, and perception. These subproblems are not isolated; they feed into each other. For example, planning and decision-making often require reasoning and a model of outcomes.

Examples:

  • Chatbots and virtual assistants combine language understanding (NLP) with decision-making and sometimes planning.

✓ Check Your Understanding:

Which subproblem is most directly connected to planning and decision-making?

Answer: Reasoning and problem-solving under uncertainty

Which list best reflects common AI subproblems?

Answer: Learning, reasoning, planning, NLP, and perception

3. Reasoning and problem-solving under uncertainty

AI reasoning must often handle uncertain or incomplete information and choose actions when exact outcomes are not known. This connects directly to planning and decision-making. It also introduces combinatorial explosion: step-by-step reasoning algorithms can scale poorly as problem size grows, becoming exponentially slower.

Examples:

  • Choosing a move in Go involves uncertainty about opponent responses and future board states.

✓ Check Your Understanding:

What is the main challenge in reasoning under uncertainty?

Answer: Exact outcomes are not known, so the agent must choose actions despite uncertainty

How does combinatorial explosion affect reasoning?

Answer: It increases the number of possibilities to search, making algorithms exponentially slower

4. Knowledge representation and knowledge engineering

Knowledge representation encodes concepts and relationships so programs can make deductions and answer questions using a knowledge base and formal structures. This is needed because reasoning and planning often require structured information, not just raw data. Knowledge engineering also faces the commonsense breadth issue: real-world knowledge is huge and diverse.

Examples:

  • Ontology and knowledge base used for tasks like content-based indexing/retrieval and clinical decision support.

✓ Check Your Understanding:

Which statement best distinguishes knowledge representation from a knowledge base?

Answer: Knowledge representation is the formal way knowledge is encoded; a knowledge base is the body of knowledge represented for a program

Why is knowledge representation useful for deductions and question answering?

Answer: Because it encodes concepts and relationships in a form that supports inference and retrieval

5. Ontologies and knowledge bases

Ontologies are formal representations of concepts within a domain and relationships between those concepts. They enable a knowledge base to support content-based indexing/retrieval and domain reasoning. This section builds directly on knowledge representation by adding the structured domain model (ontology) that makes the knowledge base usable.

Examples:

  • An ontology represents objects, relations, concepts, and properties used by a domain.

✓ Check Your Understanding:

What is an ontology primarily?

Answer: A representation of knowledge as concepts and relationships within a domain

How do ontologies help knowledge bases?

Answer: They provide formal structure so the system can interpret and relate stored facts for retrieval and inference

6. Planning and decision-making

Planning and decision-making choose actions to achieve goals, often using reasoning under uncertainty and structured knowledge. If the environment is uncertain, the planner must consider probabilistic outcomes and may need to update beliefs after acting.

Examples:

  • Autonomous vehicles plan routes and maneuvers while accounting for uncertain sensor readings and other drivers.

✓ Check Your Understanding:

Which dependency best explains why planning needs reasoning under uncertainty?

Answer: Planning must choose actions when exact outcomes are not known

How does structured knowledge typically support planning?

Answer: It provides encoded concepts and relations that the planner can use for inference and decision-making

7. Rational agents, utility, and expected utility

A rational agent selects actions to maximize preferences, often by computing expected utility over possible outcomes. Expected utility averages utilities of outcomes weighted by their probabilities. This is the bridge from planning/decision-making to probabilistic choice.

Examples:

  • Expected utility decision-making: choosing the action with maximum expected utility over probabilistic outcomes.

✓ Check Your Understanding:

What does expected utility combine?

Answer: Outcome utilities weighted by outcome probabilities for a chosen action

Why is expected utility not the same as utility?

Answer: Expected utility averages over possible outcomes using probabilities

8. Markov decision process (MDP) and policies

An MDP is a decision model with a transition model and reward function, plus a policy mapping states to decisions. This formalizes planning/decision-making under uncertainty and connects to expected utility by modeling how actions lead to probabilistic state changes and rewards.

Examples:

  • An MDP provides probabilities of state changes and utilities or rewards for states and actions.

✓ Check Your Understanding:

In an MDP, what does a policy represent?

Answer: A mapping from states to decisions

What is the role of the transition model in an MDP?

Answer: It provides probabilities of how the state changes after actions

9. Machine learning as performance improvement

Machine learning studies programs that improve performance on a task automatically from data. This connects to AI goals and subproblems by providing a way to learn models for tasks like perception and language. It also sets up later sections on supervised/unsupervised/reinforcement/transfer/deep learning.

Examples:

  • Modern NLP and perception methods rely on machine learning to learn patterns from data.

✓ Check Your Understanding:

What is the defining feature of machine learning?

Answer: Improving performance automatically from data

Which statement correctly contrasts supervised and unsupervised learning?

Answer: Supervised learning uses labeled expected answers; unsupervised learning uses unlabeled data to find patterns

10. Supervised/unsupervised/reinforcement/transfer/deep learning

Different learning paradigms differ mainly by the training signal: supervised learning uses labeled expected answers; unsupervised learning uses unlabeled data; reinforcement learning uses rewards from actions; transfer learning reuses knowledge across tasks; deep learning uses neural networks with many layers. This section prepares the next steps for transformers (deep learning for NLP) and for perception (learning from sensor data).

Examples:

  • Deep learning underpins modern NLP and perception methods.

✓ Check Your Understanding:

Which option best describes reinforcement learning?

Answer: Learning from rewards after taking actions

What is the most accurate distinction between supervised and unsupervised learning?

Answer: Whether training data includes labeled expected answers

11. Natural language processing (NLP) and modern deep learning methods

NLP enables reading, writing, and communication in human languages, including speech recognition, translation, extraction, retrieval, and question answering. Modern NLP uses transformers and word embeddings. Generative GPT-style models can produce coherent text, connecting language understanding to learning.

Examples:

  • NLP modern methods: transformers and GPT language models generating coherent text; by 2023, models achieved human-level scores on bar exam, SAT, and GRE.
  • Generative AI in the 2020s generating images, audio, and videos from text prompts.

✓ Check Your Understanding:

Which task is included in NLP?

Answer: Question answering

What is a key architectural idea behind modern NLP?

Answer: Transformers using attention mechanisms

12. Transformers and GPT-style generative models

Transformers introduced attention-based deep learning for sequence tasks, accelerating progress after 2017. GPT-style models are generative: they produce text that can be used for communication and downstream tasks. This section depends on NLP and machine learning, and it connects back to AI goals like language tasks.

Examples:

  • Growth accelerated after 2017 with the transformer architecture.

✓ Check Your Understanding:

Why did transformers accelerate AI progress?

Answer: They improved representation and learning for sequence tasks, boosting NLP and generative models

What does “generative” emphasize in GPT-style models?

Answer: Producing coherent text (and related media) rather than only classifying

13. Perception and computer vision

Perception uses sensor inputs to infer aspects of the world, while computer vision analyzes visual input for tasks like recognition and tracking. This depends on AI goals and machine learning because perception systems learn patterns from data. It also connects to decision-making: perception supplies the state information needed for planning and policies.

Examples:

  • Perception via sensors and computer vision supports robotics and autonomous agents, including facial/object recognition.

✓ Check Your Understanding:

Which statement best distinguishes perception from computer vision?

Answer: Computer vision is specifically analysis of visual input; perception can include many sensors beyond vision

Why does perception matter for decision-making?

Answer: It provides information about the current situation that policies and planners use

14. Affective computing and social intelligence

Affective computing builds systems that recognize, interpret, process, or simulate human feelings, emotion, and mood. It relates to human–computer interaction and sentiment analysis. This connects back to AI goals and subproblems by adding a social dimension to decision-making and communication, but it also introduces safety concerns: systems can create unrealistic user expectations.

Examples:

  • Sentiment analysis and emotion-aware interaction in human–computer systems.

✓ Check Your Understanding:

What is the focus of affective computing?

Answer: Recognizing and modeling human feelings and emotion

Which risk is explicitly highlighted for affective systems?

Answer: Creating unrealistic user expectations

15. Search and optimization in AI

AI uses state space search and local search (mathematical optimization) to find solutions. State space search supports planning and adversarial game search. Local search includes gradient descent and evolutionary computation. This section connects back to reasoning under uncertainty and planning: when exact reasoning is too expensive, search and optimization provide practical ways to find good solutions.

Examples:

  • Strategy-game analysis (chess and Go) uses search over game states.
  • Local search trains models via gradient descent/backpropagation.

✓ Check Your Understanding:

Which statement correctly links search to planning?

Answer: State space search supports planning by exploring possible action sequences

What is local search in this context?

Answer: Mathematical optimization methods such as gradient descent to improve parameters or solutions

16. State space search and adversarial search

State space search explores possible states and actions to find solutions. In adversarial settings (games), the search must consider that opponents act strategically. This depends on search and optimization and connects back to combinatorial explosion: naive step-by-step reasoning can become infeasible as the search space grows.

Examples:

  • Chess and Go analysis involves adversarial search over game states.

✓ Check Your Understanding:

What distinguishes adversarial search from ordinary search?

Answer: The opponent also chooses actions, so outcomes depend on both players’ choices

How does combinatorial explosion relate to search?

Answer: The number of possibilities grows rapidly, making exhaustive search impractical

17. Local search, gradient descent, and evolutionary computation

Local search methods optimize numerical parameters by minimizing a loss function. Gradient descent adjusts parameters incrementally to reduce loss, and backpropagation computes gradients for neural networks. This connects back to machine learning: learning improves performance by optimizing model parameters.

Examples:

  • Gradient descent: parameters are adjusted to minimize a loss function; backpropagation is used to train neural networks.

✓ Check Your Understanding:

What does gradient descent do?

Answer: It incrementally adjusts parameters to minimize a loss function

What is the role of backpropagation?

Answer: It computes gradients so parameter updates reduce prediction error

18. AI history and safety/regulation discussions

AI history includes cycles of optimism and disappointment, including AI winters. Increases in funding and interest after 2012 were driven by GPUs accelerating neural networks and deep learning outperforming prior techniques. After 2017, transformers accelerated progress. In the 2020s, generative AI boosted media creation. Safety concerns arise because powerful systems can fail in unexpected ways, create unrealistic expectations, or be misused.

Examples:

  • AI was founded as an academic discipline in 1956.
  • Funding and interest increased substantially after 2012 due to GPUs accelerating neural networks and deep learning outperforming prior techniques.
  • Growth accelerated further after 2017 with the transformer architecture.

✓ Check Your Understanding:

Which factor is linked to the post-2012 deep learning surge?

Answer: GPUs accelerating neural network training

What is an AI winter?

Answer: A period of disappointment and loss of funding in AI history

Practice Activities

Uncertainty to expected utility chain
medium

Scenario: A robot must choose between Action A and Action B in a partially observable environment. Outcome probabilities are uncertain. Task: Write a cause-effect chain that starts with uncertainty about outcomes, then leads to probabilistic decision-making, then leads to computing expected utility, and ends with selecting the action with maximum expected utility.

Combinatorial explosion to search strategy chain
medium

Scenario: A planner uses step-by-step reasoning to explore future action sequences, but the problem size grows. Task: Build a cause-effect chain: step-by-step reasoning scales poorly -> combinatorial explosion -> exponential slowdown -> need for search/optimization strategies -> practical solution finding.

Learning pipeline chain: gradient descent to improved performance
medium

Scenario: A neural model is trained to reduce prediction error. Task: Create a cause-effect chain: local search via gradient descent -> loss minimization -> backpropagation computes gradients -> parameter updates reduce error -> improved task performance.

Transformers to NLP capability chain
medium

Scenario: A system generates coherent answers and text. Task: Create a cause-effect chain: transformer architecture (attention-based deep learning) -> improved learning for sequence tasks -> stronger NLP representations -> improved generative text quality.

Next Steps

Related Topics:

  • Markov decision processes and policy optimization
  • State space search algorithms and heuristics
  • Knowledge engineering workflows for building ontologies and knowledge bases
  • Transformer attention mechanisms and generative model prompting
  • Safety and regulation considerations for deployed AI systems

Practice Suggestions:

  • For each new concept you learn, write one dependency sentence that links it to the concept(s) it depends on
  • Practice converting a real scenario into a cause-effect chain that ends with a decision rule (expected utility or a policy)
  • Create a small ontology sketch for a domain you know, then list how it would support retrieval or inference

Cheat Sheet

Cheat Sheet: Artificial Intelligence (AI)

Key Terms

Artificial general intelligence (AGI)
AI that can complete virtually any cognitive task at least as well as a human.
AI winter
A period of disappointment and loss of funding in AI history.
Combinatorial explosion
A phenomenon where reasoning algorithms become exponentially slower as problem size grows.
Ontology
A representation of knowledge as concepts within a domain and relationships between those concepts.
Knowledge base
A body of knowledge represented in a form that a program can use.
Rational agent
An agent that has goals or preferences and takes actions to make them happen.
Expected utility
A value for an action computed from utilities of possible outcomes weighted by their probabilities.
Markov decision process (MDP)
A decision model with a transition model and reward function, plus a policy mapping states to decisions.
Supervised learning
Learning where training data includes labeled expected answers.
Transformer architecture
A deep learning architecture using an attention mechanism, enabling modern NLP and generative models.

Formulas

Expected utility (decision rule)

EU(a) = Σ_o P(o | a) * U(o)

When choosing an action a under uncertainty by averaging outcome utilities U(o) weighted by probabilities P(o | a).

Gradient descent (parameter update)

θ := θ - η * ∇_θ L(θ)

When training models by minimizing a loss function L(θ) using learning rate η and gradient ∇_θ L.

Backpropagation (training mechanism)

Use chain rule to compute gradients: ∂L/∂θ via ∂L/∂(intermediate) terms

When training neural networks so gradient descent can update parameters to reduce prediction error.

Main Concepts

1.

AI definition and purpose

AI is computational capability to perform tasks associated with human intelligence, using learning and reasoning to achieve defined goals.

2.

Reasoning and problem-solving under uncertainty

AI must act when outcomes are uncertain, often requiring probabilistic choices and later updates.

3.

Knowledge representation and ontologies

Encode concepts and relationships so a program can deduce and answer questions using a structured knowledge base.

4.

Rational agents, utility, and expected utility

A rational agent selects actions to maximize preferences, often by maximizing expected utility over possible outcomes.

5.

Machine learning as performance improvement

ML improves task performance automatically from data, including supervised, unsupervised, reinforcement, transfer, and deep learning.

6.

NLP tasks and modern deep learning methods

NLP covers speech recognition, translation, extraction, retrieval, and question answering, often using transformers and embeddings.

7.

Perception via sensors and computer vision

Perception infers world aspects from sensors; computer vision specifically analyzes visual input for recognition and tracking.

8.

Affective computing and social intelligence

Systems recognize and process human feelings and social signals, enabling sentiment and emotion-related interaction.

9.

Search and optimization in AI

AI uses state space search and local optimization (e.g., gradient descent) to find solutions or train models.

Memory Tricks

Combinatorial explosion

Think: “Combi = many combinations = exponential blow-up.” If the search space grows fast, expect exponential slowdown.

Expected utility vs utility

“Expected” means “average over possibilities.” Utility is a single-score for an outcome; expected utility is the probability-weighted average for an action.

Transformer impact timeline

2012 = GPUs + deep learning; 2017 = Transformers + attention. Map years to drivers: 12 for compute, 17 for architecture.

Knowledge representation vs knowledge base

Representation is the blueprint (how knowledge is encoded). Knowledge base is the filled-in content (the actual stored knowledge).

Supervised vs unsupervised

Supervised = “supervised labels” (expected answers). Unsupervised = “no labels,” find structure on your own.

Quick Facts

  • AI was founded as an academic discipline in 1956.
  • AI history includes cycles of optimism and disappointment, including AI winters.
  • Funding and interest increased substantially after 2012 due to GPUs accelerating neural networks and deep learning outperforming prior techniques.
  • Growth accelerated further after 2017 with the transformer architecture.
  • In the 2020s, an AI boom coincided with advances in generative AI for creating or modifying media.
  • Early step-by-step reasoning struggled with large problems due to combinatorial explosion (exponential slowdown).
  • In automated decision-making, expected utility averages utilities weighted by outcome probabilities.
  • In gradient descent, parameters are adjusted to minimize a loss function; backpropagation computes gradients for neural network training.
  • NLP includes speech recognition, speech synthesis, machine translation, information extraction, information retrieval, and question answering.

Common Mistakes

Common Mistakes: AI Goals, Knowledge, Decision-Making, Learning, and Perception

Students claim AI is mainly about human-like step-by-step reasoning, so they treat learning, perception, and language as “not really AI” unless it looks like explicit deduction.

conceptual · high severity

Why it happens:

They equate “intelligence” with visible human reasoning. Then they assume that if a system does not perform explicit symbolic deduction, it cannot be intelligent. This leads them to ignore that AI can achieve goals using learning and reasoning under uncertainty, and can rely on statistical methods rather than explicit step-by-step logic.

✓ Correct understanding:

AI is the capability of computational systems to perform tasks associated with human intelligence by using learning and reasoning to achieve defined goals. Many AI systems learn patterns from data (machine learning), infer from sensor inputs (perception/computer vision), and generate or interpret language (NLP). Explicit symbolic reasoning is only one approach; probabilistic decision-making and deep learning are also central.

How to avoid:

Use the definition as a checklist: (1) Is there a task associated with human intelligence? (2) Is the system using learning and/or reasoning to achieve a goal? (3) Does it operate under uncertainty or complex inputs where explicit deduction is impractical? If yes, it is AI even if it is not “human-like reasoning.”

Students mix up knowledge representation with a knowledge base, saying they are the same thing or that “the knowledge base is the representation method.”

conceptual · high severity

Why it happens:

They focus on the word “knowledge” and assume the container equals the encoding. Then they treat ontologies as if they were just a synonym for a database, rather than the formal structure used to encode concepts and relationships.

✓ Correct understanding:

Knowledge representation is the formal way knowledge is encoded so a program can make deductions and answer questions. A knowledge base is the actual body of knowledge represented for a program. Ontologies are a specific representation structure: they define concepts and relationships in a domain, and those structures populate a knowledge base.

How to avoid:

Separate “method” from “content.” Ask: “What is the formal encoding scheme?” (representation/ontology) versus “What facts and relations are stored for use?” (knowledge base). When you see ontologies, think “structure for encoding,” not “the entire stored dataset by itself.”

Students say expected utility is just utility, or they compute expected utility by taking the utility of the most likely outcome only (ignoring probabilities of other outcomes).

conceptual · high severity

Why it happens:

They remember that expected utility involves “utility” and then collapse the distinction between averaging over outcomes and selecting a single outcome. They also confuse “expected” with “most likely,” so they ignore the probability-weighted averaging requirement.

✓ Correct understanding:

Expected utility is computed for an action by weighting the utilities of possible outcomes by their probabilities, then summing (or averaging) across outcomes. A rational agent chooses the action with maximum expected utility, not maximum single-outcome utility.

How to avoid:

Always write the expected utility as: expected_utility(action) = Σ [ P(outcome | action) × utility(outcome) ]. Then check whether you accidentally replaced Σ with “pick the highest probability outcome.” If you did, correct it.

Students treat supervised learning and unsupervised learning as differing only by the model architecture, claiming both use labeled answers during training.

conceptual · medium severity

Why it happens:

They focus on “learning from data” and assume the only difference is the neural network type. Then they overlook the key distinction: whether training data includes labeled expected answers.

✓ Correct understanding:

Supervised learning uses training data with labeled expected answers, enabling learning to map inputs to correct outputs (classification/regression). Unsupervised learning uses unlabeled data and focuses on discovering patterns or structure without explicit target labels.

How to avoid:

Use the “label signal” test: Ask whether the training examples include the correct outputs. If yes, supervised. If no, unsupervised. Do not infer the paradigm from the model type; infer it from the presence or absence of labeled expected answers.

Students equate computer vision with all perception, concluding that perception is only about images and cameras.

conceptual · medium severity

Why it happens:

They anchor on the phrase “vision” and generalize it to the entire perception pipeline. Then they ignore that perception can use many sensors and modalities beyond vision, such as lidar, sonar, and tactile inputs.

✓ Correct understanding:

Perception is the broader process of inferring aspects of the world from sensor inputs. Computer vision is specifically the analysis of visual input. A system can be perceptive without using only cameras, and computer vision is only one component of perception.

How to avoid:

Use a two-level framing: (1) Perception = inference from sensors (many modalities). (2) Computer vision = inference from visual input specifically. When you see “computer vision,” restrict it to visual data; when you see “perception,” include all relevant sensors.

Students claim that step-by-step reasoning always works for large problems if you just “optimize the code,” and they underestimate combinatorial explosion as a fundamental limitation.

conceptual · high severity

Why it happens:

They treat runtime issues as purely engineering problems. Then they assume that faster hardware or better implementation eliminates the exponential growth in the number of possibilities that reasoning must consider.

✓ Correct understanding:

Step-by-step reasoning algorithms can scale poorly because combinatorial explosion increases the number of possibilities exponentially with problem size. This makes them exponentially slower and often infeasible for large reasoning tasks, regardless of minor implementation improvements.

How to avoid:

When you hear “step-by-step reasoning” and “large problem size,” immediately ask: “How many possibilities must be explored?” If the count grows exponentially, expect combinatorial explosion. Then consider alternative approaches: probabilistic reasoning, planning heuristics, search/optimization methods, or learning-based approximations.

Students say uncertainty is irrelevant to decision-making because the agent can always compute the best action deterministically.

conceptual · high severity

Why it happens:

They assume the environment is fully known and outcomes are certain. Then they ignore that real settings have uncertainty about the current situation or action outcomes, requiring probabilistic guesses and reassessment after acting.

✓ Correct understanding:

Reasoning under uncertainty means the agent does not know exact outcomes. The agent must make probabilistic decisions, update beliefs after observing results, and choose actions that maximize expected outcomes (often via expected utility). This connects uncertainty to planning and decision-making under probabilistic transitions.

How to avoid:

Check the assumptions: If outcomes are not guaranteed, you must incorporate probabilities. Replace “best deterministic outcome” with “best expected outcome” or “best expected utility,” and include reassessment after acting when new information arrives.

General Tips

  • Use definitions as constraints: if an answer contradicts the definition (for example, expected utility ignoring probabilities), mark it as incorrect.
  • Separate “method” from “content” in knowledge questions: representation/ontology encodes; knowledge base stores encoded facts.
  • For decision-making, always distinguish “utility of one outcome” from “expected utility of an action across outcomes.”
  • For learning paradigms, classify by the presence of labeled expected answers, not by the neural network type.
  • For perception, distinguish computer vision (visual input) from perception (multi-sensor inference).
  • When reasoning scales poorly, explicitly connect to combinatorial explosion rather than blaming only implementation details.