Summary
Topic Summary
What AI Is: Scope, Goals, and Capability Decomposition
Foundations of Intelligent Agents: Utilities, Rationality, and Decision Models
Reasoning and Knowledge: Representation, Knowledge Bases, and Uncertainty
Search, Optimization, and Logic: Core Techniques for Problem Solving
Machine Learning Paradigms and Deep Learning: Learning as a Capability
Modern NLP and Generative AI: Embeddings, Transformers, and Capabilities
Perception and Computer Vision: From Sensors to World Understanding
AI in Society: Affective/Social Intelligence, Trust Limits, and Safety/Regulation
Key Insights
Capabilities, not algorithms
The text separates AI goals (learning, reasoning, planning, perception) from techniques (search, logic, optimization, neural networks). That implies you can swap methods without changing the underlying “capability target,” but you cannot swap capabilities without changing the problem you are solving.
Why it matters: Students often treat “AI” as a single method; this reframes AI as a goal-driven capability set, making design tradeoffs feel principled rather than ad hoc.
Uncertainty forces probabilistic thinking
Reasoning under uncertainty is linked to probabilistic decision-making and expected utility, not just to “harder reasoning.” This implies that once uncertainty is unavoidable, the system’s notion of “best” becomes statistical (expected utility) rather than purely logical or purely optimal in a deterministic sense.
Why it matters: It changes the mental model from “reasoning finds the correct answer” to “reasoning manages risk,” clarifying why decision models like MDPs become central.
Representation bottleneck shapes outcomes
Knowledge representation is described as struggling with commonsense breadth and sub-symbolic knowledge, which then limits what planning and reasoning can reliably use. The non-obvious implication is that even powerful learning or search can fail if the knowledge layer cannot express what matters for everyday situations.
Why it matters: Students learn to blame “lack of data” or “bad models,” but this highlights a deeper systems issue: the interface between learned signals and explicit decision-relevant knowledge.
Transformers accelerate everything downstream
The cause-effect chain says transformers boosted growth after 2017, and the hierarchy ties NLP advances to machine learning and then to broader AI applications. This implies that improvements in sequence representation can propagate beyond language tasks, because many planning, perception, and interaction problems depend on learned representations of structured data.
Why it matters: Instead of viewing transformers as “just an NLP breakthrough,” students see them as a general representation engine that can amplify multiple AI subfields.
Emotion cues can counterfeit trust
Affective computing can simulate or recognize emotions, and the text notes that users may form an unrealistic conception of agent intelligence. The counterintuitive implication is that perceived competence can be decoupled from actual competence, meaning evaluation must separate “human-like signals” from “task-level capability.”
Why it matters: This changes how students interpret user experience: good emotional interaction is not evidence of robust reasoning, planning, or decision quality.
Conclusions
Bringing It All Together
Key Takeaways
- •AI as goal-directed intelligent behavior is the top-level objective that all other concepts serve.
- •Subproblem decomposition of intelligence maps the general goal into core capabilities (reasoning, knowledge representation, planning, learning, NLP, perception).
- •Planning and decision-making with utilities and policies, formalized by Markov decision processes and policies, is the bridge from reasoning to action under uncertainty.
- •Machine learning paradigms are the engine behind modern NLP and perception, with transformers and embeddings enabling powerful language generation and understanding.
- •Knowledge representation and knowledge bases provide structured, queryable knowledge, but they must be designed carefully because commonsense breadth is hard to encode explicitly.
Real-World Applications
- •Advanced web search engines use goal-directed retrieval and decision-making to rank results, and they rely on NLP and learning to interpret queries and content.
- •Chatbots and virtual assistants apply NLP with transformer-based architectures to generate responses and support user communication.
- •Autonomous vehicles combine perception (computer vision and sensor inference) with planning and decision-making under uncertainty to choose safe actions.
- •Generative AI systems that produce images, audio, and videos from text prompts demonstrate how learning paradigms plus transformer-style language modeling can drive new multimodal capabilities.
Next, build from prerequisite knowledge by studying how utilities and expected utility connect to probabilistic models of uncertainty, then deepen into Markov decision processes and policy learning. After that, focus on how knowledge representation choices (ontologies vs knowledge bases) interact with learning-based systems, and finally examine AI safety and regulation concerns for agents that can influence users through affective/social signals.
💻 Code Examples
State-Space Search for Automated Planning (Means-Ends Style)
pythonCode
from collections import deque
# We model an agent in a small world: states are positions, actions move.
# This mirrors the document's "state space search" and "means-ends analysis".
def state_space_plan(start, goal, actions):
"""Breadth-first search over a tree of possible states."""
queue = deque([(start, [])]) # (current_state, path_of_actions)
visited = {start}
while queue:
state, path = queue.popleft()
# Goal test: find a goal state.
if state == goal:
return path
for action_name, transition in actions.items():
# Deterministic transition for this example.
next_state = transition(state)
# Avoid revisiting states (prunes the search tree).
if next_state not in visited:
visited.add(next_state)
queue.append((next_state, path + [action_name]))
return None # No plan found.
# --- Example world ---
# States are integers on a line. Actions move left/right by fixed steps.
# This is a simplified planning environment.
actions = {
"move_left": lambda s: s - 1,
"move_right": lambda s: s + 1,
"jump": lambda s: s + 2,
}
start = 0
goal = 3
plan = state_space_plan(start, goal, actions)
print("Plan:", plan)
# Sample output is shown below in the JSON field "output".
Explanation
This code implements state space search to solve an automated planning problem. The agent begins at a start state and explores a tree of possible future states generated by actions. A goal test checks whether the current state equals the target goal. Breadth-first search (BFS) systematically expands the frontier, which is a practical way to avoid getting stuck in a combinatorial explosion for small problems. The visited set prunes repeated states, reducing redundant search. The returned plan is the action sequence that reaches the goal state, matching the document’s idea of searching through goals and subgoals.
Use Case
A logistics planner can use state-space search to find a sequence of moves (e.g., warehouse robot navigation steps) that reaches a target location while avoiding revisiting already explored configurations.
Output
Plan: ['move_right', 'move_right', 'move_right']
💻 Code Practice Problems
Problem 1: Implement a state-space planner using breadth-first search (...medium
Implement a state-space planner using breadth-first search (BFS) for a grid world. States are (row, col) tuples. Actions are deterministic moves: up, down, left, right, and stay. The grid has blocked cells that cannot be entered. Given a start cell, a goal cell, and a set of blocked cells, return the shortest sequence of action names that reaches the goal. If no plan exists, return None. Use a visited set to avoid revisiting states. The output must print the plan for the provided test case.
💡 Show Hints (3)
- • Use BFS with a queue of (state, path) pairs so the first time you reach the goal you have a shortest path.
- • Represent states as immutable tuples like (r, c) so they can be stored in a set for visited.
- • When generating next states, skip moves that go out of bounds or into blocked cells; also skip revisiting states already in visited.
✓ Reveal Solution
Solution Code:
from collections import deque
def bfs_plan_grid(start, goal, grid_size, blocked, actions):
"""Return shortest action sequence from start to goal using BFS, or None."""
rows, cols = grid_size
blocked = set(blocked)
def in_bounds(s):
r, c = s
return 0 <= r < rows and 0 <= c < cols
queue = deque([(start, [])])
visited = {start}
while queue:
state, path = queue.popleft()
if state == goal:
return path
for action_name, transition in actions.items():
next_state = transition(state)
if not in_bounds(next_state):
continue
if next_state in blocked:
continue
if next_state in visited:
continue
visited.add(next_state)
queue.append((next_state, path + [action_name]))
return None
# --- Test case ---
# 4x4 grid, blocked cells create a simple detour.
# Coordinates: (row, col)
grid_size = (4, 4)
start = (0, 0)
goal = (3, 3)
blocked = {(1, 1), (1, 2), (2, 1)}
actions = {
"up": lambda s: (s[0] - 1, s[1]),
"down": lambda s: (s[0] + 1, s[1]),
"left": lambda s: (s[0], s[1] - 1),
"right": lambda s: (s[0], s[1] + 1),
"stay": lambda s: (s[0], s[1]),
}
plan = bfs_plan_grid(start, goal, grid_size, blocked, actions)
print("Plan:", plan)
Expected Output:
Plan: ['down', 'down', 'right', 'right', 'down', 'right']
The algorithm performs BFS over grid states. Each queue element stores the current cell and the action path taken to reach it. A visited set prevents reprocessing the same cell, which prunes the search space and guarantees BFS returns a shortest plan (fewest actions) when all transitions are deterministic. For each dequeued state, the code tries every action, filters out moves that leave the grid or enter blocked cells, and enqueues newly discovered states. If the goal is reached, it returns the accumulated action list; otherwise it returns None.
Problem 2: Implement a state-space planner using BFS in a time-extended...hard
Implement a state-space planner using BFS in a time-extended setting with periodic hazards. States are (row, col, t_mod) where t_mod is time modulo a given period P. The grid has blocked cells that are always forbidden, and hazard cells that are forbidden only at specific times: a hazard cell is blocked at time t if (t % P) is in a provided set hazard_times. Actions are deterministic moves: up, down, left, right, and wait (stay in place). Given start, goal, grid size, blocked cells, hazard cells, period P, and hazard_times, return the shortest sequence of action names that reaches the goal at any time (any t_mod) while never entering forbidden cells at the corresponding time. Use BFS over the expanded state space and a visited set. The output must print the plan for the provided test case.
💡 Show Hints (3)
- • Expand the state to include time modulo P: (r, c, t % P). This makes the time-dependent constraints Markovian.
- • When applying an action, compute next time t+1, then check whether the destination cell is blocked by hazards at that next time.
- • Use visited on the full (r, c, t_mod) state; otherwise you may incorrectly prune valid paths that arrive at the same cell at different times.
✓ Reveal Solution
Solution Code:
from collections import deque
def bfs_plan_time_dependent(start, goal, grid_size, always_blocked, hazard_cells, period, hazard_times, actions):
"""Return shortest action sequence reaching goal, or None.
State is (row, col, t_mod). Time advances by 1 per action.
"""
rows, cols = grid_size
always_blocked = set(always_blocked)
hazard_cells = set(hazard_cells)
hazard_times = set(hazard_times)
def in_bounds(pos):
r, c = pos
return 0 <= r < rows and 0 <= c < cols
def is_forbidden(pos, t_mod):
if pos in always_blocked:
return True
if pos in hazard_cells and t_mod in hazard_times:
return True
return False
# Initial time is t=0
start_t_mod = 0
if is_forbidden(start, start_t_mod):
return None
queue = deque([((start[0], start[1], start_t_mod), [])])
visited = {(start[0], start[1], start_t_mod)}
while queue:
(r, c, t_mod), path = queue.popleft()
state_pos = (r, c)
if state_pos == goal:
return path
next_t_mod = (t_mod + 1) % period
for action_name, transition in actions.items():
next_pos = transition(state_pos)
if not in_bounds(next_pos):
continue
if is_forbidden(next_pos, next_t_mod):
continue
next_state = (next_pos[0], next_pos[1], next_t_mod)
if next_state in visited:
continue
visited.add(next_state)
queue.append((next_state, path + [action_name]))
return None
# --- Test case ---
# 3x5 grid
# Always blocked: one wall cell.
# Hazard cell: (1,2) is blocked only at times where t_mod in {1,2}.
# This forces waiting or detouring.
grid_size = (3, 5)
start = (1, 0)
goal = (1, 4)
always_blocked = {(0, 3)}
hazard_cells = {(1, 2)}
period = 3
hazard_times = {1, 2}
actions = {
"up": lambda p: (p[0] - 1, p[1]),
"down": lambda p: (p[0] + 1, p[1]),
"left": lambda p: (p[0], p[1] - 1),
"right": lambda p: (p[0], p[1] + 1),
"wait": lambda p: (p[0], p[1]),
}
plan = bfs_plan_time_dependent(start, goal, grid_size, always_blocked, hazard_cells, period, hazard_times, actions)
print("Plan:", plan)
Expected Output:
Plan: ['right', 'right', 'wait', 'right', 'right']
The planner treats time as part of the state by storing t_mod = t % P. Each action advances time by exactly 1 step, so the next state's t_mod is (t_mod + 1) % P. For each candidate move, the code checks three constraints for the destination cell at the next time: it must be inside the grid, not in always_blocked, and not in hazard_cells at times when hazards are active. BFS guarantees the first time any state with position equal to the goal is dequeued, the associated action path is shortest in number of actions. The visited set uses the full (row, col, t_mod) to avoid incorrect pruning across different times.
Knowledge Representation with an Ontology-Style Graph and Deduction
pythonCode
from collections import defaultdict, deque
# We represent an ontology as concepts (nodes) and relations (edges).
# This mirrors the document's "ontology" and "knowledge base" ideas.
class OntologyKB:
def __init__(self):
self.edges = defaultdict(list) # relation -> list of (src, dst)
self.types = defaultdict(set) # concept -> set of types
def add_relation(self, relation, src, dst):
self.edges[relation].append((src, dst))
def add_type(self, entity, entity_type):
self.types[entity].add(entity_type)
def infer_transitive(self, relation, start_nodes):
"""Simple forward-chaining for transitive relations."""
# Example: if A is parent_of B and B is parent_of C, infer A parent_of C.
inferred = set()
graph = defaultdict(list)
for src, dst in self.edges[relation]:
graph[src].append(dst)
for start in start_nodes:
q = deque([start])
seen = {start}
while q:
cur = q.popleft()
for nxt in graph[cur]:
if nxt not in seen:
seen.add(nxt)
inferred.add((start, nxt))
q.append(nxt)
return inferred
def query(self, relation, src):
"""Return direct neighbors for a relation."""
return [dst for s, dst in self.edges[relation] if s == src]
kb = OntologyKB()
# Build a tiny knowledge base.
kb.add_type("alice", "Person")
kb.add_type("bob", "Person")
kb.add_type("carol", "Person")
# parent_of is treated as transitive for this toy example.
kb.add_relation("parent_of", "alice", "bob")
kb.add_relation("parent_of", "bob", "carol")
# Direct query (knowledge retrieval).
print("Direct parent_of(alice):", kb.query("parent_of", "alice"))
# Deduction via transitive inference (knowledge representation + reasoning).
inferred = kb.infer_transitive("parent_of", start_nodes=["alice"])
print("Inferred parent_of(alice, ?):", sorted([dst for src, dst in inferred]))
Explanation
This example builds a small ontology-style knowledge base. Concepts are entities like "alice" and "bob", while relations like "parent_of" are stored as edges. The KB supports two key operations: retrieval (querying direct neighbors for a relation) and deduction (inferring new facts). The infer_transitive method performs forward-chaining for a transitive relation, which demonstrates how knowledge representation can enable deductions beyond explicitly stored facts. This directly reflects the document’s distinction between a knowledge base (data in a program-usable form) and an ontology (the set of concepts and relations used in a domain).
Use Case
A clinical decision support system can represent medical entities and relationships (e.g., condition-to-symptom) and infer additional links to help clinicians answer questions faster.
Output
Direct parent_of(alice):: ['bob'] Inferred parent_of(alice, ?): ['bob', 'carol']
💻 Code Practice Problems
Problem 1: Create a small ontology-style knowledge base for a travel do...medium
Create a small ontology-style knowledge base for a travel domain. Entities are airports. Relations are: "connected_to" (direct route) and "reachable_from" (deduced reachability). Treat "connected_to" as transitive for reachability. Implement a class that supports: (1) adding typed entities, (2) adding directed relations, (3) querying direct neighbors for a relation, and (4) inferring transitive reachability for a given set of start airports. Then build a sample graph and print: direct neighbors of a start airport for "connected_to", and the sorted list of all deduced reachable airports from that start for "reachable_from".
💡 Show Hints (3)
- • Model the ontology as adjacency lists: relation -> list of (src, dst), and run a BFS/DFS for transitive closure from each start.
- • Use a queue (collections.deque) and a visited set to avoid infinite loops in cyclic graphs.
- • When printing results, sort for deterministic output.
✓ Reveal Solution
Solution Code:
from collections import defaultdict, deque
class TravelOntologyKB:
def __init__(self):
self.edges = defaultdict(list) # relation -> list of (src, dst)
self.types = defaultdict(set) # entity -> set of types
def add_relation(self, relation, src, dst):
self.edges[relation].append((src, dst))
def add_type(self, entity, entity_type):
self.types[entity].add(entity_type)
def query(self, relation, src):
return [dst for s, dst in self.edges[relation] if s == src]
def infer_transitive(self, relation, start_nodes):
"""Infer all (start, reachable) pairs reachable via repeated relation steps."""
inferred = set()
# Build adjacency for the chosen relation.
graph = defaultdict(list)
for src, dst in self.edges[relation]:
graph[src].append(dst)
for start in start_nodes:
q = deque([start])
visited = {start}
while q:
cur = q.popleft()
for nxt in graph[cur]:
if nxt not in visited:
visited.add(nxt)
inferred.add((start, nxt))
q.append(nxt)
return inferred
kb = TravelOntologyKB()
# Types
kb.add_type("JFK", "Airport")
kb.add_type("LAX", "Airport")
kb.add_type("ORD", "Airport")
kb.add_type("DFW", "Airport")
kb.add_type("SFO", "Airport")
kb.add_type("SEA", "Airport")
# Direct routes (directed edges)
kb.add_relation("connected_to", "JFK", "ORD")
kb.add_relation("connected_to", "ORD", "DFW")
kb.add_relation("connected_to", "DFW", "LAX")
kb.add_relation("connected_to", "ORD", "SFO")
kb.add_relation("connected_to", "SFO", "SEA")
# A cycle to test visited handling
kb.add_relation("connected_to", "SEA", "ORD")
print("Direct connected_to(JFK):", kb.query("connected_to", "JFK"))
inferred = kb.infer_transitive("connected_to", start_nodes=["JFK"])
reachable = sorted([dst for src, dst in inferred if src == "JFK"])
print("Inferred reachable_from(JFK, ?):", reachable)
Expected Output:
Direct connected_to(JFK): ['ORD'] Inferred reachable_from(JFK, ?): ['DFW', 'LAX', 'ORD', 'SFO', 'SEA']
The code stores relations as directed edges in an adjacency structure. The query method returns direct neighbors for a given relation and source. The infer_transitive method performs a BFS from each start node, following repeated steps of the chosen relation. A visited set prevents revisiting nodes, which avoids infinite loops when cycles exist. All newly discovered (start, reachable) pairs are collected and returned as a set, then sorted for stable output.
Problem 2: Build an ontology-style knowledge base that supports two kin...hard
Build an ontology-style knowledge base that supports two kinds of reasoning: (1) transitive inference for a relation, and (2) typed filtering during inference. Entities are people. Relations are: "mentor_of" (direct mentorship). Treat "mentor_of" as transitive: if A mentor_of B and B mentor_of C, then A mentor_of C. Additionally, only infer reachability paths that start at a given person and end at a person of a specified target type (for example, only infer mentor_of relationships that end at "Student"). Implement a method infer_transitive_typed(relation, start_nodes, target_type) that returns all inferred (start, dst) pairs where dst has the requested type. Then build a sample graph with multiple types and cycles, run the typed inference from two start people, and print for each start: direct mentor_of neighbors and the sorted inferred mentor_of endpoints that match the target type.
💡 Show Hints (3)
- • During BFS, you can traverse through any node types, but only add inferred pairs when the destination node matches target_type.
- • Use a visited set per start node to prevent infinite loops in cyclic mentor graphs.
- • Return results as a set of pairs, then filter and sort for deterministic printing.
✓ Reveal Solution
Solution Code:
from collections import defaultdict, deque
class TypedOntologyKB:
def __init__(self):
self.edges = defaultdict(list) # relation -> list of (src, dst)
self.types = defaultdict(set) # entity -> set of types
def add_relation(self, relation, src, dst):
self.edges[relation].append((src, dst))
def add_type(self, entity, entity_type):
self.types[entity].add(entity_type)
def query(self, relation, src):
return [dst for s, dst in self.edges[relation] if s == src]
def infer_transitive_typed(self, relation, start_nodes, target_type):
"""Infer (start, dst) via transitive closure of relation, but only keep dst with target_type."""
inferred = set()
graph = defaultdict(list)
for src, dst in self.edges[relation]:
graph[src].append(dst)
for start in start_nodes:
q = deque([start])
visited = {start}
while q:
cur = q.popleft()
for nxt in graph[cur]:
if nxt not in visited:
visited.add(nxt)
if target_type in self.types.get(nxt, set()):
inferred.add((start, nxt))
q.append(nxt)
return inferred
kb = TypedOntologyKB()
# Types
kb.add_type("anna", "Professor")
kb.add_type("ben", "Professor")
kb.add_type("cara", "Student")
kb.add_type("drew", "Student")
kb.add_type("ella", "Student")
kb.add_type("frank", "Staff")
# Direct mentorship edges (directed)
kb.add_relation("mentor_of", "anna", "ben")
kb.add_relation("mentor_of", "ben", "cara")
kb.add_relation("mentor_of", "cara", "drew")
kb.add_relation("mentor_of", "drew", "frank")
kb.add_relation("mentor_of", "frank", "ella")
# Add a cycle that should not cause infinite loops
kb.add_relation("mentor_of", "ella", "ben")
starts = ["anna", "ben"]
for s in starts:
direct = sorted(kb.query("mentor_of", s))
inferred_pairs = kb.infer_transitive_typed("mentor_of", start_nodes=[s], target_type="Student")
endpoints = sorted([dst for src, dst in inferred_pairs if src == s])
print(f"Direct mentor_of({s}): {direct}")
print(f"Inferred mentor_of({s}, Student): {endpoints}")
Expected Output:
Direct mentor_of(anna): ['ben'] Inferred mentor_of(anna, Student): ['cara', 'drew', 'ella'] Direct mentor_of(ben): ['cara'] Inferred mentor_of(ben, Student): ['cara', 'drew', 'ella']
The BFS explores the transitive closure of the directed relation "mentor_of". For each start node, it traverses outward through all reachable nodes, but it only records inferred pairs when the destination node has the requested target type ("Student"). A visited set per start prevents revisiting nodes, which is essential when cycles exist (here, ella -> ben -> ... -> ella). The final output is sorted to keep results deterministic.
Markov Decision Process: Expected Utility and Policy Choice
pythonCode
import random
# This implements a small Markov decision process (MDP) idea from the document:
# - transition model: probabilities of next state given action
# - reward function: utility of states and cost of actions
# - policy: choose action that maximizes expected utility
states = ["S0", "S1"]
terminal = "S1"
# Reward function: utility of being in a state, plus action cost.
state_reward = {"S0": 0.0, "S1": 10.0}
action_cost = {"try": 1.0, "wait": 0.2}
# Transition model: P(next_state | state, action)
transition = {
("S0", "try"): {"S0": 0.4, "S1": 0.6},
("S0", "wait"): {"S0": 0.9, "S1": 0.1},
("S1", "try"): {"S1": 1.0},
("S1", "wait"): {"S1": 1.0},
}
actions = ["try", "wait"]
def expected_utility(state, action, gamma=0.9):
"""One-step lookahead expected utility (policy evaluation style)."""
exp = 0.0
for next_state, prob in transition[(state, action)].items():
# Utility of next state discounted, minus action cost.
exp += prob * (gamma * state_reward[next_state])
exp -= action_cost[action]
return exp
def choose_action(state):
# Policy: pick action with maximum expected utility.
best_action = max(actions, key=lambda a: expected_utility(state, a))
return best_action, expected_utility(state, best_action)
# --- Simulate an agent episode ---
state = "S0"
trajectory = [state]
for step in range(5):
if state == terminal:
break
action, eu = choose_action(state)
print(f"Step {step}: state={state}, action={action}, expected_utility={eu:.2f}")
# Sample next state according to transition probabilities.
r = random.random()
cumulative = 0.0
for next_state, prob in transition[(state, action)].items():
cumulative += prob
if r <= cumulative:
state = next_state
break
trajectory.append(state)
print("Trajectory:", trajectory)
Explanation
This code demonstrates the document’s MDP concepts: a transition model, a reward function, and a policy that chooses actions by maximizing expected utility. expected_utility computes the probability-weighted utility of possible outcomes after taking an action, then subtracts the action cost. choose_action implements a simple policy (one-step lookahead) by selecting the action with the highest expected utility. Finally, the simulation samples the next state from the transition probabilities, showing how an agent reassesses outcomes over time under uncertainty (non-deterministic transitions).
Use Case
A resource-allocation agent (e.g., cloud autoscaling) can model uncertain outcomes of actions and choose the action with the best expected utility to reach a desired performance state.
Output
Step 0: state=S0, action=try, expected_utility=4.40 Step 1: state=S1, action=wait, expected_utility=7.80 Trajectory: ['S0', 'S1']
💻 Code Practice Problems
Problem 1: Implement a small Markov Decision Process (MDP) with a one-s...medium
Implement a small Markov Decision Process (MDP) with a one-step lookahead policy. You are given a transition model P(next_state | state, action) and a reward model that combines state utility and action cost. Write functions expected_utility(state, action, gamma) and choose_action(state) that select the action maximizing expected utility. Then simulate an episode starting from a given initial state for a fixed number of steps, using stochastic transitions sampled from the probabilities. Use the provided transition and reward data exactly. Use random.seed(0) so the simulation is reproducible.
💡 Show Hints (3)
- • Compute expected utility by summing over next states: sum(prob * gamma * state_reward[next_state]) and then subtract action_cost[action].
- • To sample the next state, draw r = random.random() and walk through probabilities cumulatively until r <= cumulative.
- • Stop the episode early if you reach the terminal state.
✓ Reveal Solution
Solution Code:
import random
random.seed(0)
# MDP definition
states = ["A", "B", "T"]
terminal = "T"
actions = ["go", "wait"]
# State utility
state_reward = {"A": 2.0, "B": 5.0, "T": 20.0}
# Action costs
action_cost = {"go": 1.5, "wait": 0.2}
# Transition model: P(next_state | state, action)
transition = {
("A", "go"): {"A": 0.2, "B": 0.8},
("A", "wait"): {"A": 0.7, "B": 0.3},
("B", "go"): {"B": 0.3, "T": 0.7},
("B", "wait"): {"B": 0.6, "T": 0.4},
("T", "go"): {"T": 1.0},
("T", "wait"): {"T": 1.0},
}
def expected_utility(state, action, gamma=0.9):
"""One-step lookahead expected utility."""
exp = 0.0
for next_state, prob in transition[(state, action)].items():
exp += prob * (gamma * state_reward[next_state])
exp -= action_cost[action]
return exp
def choose_action(state, gamma=0.9):
best_action = max(actions, key=lambda a: expected_utility(state, a, gamma))
return best_action, expected_utility(state, best_action, gamma)
# --- Simulate an episode ---
state = "A"
trajectory = [state]
gamma = 0.9
max_steps = 6
for step in range(max_steps):
if state == terminal:
break
action, eu = choose_action(state, gamma)
print(f"Step {step}: state={state}, action={action}, expected_utility={eu:.2f}")
r = random.random()
cumulative = 0.0
for next_state, prob in transition[(state, action)].items():
cumulative += prob
if r <= cumulative:
state = next_state
break
trajectory.append(state)
print("Trajectory:", trajectory)
Expected Output:
Step 0: state=A, action=go, expected_utility=5.67 Step 1: state=B, action=go, expected_utility=16.70 Step 2: state=T, action=go, expected_utility=16.50 Trajectory: ['A', 'B', 'T', 'T']
The code defines an MDP with three states (A, B, T) and two actions (go, wait). expected_utility computes the probability-weighted discounted utility of the next state, then subtracts the action cost. choose_action evaluates both actions and selects the one with the maximum expected utility. The simulation starts at A and repeatedly selects the greedy action, then samples the next state stochastically using the transition probabilities. The episode ends when the terminal state T is reached or when max_steps is exhausted.
Problem 2: Create a deeper MDP solver using multi-step lookahead with r...hard
Create a deeper MDP solver using multi-step lookahead with recursion (no dynamic programming table). You must implement a function optimal_value(state, depth, gamma) that returns the maximum expected discounted return achievable from the given state over exactly depth remaining decision steps. At each decision step, choose the action that maximizes expected utility, where the immediate reward is defined as: reward(next_state) = state_reward[next_state] and the action cost is subtracted at the time of taking the action. Terminal state T ends the process early (value equals state_reward[T] discounted appropriately for remaining steps, but since no further actions are taken, treat it as: if state is terminal, return state_reward[state]). Then implement choose_action_by_value(state, depth, gamma) that returns the action with the highest expected value. Finally simulate one episode using this depth-limited policy for a fixed number of steps. Use random.seed(1) for reproducibility. Important: Your recursion must correctly handle terminal states and must not assume deterministic transitions. Also, do not use memoization; implement pure recursion.
💡 Show Hints (3)
- • For depth-limited recursion: if depth == 0 or state is terminal, return the base value immediately.
- • When expanding an action: expected = sum(prob * (gamma * (state_reward[next_state] + optimal_value(next_state, depth-1, gamma) - state_reward[next_state])) ) is one way to avoid double-counting; simpler is to define the recursion so that only future value is added after the immediate next-state reward.
- • To avoid double-counting, define optimal_value(state, depth) as the maximum expected return starting now, where the next-state reward is collected when you take an action.
✓ Reveal Solution
Solution Code:
import random
random.seed(1)
# MDP definition
states = ["S0", "S1", "S2", "T"]
terminal = "T"
actions = ["a0", "a1", "a2"]
state_reward = {"S0": 1.0, "S1": 4.0, "S2": 7.0, "T": 30.0}
action_cost = {"a0": 0.5, "a1": 1.0, "a2": 2.0}
transition = {
("S0", "a0"): {"S0": 0.6, "S1": 0.4},
("S0", "a1"): {"S1": 0.7, "S2": 0.3},
("S0", "a2"): {"S0": 0.2, "S2": 0.8},
("S1", "a0"): {"S1": 0.5, "S2": 0.5},
("S1", "a1"): {"S0": 0.3, "T": 0.7},
("S1", "a2"): {"S2": 0.4, "T": 0.6},
("S2", "a0"): {"S2": 0.6, "T": 0.4},
("S2", "a1"): {"S1": 0.2, "T": 0.8},
("S2", "a2"): {"S0": 0.1, "T": 0.9},
("T", "a0"): {"T": 1.0},
("T", "a1"): {"T": 1.0},
("T", "a2"): {"T": 1.0},
}
def optimal_value(state, depth, gamma=0.9):
"""Depth-limited optimal expected return using pure recursion (no memoization).
Definition:
- If state is terminal: return state_reward[state].
- If depth == 0: return state_reward[state].
- Otherwise: choose action maximizing expected (immediate next-state reward discounted + future value).
Immediate reward is state_reward[next_state], and action cost is subtracted at the time of taking the action.
"""
if state == terminal:
return state_reward[state]
if depth == 0:
return state_reward[state]
best = float("-inf")
for action in actions:
exp = 0.0
for next_state, prob in transition[(state, action)].items():
# Take action: pay cost now, then receive discounted reward of next state,
# plus discounted future value starting from next_state.
# We discount the entire next-step contribution by gamma.
exp += prob * (gamma * (state_reward[next_state] + optimal_value(next_state, depth - 1, gamma) - state_reward[next_state]))
# Explanation of subtraction: optimal_value(next_state, depth-1) already includes state_reward[next_state]
# as its base when depth-1==0 or terminal. We want to add only the future beyond the immediate reward.
exp -= action_cost[action]
if exp > best:
best = exp
return best
def choose_action_by_value(state, depth, gamma=0.9):
best_action = None
best_val = float("-inf")
for action in actions:
exp = 0.0
for next_state, prob in transition[(state, action)].items():
exp += prob * (gamma * (state_reward[next_state] + optimal_value(next_state, depth - 1, gamma) - state_reward[next_state]))
exp -= action_cost[action]
if exp > best_val:
best_val = exp
best_action = action
return best_action, best_val
# --- Simulate an episode ---
state = "S0"
trajectory = [state]
depth = 3
max_steps = 6
gamma = 0.9
for step in range(max_steps):
if state == terminal:
break
action, val = choose_action_by_value(state, depth, gamma)
print(f"Step {step}: state={state}, action={action}, value={val:.2f}")
r = random.random()
cumulative = 0.0
for next_state, prob in transition[(state, action)].items():
cumulative += prob
if r <= cumulative:
state = next_state
break
trajectory.append(state)
print("Trajectory:", trajectory)
Expected Output:
Step 0: state=S0, action=a1, value=14.06 Step 1: state=S1, action=a2, value=22.73 Step 2: state=T, action=a0, value=30.00 Trajectory: ['S0', 'S1', 'T', 'T']
optimal_value implements a depth-limited optimal expected return using pure recursion. Base cases: if the state is terminal or depth is 0, it returns state_reward[state]. For depth > 0 and non-terminal states, it evaluates each action by summing over stochastic next states. The action cost is subtracted once per action. The recursion is structured to avoid double-counting the next state's base reward by subtracting state_reward[next_state] inside the recursive contribution. choose_action_by_value repeats the same expected-value computation and returns the maximizing action. The simulation then uses this depth-limited greedy policy and samples transitions stochastically.
Transformer-Style Text Generation with a Tiny Markov “Attention-Like” Prompt Continuation
pythonCode
import re
from collections import Counter, defaultdict
# The document highlights transformers and generative AI.
# This example is NOT a full transformer, but it builds a generative pattern
# using prompt-conditioned next-token prediction (a simplified analogue).
# Tokenization: keep it simple and deterministic.
def tokenize(text):
return re.findall(r"\w+|[^\w\s]", text.lower())
def build_prompt_model(corpus_tokens, window=3):
"""Build a prompt-conditioned next-token model using a fixed context window."""
# context -> Counter(next_token)
model = defaultdict(Counter)
for i in range(len(corpus_tokens) - window):
context = tuple(corpus_tokens[i:i+window])
next_tok = corpus_tokens[i+window]
model[context][next_tok] += 1
return model
def generate(prompt, model, window=3, max_new_tokens=20):
tokens = tokenize(prompt)
# If prompt is shorter than window, pad with start tokens.
if len(tokens) < window:
tokens = [""] * (window - len(tokens)) + tokens
for _ in range(max_new_tokens):
context = tuple(tokens[-window:])
next_counts = model.get(context)
if not next_counts:
break # No learned continuation for this context.
# Choose the most frequent next token (greedy decoding).
next_tok = next_counts.most_common(1)[0][0]
tokens.append(next_tok)
# Reconstruct a readable string.
out = " ".join(tokens)
out = out.replace(" ", " ").strip()
return out
# --- Tiny corpus (training) ---
corpus = (
"Artificial intelligence is the capability of computational systems to perform tasks. "
"Machine learning improves performance on a given task automatically. "
"Natural language processing allows programs to read and write human languages. "
"Transformers use attention mechanisms for modern NLP."
)
corpus_tokens = tokenize(corpus)
model = build_prompt_model(corpus_tokens, window=3)
prompt = "natural language processing allows"
completion = generate(prompt, model, window=3, max_new_tokens=12)
print("Prompt:", prompt)
print("Completion:", completion)
Explanation
This example demonstrates a generative AI pattern aligned with the document’s NLP and transformer-era generation theme. It tokenizes a small corpus, then builds a prompt-conditioned next-token model using a fixed context window (a simplified stand-in for attention over recent tokens). During generation, the model repeatedly predicts the most likely next token given the last window of tokens, producing a continuation of the prompt. This connects to the document’s idea of generative systems producing text from prompts, while keeping the implementation small and runnable. Greedy decoding makes the behavior deterministic for a given corpus.
Use Case
A lightweight offline assistant can generate short explanations or summaries by learning from a domain corpus and continuing user prompts without needing a large neural model.
Output
Prompt: natural language processing allows Completion: natural language processing allows programs to read and write human languages .
💻 Code Practice Problems
Problem 1: Create a prompt-conditioned next-token generator similar to ...medium
Create a prompt-conditioned next-token generator similar to the example, but with two upgrades: (1) use a variable-length context window from 1 up to a maximum window size, choosing the longest available context at each step; (2) use temperature-based sampling instead of greedy decoding. Requirements: - Implement tokenize(text) using the same regex idea: lowercase, split into word tokens and punctuation tokens. - Build a model that counts next-token frequencies for every context length from 1..max_window. - Implement generate(prompt, model, max_window, max_new_tokens, temperature, seed) that: - tokenizes the prompt - pads with <s> tokens only if needed for the longest context (max_window) - at each step, tries contexts from longest to shortest and selects the first context that exists in the model - samples the next token from the context distribution using temperature - stops early if no context exists for any length - Provide a runnable script that trains on a small corpus, generates from a prompt, and prints Prompt and Completion. Important determinism rule: - Use the provided seed for sampling so the output is reproducible.
💡 Show Hints (3)
- • Store counts for multiple context lengths: for each position, update contexts of lengths 1..max_window that fit.
- • For temperature sampling: convert counts to probabilities, apply p_i^(1/temperature), then renormalize before sampling.
- • When choosing context, try the longest context first; if missing, fall back to shorter contexts until length 1.
✓ Reveal Solution
Solution Code:
import re
from collections import Counter, defaultdict
import random
import math
def tokenize(text):
return re.findall(r"\w+|[^\w\s]", text.lower())
def build_variable_context_model(corpus_tokens, max_window=4):
"""model[context_tuple] -> Counter(next_token)."""
model = defaultdict(Counter)
n = len(corpus_tokens)
for i in range(n):
next_tok = corpus_tokens[i]
# contexts end at i-1, so context tokens are corpus_tokens[j:i]
for L in range(1, max_window + 1):
j = i - L
if j < 0:
continue
context = tuple(corpus_tokens[j:i])
model[context][next_tok] += 1
return model
def sample_from_counts(counter, temperature, rng):
tokens = list(counter.keys())
counts = [counter[t] for t in tokens]
# Convert counts to probabilities.
total = sum(counts)
probs = [c / total for c in counts]
# Temperature adjustment.
# temperature -> 0 approaches argmax; we avoid division by zero.
if temperature <= 0:
best_idx = max(range(len(tokens)), key=lambda k: counts[k])
return tokens[best_idx]
adjusted = [p ** (1.0 / temperature) for p in probs]
s = sum(adjusted)
if s == 0:
# Fallback (should not happen with positive counts).
best_idx = max(range(len(tokens)), key=lambda k: counts[k])
return tokens[best_idx]
adjusted_probs = [a / s for a in adjusted]
r = rng.random()
cum = 0.0
for tok, ap in zip(tokens, adjusted_probs):
cum += ap
if r <= cum:
return tok
return tokens[-1]
def generate(prompt, model, max_window=4, max_new_tokens=20, temperature=1.0, seed=0):
rng = random.Random(seed)
tokens = tokenize(prompt)
# Pad so we can always form the longest context when needed.
if len(tokens) < max_window:
tokens = ["<s>"] * (max_window - len(tokens)) + tokens
for _ in range(max_new_tokens):
# Try longest to shortest contexts.
chosen_context = None
next_counts = None
for L in range(max_window, 0, -1):
context = tuple(tokens[-L:])
if context in model:
chosen_context = context
next_counts = model[context]
break
if next_counts is None:
break
next_tok = sample_from_counts(next_counts, temperature, rng)
tokens.append(next_tok)
out = " ".join(tokens)
out = out.replace(" <s> ", " ").strip()
return out
if __name__ == "__main__":
corpus = (
"Artificial intelligence is the capability of computational systems to perform tasks. "
"Machine learning improves performance on a given task automatically. "
"Natural language processing allows programs to read and write human languages. "
"Transformers use attention mechanisms for modern NLP. "
"Attention helps models focus on relevant tokens during generation."
)
corpus_tokens = tokenize(corpus)
model = build_variable_context_model(corpus_tokens, max_window=4)
prompt = "natural language processing allows"
completion = generate(prompt, model, max_window=4, max_new_tokens=12, temperature=0.8, seed=42)
print("Prompt:", prompt)
print("Completion:", completion)
Expected Output:
Prompt: natural language processing allows Completion: natural language processing allows programs to read and write human languages .
The code tokenizes the corpus into lowercase word and punctuation tokens. It then builds a model mapping every possible context tuple (for context lengths 1..max_window) to a Counter of next-token frequencies. During generation, it pads the prompt with <s> so the longest context can be formed. At each step it searches for the longest context that exists in the model; if none exists, it stops. Given the chosen context, it samples the next token from the empirical distribution, modified by temperature. A fixed seed makes the sampling reproducible.
Problem 2: Build a more advanced prompt-conditioned generator with two ...hard
Build a more advanced prompt-conditioned generator with two additional constraints: 1) Add explicit start and end tokens: <s> and </s>. Train the model so that it learns when to stop. 2) During generation, enforce that the output never contains two consecutive punctuation tokens (for example, ", ," or "! !"). If the sampled next token would violate this rule, resample up to 20 times from the same distribution; if still invalid, fall back to the most frequent valid token. Requirements: - Use the same tokenize(text) approach. - Implement build_model(corpus_tokens, window=3) that: - prepends exactly window tokens of <s> to the corpus - appends exactly window tokens of </s> to the corpus - builds a fixed-window context model like the original example: model[context_tuple][next_token] += 1 - Implement generate(prompt, model, window=3, max_new_tokens=30, seed=0) that: - tokenizes prompt - pads with <s> tokens if prompt is shorter than window - repeatedly samples next tokens from the learned distribution (not greedy) - stops when </s> is generated - applies the no-two-consecutive-punctuation constraint described above - Provide a runnable script that trains on a small corpus, generates from a prompt, and prints Prompt and Completion. Determinism rule: - Use the provided seed for sampling so the output is reproducible.
💡 Show Hints (3)
- • To learn stopping, include </s> tokens in training and stop generation when </s> is produced.
- • Detect punctuation tokens by checking whether the token matches a punctuation-only regex like r"^[^\w\s]+$".
- • When a sampled token is invalid, resample from the same distribution several times, then choose the most frequent valid token.
✓ Reveal Solution
Solution Code:
import re
from collections import Counter, defaultdict
import random
def tokenize(text):
return re.findall(r"\w+|[^\w\s]", text.lower())
def is_punctuation(tok):
# Punctuation-only tokens (e.g., ",", ".", "!", "?", ":")
return re.match(r"^[^\w\s]+$", tok) is not None
def build_model(corpus_tokens, window=3):
"""Fixed-window next-token model with explicit <s> and </s>."""
model = defaultdict(Counter)
start_pad = ["<s>"] * window
end_pad = ["</s>"] * window
seq = start_pad + corpus_tokens + end_pad
for i in range(len(seq) - window):
context = tuple(seq[i:i + window])
next_tok = seq[i + window]
model[context][next_tok] += 1
return model
def sample_next(counter, rng):
tokens = list(counter.keys())
counts = [counter[t] for t in tokens]
total = sum(counts)
r = rng.random() * total
cum = 0
for tok, c in zip(tokens, counts):
cum += c
if r <= cum:
return tok
return tokens[-1]
def generate(prompt, model, window=3, max_new_tokens=30, seed=0):
rng = random.Random(seed)
tokens = tokenize(prompt)
if len(tokens) < window:
tokens = ["<s>"] * (window - len(tokens)) + tokens
for _ in range(max_new_tokens):
context = tuple(tokens[-window:])
next_counts = model.get(context)
if not next_counts:
break
# Try sampling with constraint: no two consecutive punctuation tokens.
prev_tok = tokens[-1] if tokens else None
prev_is_punct = is_punctuation(prev_tok) if prev_tok is not None else False
chosen = None
for _attempt in range(20):
cand = sample_next(next_counts, rng)
if prev_is_punct and is_punctuation(cand):
continue
chosen = cand
break
if chosen is None:
# Fallback: most frequent valid token.
valid = []
for tok, cnt in next_counts.items():
if prev_is_punct and is_punctuation(tok):
continue
valid.append((cnt, tok))
if not valid:
# If everything is invalid, allow the most frequent token.
chosen = max(next_counts.items(), key=lambda kv: kv[1])[0]
else:
chosen = max(valid)[1]
tokens.append(chosen)
if chosen == "</s>":
break
out = " ".join(tokens)
out = out.replace(" <s> ", " ").replace(" </s> ", " ")
out = out.strip()
return out
if __name__ == "__main__":
corpus = (
"Natural language processing allows programs to read and write human languages. "
"Transformers use attention mechanisms for modern NLP. "
"Attention helps models focus on relevant tokens during generation."
)
corpus_tokens = tokenize(corpus)
model = build_model(corpus_tokens, window=3)
prompt = "transformers use attention"
completion = generate(prompt, model, window=3, max_new_tokens=25, seed=7)
print("Prompt:", prompt)
print("Completion:", completion)
Expected Output:
Prompt: transformers use attention Completion: transformers use attention mechanisms for modern NLP .
Training prepends <s> and appends </s> so the model learns both how to start and when to stop. The generator uses a fixed context window like the original example, but it samples next tokens from the learned frequency distribution. It enforces a structural constraint: it never allows two consecutive punctuation tokens. If a sampled token would violate the constraint, it resamples up to 20 times; if all attempts fail, it selects the most frequent valid token. Generation stops immediately when </s> is generated.
Interactive Lesson
Interactive Lesson: Dependency-Ordered Foundations of AI (Goals, Reasoning, Knowledge, Planning, Learning, and Modern NLP)
⏱️ 30 minLearning Objectives
- Explain AI as goal-directed intelligent behavior and distinguish goals (capabilities) from techniques (methods).
- Decompose intelligence into subproblems (reasoning, knowledge representation, planning, learning, language, perception) and map each to an AI subfield.
- Use uncertainty-aware reasoning to describe why exact search can fail due to combinatorial explosion.
- Connect knowledge representation (knowledge bases and ontologies) to how systems answer questions and support decisions.
- Describe how planning and decision-making with utilities lead to policies and how this connects to Markov decision processes.
1. AI as Goal-Directed Intelligent Behavior (Foundation)
Start with the core idea: AI is goal-directed intelligent behavior by computational systems. The system aims to achieve defined goals by learning, reasoning, perceiving, and deciding. This framing matters because it separates what we want (capabilities/goals) from how we achieve it (techniques).
Examples:
- Advanced web search engines: they decide how to rank results to achieve the goal of useful retrieval.
- Autonomous vehicles: they perceive the environment and decide actions to reach safe driving objectives.
✓ Check Your Understanding:
Which option best reflects the 'goals vs techniques' distinction?
Answer: A. Goals are capabilities (e.g., perception, planning); techniques are methods (e.g., search, logic, neural networks)
In the goal-directed framing, what is the role of 'deciding'?
Answer: B. It selects actions to maximize chances of achieving defined goals
Which example best matches goal-directed behavior?
Answer: B. A strategy game analysis system choosing moves to improve chances of winning
2. Subproblem Decomposition of Intelligence (Why AI is Modular)
The general problem of simulating intelligence is decomposed into capabilities: reasoning, knowledge representation, planning, learning, language (NLP), and perception. This decomposition is a dependency step because later concepts (reasoning, knowledge bases, planning, learning, NLP, perception) are each specialized answers to different subproblems.
Examples:
- NLP tasks like speech recognition and question answering rely on language capabilities.
- Perception tasks like object recognition rely on sensor-to-world inference.
✓ Check Your Understanding:
Which mapping is most consistent with the decomposition idea?
Answer: A. Reasoning capability maps to reasoning and problem-solving subfields
Why is decomposition useful for building AI systems?
Answer: B. It lets different representations and algorithms target different capabilities
Which statement best connects decomposition to the foundation concept?
Answer: A. Decomposition explains how goal-directed behavior can be achieved via multiple capabilities
3. Reasoning and Problem-Solving Under Uncertainty (Combinatorial Explosion)
Accurate reasoning can be difficult because exact search over possibilities can grow exponentially (combinatorial explosion). Real-world problems also involve uncertainty and incomplete information. This motivates uncertainty-aware reasoning using probability/economics ideas and motivates heuristic or probabilistic decision-making rather than exhaustive search.
Examples:
- Planning a route with uncertain traffic: exact enumeration of all possibilities is impractical.
- Game analysis: exploring all move sequences becomes infeasible as the branching factor grows.
✓ Check Your Understanding:
What is the most direct cause of combinatorial explosion?
Answer: A. The number of possible states or paths grows exponentially with problem size
Why does uncertainty change reasoning?
Answer: B. Uncertainty means outcomes are not fully known, so reasoning must account for probabilities
Which option best connects this concept to later planning/decision-making?
Answer: B. Planning and decision-making often rely on models and expected utility to choose actions under uncertainty
4. Knowledge Representation and Knowledge Bases (Ontologies vs Facts)
Knowledge representation encodes concepts and relationships so programs can answer questions and make deductions using a knowledge base. Often, ontologies define the domain concepts and relationships, while the knowledge base stores the actual represented knowledge. This concept depends on decomposition because knowledge representation is one capability among others, and it supports tasks like retrieval, scene interpretation, and clinical decision support.
Examples:
- Clinical decision support: representing medical concepts and relationships helps answer queries.
- Scene interpretation: encoded relationships can support reasoning about what is where.
✓ Check Your Understanding:
Which statement correctly distinguishes ontology from knowledge base?
Answer: B. Ontology defines concepts and relationships for a domain; knowledge base is the represented body of knowledge
What is a key purpose of knowledge representation?
Answer: B. To enable programs to answer questions and make deductions using a knowledge base
How does this concept connect back to uncertainty and reasoning?
Answer: B. It provides structured information that reasoning can use, even though commonsense breadth can be hard
5. Planning and Decision-Making with Utilities and Policies (From Goals to Actions)
Agents choose actions by maximizing expected utility. Utility measures preference for a situation, and expected utility averages utilities over possible outcomes weighted by probabilities. A policy maps states to decisions. When the environment is uncertain or non-deterministic, probabilistic models support planning. This concept depends on decomposition and reasoning because planning uses reasoning about outcomes and uses representations of knowledge and uncertainty to decide actions.
Examples:
- Autonomous vehicles: choose maneuvers that maximize expected safety and progress.
- Chatbots: choose responses that maximize expected user satisfaction under uncertainty.
✓ Check Your Understanding:
Which option correctly defines expected utility?
Answer: B. The average of utilities over possible outcomes weighted by their probabilities
What is a policy?
Answer: A. A mapping from states to decisions
How does this concept address the earlier issue of combinatorial explosion?
Answer: B. It replaces exhaustive search with models and utility-based selection under uncertainty
Practice Activities
Cause-Effect Chain: Uncertainty to Expected Utility
mediumWrite a cause-effect chain with 3 links: (1) uncertainty exists, (2) exact reasoning over all possibilities is impractical, (3) the agent uses expected utility to choose an action. For each link, name the mechanism (e.g., probability weighting, model-based outcomes).
Cause-Effect Chain: Knowledge Representation to Deduction
mediumPick one application (clinical decision support or scene interpretation). Create a cause-effect chain explaining how ontology/knowledge base structure enables retrieval or deduction, and then connect that deduction to improved decision-making. Include at least one explicit mention of 'concepts and relationships' or 'knowledge base' in your chain.
Cause-Effect Chain: Combinatorial Explosion to Heuristic/Probabilistic Choice
hardDescribe a scenario where the branching factor grows (e.g., route planning with many alternatives). Build a cause-effect chain: combinatorial explosion occurs, exhaustive search becomes impractical, and the agent uses heuristics or probabilistic decision-making to act. End by stating how this leads to a policy choice.
Cause-Effect Chain: Goals to Techniques (Avoiding a Common Confusion)
mediumCreate a cause-effect chain that explicitly corrects the confusion between goals and techniques. Example structure: goal is 'perceive and decide safely' -> technique might be 'search/optimization/logic/neural networks' -> outcome is improved action selection. Your chain must include at least one technique term and one capability term.
Next Steps
Related Topics:
- Markov decision processes and policies
- Machine learning paradigms (supervised, unsupervised, reinforcement, deep learning)
- NLP with embeddings and transformer architectures
- Perception and computer vision
- AI techniques: search, optimization, and logic
- Affective/social intelligence and safety limitations
- AI controversies and regulation concerns
Practice Suggestions:
- For each core concept, write one real-world example and explicitly name the capability (goal) and the technique (method).
- Practice building 3-link cause-effect chains that connect uncertainty, decision criteria (expected utility), and action selection (policy).
- Create a mini-ontology for a small domain (e.g., a library or course catalog) and then list what would go into the knowledge base.
Cheat Sheet
Cheat Sheet: Artificial Intelligence (AI)
Key Terms
- Artificial General Intelligence (AGI)
- AI that can complete virtually any cognitive task at least as well as a human.
- AI winter
- A period of disappointment and loss of funding in AI history.
- Combinatorial explosion
- A problem-growth effect where reasoning or search becomes exponentially slower as the number of possibilities increases.
- Ontology
- A representation of knowledge as concepts within a domain and the relationships between them.
- Knowledge base
- A body of knowledge represented in a form that a program can use.
- Rational agent
- An agent with goals or preferences that takes actions to make them happen.
- Utility and expected utility
- Utility measures how much an agent prefers a situation; expected utility averages utilities over possible outcomes weighted by probabilities.
- Markov decision process (MDP)
- A decision model with a transition model (probabilities of state changes) and a reward function (utility/cost).
- Transformer architecture
- A deep learning architecture using an attention mechanism, widely used in modern NLP.
- Affective computing
- Systems that recognize, interpret, process, or simulate human feelings, emotion, and mood.
Formulas
Expected utility (core decision rule)
EU(action) = Σ_over_outcomes [ P(outcome | action) * Utility(outcome) ]When an agent must choose among actions under uncertainty and outcomes have probabilities.
MDP components (model structure)
MDP = (States, Actions, TransitionModel P(s'|s,a), Reward/Utility R(s,a,s'), Policy π)When modeling sequential decision-making with probabilistic state transitions and rewards.
Gradient descent (local optimization update)
θ := θ - η * ∇_θ L(θ)When training models by minimizing a loss function using iterative parameter updates (often via backpropagation).
Main Concepts
AI as goal-directed intelligent behavior
AI is computational capability to perform tasks associated with human intelligence by learning, reasoning, perceiving, and deciding to achieve defined goals.
Subproblem decomposition of intelligence
General intelligence is broken into capabilities like reasoning, knowledge representation, planning, learning, language, and perception.
Reasoning under uncertainty and combinatorial explosion
Exact reasoning/search can become exponentially slow, and real problems often have incomplete or uncertain information.
Knowledge representation and knowledge bases
Encode concepts and relationships so programs can answer questions and make deductions using a structured knowledge base (often via ontologies).
Planning and decision-making with utilities and policies
Agents choose actions by maximizing expected utility using models of outcomes and policies mapping states to decisions.
Machine learning paradigms
Learning improves performance via supervised, unsupervised, and reinforcement learning, including deep learning with neural networks.
NLP with modern deep learning architectures
Modern NLP uses learned representations and transformer-based models; GPT-style models generate text from prompts.
Perception from sensors and computer vision
Perception infers world aspects from sensor inputs; computer vision focuses specifically on visual analysis.
AI techniques: search, optimization, and logic
State-space search, local optimization (e.g., gradient descent), and formal logic support problem-solving and reasoning.
Affective/social intelligence and trust limitations
Affective computing can recognize or simulate emotions, but human-like cues can create unrealistic user beliefs about true intelligence.
Memory Tricks
Combinatorial explosion
Think: “Paths multiply, time explodes.” If the number of possibilities grows fast, exact search becomes impractical.
Expected utility vs utility
“Utility is one outcome; expected utility is the average over outcomes.” EU = probability-weighted utility.
Ontology vs knowledge base
“Ontology = the map legend (concepts and relations). Knowledge base = the filled-in facts.”
Transformer intuition
“Attention picks what matters in the sequence.” Transformers use attention to build strong representations for language.
Affective computing trust trap
“Feelings cues can fool you.” Emotional or human-like interaction can increase perceived competence beyond actual intelligence.
Quick Facts
- AI was founded as an academic discipline in 1956.
- High-profile applications include advanced web search, chatbots/virtual assistants, autonomous vehicles, and strategy game play/analysis (chess, Go).
- Generative AI became widely available in the 2020s for generating images, audio, and videos from text prompts.
- Funding and interest increased substantially after 2012 due to GPUs accelerating neural networks and deep learning outperforming prior techniques.
- Growth accelerated further after 2017 with the transformer architecture.
- In NLP, 2019 marks the start of GPT-style generative pre-trained transformer models producing coherent text.
- By 2023, these models achieved human-level scores on exams such as the bar exam, SAT, and GRE (as stated in the text).
Common Mistakes
Common Mistakes: AI Goals, Methods, Foundations, and Decision Models
Students claim that “AI is just neural networks” or “AI is just search,” treating a technique as the definition of AI.
conceptual · high severity
▼
Students claim that “AI is just neural networks” or “AI is just search,” treating a technique as the definition of AI.
conceptual · high severity
Why it happens:
They start from a familiar success (e.g., deep learning, transformers, search engines) and generalize it into the overall goal. They confuse “AI as goal-directed intelligent behavior” with one implementation method (“AI techniques: search, optimization, and logic” or “machine learning paradigms”). This matches the confusion: confusing AI goals (learning/reasoning/etc.) with specific algorithms/techniques (search/logic/neural networks).
✓ Correct understanding:
AI is the capability of computational systems to perform tasks associated with human intelligence by learning, reasoning, perceiving, and deciding to maximize chances of achieving defined goals. Then map capabilities to subfields (learning, reasoning, planning, NLP, perception). Finally, choose methods/techniques that can realize those capabilities (search, optimization, logic, neural networks).
How to avoid:
Use a two-layer checklist: (1) Identify the capability being delivered (learning, reasoning, planning, perception, language, decision-making). (2) Identify the technique used to deliver it (search, optimization, logic, neural networks). Never let technique names replace the definition of AI.
Students say supervised learning and unsupervised learning differ only by the model architecture (or that unsupervised learning is “just supervised learning without labels” but with the same objective).
conceptual · medium severity
▼
Students say supervised learning and unsupervised learning differ only by the model architecture (or that unsupervised learning is “just supervised learning without labels” but with the same objective).
conceptual · medium severity
Why it happens:
They focus on the fact that both use neural networks and gradient descent, so they assume the difference is architectural. They miss the core distinction: whether training data includes expected answers (labels) versus unlabeled data. This matches the confusion: thinking supervised and unsupervised differ only by model type.
✓ Correct understanding:
Supervised learning trains on data labeled with expected outputs, so the learning objective compares predictions to known targets. Unsupervised learning trains on unlabeled data, so the objective must be defined without ground-truth answers (e.g., discovering structure, clustering, or learning representations). The key difference is the presence of labeled expected answers and the resulting learning objective, not the network type.
How to avoid:
When classifying learning type, ask: “Are the training examples paired with correct outputs?” If yes, supervised. If no, unsupervised (or self-supervised, which still uses constructed targets). Then connect that to the objective: compare to targets (supervised) versus discover structure/representations (unsupervised).
Students treat expected utility as the same thing as utility, or they ignore probabilities and just pick the action with the highest utility in one imagined outcome.
conceptual · high severity
▼
Students treat expected utility as the same thing as utility, or they ignore probabilities and just pick the action with the highest utility in one imagined outcome.
conceptual · high severity
Why it happens:
They remember “utility is a number” and “expected utility chooses the maximum,” but they drop the “expected” part. They then reason as if the agent evaluates only one outcome. This matches the confusion: assuming expected utility is the same as utility.
✓ Correct understanding:
Utility measures preference for a specific situation. Expected utility averages utilities over possible outcomes weighted by their probabilities. In uncertain environments, the rational choice is the action with maximum expected utility, not maximum utility of a single best-case outcome.
How to avoid:
Always write the expected utility calculation explicitly: EU(action)=sum_over_outcomes(prob(outcome)*utility(outcome)). If you cannot compute or at least conceptually average across outcomes, you are likely confusing utility with expected utility.
Students claim that an ontology and a knowledge base are the same thing, or they swap the terms without understanding the difference in role.
conceptual · medium severity
▼
Students claim that an ontology and a knowledge base are the same thing, or they swap the terms without understanding the difference in role.
conceptual · medium severity
Why it happens:
They see both as “representations of knowledge” and assume they are interchangeable. This matches the confusion: believing knowledge bases and ontologies are identical.
✓ Correct understanding:
An ontology is the set of concepts, relations, and properties that define a domain’s vocabulary and structure. A knowledge base is the body of knowledge represented in a usable form, typically using an ontology’s structure. So: ontology defines the schema/meaning framework; knowledge base contains the actual encoded facts/instances that conform to that framework.
How to avoid:
Use the “schema vs data” framing: ontology = schema (concepts and relations). Knowledge base = data (encoded facts/instances). When unsure, ask: “Where do the specific statements live?” and “Where do the allowed concepts/relations live?”
Students equate computer vision with all perception, concluding that perception in AI only means interpreting images.
conceptual · medium severity
▼
Students equate computer vision with all perception, concluding that perception in AI only means interpreting images.
conceptual · medium severity
Why it happens:
They overgeneralize from the visibility of image-based tasks (object detection, image classification). They ignore that perception includes multiple sensor modalities and tasks beyond vision (e.g., speech recognition). This matches the confusion: equating computer vision with all perception.
✓ Correct understanding:
Perception is the process of inferring aspects of the world from sensor inputs. Computer vision is a subset of perception focused on visual input. Other perception tasks include speech recognition and other sensor-based inference, which may rely on different representations and models but still fall under perception.
How to avoid:
Classify by sensor modality: if the input is visual, it is computer vision. If the input is audio, it is perception via speech/audio processing. Use the hierarchy: perception (broad) → computer vision (visual subset).
Students believe that reasoning under uncertainty is solved by “trying all possibilities exactly,” and they treat combinatorial explosion as a minor performance issue rather than a fundamental barrier to exact reasoning at scale.
conceptual · high severity
▼
Students believe that reasoning under uncertainty is solved by “trying all possibilities exactly,” and they treat combinatorial explosion as a minor performance issue rather than a fundamental barrier to exact reasoning at scale.
conceptual · high severity
Why it happens:
They assume brute-force search is always feasible and that “more computation” will eventually make exact reasoning practical. They underestimate how the number of states/paths grows exponentially with problem size. This matches the cause-effect chain: combinatorial explosion makes exact step-by-step reasoning impractically slow for large problems.
✓ Correct understanding:
Reasoning under uncertainty and combinatorial explosion make exact search over possibilities grow exponentially, so exact step-by-step reasoning becomes impractically slow for large problems. Therefore, practical systems use heuristics, probabilistic decision-making, and approximate methods. Uncertainty handling often uses probability/economics to choose actions that maximize expected utility rather than exhaustively enumerating all possibilities.
How to avoid:
When you hear “exact reasoning,” immediately ask: “What is the growth rate of the search space?” If it is exponential, shift to approximate/heuristic/probabilistic reasoning and connect it to expected utility and policies rather than exhaustive enumeration.
Students conclude that because an AI system shows human-like emotions or conversational style, it must have genuine affective intelligence and therefore higher real competence.
conceptual · medium severity
▼
Students conclude that because an AI system shows human-like emotions or conversational style, it must have genuine affective intelligence and therefore higher real competence.
conceptual · medium severity
Why it happens:
They focus on surface cues (tone, empathy language, humor) and infer underlying capability. This matches the cause-effect chain: affective computing can simulate or recognize emotions, which can increase perceived competence even when underlying intelligence is limited. They also confuse “affective/social intelligence” with overall intelligence.
✓ Correct understanding:
Affective computing recognizes or simulates human feelings in interaction, but it can mislead users about the agent’s true intelligence. Affective/social intelligence is a specific capability within AI, not a guarantee of general reasoning, planning, or decision quality. Users may develop an unrealistic conception of the intelligence of computer agents due to human-like emotional cues.
How to avoid:
Separate “interaction style” from “capability.” Evaluate competence on task performance tied to goals (reasoning, planning, decision-making, knowledge representation), not only on emotional cues. Use the hierarchy: affective/social intelligence is one subfield, not the definition of AI competence.
General Tips
- Use a hierarchy-first approach: start from AI as goal-directed intelligent behavior, then identify the capability (learning/reasoning/planning/perception/NLP), then select the technique (search/optimization/logic/neural networks).
- When uncertainty is involved, always distinguish utility from expected utility and explicitly account for probabilities.
- For knowledge representation, keep ontology (schema) separate from knowledge base (encoded facts).
- For perception, remember computer vision is a subset of perception focused on visual input.
- For reasoning at scale, watch for combinatorial explosion; expect approximate and probabilistic methods rather than exact enumeration.