Databases and Database Management Systems (DBMS)

Summary

Databases and DBMS terminology matters because people often mix up the data with the software. A database is organized data accessed through a DBMS, and together with related applications it forms a database system. This distinction connects directly to what the DBMS must do: provide core capabilities for data definition, update, retrieval, and administration. DBMS functional groups form the operational backbone. Data definition creates and changes how data is organized; update inserts, modifies, and deletes data; retrieval selects data by criteria and returns it (possibly transformed); administration manages security, integrity, concurrency control, and recovery. These functions connect to storage and deployment decisions, because performance and reliability depend on how the server is physically deployed. Database models classify how data is structured and how queries are expressed. Relational models organize data into tables with rows and columns, while NoSQL uses different models and query languages (such as key–value and document-oriented). This classification matters because it determines how applications access data and how the DBMS can optimize execution. In relational systems, the relational model foundations explain why primary keys and declarative querying are central. Primary keys are logical identifiers used for cross-table relationships, not disk addresses, which enables tables to move or resize without expensive reorganization. Declarative queries ask for what data is needed, so the DBMS chooses access paths, enabling query optimization. Normalization further matters by storing each fact once, replacing repeating internal structures with multiple related tables. Views connect here by offering alternative presentations that may not be directly updatable. Historically, database technology evolved through navigational eras, then SQL/relational dominance, then post-relational approaches. Navigational models like hierarchical and CODASYL/Network rely on pointers and explicit navigation, contrasting with relational content-based access. Finally, NoSQL’s late-2000s rise reflects post-relational designs that prioritize different data models and access patterns, while NewSQL aims to retain relational/SQL semantics with improved performance.

Topic Summary

Core Terminology: Database vs DBMS vs Database System

Clarify that a database is organized data, while a DBMS is the software that provides access to that data. A database system includes the database, the DBMS, and associated applications. This topic prevents the common confusion where people treat “database” as only the software. It also sets up why DBMS capabilities matter in later topics.

DBMS Functional Groups: Definition, Update, Retrieval, Administration

Explain the four DBMS capability groups: data definition, update, retrieval, and administration. Connect retrieval to query answering and possible transformation, while administration covers security, integrity, concurrency control, and recovery. This topic links directly to how different models (relational vs NoSQL) still need these capabilities. It also prepares you for server architecture and deployment decisions.

Database Models and Classification: Relational, NoSQL, and Beyond

Introduce database model conformance: a database and its DBMS follow the rules of a chosen model. Contrast relational systems (tables and SQL-style declarative querying) with NoSQL systems (different data models and query languages). This topic connects to relational foundations and NoSQL query differences. It also motivates why “post-relational” appears in the technology history topic.

Relational Model Foundations: Tables, Keys, Declarative Queries, Normalization

Build the relational mental model: data is organized into tables, relationships use primary keys, and queries are declarative (ask for what, not how). Explain how normalization stores each fact once, replacing internal repeating structures with multiple related tables. Connect this to joins and to why the DBMS can optimize access paths. This topic also reduces confusion about primary keys being physical disk addresses.

Relational Databases and SQL: Rows, Columns, Joins, and Views

Connect relational foundations to practical querying: tables of rows and columns, joins based on key relationships, and declarative query semantics. Explain normalization’s effect on schema design and how views can provide alternative presentations that may not be directly updatable. Tie this to the DBMS retrieval group and to query optimization enabled by declarative logic. This topic also contrasts relational content-based access with navigational access.

NoSQL Databases: Key-Value, Document Stores, and Query Language Differences

Describe NoSQL as a post-relational family that became popular in the late 2000s, using models like key-value and document-oriented data. Emphasize that NoSQL does not necessarily use SQL; it uses different query languages and structures. Connect this to the classification topic and to the idea of model conformance. This topic also helps you compare how retrieval and updates are expressed across models.

Technology Eras and Access Paradigms: Navigational vs SQL/Relational vs Post-Relational

Place database evolution into eras: navigational, SQL/relational, and post-relational. Explain the key shift: navigational systems guide how to follow links, while relational systems ask for what data is needed and let the DBMS choose access paths. Connect this to declarative vs navigational access and to why NoSQL is described as post-relational. This topic provides historical intuition for the model differences.

Navigational Models and Physical Deployment: Hierarchical/CODASYL and Server Architecture

Cover navigational models (hierarchical and CODASYL/network) where applications navigate pointers/sets or scan records, contrasting with relational declarative querying. Then connect DBMS functional groups to physical deployment: how server architecture and storage choices affect how definition, update, retrieval, and administration are implemented. Finally, link to deployment options like file systems, clusters, and cloud storage. This topic integrates conceptual model choices with real-world system placement.

Key Insights

Keys Enable Physical Freedom

Because relationships use primary keys rather than disk addresses, the database can be reorganized physically without rewriting every cross-table link. This means logical design choices (keys) directly reduce the cost of physical storage changes.

Why it matters: Students often treat keys as just identifiers for queries; this reframes keys as a mechanism that decouples logical relationships from physical layout, enabling easier evolution of storage.

Declarative Queries Move Complexity

Relational systems do not merely change how users ask questions; they shift the burden of choosing efficient access paths into the DBMS. As a result, “what” is requested becomes separable from “how” it is executed, which is why query optimization becomes central.

Why it matters: This counters the intuition that relational vs navigational is only about syntax. It shows that the real architectural difference is who decides the execution strategy.

Normalization Changes Update Shape

Normalization replaces internal repeating or nested structures with multiple relations connected by logical keys. That implies updates become more like coordinated changes across tables rather than editing one complex record, which can simplify correctness but complicate multi-step writes.

Why it matters: Students may think normalization only improves storage efficiency. This insight highlights a deeper tradeoff: normalization reshapes the update workflow and therefore interacts with the DBMS’s update and administration responsibilities.

NoSQL Is Access-Pattern First

The post-relational rise of NoSQL is described as a next-generation shift in data models and query languages, not as a simple SQL replacement. That implies the dominant design driver becomes the expected access patterns, so the “shape” of queries and stored data co-evolve.

Why it matters: Instead of viewing NoSQL as a different database language, students learn to see it as a different optimization target: matching model and query language to how data is actually accessed.

Eras Reflect Hardware-Driven Semantics

The technology eras are tied to storage and deployment capabilities: direct-access media enabled interactive retrieval, and later hardware improvements allowed relational dominance at scale. This implies database “semantics” (navigational vs declarative) became practical only when the underlying system could support the required access behavior efficiently.

Why it matters: Students often treat eras as purely conceptual history. This reframes them as co-evolution of data models with feasible physical execution, explaining why certain paradigms could not dominate earlier.

Conclusions

Bringing It All Together

A database is organized data accessed through a DBMS, and together with applications it forms a database system, so the terminology clarifies what is software versus what is data. DBMS functional groups (data definition, update, retrieval, administration) explain how the system supports the full lifecycle of data, while database models and classification determine how data is structured and queried. The relational model builds on tables, primary keys, and declarative querying, where normalization splits facts into multiple relations and views provide tailored presentations; this connects to the idea that relational systems do not require navigating links because the DBMS selects access paths. Navigational models (hierarchical and CODASYL/network) instead emphasize pointers and traversal, matching the earlier navigational era, while the history shows how direct-access storage and later hardware enabled SQL/relational dominance. Post-relational approaches such as NoSQL shift to different data models and query languages (key-value, document-oriented), reflecting new access patterns and motivating later efforts like NewSQL to retain relational semantics with improved performance.

Key Takeaways

•Understand terminology first: a database is data, a DBMS is the software, and a database system includes the database, the DBMS, and applications.
•Learn DBMS functional groups as the backbone of any design: data definition, update, retrieval, and administration map to how systems create, change, read, and protect data.
•Grasp the relational model core: tables plus primary keys enable joins, declarative queries let the DBMS optimize access paths, and normalization stores each fact once.
•Contrast access paradigms: declarative relational querying answers for what data is needed, while navigational models guide how to traverse links using pointers or keys.
•Recognize why NoSQL exists: post-relational databases popularized in the late 2000s use different structures and query languages to match alternative access patterns beyond SQL.

Real-World Applications

•Design a contact management system by normalizing user data into separate relations (user, address, phone) and creating optional records only when information exists.
•Implement a navigational-style lookup in a legacy-like application where records are found by a primary key, then related records are reached by traversing sets or scanning sequentially.
•Use a key-value store for fast retrieval of items by identifier (for example, caching or indexing), reflecting the NoSQL key-value approach described for late-2000s systems.
•Represent semi-structured documents (for example, profiles or configurations) using a document-oriented NoSQL model, aligning with the post-relational shift to different data models and query languages.

Next, you should build deeper competence in relational query processing and design: focus on declarative SQL semantics, join behavior, and normalization tradeoffs, then connect those ideas to physical deployment choices (file systems, clusters, cloud) and server architecture, because those determine performance, concurrency, and recovery in real DBMS deployments.

💻 Code Examples

Relational data modeling with SQL: users, addresses, and phones (normalized tables + joins)

sql

Code

/* Relational model: separate tables per entity, linked by logical keys (primary/foreign keys). */

-- Data definition (DBMS function group: data definition)
CREATE TABLE users (
  user_id INTEGER PRIMARY KEY,
  full_name TEXT NOT NULL
);

CREATE TABLE addresses (
  address_id INTEGER PRIMARY KEY,
  user_id INTEGER NOT NULL,
  street TEXT NOT NULL,
  city TEXT NOT NULL,
  FOREIGN KEY (user_id) REFERENCES users(user_id)
);

CREATE TABLE phones (
  phone_id INTEGER PRIMARY KEY,
  user_id INTEGER NOT NULL,
  phone_number TEXT NOT NULL,
  FOREIGN KEY (user_id) REFERENCES users(user_id)
);

-- Update (DBMS function group: update)
INSERT INTO users(user_id, full_name) VALUES
  (1, 'Avery Chen'),
  (2, 'Sam Rivera');

-- Optional tables only when data exists (matches the document example)
INSERT INTO addresses(address_id, user_id, street, city) VALUES
  (10, 1, '12 Market St', 'Springfield');

INSERT INTO phones(phone_id, user_id, phone_number) VALUES
  (100, 1, '+1-555-0101'),
  (101, 2, '+1-555-0202');

-- Retrieval (DBMS function group: retrieval) using declarative SQL joins
-- Query expresses WHAT data is needed, not the access path.
SELECT
  u.user_id,
  u.full_name,
  a.street,
  a.city,
  p.phone_number
FROM users u
LEFT JOIN addresses a ON a.user_id = u.user_id  -- logical key relationship
LEFT JOIN phones p ON p.user_id = u.user_id;    -- logical key relationship

-- Optional: view to present different shapes for different users
CREATE VIEW user_contact_view AS
SELECT
  u.user_id,
  u.full_name,
  a.city,
  p.phone_number
FROM users u
LEFT JOIN addresses a ON a.user_id = u.user_id
LEFT JOIN phones p ON p.user_id = u.user_id;

-- Retrieval from the view
SELECT * FROM user_contact_view ORDER BY user_id;

Explanation

This example implements the relational model described in the document: data is normalized into separate tables (users, addresses, phones) and linked using logical keys rather than disk addresses. The INSERT statements demonstrate update operations, while the SELECT with LEFT JOIN demonstrates retrieval by content using declarative SQL. The query does not “navigate” record-to-record; instead, it states what combined result is required. The optional CREATE VIEW shows how relational systems can present data in different forms for different users without changing the underlying normalized storage.

Use Case

A customer support system stores user profiles and contact methods. Normalizing addresses and phone numbers prevents duplication and makes updates safer when a user changes only one contact attribute.

Output

Sample output rows (order may vary):
1 | Avery Chen | 12 Market St | Springfield | +1-555-0101
2 | Sam Rivera | NULL | NULL | +1-555-0202

💻 Code Practice Problems

Problem 1: Design a normalized relational schema for a small bookstore....medium

Design a normalized relational schema for a small bookstore. Create three tables: customers, orders, and order_items. Each order belongs to exactly one customer, and each order can have multiple order_items. Then insert sample data for 3 customers and several orders (some customers may have no orders). Finally, write a SELECT query that returns one row per order item, including customer name, order date, product name, and quantity. Use LEFT JOIN so that customers with no orders still appear (with NULL order and item fields).

💡 Show Hints (3)

• Use primary keys for entity tables and foreign keys to link orders to customers and order_items to orders.
• To include customers with no orders, start from customers and use LEFT JOIN to orders, then LEFT JOIN to order_items.
• Because LEFT JOIN can produce NULLs, expect NULLs in order_date, product_name, and quantity for customers without orders.

✓ Reveal Solution

Solution Code:

/* Bookstore relational model: normalized tables + joins */

-- Data definition
CREATE TABLE customers (
  customer_id INTEGER PRIMARY KEY,
  customer_name TEXT NOT NULL
);

CREATE TABLE orders (
  order_id INTEGER PRIMARY KEY,
  customer_id INTEGER NOT NULL,
  order_date TEXT NOT NULL,
  FOREIGN KEY (customer_id) REFERENCES customers(customer_id)
);

CREATE TABLE order_items (
  item_id INTEGER PRIMARY KEY,
  order_id INTEGER NOT NULL,
  product_name TEXT NOT NULL,
  quantity INTEGER NOT NULL,
  FOREIGN KEY (order_id) REFERENCES orders(order_id)
);

-- Update (insert sample data)
INSERT INTO customers(customer_id, customer_name) VALUES
  (1, 'Jordan Lee'),
  (2, 'Priya Nair'),
  (3, 'Mateo Garcia');

INSERT INTO orders(order_id, customer_id, order_date) VALUES
  (10, 1, '2026-01-15'),
  (11, 1, '2026-02-03'),
  (12, 2, '2026-02-20');

INSERT INTO order_items(item_id, order_id, product_name, quantity) VALUES
  (100, 10, 'Notebook', 2),
  (101, 10, 'Pen', 5),
  (102, 11, 'Backpack', 1),
  (103, 12, 'Eraser', 3);

-- Retrieval using declarative SQL joins
SELECT
  c.customer_id,
  c.customer_name,
  o.order_id,
  o.order_date,
  oi.product_name,
  oi.quantity
FROM customers c
LEFT JOIN orders o ON o.customer_id = c.customer_id
LEFT JOIN order_items oi ON oi.order_id = o.order_id
ORDER BY c.customer_id, o.order_id, oi.item_id;

Expected Output:

customer_id | customer_name | order_id | order_date  | product_name | quantity
1            | Jordan Lee    | 10        | 2026-01-15  | Notebook      | 2
1            | Jordan Lee    | 10        | 2026-01-15  | Pen           | 5
1            | Jordan Lee    | 11        | 2026-02-03  | Backpack      | 1
2            | Priya Nair     | 12        | 2026-02-20  | Eraser        | 3
3            | Mateo Garcia   | NULL      | NULL         | NULL          | NULL

The schema is normalized: customers store customer identity, orders store order-level data, and order_items store product-level line items. The query starts from customers and uses LEFT JOIN to orders and then to order_items. This expresses the requested combined result declaratively: for each customer, include matching orders and matching items when they exist; otherwise, the joined columns from orders and order_items remain NULL.

Problem 2: Create a normalized schema for a sports league. Tables: team...hard

Create a normalized schema for a sports league. Tables: teams, matches, and match_events. Each match has exactly two teams (home_team_id and away_team_id). Each match can have multiple match_events (goals, cards, substitutions). Then insert sample data for 3 teams and 2 matches. Write a query that returns, for each match, the home team name, away team name, total goals scored by the home team, total goals scored by the away team, and a comma-separated list of all event types in chronological order. Use LEFT JOIN so that matches with no events still appear. Assume match_events has columns: event_id, match_id, minute, event_type, scoring_team_id (NULL unless the event is a goal), and scoring_team_id references teams(team_id).

💡 Show Hints (3)

• Model goals by summing events where event_type = 'GOAL' and scoring_team_id matches the relevant team.
• To keep matches with no events, LEFT JOIN matches to match_events and use conditional aggregation with CASE expressions.
• For the ordered comma-separated list, use a string aggregation function with ORDER BY (for example, GROUP_CONCAT in SQLite).

✓ Reveal Solution

Solution Code:

/* Sports league relational model: conditional aggregation + ordered event list */

-- Data definition
CREATE TABLE teams (
  team_id INTEGER PRIMARY KEY,
  team_name TEXT NOT NULL
);

CREATE TABLE matches (
  match_id INTEGER PRIMARY KEY,
  home_team_id INTEGER NOT NULL,
  away_team_id INTEGER NOT NULL,
  match_date TEXT NOT NULL,
  FOREIGN KEY (home_team_id) REFERENCES teams(team_id),
  FOREIGN KEY (away_team_id) REFERENCES teams(team_id)
);

CREATE TABLE match_events (
  event_id INTEGER PRIMARY KEY,
  match_id INTEGER NOT NULL,
  minute INTEGER NOT NULL,
  event_type TEXT NOT NULL,
  scoring_team_id INTEGER,
  FOREIGN KEY (match_id) REFERENCES matches(match_id),
  FOREIGN KEY (scoring_team_id) REFERENCES teams(team_id)
);

-- Update (insert sample data)
INSERT INTO teams(team_id, team_name) VALUES
  (1, 'North FC'),
  (2, 'East United'),
  (3, 'South City');

INSERT INTO matches(match_id, home_team_id, away_team_id, match_date) VALUES
  (1000, 1, 2, '2026-03-01'),
  (1001, 2, 3, '2026-03-08');

-- Match 1000 has events; match 1001 has no events (to test LEFT JOIN)
INSERT INTO match_events(event_id, match_id, minute, event_type, scoring_team_id) VALUES
  (1, 1000, 12, 'GOAL', 1),
  (2, 1000, 25, 'YELLOW_CARD', NULL),
  (3, 1000, 40, 'GOAL', 2),
  (4, 1000, 78, 'GOAL', 1),
  (5, 1000, 88, 'RED_CARD', NULL);

-- Retrieval: one row per match with conditional aggregation and ordered event list
SELECT
  m.match_id,
  ht.team_name AS home_team,
  at.team_name AS away_team,
  SUM(CASE WHEN me.event_type = 'GOAL' AND me.scoring_team_id = ht.team_id THEN 1 ELSE 0 END) AS home_goals,
  SUM(CASE WHEN me.event_type = 'GOAL' AND me.scoring_team_id = at.team_id THEN 1 ELSE 0 END) AS away_goals,
  GROUP_CONCAT(me.event_type, ',') AS event_types_in_order
FROM matches m
JOIN teams ht ON ht.team_id = m.home_team_id
JOIN teams at ON at.team_id = m.away_team_id
LEFT JOIN match_events me ON me.match_id = m.match_id
GROUP BY
  m.match_id,
  ht.team_id,
  ht.team_name,
  at.team_id,
  at.team_name
ORDER BY m.match_id;

Expected Output:

match_id | home_team   | away_team   | home_goals | away_goals | event_types_in_order
1000      | North FC    | East United | 2           | 1           | GOAL,YELLOW_CARD,GOAL,GOAL,RED_CARD
1001      | East United | South City  | 0           | 0           | NULL

The query produces one row per match by grouping on match and team identities. LEFT JOIN ensures matches without events still appear; in that case, the joined event columns are NULL. Conditional aggregation counts goals for the home and away teams by checking both event_type and scoring_team_id. The event list is built with GROUP_CONCAT over match_events; because the sample uses SQLite-style GROUP_CONCAT, it concatenates event_type values. (If your DB supports ORDER BY inside aggregation, you can add it to guarantee chronological ordering.)

SQL query optimization mindset: rewriteable declarative queries with consistent results

sql

Code

/* Demonstrates declarative retrieval and how equivalent logical queries can be rewritten. */

CREATE TABLE users (
  user_id INTEGER PRIMARY KEY,
  full_name TEXT NOT NULL
);

CREATE TABLE phones (
  phone_id INTEGER PRIMARY KEY,
  user_id INTEGER NOT NULL,
  phone_number TEXT NOT NULL,
  FOREIGN KEY (user_id) REFERENCES users(user_id)
);

INSERT INTO users(user_id, full_name) VALUES
  (1, 'Avery Chen'),
  (2, 'Sam Rivera'),
  (3, 'Jordan Patel');

INSERT INTO phones(phone_id, user_id, phone_number) VALUES
  (100, 1, '+1-555-0101'),
  (101, 2, '+1-555-0202');

-- Retrieval: declarative query for users who have at least one phone
-- (Query optimization can choose different physical plans.)
SELECT u.user_id, u.full_name
FROM users u
WHERE EXISTS (
  SELECT 1
  FROM phones p
  WHERE p.user_id = u.user_id
);

-- Equivalent logical query using JOIN + DISTINCT
-- Both express the same intent: users with matching phone rows.
SELECT DISTINCT u.user_id, u.full_name
FROM users u
JOIN phones p ON p.user_id = u.user_id;

-- Another rewrite: filter by phone_number pattern (still declarative)
-- This shows content-based search rather than pointer navigation.
SELECT u.user_id, u.full_name, p.phone_number
FROM users u
JOIN phones p ON p.user_id = u.user_id
WHERE p.phone_number LIKE '+1-555-0%';

-- Update: add a phone for Jordan, then retrieval changes automatically
INSERT INTO phones(phone_id, user_id, phone_number)
VALUES (102, 3, '+1-555-0303');

-- Retrieval again after update
SELECT u.user_id, u.full_name
FROM users u
WHERE EXISTS (
  SELECT 1 FROM phones p WHERE p.user_id = u.user_id
)
ORDER BY u.user_id;

Explanation

This example focuses on the document’s relational idea: applications ask for data by content using declarative SQL, and the DBMS can optimize execution. Two retrieval queries are logically equivalent: one uses EXISTS, the other uses JOIN with DISTINCT. A DBMS can rewrite them internally to produce the same result efficiently. The final INSERT demonstrates the update/retrieval loop: after adding a phone row, the retrieval query automatically includes the new user without any manual navigation logic.

Use Case

A billing or contact verification workflow needs to list only users who have reachable phone numbers. Using declarative SQL keeps the application simple while the DBMS handles efficient access paths.

Output

Initial retrieval output:
1 | Avery Chen
2 | Sam Rivera
After update:
1 | Avery Chen
2 | Sam Rivera
3 | Jordan Patel

💻 Code Practice Problems

Problem 1: Create a small relational schema for customers and orders. T...medium

Create a small relational schema for customers and orders. Then write two logically equivalent declarative queries that return the same set of customers: customers who have at least one order. Use (1) EXISTS and (2) JOIN with DISTINCT. Finally, insert a new order for a customer who previously had no orders, and re-run both queries to show that the result updates automatically without changing any query logic.

💡 Show Hints (3)

• Model the relationship with a foreign key from orders to customers, then express "customers with at least one order" as an existence condition.
• For the EXISTS version, correlate the subquery using the outer customer key (e.g., o.customer_id = c.customer_id).
• For the JOIN version, use DISTINCT to avoid duplicate customers when multiple orders exist for the same customer.

✓ Reveal Solution

Solution Code:

/* Problem 1: EXISTS vs JOIN+DISTINCT for customers with at least one order */

DROP TABLE IF EXISTS orders;
DROP TABLE IF EXISTS customers;

CREATE TABLE customers (
  customer_id INTEGER PRIMARY KEY,
  full_name TEXT NOT NULL
);

CREATE TABLE orders (
  order_id INTEGER PRIMARY KEY,
  customer_id INTEGER NOT NULL,
  order_total_cents INTEGER NOT NULL,
  FOREIGN KEY (customer_id) REFERENCES customers(customer_id)
);

INSERT INTO customers(customer_id, full_name) VALUES
  (1, 'Avery Chen'),
  (2, 'Sam Rivera'),
  (3, 'Jordan Patel'),
  (4, 'Taylor Brooks');

INSERT INTO orders(order_id, customer_id, order_total_cents) VALUES
  (10, 1, 2500),
  (11, 1, 9900),
  (12, 2, 1500);

-- Query A (EXISTS): customers who have at least one order
SELECT c.customer_id, c.full_name
FROM customers c
WHERE EXISTS (
  SELECT 1
  FROM orders o
  WHERE o.customer_id = c.customer_id
)
ORDER BY c.customer_id;

-- Query B (JOIN + DISTINCT): logically equivalent set of customers
SELECT DISTINCT c.customer_id, c.full_name
FROM customers c
JOIN orders o ON o.customer_id = c.customer_id
ORDER BY c.customer_id;

-- Update: add a new order for a customer who previously had no orders
INSERT INTO orders(order_id, customer_id, order_total_cents)
VALUES (13, 4, 4200);

-- Re-run Query A
SELECT c.customer_id, c.full_name
FROM customers c
WHERE EXISTS (
  SELECT 1
  FROM orders o
  WHERE o.customer_id = c.customer_id
)
ORDER BY c.customer_id;

-- Re-run Query B
SELECT DISTINCT c.customer_id, c.full_name
FROM customers c
JOIN orders o ON o.customer_id = c.customer_id
ORDER BY c.customer_id;

Expected Output:

First pair of queries (before the insert) should both return:
customer_id | full_name
1 | Avery Chen
2 | Sam Rivera

Second pair of queries (after inserting order_id 13 for customer_id 4) should both return:
customer_id | full_name
1 | Avery Chen
2 | Sam Rivera
4 | Taylor Brooks

The schema uses a foreign key so each order points to a customer. "Customers with at least one order" can be expressed declaratively in two equivalent ways. The EXISTS query checks whether at least one matching order row exists for each customer. The JOIN query matches customers to their orders, but because customers can have multiple orders, DISTINCT is required to collapse duplicates. After inserting a new order for customer_id = 4, both queries automatically include that customer because the underlying data now satisfies the same declarative condition.

Problem 2: Create a schema for authors and books, where each book has a...hard

Create a schema for authors and books, where each book has a publication year and each author can have multiple books. Write two declarative queries that return the same set of authors under a more complex condition: Condition: Return authors who have at least one book published in 2010 or later AND have at least one book published before 2010. You must implement: 1) An EXISTS-based query using two correlated EXISTS subqueries. 2) A JOIN-based query that achieves the same logic using joins and DISTINCT (you may use two joins to the books table with different year filters). After creating and populating the data, run both queries and confirm they return identical author sets.

💡 Show Hints (3)

• Think of the condition as the intersection of two independent existence requirements: (has any book >= 2010) AND (has any book < 2010).
• For the EXISTS solution, use two separate correlated EXISTS clauses, each with its own year predicate.
• For the JOIN solution, you can join books twice with different aliases and year filters, then DISTINCT authors to remove duplicates.

✓ Reveal Solution

Solution Code:

/* Problem 2: Authors with books both before 2010 and in 2010 or later */

DROP TABLE IF EXISTS books;
DROP TABLE IF EXISTS authors;

CREATE TABLE authors (
  author_id INTEGER PRIMARY KEY,
  full_name TEXT NOT NULL
);

CREATE TABLE books (
  book_id INTEGER PRIMARY KEY,
  author_id INTEGER NOT NULL,
  title TEXT NOT NULL,
  publication_year INTEGER NOT NULL,
  FOREIGN KEY (author_id) REFERENCES authors(author_id)
);

INSERT INTO authors(author_id, full_name) VALUES
  (1, 'Avery Chen'),
  (2, 'Sam Rivera'),
  (3, 'Jordan Patel'),
  (4, 'Taylor Brooks'),
  (5, 'Morgan Lee');

-- Data setup:
-- author 1: has both <2010 and >=2010
-- author 2: only >=2010
-- author 3: only <2010
-- author 4: has both
-- author 5: no books
INSERT INTO books(book_id, author_id, title, publication_year) VALUES
  (100, 1, 'Old Tales', 2005),
  (101, 1, 'New Horizons', 2012),

  (102, 2, 'Future Plans', 2018),
  (103, 2, 'Modern Systems', 2020),

  (104, 3, 'Ancient Myths', 1999),

  (105, 4, 'Pre-2010 Volume', 2001),
  (106, 4, 'Post-2010 Volume', 2011);

-- Query A (EXISTS): intersection of two existence conditions
SELECT a.author_id, a.full_name
FROM authors a
WHERE EXISTS (
  SELECT 1
  FROM books b
  WHERE b.author_id = a.author_id
    AND b.publication_year >= 2010
)
AND EXISTS (
  SELECT 1
  FROM books b
  WHERE b.author_id = a.author_id
    AND b.publication_year < 2010
)
ORDER BY a.author_id;

-- Query B (JOIN + DISTINCT): two joins with different year filters
SELECT DISTINCT a.author_id, a.full_name
FROM authors a
JOIN books b_ge ON b_ge.author_id = a.author_id
  AND b_ge.publication_year >= 2010
JOIN books b_lt ON b_lt.author_id = a.author_id
  AND b_lt.publication_year < 2010
ORDER BY a.author_id;

Expected Output:

Both queries must return the same two authors:
author_id | full_name
1 | Avery Chen
4 | Taylor Brooks

The EXISTS query expresses the logic directly: an author qualifies only if there exists at least one book with publication_year >= 2010 and at least one book with publication_year < 2010. The JOIN query recreates the same intersection by joining the authors table to two separate filtered views of books: one join for books in 2010 or later (b_ge) and another join for books before 2010 (b_lt). If both joins succeed for an author, then the author has both types of books. DISTINCT is necessary because multiple qualifying book pairs can create duplicate author rows.

NoSQL key-value style with JSON documents: fast retrieval by key (contrast with relational joins)

javascript

Code

/* Simulates a NoSQL key-value/document store pattern in JavaScript. */

// In a real NoSQL DB, each document is stored by a key (e.g., user_id).
// Here we use an in-memory Map to model the pattern.
const kvStore = new Map();

// Data definition in NoSQL is often schema-flexible (not enforced here).
function upsertUserDocument(userId, doc) {
  // Update: insert or replace the document by key
  kvStore.set(String(userId), doc);
}

// Retrieval by key: content is fetched via the key, not via joins.
function getUserContactByKey(userId) {
  const doc = kvStore.get(String(userId));
  if (!doc) return null;

  // Document-oriented shape: address and phones embedded
  return {
    user_id: userId,
    full_name: doc.full_name,
    address: doc.address ?? null,
    phones: doc.phones ?? []
  };
}

// Seed documents (NoSQL style: embed related data)
upsertUserDocument(1, {
  full_name: 'Avery Chen',
  address: { street: '12 Market St', city: 'Springfield' },
  phones: ['+1-555-0101']
});

upsertUserDocument(2, {
  full_name: 'Sam Rivera',
  phones: ['+1-555-0202']
});

// Retrieval: fast key-based access
console.log('User 1:', getUserContactByKey(1));
console.log('User 2:', getUserContactByKey(2));
console.log('User 3 (missing):', getUserContactByKey(3));

// Update: add embedded address without touching separate tables
upsertUserDocument(2, {
  full_name: 'Sam Rivera',
  address: { street: '9 Elm St', city: 'Shelbyville' },
  phones: ['+1-555-0202']
});

console.log('User 2 after address update:', getUserContactByKey(2));

Explanation

This example models the NoSQL document/key-value approach mentioned in the document: data is retrieved quickly by key, and related information is often embedded inside a single document. The Map acts like a key-value store where user_id is the key. The upsertUserDocument function demonstrates update by replacing the document. The getUserContactByKey function demonstrates retrieval without joins, because address and phones are already inside the document. This contrasts with the relational approach where addresses and phones live in separate tables and are combined via SQL joins.

Use Case

A profile service in a high-throughput system may store each user as a single document to serve “get profile by user_id” requests with minimal latency.

Output

User 1: { user_id: 1, full_name: 'Avery Chen', address: { street: '12 Market St', city: 'Springfield' }, phones: [ '+1-555-0101' ] }
User 2: { user_id: 2, full_name: 'Sam Rivera', address: null, phones: [ '+1-555-0202' ] }
User 3 (missing): null
User 2 after address update: { user_id: 2, full_name: 'Sam Rivera', address: { street: '9 Elm St', city: 'Shelbyville' }, phones: [ '+1-555-0202' ] }

💻 Code Practice Problems

Problem 1: Simulate a NoSQL key-value/document store using a JavaScript...medium

Simulate a NoSQL key-value/document store using a JavaScript Map. Implement two functions: (1) upsertOrderDocument(orderId, doc) that inserts or replaces an order document by key, and (2) getOrderSummaryByKey(orderId) that retrieves the document by key and returns a summary object. The summary must embed related fields inside one returned object (no joins). If the order is missing, return null. If optional fields are missing, apply defaults: address should be null and items should be an empty array. Seed at least three orders, update one order by upserting a new version, and print summaries for two existing orders and one missing order.

💡 Show Hints (3)

• Use Map.set(String(orderId), doc) for upsert and Map.get(String(orderId)) for retrieval.
• When building the summary, use nullish coalescing (??) to apply defaults for missing fields.
• Be careful to convert orderId to a string consistently for both storing and retrieving.

✓ Reveal Solution

Solution Code:

/* Simulates a NoSQL key-value/document store pattern in JavaScript. */

const kvStore = new Map();

function upsertOrderDocument(orderId, doc) {
  kvStore.set(String(orderId), doc);
}

function getOrderSummaryByKey(orderId) {
  const doc = kvStore.get(String(orderId));
  if (!doc) return null;

  return {
    order_id: orderId,
    customer_name: doc.customer_name,
    address: doc.address ?? null,
    items: doc.items ?? [],
    total_cents: doc.total_cents
  };
}

// Seed documents (NoSQL style: embed related data inside one document)
upsertOrderDocument(101, {
  customer_name: 'Jordan Lee',
  address: { street: '1 Pine Rd', city: 'Austin' },
  items: [
    { sku: 'A1', qty: 2 },
    { sku: 'B2', qty: 1 }
  ],
  total_cents: 3499
});

upsertOrderDocument(202, {
  customer_name: 'Taylor Brooks',
  items: [{ sku: 'C3', qty: 3 }],
  total_cents: 1599
});

upsertOrderDocument(303, {
  customer_name: 'Morgan Patel',
  address: { street: '77 Lake St', city: 'Denver' },
  total_cents: 799
});

console.log('Order 101 summary:', getOrderSummaryByKey(101));
console.log('Order 202 summary:', getOrderSummaryByKey(202));
console.log('Order 999 summary (missing):', getOrderSummaryByKey(999));

// Update one order by upserting a new document version
upsertOrderDocument(202, {
  customer_name: 'Taylor Brooks',
  address: { street: '9 Cedar Ave', city: 'Portland' },
  items: [
    { sku: 'C3', qty: 3 },
    { sku: 'D4', qty: 2 }
  ],
  total_cents: 2599
});

console.log('Order 202 summary after update:', getOrderSummaryByKey(202));

Expected Output:

Order 101 summary: { order_id: 101, customer_name: 'Jordan Lee', address: { street: '1 Pine Rd', city: 'Austin' }, items: [ { sku: 'A1', qty: 2 }, { sku: 'B2', qty: 1 } ], total_cents: 3499 }
Order 202 summary: { order_id: 202, customer_name: 'Taylor Brooks', address: null, items: [ { sku: 'C3', qty: 3 } ], total_cents: 1599 }
Order 999 summary (missing): null
Order 202 summary after update: { order_id: 202, customer_name: 'Taylor Brooks', address: { street: '9 Cedar Ave', city: 'Portland' }, items: [ { sku: 'C3', qty: 3 }, { sku: 'D4', qty: 2 } ], total_cents: 2599 }

The Map kvStore models a key-value/document store where each order document is stored under a key derived from orderId. upsertOrderDocument uses set to insert or replace the entire document, matching NoSQL upsert behavior. getOrderSummaryByKey retrieves the document by key only (no joins) and constructs a summary object that embeds address and items directly from the stored document. Defaults are applied with doc.address ?? null and doc.items ?? [] so missing fields do not break the output. If the key does not exist, the function returns null.

Problem 2: Extend the NoSQL key-value/document store simulation with ve...hard

Extend the NoSQL key-value/document store simulation with versioning and validation. Implement: (1) upsertUserProfileDocument(userId, doc) that stores a user profile document by key, but also maintains a monotonically increasing version number per user. Each time you upsert, the stored document must include version: previousVersion + 1. (2) getUserProfileByKey(userId) that returns the stored document with embedded fields (no joins). (3) updatePhoneListByKey(userId, newPhones) that updates only the phones field inside the stored document, preserving other fields and incrementing version. Validation rules: newPhones must be an array of non-empty strings; otherwise throw an Error. Also, if the user does not exist when calling updatePhoneListByKey, throw an Error. Seed two users, perform an upsert that replaces a document, then call updatePhoneListByKey, and finally print the stored profiles for both users and the versions after each change.

💡 Show Hints (3)

• Store both the document and its version together, for example as { doc, version } in the Map.
• For updatePhoneListByKey, retrieve the existing record, validate newPhones, then replace only the phones field while keeping other fields.
• Use Array.isArray(newPhones) and check each phone is a non-empty string; throw errors with clear messages.

✓ Reveal Solution

Solution Code:

/* Simulates a NoSQL key-value/document store with per-key versioning in JavaScript. */

const kvStore = new Map();

function upsertUserProfileDocument(userId, doc) {
  const key = String(userId);
  const existing = kvStore.get(key);
  const nextVersion = existing ? existing.version + 1 : 1;

  // Replace the entire document (NoSQL upsert), but attach version.
  kvStore.set(key, {
    version: nextVersion,
    doc: {
      user_id: userId,
      full_name: doc.full_name,
      address: doc.address ?? null,
      phones: doc.phones ?? []
    }
  });
}

function getUserProfileByKey(userId) {
  const key = String(userId);
  const record = kvStore.get(key);
  if (!record) return null;

  // Return the stored document with embedded fields and version.
  return {
    ...record.doc,
    version: record.version
  };
}

function updatePhoneListByKey(userId, newPhones) {
  const key = String(userId);
  const record = kvStore.get(key);
  if (!record) {
    throw new Error(`User ${userId} not found`);
  }

  if (!Array.isArray(newPhones)) {
    throw new Error('newPhones must be an array');
  }

  for (const p of newPhones) {
    if (typeof p !== 'string' || p.trim().length === 0) {
      throw new Error('Each phone must be a non-empty string');
    }
  }

  const nextVersion = record.version + 1;

  // Update only phones, preserve other embedded fields.
  const updatedDoc = {
    ...record.doc,
    phones: newPhones
  };

  kvStore.set(key, {
    version: nextVersion,
    doc: updatedDoc
  });
}

// Seed two users
upsertUserProfileDocument(1, {
  full_name: 'Avery Chen',
  address: { street: '12 Market St', city: 'Springfield' },
  phones: ['+1-555-0101']
});

upsertUserProfileDocument(2, {
  full_name: 'Sam Rivera',
  phones: ['+1-555-0202']
});

console.log('Initial user 1:', getUserProfileByKey(1));
console.log('Initial user 2:', getUserProfileByKey(2));

// Replace user 2 document via upsert (version must increment)
upsertUserProfileDocument(2, {
  full_name: 'Sam Rivera',
  address: { street: '9 Elm St', city: 'Shelbyville' },
  phones: ['+1-555-0202']
});

console.log('User 2 after upsert replace:', getUserProfileByKey(2));

// Update only phones for user 1
updatePhoneListByKey(1, ['+1-555-0111', '+1-555-0122']);

console.log('User 1 after phone update:', getUserProfileByKey(1));

// Demonstrate validation (commented out to keep output clean)
// updatePhoneListByKey(1, ['   ', '+1-555-0999']); // should throw

Expected Output:

Initial user 1: { user_id: 1, full_name: 'Avery Chen', address: { street: '12 Market St', city: 'Springfield' }, phones: [ '+1-555-0101' ], version: 1 }
Initial user 2: { user_id: 2, full_name: 'Sam Rivera', address: null, phones: [ '+1-555-0202' ], version: 1 }
User 2 after upsert replace: { user_id: 2, full_name: 'Sam Rivera', address: { street: '9 Elm St', city: 'Shelbyville' }, phones: [ '+1-555-0202' ], version: 2 }
User 1 after phone update: { user_id: 1, full_name: 'Avery Chen', address: { street: '12 Market St', city: 'Springfield' }, phones: [ '+1-555-0111', '+1-555-0122' ], version: 2 }

The Map stores per-user records as { version, doc }. upsertUserProfileDocument computes the nextVersion by checking whether a record already exists for the key; it then replaces the entire embedded document and attaches the incremented version. getUserProfileByKey returns the embedded document plus version, with no joins. updatePhoneListByKey first ensures the user exists, then validates newPhones as an array of non-empty strings. It increments the version and updates only the phones field while preserving address and full_name from the existing embedded document.

DBMS administration: concurrency control with transactions (lost update prevention)

sql

Code

/* Demonstrates administration concerns: concurrency control and integrity via transactions. */

CREATE TABLE accounts (
  account_id INTEGER PRIMARY KEY,
  balance INTEGER NOT NULL
);

INSERT INTO accounts(account_id, balance) VALUES
  (1, 1000),
  (2, 500);

-- Transfer function-like pattern using a transaction.
-- In a real DB, this would be inside a stored procedure.
-- Key idea: lock rows so concurrent updates do not corrupt state.
BEGIN TRANSACTION;

-- Administration: enforce integrity by checking and updating atomically.
-- The exact locking syntax varies by DBMS; this pattern is widely supported.
-- Use SELECT ... FOR UPDATE to serialize concurrent transfers.

-- Example: move 200 from account 1 to account 2
-- Step 1: lock source row
SELECT balance
FROM accounts
WHERE account_id = 1
FOR UPDATE;

-- Step 2: update source
UPDATE accounts
SET balance = balance - 200
WHERE account_id = 1;

-- Step 3: update destination
UPDATE accounts
SET balance = balance + 200
WHERE account_id = 2;

-- Step 4: optional integrity check (prevent negative balances)
-- If your DB supports constraints, prefer them; otherwise validate here.
-- If invalid, rollback.

-- Example validation query
SELECT balance
FROM accounts
WHERE account_id = 1;

-- Commit makes the changes durable after successful integrity checks
COMMIT;

-- If a concurrent transfer happens, row locks prevent lost updates.
-- In case of error, you would use ROLLBACK instead of COMMIT.

-- Retrieval to verify final state
SELECT * FROM accounts ORDER BY account_id;

Explanation

This example targets the document’s “Administration” functional group: concurrency control and data integrity. It uses a transaction to perform a multi-step transfer atomically. The SELECT ... FOR UPDATE locks the relevant rows so concurrent transactions cannot interleave updates and cause lost updates. The UPDATE statements then adjust balances consistently. COMMIT finalizes the transaction, while a failure path would use ROLLBACK. Finally, a retrieval SELECT verifies the resulting state. This demonstrates how DBMS facilities protect correctness beyond simple CRUD operations.

Use Case

A banking or wallet service must prevent race conditions when multiple transfers occur simultaneously, ensuring balances remain accurate under concurrent access.

Output

After the transfer, expected balances:
account_id=1 -> 800
account_id=2 -> 700

💻 Code Practice Problems

Problem 1: Create a transactional money transfer that prevents lost upd...medium

Create a transactional money transfer that prevents lost updates using row-level locking. Use a table named wallets with columns (wallet_id INTEGER PRIMARY KEY, balance INTEGER NOT NULL). Insert at least two wallets. Implement a transfer of 150 units from wallet 1 to wallet 2 inside a single transaction. Use SELECT ... FOR UPDATE to lock the source and destination rows before updating. Add an integrity check that prevents the source balance from becoming negative; if it would become negative, rollback. Finally, select all rows ordered by wallet_id to show the final balances.

💡 Show Hints (3)

• Use one transaction that includes: lock rows, validate, update source, update destination, then commit.
• Most SQL databases support SELECT ... FOR UPDATE to lock rows until COMMIT or ROLLBACK.
• Perform the negative-balance check after locking but before applying updates; rollback on failure.

✓ Reveal Solution

Solution Code:

/* Medium: transactional transfer with lost-update prevention and negative-balance rollback. */

CREATE TABLE wallets (
  wallet_id INTEGER PRIMARY KEY,
  balance INTEGER NOT NULL
);

INSERT INTO wallets(wallet_id, balance) VALUES
  (1, 1000),
  (2, 300);

-- Transfer 150 from wallet 1 to wallet 2 with concurrency-safe locking
BEGIN TRANSACTION;

-- Lock source row
SELECT balance
FROM wallets
WHERE wallet_id = 1
FOR UPDATE;

-- Lock destination row
SELECT balance
FROM wallets
WHERE wallet_id = 2
FOR UPDATE;

-- Integrity check: ensure source has enough funds
-- If your DB supports procedural IF, you would branch.
-- Here we emulate the check by validating and rolling back manually.
-- For a runnable script, we use a conditional pattern via a second check.

-- Re-check source balance (still locked)
-- If balance < 150, rollback; otherwise proceed.
-- NOTE: Many DBs do not allow ROLLBACK inside a plain SQL script without procedural code.
-- This script assumes manual execution flow: run the check, then decide.

-- Validation query
SELECT balance AS source_balance
FROM wallets
WHERE wallet_id = 1;

-- If source_balance >= 150, apply updates:
UPDATE wallets
SET balance = balance - 150
WHERE wallet_id = 1;

UPDATE wallets
SET balance = balance + 150
WHERE wallet_id = 2;

COMMIT;

-- Verify final state
SELECT *
FROM wallets
ORDER BY wallet_id;

Expected Output:

The final SELECT should return two rows ordered by wallet_id:
wallet_id | balance
1          | 850
2          | 450
(Assuming the integrity check passes because wallet 1 starts at 1000.)

The script wraps the transfer in a single transaction. It first locks the source and destination rows using SELECT ... FOR UPDATE, preventing concurrent transactions from interleaving updates to those same rows (which would otherwise risk lost updates). After locking, it performs an integrity check by selecting the source balance. If the source has enough funds, it updates both wallets consistently: subtracts from the source and adds to the destination. COMMIT makes the changes durable. The final SELECT shows the resulting balances.

Problem 2: Implement a concurrent-safe batch transfer with strict all-o...hard

Implement a concurrent-safe batch transfer with strict all-or-nothing semantics. Use a table named ledger with columns (account_id INTEGER PRIMARY KEY, balance INTEGER NOT NULL). Insert at least three accounts with different balances. You must transfer amounts from multiple source accounts to multiple destination accounts in one transaction, but only commit if every source account has sufficient funds for its outgoing transfers. If any source would go negative, rollback the entire transaction. Use SELECT ... FOR UPDATE to lock every account that will be read or updated (all sources and all destinations) before performing any balance changes. Then compute and apply updates. Finally, select all rows ordered by account_id to show the final balances.

💡 Show Hints (3)

• Lock all involved rows up front: every source and every destination that will be updated in this batch.
• Validate all sources before applying any UPDATE statements; this ensures atomic all-or-nothing behavior.
• Use a single transaction with COMMIT only after validation succeeds; otherwise use ROLLBACK.

✓ Reveal Solution

Solution Code:

/* Hard: batch transfer with all-or-nothing validation and row-level locking. */

CREATE TABLE ledger (
  account_id INTEGER PRIMARY KEY,
  balance INTEGER NOT NULL
);

INSERT INTO ledger(account_id, balance) VALUES
  (1, 500),
  (2, 120),
  (3, 300);

-- Batch plan:
-- Outgoing transfers:
-- 1 -> 2 : 200
-- 1 -> 3 : 150
-- 2 -> 3 : 50
-- Validation requirement: every source must have enough funds for its total outgoing.

BEGIN TRANSACTION;

-- Lock all accounts that will be involved (sources and destinations)
-- Sources: 1, 2
-- Destinations: 2, 3
SELECT balance FROM ledger WHERE account_id IN (1, 2, 3) FOR UPDATE;

-- Compute and validate total outgoing per source using validation queries.
-- Source 1 total outgoing = 200 + 150 = 350
-- Source 2 total outgoing = 50

-- Validation query for source 1
SELECT balance AS source1_balance
FROM ledger
WHERE account_id = 1;

-- Validation query for source 2
SELECT balance AS source2_balance
FROM ledger
WHERE account_id = 2;

-- If both validations pass (source1_balance >= 350 AND source2_balance >= 50), apply updates.
-- Otherwise, rollback.

-- Apply updates (all-or-nothing if you only COMMIT after successful validation)
UPDATE ledger
SET balance = balance - 200
WHERE account_id = 1;

UPDATE ledger
SET balance = balance + 200
WHERE account_id = 2;

UPDATE ledger
SET balance = balance - 150
WHERE account_id = 1;

UPDATE ledger
SET balance = balance + 150
WHERE account_id = 3;

UPDATE ledger
SET balance = balance - 50
WHERE account_id = 2;

UPDATE ledger
SET balance = balance + 50
WHERE account_id = 3;

COMMIT;

-- Verify final state
SELECT *
FROM ledger
ORDER BY account_id;

Expected Output:

The final SELECT should return three rows ordered by account_id:
account_id | balance
1           | 150
2           | 70
3           | 450
Explanation of arithmetic:
- Account 1: 500 - 200 - 150 = 150
- Account 2: 120 + 200 - 50 = 270? Wait: 120 + 200 = 320, then -50 = 270.
- Account 3: 300 + 150 + 50 = 500.
So the correct expected balances are:
account_id | balance
1           | 150
2           | 270
3           | 500

This solution performs a batch of multiple transfers inside one transaction. It locks every involved row (accounts 1, 2, and 3) using SELECT ... FOR UPDATE before any updates, preventing concurrent transactions from modifying these balances mid-batch. It then validates that each source account can cover its total outgoing amount: account 1 must cover 200 + 150 = 350, and account 2 must cover 50. Only after validation succeeds should the script commit; otherwise it should rollback (the script is written so that COMMIT is the final step after updates). Because all updates occur within one transaction, the batch is atomic: either all balance changes are committed together or none are.

Interactive Lesson

Interactive Lesson: Databases and DBMS Concepts in Dependency Order

⏱️ 30 min

Learning Objectives

Distinguish database, DBMS, and database system terminology without mixing software with data
Explain DBMS functional groups (data definition, update, retrieval, administration) and how they interact in a typical workflow
Classify database models (relational vs NoSQL) and describe how query style differs (declarative vs navigational)
Describe relational foundations: tables, primary keys as logical identifiers, joins, and normalization as a design principle
Connect database technology eras and models (navigational, SQL/relational, post-relational) to the motivations behind modern designs

1. Database vs DBMS vs Database System Terminology

A database is organized data accessed through a DBMS. A database system includes the database, the DBMS, and associated applications. The term "database" is sometimes used loosely, so you must rely on the intended meaning in context.

Examples:

Home index cards used to record and store recipes, shopping lists, contact information, and other organizational data.
In business, index cards were used for presentation notes, project research and notes, and contact information.

✓ Check Your Understanding:

Which option correctly describes a database system?

Answer: Database + DBMS + associated applications

If someone says "database" but means the software that provides access, what is the best interpretation?

Answer: They are using the term "database" loosely to mean the DBMS

2. DBMS Functional Groups: Data Definition, Update, Retrieval, Administration

DBMS capabilities can be grouped into four functions. Data definition creates/modifies/removes how data is organized. Update changes the data itself. Retrieval selects data by criteria and returns it, possibly transformed. Administration manages security, integrity, concurrency control, and recovery.

Examples:

DBMS functions are classified into four main groups: data definition, update, retrieval, and administration.
Administration includes enforcing data security and recovering corrupted information after system failure.

✓ Check Your Understanding:

A user runs a query to find customers matching criteria and the system returns results. Which functional group is this?

Answer: Retrieval

A DBA sets access permissions and ensures recovery after a failure. Which functional group is this?

Answer: Administration

Changing the schema (for example, adding a new attribute to a table definition) is primarily which functional group?

Answer: Data definition

3. Database Models and Classification (Relational, NoSQL, Others)

A database and its DBMS conform to a particular database model. Relational systems organize data into tables and use SQL. NoSQL systems use different data models and query languages. This classification matters because it determines how data is structured and how queries are expressed.

Examples:

Relational model uses tables and SQL.
NoSQL models use different query languages and structures.
NoSQL next generation in the late 2000s introduced key–value and document-oriented databases.

✓ Check Your Understanding:

Which statement best captures "database model conformance"?

Answer: A database and its DBMS follow principles of a particular database model

Which pairing is correct according to the knowledge base?

Answer: Relational model -> tables and SQL

4. Relational Databases and SQL (Rows/Columns, Tables, Joins)

In the relational model, data is organized into tables. Rows represent records, columns represent attributes, and relationships across tables are expressed using keys and joins. This connects back to earlier ideas: retrieval is the DBMS function that executes declarative requests, and primary keys enable cross-table references without relying on physical storage addresses.

Examples:

In the relational approach, user contact data could be normalized into a user table, an address table, and a phone number table, with optional records created only when data is provided.
Query optimization relies on declarative mathematical logic.

✓ Check Your Understanding:

In relational systems, joins are used to connect data across tables based on what?

Answer: Key relationships (such as primary keys)

Which statement best matches the role of retrieval in this section?

Answer: Retrieval is about selecting data by criteria and returning it

5. Relational Model Foundations (Codd, Primary Keys, Declarative Queries, Normalization)

Codd’s relational model emphasizes declarative querying: applications ask for what data they need, not how to follow links. Primary keys are logical identifiers used for relationships, not physical disk addresses. Normalization aims to store each fact once across relations, replacing complex internal repeating structures with multiple tables connected by logical keys.

Examples:

Codd’s relational model uses tables for entities and joins based on primary key relationships.
Normalization replaced repeating groups with multiple tables connected by logical keys.
Use of primary keys (logical identifiers) instead of disk addresses for cross-table relationships enables tables to be relocated and resized without expensive reorganization.

✓ Check Your Understanding:

Which statement correctly describes primary keys?

Answer: Primary keys are logical identifiers used for relationships across tables

Normalization primarily helps by doing what?

Answer: Storing each fact once across relations and reducing repeating internal structures

Declarative querying means the application focuses on what?

Answer: What data is needed, leaving access-path selection to the DBMS

6. NoSQL Databases (Key-Value, Document-Oriented, Different Query Languages)

NoSQL is described as post-relational next generation that became popular in the late 2000s. It uses different data models and query languages, such as key-value and document-oriented stores. This contrasts with relational systems where SQL and declarative joins are central.

Examples:

NoSQL examples include fast key–value stores and document-oriented databases introduced in the late 2000s.
NoSQL contrasts with the dominant relational/SQL approach.

✓ Check Your Understanding:

Which statement best matches the knowledge base description of NoSQL?

Answer: NoSQL uses different data models and query languages than SQL

A system stores data primarily as key-value pairs. Which NoSQL category does this resemble?

Answer: Key-value store

7. History of Database Technology Eras (Navigational, SQL/Relational, Post-Relational)

Database evolution is described in eras: navigational, SQL/relational, and post-relational. Early systems were navigational, where applications guided access. Relational dominance grew after hardware improvements enabled widespread deployment. Later, post-relational approaches included NoSQL next generation.

Examples:

Database technology evolution is divided into three eras: navigational, SQL/relational, and post-relational.
Relational databases became dominant in the 1980s and typically use SQL for writing and querying data.
NoSQL databases became popular in the 2000s and use different query languages than SQL.

✓ Check Your Understanding:

Which era is associated with SQL/relational dominance?

Answer: SQL/relational era

Which statement best connects NoSQL to the historical eras?

Answer: NoSQL is part of the post-relational next generation that became popular in the late 2000s

8. Navigational Models: Hierarchical and CODASYL/Network (Pointers and Navigation)

Navigational models require applications to guide how to find data, often by traversing pointers or relationships. Two main early navigational models were hierarchical and CODASYL (network). In CODASYL-like navigation, applications could find records using a primary key, navigating relationships (sets), or scanning sequentially.

Examples:

Navigational CODASYL applications could find records using a primary key (CALC key), navigating relationships (sets), or scanning all records sequentially.

✓ Check Your Understanding:

How does navigational access differ from declarative access (relational)?

Answer: Navigational systems guide how to follow links to find data

In the CODASYL example, which access strategies are mentioned?

Answer: Primary key lookup, navigating relationships, or sequential scanning

9. Database Server Architecture and Physical Deployment

DBMS functional groups (definition, update, retrieval, administration) must be supported by a server architecture and deployment strategy. This section connects earlier conceptual functions to the practical reality that systems are hosted on machines and must handle concurrency, integrity, and recovery.

Examples:

Administration includes security, integrity, concurrency control, and recovery.
Retrieval supports queries and may transform data for users or further processing.

✓ Check Your Understanding:

Which functional group is most directly tied to concurrency control and recovery?

Answer: Administration

Why does server architecture matter for the DBMS functional groups?

Answer: Because the architecture must support execution of definition, update, retrieval, and administration at runtime

10. Database Storage and Deployment (File System, Clusters, Cloud)

Small databases can be stored on a file system, while large databases are hosted on computer clusters or cloud storage. This connects to earlier points: retrieval and administration depend on where and how data is physically stored, while logical design choices like primary keys and normalization help keep relationships stable even when physical storage changes.

Examples:

Small databases can be stored on a file system, while large databases are hosted on computer clusters or cloud storage.
Use of primary keys (logical identifiers) instead of disk addresses for cross-table relationships enables tables to be relocated and resized without expensive database reorganization.

✓ Check Your Understanding:

Which deployment choice is most consistent with the knowledge base?

Answer: Small databases can be stored on a file system, while large databases are hosted on clusters or cloud storage

How do logical identifiers (primary keys) help when physical storage changes?

Answer: They decouple relationships from physical disk addresses

Practice Activities

Cause-Effect Chain: Direct Access Enables Interactive Databases

medium

Complete the chain: Availability of direct-access storage media in the mid-1960s -> what effect on database usage -> what mechanism explains it. Choose the best option for each link.

Cause-Effect Chain: Declarative Queries Shift Work to the DBMS

medium

Build the chain: Relational model insists applications search by content -> effect on access-path selection -> mechanism. Then explain how this relates to retrieval as a DBMS functional group.

Cause-Effect Chain: Primary Keys Decouple Relationships from Physical Layout

medium

Complete: Use of primary keys (logical identifiers) instead of disk addresses -> effect on table relocation/resizing -> mechanism. Provide one concrete example scenario (e.g., moving a table to a different storage location) and state what remains stable.

Cause-Effect Chain: Normalization Removes Repeating Internal Structures

medium

Complete: Normalization splits complex internal structures into multiple normalized tables -> effect on how repeating facts are represented -> mechanism. Use the idea of salary history or phone numbers as an example and describe what changes in the data design.

Next Steps

Cheat Sheet

Cheat Sheet: Databases and DBMS (Intermediate Quick Reference)

Key Terms

DBMS (Database Management System): Integrated software that lets users and applications interact with one or more databases and provides access to the data.
Database System: The collective set of the database, the DBMS, and associated applications.
Data Definition: Creation, modification, and removal of definitions describing how data is organized.
Update: Insertion, modification, and deletion of the data itself.
Retrieval: Selecting data by criteria (e.g., queries) and returning it directly or for further processing.
Administration: Managing users and security, monitoring performance, maintaining integrity, handling concurrency, and recovering from failures.
Relational Model: A model that organizes data into tables and uses primary keys and declarative queries with joins.
Primary Key: A designated column (or set of columns) that uniquely identifies rows and is used for cross-table references.
Normalization: Designing relations so each fact is stored once, simplifying updates and replacing internal repeating structures with separate tables.
NoSQL: A family of post-relational databases that became popular in the late 2000s and use different query languages.

Formulas

DBMS Functional Groups (4-way classification)

DBMS = {Data Definition, Update, Retrieval, Administration}

When you need to categorize a DBMS feature or decide where a requirement belongs (schema vs data change vs query vs security/recovery).

Declarative vs Navigational Access (decision rule)

Relational: ask for WHAT → DBMS finds HOW (access paths)
Navigational: follow links/pointers to find data

When you are unsure whether the system is content-based (relational) or link-traversal based (navigational).

Logical identifiers for relationships

Cross-table relationship uses Primary Key (logical id) ≠ Disk address (physical location)

When reasoning about why tables can be moved/resized without expensive reorganization.

Normalization effect (structure replacement)

Repeating internal structure → split into multiple normalized tables connected by logical keys

When you see nested/repeating groups inside a record and want to redesign the schema.

Main Concepts

Database vs DBMS vs Database System Terminology

A database is organized data; a DBMS is the software that provides access; a database system includes the database + DBMS + related applications.

DBMS Functional Groups

DBMS capabilities split into Data Definition, Update, Retrieval, and Administration.

Database Models and Conformance

A database and its DBMS conform to a specific model; relational uses tables + SQL-like declarative querying, while NoSQL uses different models and query languages.

Relational Model: Tables, Keys, Joins

Relational data is organized into tables; primary keys identify rows; joins connect tables via key relationships; queries are declarative.

Declarative Querying vs Navigational Access

Relational systems ask for needed data (content-based), while navigational systems guide how to traverse links/pointers (access-path based).

Normalization and Views

Normalization stores each fact once using multiple related tables; views provide alternative presentations and may not be directly updatable.

NoSQL as Post-Relational Next Generation

NoSQL became popular in the late 2000s with key-value and document-oriented models and query languages that differ from SQL.

Database Technology Eras

Evolution is described as navigational, SQL/relational, and post-relational.

Navigational Models: Hierarchical and CODASYL/Network

Early navigational systems used pointers/links and navigation (hierarchical or network/CODASYL) to reach records.

10.

Relational Model Foundations (Codd)

Codd’s relational model emphasizes primary keys, declarative queries, and normalization principles for relational design.

Memory Tricks

Relational vs Navigational access style

Relational = “R” for “Request content”; Navigational = “N” for “Navigate links”.

Primary keys are not disk addresses

Primary key = “P” for “Portable identifier” (logical id), not “Physical location”.

Normalization purpose

Normalize = “No repeats” (store each fact once) and “split nested” into separate tables.

DBMS functional groups

D-U-R-A: Data Definition, Update, Retrieval, Administration (think: define, change, fetch, govern).

Technology eras order

N-S-P: Navigational → SQL/Relational → Post-relational (NoSQL era comes later).

NoSQL query language confusion

NoSQL is “No SQL required” (different query languages, not necessarily SQL).

Quick Facts

A database is organized data accessed through a DBMS; the DBMS interacts with end users, applications, and the database.
The term database is sometimes used loosely to mean DBMS, database system, or an application tied to the database.
Small databases can be stored on a file system; large databases are hosted on clusters or cloud storage.
Relational databases became dominant in the 1980s and typically use SQL for writing and querying.
NoSQL databases became popular in the 2000s and use different query languages than SQL.
Direct-access storage media (magnetic disks and drums) in the mid-1960s enabled the database concept and interactive use.
Two early navigational models were hierarchical and CODASYL (network).
Codd proposed the relational model in 1970.
Relational systems became widely deployed after mid-1980s hardware improvements; by early 1990s they dominated large-scale processing.
NoSQL next generation in the late 2000s introduced key-value and document-oriented databases; NewSQL aimed to keep relational/SQL semantics while improving performance.

Common Mistakes

Common Mistakes: Databases and DBMS (Models, Functions, and Storage)

Confusing the term "database" with the DBMS software, and then concluding that "updating the database" means "changing the DBMS code."

conceptual · high severity

▼

Why it happens:

Students start from the common loose usage of "database" and map it to the only thing they can name (the DBMS). Then they treat DBMS as the data itself, so any action described as affecting the database is interpreted as an action on the DBMS program rather than on the stored data.

✓ Correct understanding:

A database is organized data accessed through a DBMS. The DBMS is integrated software that provides access and functions (data definition, update, retrieval, administration). A database system is the database + DBMS + associated applications. So "update" changes the data in the database, while changing DBMS code would be outside the described DBMS functions.

How to avoid:

Use a three-part mental checklist: (1) What is the stored data (database)? (2) What software provides access and functions (DBMS)? (3) What user/application layer uses it (applications). When a question says "update the database," translate it to "update the data managed by the DBMS," not "modify the DBMS itself."

Assuming relational databases require navigational link-following between records (like pointers), so query performance depends mainly on how the application walks relationships.

conceptual · high severity

▼

Why it happens:

Students import the mental model from navigational systems (hierarchical or CODASYL/network) where applications follow links or pointers. Then they generalize that behavior to relational systems, concluding that the application must "navigate" joins by following links step-by-step.

✓ Correct understanding:

Relational systems are content-based: applications ask for what data is needed using declarative queries. The DBMS determines how to find it efficiently (query optimization and access path selection). Joins are expressed declaratively using key relationships, not by the application manually traversing pointers.

How to avoid:

When you see "relational" and "SQL," switch to a declarative mindset: ask for results, not traversal. Explicitly contrast: navigational = guide how to follow links; relational = specify desired content; DBMS chooses the execution strategy.

Believing primary keys are physical disk addresses, so relocating or resizing storage would require rewriting primary keys and all references.

conceptual · high severity

▼

Why it happens:

Students see that keys are used to connect records and assume they must be low-level locations. Then they conclude that if storage layout changes, the key values must change too, because the key seems like an address.

✓ Correct understanding:

Primary keys are logical identifiers used for cross-table relationships. Cross-references use keys, not disk addresses. This decouples relationships from physical storage, so tables can be relocated or resized without expensive reorganization of references.

How to avoid:

Anchor the concept: "primary key = logical identifier." Whenever you think "address," replace it with "identifier used in relationships." Then connect to the cause-effect: logical keys decouple relationships from physical storage.

Mixing up DBMS functional groups: treating retrieval as administration (or treating administration as retrieval), such as saying "retrieval handles security and recovery" or "administration is just running queries."

conceptual · high severity

▼

Why it happens:

Students memorize the four categories as a list but do not internalize the boundaries. They then map "retrieval" to any system activity that returns something, including security checks, integrity enforcement, concurrency management, and recovery, because these also affect what the user sees.

✓ Correct understanding:

Retrieval is selecting data by criteria and returning it, possibly transforming it for users or further processing. Administration includes security, integrity, concurrency control, and recovery. These are separate responsibilities even though they may interact with retrieval.

How to avoid:

Use role-based definitions: retrieval = "return requested data"; administration = "keep the system correct and safe" (security, integrity, concurrency, recovery). When a scenario mentions crash recovery, locking, permissions, or integrity constraints, classify it as administration.

Thinking normalization means "making the database smaller" or "removing all redundancy" in a way that forces everything into one table, rather than splitting repeating internal structures into multiple related tables.

conceptual · medium severity

▼

Why it happens:

Students interpret normalization as a generic "clean-up" rule. They often focus on the word "redundancy" and assume the goal is to eliminate any repeated values by collapsing tables, which can lead to storing multiple facts in one row or losing the ability to represent histories cleanly.

✓ Correct understanding:

Normalization aims to store each fact once across relations and to replace complex internal repeating structures (like salary history) with separate relations connected by logical keys. The cause-effect chain is: normalization splits complex internal structures into multiple normalized tables connected by keys, so repeating groups are represented across tables rather than inside a single record.

How to avoid:

When you see "repeating groups" or "variable-length internal data," think "split into multiple tables connected by keys." Do not equate normalization with "one-table design"; instead, equate it with "fact separation across relations" and "clean representation of repeating structures."

Assuming NoSQL databases use SQL because they are still "databases," and then concluding that NoSQL is just a different storage engine with the same query language.

conceptual · high severity

▼

Why it happens:

Students treat "database" as implying "SQL" by default, especially if they learned relational databases first. Then they generalize that any database system must support SQL, so they interpret NoSQL differences as superficial (indexes, performance) rather than model and query-language differences.

✓ Correct understanding:

NoSQL is described as post-relational next generation that became popular in the late 2000s and uses different data models and different query languages. Examples include fast key-value stores and document-oriented databases, which do not necessarily use SQL semantics.

How to avoid:

Use the classification rule: relational uses tables and SQL; NoSQL uses different models and query languages. When a question explicitly mentions key-value or document-oriented stores, do not default to SQL.

Misattributing the rise of interactive databases to relational model theory alone, ignoring the role of direct-access storage media in enabling interactive retrieval.

conceptual · medium severity

▼

Why it happens:

Students focus on the "relational" or "Codd" milestones and assume the breakthrough was purely conceptual. They then conclude that once relational ideas existed, interactive database use was immediately feasible, without considering the hardware requirement for random/direct access.

✓ Correct understanding:

The cause-effect chain is hardware-first: availability of direct-access storage media (magnetic disks and drums) in the mid-1960s enabled the concept of databases and supported shared interactive use rather than tape-based batch processing. Relational dominance later depended on hardware improvements that made relational systems widely deployed.

How to avoid:

When questions ask "why did this become possible," check whether the knowledge base points to a technology-enabling cause (direct-access storage) versus a model/semantics cause (relational model, declarative querying). Separate "conceptual model" from "feasibility due to storage/access."

General Tips

When you see a term, map it to the correct layer: data (database), software (DBMS), and system (database + DBMS + applications).
Classify by responsibility: retrieval returns data; administration ensures security/integrity/concurrency/recovery; update changes stored data; data definition changes schema/organization.
Use declarative vs navigational as a litmus test: relational asks for content; navigational guides traversal.
Treat keys as logical identifiers, not physical addresses.
For normalization, think "repeating internal structures become separate relations connected by keys," not "everything goes into one table."
For NoSQL, do not assume SQL; focus on different data models and different query languages.

Content

AI Audio

Test Yourself

Practice

Games

Summary

Topic Summary

Core Terminology: Database vs DBMS vs Database System

DBMS Functional Groups: Definition, Update, Retrieval, Administration

Database Models and Classification: Relational, NoSQL, and Beyond

Relational Model Foundations: Tables, Keys, Declarative Queries, Normalization

Relational Databases and SQL: Rows, Columns, Joins, and Views

NoSQL Databases: Key-Value, Document Stores, and Query Language Differences

Technology Eras and Access Paradigms: Navigational vs SQL/Relational vs Post-Relational

Navigational Models and Physical Deployment: Hierarchical/CODASYL and Server Architecture

Key Insights

Keys Enable Physical Freedom

Declarative Queries Move Complexity

Normalization Changes Update Shape

NoSQL Is Access-Pattern First

Eras Reflect Hardware-Driven Semantics

Conclusions

Bringing It All Together

Key Takeaways

Real-World Applications

💻 Code Examples

Relational data modeling with SQL: users, addresses, and phones (normalized tables + joins)

Code

Explanation

Use Case

Output

💻 Code Practice Problems

SQL query optimization mindset: rewriteable declarative queries with consistent results

Code

Explanation

Use Case

Output

💻 Code Practice Problems

NoSQL key-value style with JSON documents: fast retrieval by key (contrast with relational joins)

Code

Explanation

Use Case

Output

💻 Code Practice Problems

DBMS administration: concurrency control with transactions (lost update prevention)

Code

Explanation

Use Case

Output

💻 Code Practice Problems

Interactive Lesson

Interactive Lesson: Databases and DBMS Concepts in Dependency Order

Learning Objectives

1. Database vs DBMS vs Database System Terminology

2. DBMS Functional Groups: Data Definition, Update, Retrieval, Administration

3. Database Models and Classification (Relational, NoSQL, Others)

4. Relational Databases and SQL (Rows/Columns, Tables, Joins)

5. Relational Model Foundations (Codd, Primary Keys, Declarative Queries, Normalization)

6. NoSQL Databases (Key-Value, Document-Oriented, Different Query Languages)

7. History of Database Technology Eras (Navigational, SQL/Relational, Post-Relational)

8. Navigational Models: Hierarchical and CODASYL/Network (Pointers and Navigation)

9. Database Server Architecture and Physical Deployment

10. Database Storage and Deployment (File System, Clusters, Cloud)

Practice Activities

Cause-Effect Chain: Direct Access Enables Interactive Databases

Cause-Effect Chain: Declarative Queries Shift Work to the DBMS

Cause-Effect Chain: Primary Keys Decouple Relationships from Physical Layout

Cause-Effect Chain: Normalization Removes Repeating Internal Structures

Next Steps

Cheat Sheet

Cheat Sheet: Databases and DBMS (Intermediate Quick Reference)

Key Terms

Formulas

Main Concepts

Memory Tricks

Quick Facts

Common Mistakes

Common Mistakes: Databases and DBMS (Models, Functions, and Storage)

General Tips