Loading
Turn business requirements into well-structured tables using entity relationships, normalization, and practical naming conventions.
A bad schema haunts you for years. Every query becomes a workaround, every feature requires a migration, and every performance problem traces back to a table that should have been two tables. Getting the schema right upfront saves more engineering time than almost any other decision.
Start with the business domain, not the database. Read the requirements and circle every noun — those are your candidate entities.
Take a course platform as an example. The requirements say: "Students enroll in courses. Each course has multiple lessons. Students track their progress through each lesson. Instructors create and manage courses."
The nouns: Student, Course, Lesson, Progress, Instructor.
Now ask three questions about each:
If the answer is yes to all three, it is a table. If it is just a property of something else — like a student's name — it is a column.
Every relationship between entities is one of three types:
One-to-many (1:N) — One course has many lessons. Put a foreign key on the "many" side.
Many-to-many (M:N) — Students enroll in many courses; courses have many students. Create a junction table.
One-to-one (1:1) — Rare. Usually means the two things are the same entity. Only split when one side is optional or has access control concerns (e.g., users and user_settings).
Normalization eliminates redundancy and prevents update anomalies. You need to understand three levels.
First Normal Form (1NF): Every column holds a single value. No arrays, no comma-separated lists, no JSON blobs for structured data.
Second Normal Form (2NF): Every non-key column depends on the entire primary key, not just part of it. This matters when you have composite keys.
Third Normal Form (3NF): No transitive dependencies. Every non-key column depends on the key, the whole key, and nothing but the key.
Normalization is the default. Denormalization is a deliberate tradeoff — you accept redundancy to avoid expensive joins.
Denormalize when:
Do not denormalize when:
The rule: normalize first, measure second, denormalize surgically where proven necessary.
Inconsistent naming creates confusion that compounds over time. Pick conventions and enforce them everywhere.
Tables: Plural, snake_case. students, course_tags, lesson_progress.
Columns: Singular, snake_case. first_name, created_at, is_published.
Primary keys: Always id. Not student_id in the students table — that is redundant.
Foreign keys: {referenced_table_singular}_id. So course_id in the lessons table references courses.id.
Booleans: Prefix with is_, has_, or can_. is_published, has_certificate, can_enroll.
Timestamps: Suffix with _at. created_at, updated_at, completed_at.
Indexes: idx_{table}_{columns}. idx_lessons_course_id, idx_enrollments_student_id_course_id.
A schema without constraints is a schema that will contain garbage data. Be aggressive with constraints.
Key practices:
NOT NULL is the default mindset. Only allow NULL when absence of a value has meaning (e.g., completed_at is NULL until the student finishes).CHECK constraints catch bad data at the database level, regardless of which application writes to it.WHERE clause) keep the index small and fast for common queries.ON DELETE CASCADE vs ON DELETE RESTRICT — decide what happens to children when a parent is deleted. Get this wrong and you either orphan data or block legitimate deletions.Design your schema as if the application layer does not exist. The database should enforce its own integrity. Applications come and go; the data outlives all of them.
-- These are tables (own identity, attributes, relationships)
CREATE TABLE students (...);
CREATE TABLE courses (...);
CREATE TABLE lessons (...);
-- This is NOT a table, it is a column on students
-- student_name → students.nameCREATE TABLE courses (
id BIGINT PRIMARY KEY GENERATED ALWAYS AS IDENTITY,
title TEXT NOT NULL,
instructor_id BIGINT NOT NULL REFERENCES instructors(id)
);
CREATE TABLE lessons (
id BIGINT PRIMARY KEY GENERATED ALWAYS AS IDENTITY,
course_id BIGINT NOT NULL REFERENCES courses(id),
title TEXT NOT NULL,
sort_order INT NOT NULL
);CREATE TABLE enrollments (
student_id BIGINT NOT NULL REFERENCES students(id),
course_id BIGINT NOT NULL REFERENCES courses(id),
enrolled_at TIMESTAMPTZ NOT NULL DEFAULT now(),
PRIMARY KEY (student_id, course_id)
);-- BAD: Violates 1NF
CREATE TABLE courses (
id BIGINT PRIMARY KEY,
tags TEXT -- "javascript,react,frontend"
);
-- GOOD: Separate table for the multi-valued attribute
CREATE TABLE course_tags (
course_id BIGINT REFERENCES courses(id),
tag TEXT NOT NULL,
PRIMARY KEY (course_id, tag)
);-- BAD: student_name depends only on student_id, not on course_id
CREATE TABLE enrollments (
student_id BIGINT,
course_id BIGINT,
student_name TEXT, -- Partial dependency!
enrolled_at TIMESTAMPTZ,
PRIMARY KEY (student_id, course_id)
);
-- GOOD: student_name lives on the students table-- BAD: instructor_email depends on instructor_id, not on course_id
CREATE TABLE courses (
id BIGINT PRIMARY KEY,
title TEXT,
instructor_id BIGINT,
instructor_email TEXT -- Transitive dependency!
);
-- GOOD: instructor_email lives on the instructors table-- Denormalized: precomputed count to avoid COUNT(*) on every page load
ALTER TABLE courses ADD COLUMN enrollment_count INT NOT NULL DEFAULT 0;
-- Keep it in sync with a trigger
CREATE FUNCTION update_enrollment_count() RETURNS TRIGGER AS $$
BEGIN
IF TG_OP = 'INSERT' THEN
UPDATE courses SET enrollment_count = enrollment_count + 1
WHERE id = NEW.course_id;
ELSIF TG_OP = 'DELETE' THEN
UPDATE courses SET enrollment_count = enrollment_count - 1
WHERE id = OLD.course_id;
END IF;
RETURN NULL;
END;
$$ LANGUAGE plpgsql;CREATE TABLE lessons (
id BIGINT PRIMARY KEY GENERATED ALWAYS AS IDENTITY,
course_id BIGINT NOT NULL REFERENCES courses(id) ON DELETE CASCADE,
title TEXT NOT NULL CHECK (char_length(title) BETWEEN 1 AND 200),
slug TEXT NOT NULL,
sort_order INT NOT NULL CHECK (sort_order >= 0),
is_published BOOLEAN NOT NULL DEFAULT false,
created_at TIMESTAMPTZ NOT NULL DEFAULT now(),
updated_at TIMESTAMPTZ NOT NULL DEFAULT now(),
UNIQUE (course_id, slug),
UNIQUE (course_id, sort_order)
);
CREATE INDEX idx_lessons_course_id ON lessons(course_id);
CREATE INDEX idx_lessons_is_published ON lessons(is_published)
WHERE is_published = true;