Entity-Relationship Modeling

Every system is just entities and relationships. Before you think about scale, think about what you're storing and why.

🌱 NoviceData Modeling & Schema DesignConcept

Step 1: Data Modeling & Schema Design

Mental Model

Before you write a single table, you need to understand what your system is actually tracking and how those things connect to each other. Skip this step and your schema will tell you what it can store -- but not what your product actually needs.

Take an e-commerce system. The requirements mention users, orders, products, and categories. Each of these is something the system tracks independently. A user exists whether or not they have placed an order. A product exists whether or not it appears in any order. These are your entities.

But entities alone are not enough. A user places orders. A product belongs to categories. An order contains products. These connections -- relationships -- are just as important as the entities themselves, because most of the interesting questions your application asks will span more than one entity.

The process of naming your entities, reading the relationships between them, and deciding how each relationship maps to a table structure is entity-relationship modeling. Done before you write schema, it prevents the kind of structural mistakes that require dropping columns and rebuilding tables later.

Name the things your system tracks. Then name how they connect.

Entities

An entity is any distinct thing your system needs to track independently. Given a requirement like:

"Users can browse products, add them to a cart, and place orders. Each product belongs to one or more categories."

Start by pulling out the nouns: users, products, cart, orders, categories. These are your candidates. But not every noun becomes an entity -- some are just attributes of another thing. Three questions help you decide:

Can it exist on its own? A user exists whether or not they have placed an order. A product exists whether or not it is in anyone's cart. These pass. "Email" does not exist independently -- it is a fact about a user. It becomes a column, not a table.

Would you need a separate row for each instance? If you would need to store 500 users, you need 500 rows -- that is a table. If you store one email per user, that is a column on the users table.

Does the system need to ask questions about it directly? "Give me all orders placed this week." "Which products are in this category." If the system needs to query it by itself, it is an entity. If it only ever appears as part of something else, it is probably an attribute.

Applying those tests to the e-commerce requirements:

CandidateEntity or attribute?Reason
UserEntityExists independently, has its own attributes, queried directly
ProductEntityExists independently, appears in many orders and categories
OrderEntityExists independently, has status and history, queried directly
CategoryEntityExists independently, groups many products
CartDependsIf carts persist and are queried, entity. If cart is just a session, it may not need a table at all
EmailAttributeA fact about a user, not independently tracked
PriceAttributeA fact about a product, stored as a column

The cart row is worth noticing: context changes the answer. An entity is not a fixed property of a noun -- it depends on what the system needs to do with it.

Once you have your entities confirmed, each becomes a table. Each attribute becomes a column. And each entity needs a primary key -- a unique identifier for every row -- so that other tables can reference a specific instance later.

EntityKey attributes
Userid, email, created_at
Profileid, bio, avatar_url
Orderid, status, created_at
Productid, name, price
Categoryid, name

With entities established, the next question is how they connect.

Check your understanding

You are designing a database for a hospital system. The requirements say: "Doctors treat patients. Each doctor has a specialty and a license number. Patients have a date of birth and a phone number. The hospital needs to record which doctor treated which patient on which date." Pull out the candidate nouns and apply the three identification tests to each. Which are entities? Which are attributes? Is there anything that does not fit cleanly into either -- and if so, what does that tell you?

Relationships

Entities do not exist in isolation. A relationship describes how one entity connects to another. Every relationship has a cardinality -- how many instances of each entity can exist on each side of the connection.

There are three patterns:

PatternWhat it meansExample
One-to-oneEach A has one B, each B has one AA user has one profile
One-to-manyEach A has many Bs, each B belongs to one AA user places many orders
Many-to-manyEach A has many Bs, each B has many AsA product belongs to many categories

To identify the pattern, read the relationship in both directions. One direction alone can mislead you. "Products have categories" sounds like one-to-many until you read it back: a category contains many products, and a product belongs to many categories. Both sides are "many" -- it is many-to-many.

The test is always the same two questions:

Can one [A] connect to many [B]? Can one [B] connect to many [A]?

First questionSecond questionPattern
NoNoOne-to-one
YesNoOne-to-many
YesYesMany-to-many
Loading chart...

Read every relationship in both directions before settling on a pattern. The cardinality you land on determines exactly how you structure the schema.

Check your understanding

Read each of these requirements in both directions and identify the cardinality. Explain how you applied the two-question test to reach your answer for each. 1. "A blog post can have many comments, and each comment belongs to one post." 2. "Authors write books. A book can have multiple authors, and an author can write multiple books." 3. "Each employee is assigned to exactly one parking spot, and no two employees share a spot."

Schema

Once you have your entities and their cardinalities, you can build the schema. Each entity becomes a table. The cardinality of each relationship then tells you how to connect them -- which table gets the connecting column, and what rules enforce the shape.

One-to-One

A user has one profile. The profile needs to know which user it belongs to, so user_profiles gets a user_id column that references users.id. That column is a foreign key -- it is how one row in one table points to a specific row in another.

A foreign key alone does not make this one-to-one. Without any further constraint, two profile rows could both reference the same user. That is a one-to-many structure. Adding UNIQUE on user_id tells the database to reject any second profile row for the same user -- now the constraint enforces the relationship.

Loading chart...

Check your understanding

A system tracks users and their notification settings. Each user has exactly one settings record, and no settings record exists without a user. Which table holds the foreign key? Write out what the relevant column definition looks like, including the constraint that makes it one-to-one. Then explain: what would happen to the relationship if that constraint were removed?

One-to-Many

A user places many orders. Each order belongs to exactly one user. The foreign key goes on the "many" side -- orders gets a user_id column referencing users.id.

Multiple order rows can share the same user_id. That is what makes it one-to-many: one user row, many order rows pointing back to it. No additional constraint is needed -- the structure already allows exactly this shape.

Loading chart...

Check your understanding

In a blogging system, each post is written by one author, but one author can write many posts. Where does the foreign key go -- on `posts` or on `authors`? Explain why it goes there and not the other way. Then describe what the data looks like in the table for two authors who have each written three posts.

Many-to-Many

A product belongs to many categories, and a category contains many products. Neither table can hold the foreign key cleanly.

Here is what happens when you try. You add category_id to products. A laptop belongs to Electronics -- fine. Then it also belongs to Computers -- you add category_id_2. Then it goes on sale -- you add category_id_3. Now a fourth category comes along and there is no column for it. You have to alter the table to add one.

But that is not even the worst part. Every query that asks "which categories does this product belong to?" now has to check three separate columns:

WHERE category_id = ? OR category_id_2 = ? OR category_id_3 = ?

And it is wrong the moment a fourth column exists. The column structure cannot represent a variable-length relationship -- you have forced a fixed number of slots onto something that has no fixed limit.

The solution is a junction table. Instead of columns on either parent, you create a third table that holds one row per pairing:

product_idcategory_id
110 (Electronics)
111 (Computers)
112 (Sale)
113 (New Arrivals)

Adding a fourth category is a new row. Removing one is a deleted row. Querying all categories for a product is a single join. The relationship lives in the junction table, not in either parent.

The same pattern applies to products and orders. An order contains many products, and a product appears in many orders -- so order_items is the junction table between them. It also carries quantity, which is an attribute of the relationship itself: it describes how many of a product are in a specific order, not a fact about either the order or the product alone.

Loading chart...

Check your understanding

A university needs to track student course enrollments. Each student can enroll in many courses, and each course has many students enrolled. Explain why adding `course_id_1`, `course_id_2`, `course_id_3` columns to `students` is the wrong structure -- specifically what breaks when a student enrolls in a fourth course, and what goes wrong with queries before that happens. Then describe the junction table you would use instead, and name one attribute that belongs on the junction table rather than on either parent.

Check your understanding

You are designing a database for a recipe management application. The requirements include: 1. Each recipe belongs to exactly one author (a registered user). 2. A recipe can have multiple ingredients, and the same ingredient can appear in many recipes -- each with a different quantity. 3. Each ingredient belongs to one ingredient category (such as "dairy" or "produce"). 4. A recipe can be tagged with multiple dietary labels (such as "vegan" or "gluten-free"), and each label applies to many recipes. 5. Each user can save recipes to a personal collection, and the same recipe can be saved by many users. For each requirement, identify the cardinality, name the entities involved, and describe the table structure you would use to implement it. For requirements 2 and 5, explain why a foreign key column on either parent table is not a sufficient structure.