(click anywhere to close)
OPEN MENU

[SQL] Structure

category: Website | course: SQL | difficulty:

Databases from professional grade applications usually contain thousands and thousands of rows. For example, imagine the sheer amount of users that Twitter has, or the amount of videos uploaded to Youtube. In order to quickly create new entries, and select the exact entries you want from such huge databases, you will need to be smart about your table structure.

More specifically, you want to have the least data possible stored in the database, and you need to tell the database system exactly what type of data it’s holding, so that working with it becomes much faster. This is done with two things: constraints and keys. This chapter will mainly talk about things that must be done for every table.

Primary Keys

The first thing that must be true for every table, is that every entry must be unique. If multiple entries are identical, which means the value for every column is exactly the same, you can’t select only one of them – which is a problem.

For example, say we have a table containing user accounts. When a user logs in, we fetch its profile information from the database and display it. If we had two identical entries, however, we would always select more than one row, and our script wouldn’t know what to do! It expects one row and wants to deal with that, but now it will always have two, which it can’t distinguish in any way whatsoever.

So, to make every entry unique, every table must have exactly one primary key. A column that is a primary key must contain unique values and may never contain no value (so-called NULL). If you try to insert a value that already exists, SQL will throw an error and your query will not be executed.

It is possible to create a table without primary key, but that’s only recommended if you’re storing lots and lots of data for personal use (as you don’t need it then, and it saves time). Otherwise, even when highly in doubt, use a primary key.

Usually, a column called id is used to assign a numeric id to every entry. This way, even if all the other values are the same, the id will be different, and can be used to uniquely identify every entry.

To set a column to be primary key, use

column_name type, PRIMARY KEY (column_name)MySQL column_name type PRIMARY KEYOthers
CREATE TABLE users
	(id  INT(11) PRIMARY KEY,
	 usn VARCHAR(50));

INSERT INTO users VALUES (1,"Panda");
INSERT INTO users VALUES (2,"Juan");
-- This is not allowed.
INSERT INTO users VALUES (1,"Gandalf");
idusn
1Panda
2Juan

Advanced Primary Keys

It’s also possible to create a primary key out of multiple columns, in which case you need to give it a special name yourself. Remember that this is still just one, unique primary key column – it’s just created by adding together several other columns.

CONSTRAINT primaryKeyName PRIMARY KEY (column_1, , column_n)

The query above creates a primary key called primaryKeyName which holds the result of column_1 + … + column_n for every entry. Obviously, this value should be unique for each entry.

-- This table stores courses. However, a course is given multiple times a year, so we need a second id to distinguish each one of them.
CREATE TABLE courses
	(id        INT(11),
	 second_id INT(11),
	 name      VARCHAR(50),
	 CONSTRAINT primKey PRIMARY KEY (id, signup_date));

-- Has primKey 2
INSERT INTO courses VALUES (1,1,"Calculus");
-- Has primKey 3
INSERT INTO courses VALUES (1,2,"Calculus");
idsecond_idcourse
11Calculus
12Calculus

Defaults

The second thing that must be true for every table, is that every field must have a value. Even if you don’t know the value yet, or the value has somehow (accidentally) disappeared, there has to be some value.

It is, therefore, wise to set a default for such columns. If you don’t do so, columns will default to the value NULL if you don’t specify a value yourself. (More on that in the next section.)

To set a default for a column, use

column_name type DEFAULT default_value
-- If a user leaves a comment, but isn't logged in, set the name to "Anonymous" by default
CREATE TABLE comments
	(id      INT(11),
	 name    VARCHAR(50) DEFAULT "Anonymous",
	 comment VARCHAR(255));

INSERT INTO comments VALUES (1,"Panda","Nice article!");
INSERT INTO comments VALUES (2, NULL  ,"This article is garbage. I'm cool because I anonymously leave hateful comments online.");
idnamecomment
1PandaNice article!
2AnonymousThis article is garbage. I'm cool because I anonymously leave hateful comments online.

Null

The NULL value is a special one. It represents missing or unknown data, which means it’s useful if you don’t know a field’s value yet, but also that you can’t really perform any meaningful operations with it.

To forbid a column from having NULL values, use the following constraint

column_name type NOT NULL
-- But, anonymous users don't leave very nice comments on our website, so we decide it's not allowed anymore. 
ALTER TABLE comments ALTER name SET VARCHAR(50) NOT NULL;

-- This is not allowed anymore
INSERT INTO users (id, comment) VALUES (3,"Thanks for sharing!");

-- However, this works fine; an empty string, or 0, is NOT the same as NULL
INSERT INTO users VALUES (3,"", "No problemo!");
idnamecomment
1PandaNice article!
2AnonymousThis article is garbage. I'm cool because I anonymously leave hateful comments online.
3 No problemo!

This is a constraint you’ll need very often, and I advise you to use it on every column that is critical to your application.

Best Relational Practices

At the start of this course, I listed all the advantages of relational database systems. But, you might be wondering, how do I actually use this in the best way possible?

Well, we use something called normalization. Normalization is a process where you minimize redundancy and dependency – in other words, you make sure there’s no duplicate data in your table.

For example, say you have a table containing your blog posts, and a table containing all the categories and brief descriptions of whatever that category is about. What you could do, is create a column in the posts table that holds the exact name of the category a post was put in. What’s better, however, is to let that column hold references to the entries in the categories table.

This has two huge benefits:

  • If the name of the category ever changes, everything is automatically updated
  • The table needs to store less data, which saves space and makes queries faster.
SQLNormalization

In conclusion, try to make sure you never repeat yourself. Think about which tables you need (and their structure) upfront, and make sure they reference each other in a meaningful way. Make sure you never store more data than needed, because it would waste valuable space and resources. More about this, and about setting other important constraints and data types, is covered in next chapters.

CONTINUE WITH THIS COURSE
Do you like my tutorials?
To keep this site running, donate some motivational food!
Crisps
(€2.00)
Chocolate Milk
(€3.50)
Pizza
(€5.00)