DATA

SQL Tutorial Series Vol. 2 — Syntax

Foundational layers

Bruce A. Lee

--

Photo by Roman Synkevych 🇺🇦 on Unsplash

Hello everyone! This is part two of my SQL Tutorial Series, where I break down the fundamentals for those just learning the language.

Part 1 & Part 3.

Here in volume 2, I’m going to be outlining and explaining the syntax behind SQL, the “mother clauses”, the SQL order of operations and why it helps to “sound out” queries before and as you type them.

Syntax

First thing, let’s break down the syntax of SQL. Syntax is how a language is structured. This is not exclusive to only the computer domain, actual real life languages each have their own syntactical nuances.

Think of how I can say “We have fun with our Computers!” in English, and then comparing that same phrase in Spanish: “¡Nos divertimos con nuestras Computadoras!”. The content of the sentence is the same in either language, but how the words themselves are arranged and expressed are different.

This is especially the case for the different computer languages that you’ll learn throughout your career and SQL especially is no different.

The “Mother Clauses”

For some reason, I’m been stuck on this analogy ever since it came to mind because of my obsession with food, but it just makes sense. When we talk about syntax, you can’t not have these.

A lot like how in French cooking, you have the mother sauces béchamel, velouté, espagnole, hollandaise, and tomato. You have the SQL Mother Clauses, or the building blocks that comprise each query you’ll type out.

In any case, here they are:

SELECT
FROM
WHERE
GROUP BY
ORDER BY

Now, if we were to put this in further context:

SELECT *
FROM sample_data AS sd
-- LEFT JOIN sample_data2 AS sd2
-- ON sd.employee_id = sd2.employee_id
WHERE name = 'John Doe'
-- Alternatively, you can use the IN operator to look for specifc things:
-- WHERE name IN ('John Doe', 'Jane Doe', 'Tom Anderson')
-- Also, note the '=' next to 'name'. The = is a comparison operator.
-- Others include != and <> (not equal), < and >, <= and >=, etc.
GROUP BY area_code
-- HAVING
ORDER BY name -- ASC, DESC
-- LIMIT

Note: The double dashes before the lines in my queries are how you detail notes. This is great practice when working with others.

When starting out and moving forward, you’ll be relying on these 5 clauses when building out your queries. There’s many layers that can be found when combining certain aspects of SQL with these clauses. SQL also has it’s own order of operations that you’re going to be building everything around, and I’ll go over that soon enough.

Later down the road, when we talk about CTEs (Common Table Expressions), subqueries, indexes, aggregate clauses, CASE WHEN, window functions, multi-table joins, etc.. These are aspects of SQL that are more advanced than what we’re covering here, but they play off of these basic concepts. Once we’ve gotten a good grip of what’s occurring here, you’ll see how everything works together.

So what’s actually going here? Well we’ve got:

SELECT: As it says on the tin, you’re selecting the columns in your data by name. The * next to SELECT represents the word “all”, so in English this translates to “SELECT all of the columns FROM this database”.

We try to abstain from doing this, because databases can get extremely massive.

Selecting 500 millions rows from a database will usually slow things down for everyone and can cost your company money by needlessly computing big things you don’t need. Always select specific columns if possible. SELECT can also be used in a myriad of ways that i’ll cover later on.

FROM: This is the database that you are sourcing your data from. This can be combined with JOINs if you are looking to pull data with a common identifying column (usually an ID) from an adjacent table. You can use multiple joins and a few varieties of joins exist.

WHERE: This is how you specify certain criteria that you’re looking for when it comes to filtering data from your table.

Usually logical operators like IN, LIKE, AND, EXISTS, NOT, OR, UNIQUE, BETWEEN are used here; along with the comparison operators I listed in the example above.

Also, consider that there is a fundamental difference between WHERE and HAVING.

WHERE filters data based on individual rows, where HAVING applies to groups as a whole. Also, WHERE cannot use aggregate functions, whereas HAVING can. WHERE is also heavily used in subqueries!

GROUP BY: Here you’ll use this clause to group your resulting data from the query by specific columns, grouping them in the process.

A big thing to consider here would be that whenever you use aggregate functions or certain kinds of functions in queries, for the query to work, you’ll always need to list the columns they are being based on in the GROUP BY clause. You can use the HAVING clause to filter between groups.

ORDER BY: You’ll use this clause to order everything by a column you specify, SQL always assumes alphabetical order and you aren’t able to tell it anything differently.

You can specify if you want your results to be returned in ASC (ascending) or DESC (descending) order and you can list multiple columns if you want a certain order to the proceedings.

So then if we were to build out a query, how do these keywords fall in order and why is that important? Well…

SQL Order of Operations

Yes, SQL does have it’s own order of operations, much like “PEMDAS” of childhood arithmetic fame. It’s important to know this when building out a query, as it can help things click and make more sense when putting together your logic.

The SQL Order of Operations goes a little something like this:

FROM/JOINs -> WHERE -> GROUP BY -> HAVING -> SELECT -> DISTINCT -> ORDER BY -> LIMIT/OFFSET

You’re essentially telling your database when you write a query: “From this database, where these columns fits my criteria, group them by this column and select them by these column names. They do not need to be distinct, but order them by this column and limit the results you give me to only 100.”

That’s why it helps to sound out your queries in English (or your native language), before and as you type them. For example, take this query below:

-- This isn't from a dataset that exists by the way, I just made this up 

SELECT firstname, lastname, phone_number, email, year_received, country
FROM noble_prize_winners
WHERE firstname != 'Bruce' AND lastname != 'Lee'
GROUP BY country
HAVING country = 'USA'
ORDER BY lastname ASC
LIMIT 1000

When we look at the above, the first thing I did was start my query using ‘FROM’, as it gives me a frame of reference for how I want to structure everything else.

If I want to sound this out in English, I’ll say as I’m typing:

This all has to come from the noble_prize_winners table, where they cannot have the first name Bruce and the last name Lee.

Group everything by the person’s country and make sure the groups have their country be the USA.

Next, select everyone’s first name, last name, phone number, email address, the year a person received a Noble Prize and the country they came from.

Lastly, order everything by their last name in ascending order and limit the number of the results set to 1000 rows.

It’s certainly a mouthful, but as I was starting, doing this really helped me to grasp the nuances of SQL’s logic as I was typing everything out.

The words have to make sense yourself and to SQL, let alone anyone else reading your query for that matter, so planning out your query logic beforehand goes a long way.

That’ll do it for volume 2 of the SQL Tutorial Series, check back for part 3 where I tell you how to install an IDE (Integrated Development Environment)! These can also be referred to as ‘query editors’ or ‘SQL editors’ as well. I’ll also list ways that you can find decent resources to practice with — such as sample datasets and practice websites. See you soon and thank you again for reading!

--

--

Bruce A. Lee

Sometimes I'll write about data, sometimes I'll just rant but either way, it's sure to be entertaining.