Introduction to SQL set operators

Retrieving data stored in databases is part of the data-gathering process. Many organizations have their data already stored in databases, data is generated internally which eliminates the need for further search for data through third-party agencies, web scraping, etc.

It becomes the responsibility of the analyst to connect to the available data stored in the organization's database and perform the preferred analysis on it.

Sometimes when working with this data stored in the database, we have to make queries from two or more tables. Instead of merging these two tables as performed by an SQL Join, we would love to have the result from both tables brought together into a single table.

Rather than join our dataset from the two tables horizontally, which is performed by an SQL join, we join them vertically, made possible using SQL set operators.

Note: I am using two tables here to simplify the concept, however, you are not limited to the number of tables to use. Just make sure you aren't overdoing it, in order to keep your sanity in check.

The SQL term that makes it possible to achieve this result is termed set operators.

Set operators are operators which are used to combine the result of two or more queries.

These include:
UNION, UNION ALL, INTERSECT, MINUS/EXCEPT

All code examples shown here will be written using MySQL

Set Operators Syntax

SELECT select_query
  set_operator
SELECT another_select_query
CREATE TABLE students1(
    id INT NOT NULL PRIMARY KEY AUTO_INCREMENT,
    first_name VARCHAR(100),
    last_name  VARCHAR(100),
    department VARCHAR(100),
    level INT,
    residence VARCHAR(100)
)
CREATE TABLE students2(
    id INT NOT NULL PRIMARY KEY AUTO_INCREMENT,
    first_name VARCHAR(100),
    last_name  VARCHAR(100),
    department VARCHAR(100),
    level INT,
    residence VARCHAR(100)
)

With our table created, insert some data into it.

Inserting records into our table

INSERT INTO `students1`(`id`, `first_name`, `last_name`,
`department`, `level`, `residence`)
VALUES (1,"John","Mark","Economics",300,"Hostel"),
(2,"Ikenna","Matthew","Psychology",100,"Hostel"),
(3,"Jones","Jessica","Chemistry",400,"Off_Campus"),
(4,"Sarah","Mary","Sociology",200,"Off_Campus"),
(5,"King", "James", "Social Work", 300, "Hostel")


INSERT INTO `students2`(`id`, `first_name`, `last_name`, `department`, `level`, `residence`)
VALUES (1,"John","Mark","Economics",300,"Hostel"),
(2,"Ikenna","Matthew","Psychology",100,"Hostel"),
(3,"James","Bill","Agric",200,"Hostel"),
(4,"Sarah","Mary","Sociology",200,"Off_Campus"),
(5,"Queen", "Sharon","English", 100, "Off_Campus")

Set Operators


UNION

This set operator is used to combine the result of two select statements.
Duplicate rows will be eliminated from the results obtained after performing the UNION operation.

SELECT id, first_name, last_name, department, level,residence 
FROM students1
UNION
SELECT id, first_name, last_name, department, level,residence 
FROM students2

Since the union operator eliminates duplicate values, all values that appear in the second table already existing in the first table will be eliminated.

UNION ALL

This operator combines all the records from both queries.
Duplicate rows will not be eliminated from the results obtained after performing the UNION ALL operation.

SELECT id, first_name, last_name, department, level,residence 
FROM students1
UNION ALL
SELECT id, first_name, last_name, department, level,residence 
FROM students2

UNION vs UNION ALL: Which should you choose?

The distinguishing difference between UNION and UNION ALL is what happens when duplicate records(rows) are found in both tables.

Using the UNION operator, duplicate records are eliminated/removed. That is, if the record(row) found in the second table already exists in the first table, the row from the first table is returned but the row from the second table won't be returned because there already exists an exact record for it from the first table.

UNION operator says, I have scanned your tables and there appears to be a record in the first table that also appears in the second table. Now am going to help you return distinct records by not returning the same record found in the second table.

With the UNION ALL operator, all records are returned irrespective of whether they appear in the second table.
The UNION ALL operator says, I don't care if there are duplicates, this is all the record you have in both tables. The ALL in UNION ALL means show all records.

UNION is often slower than UNION ALL because there is an additional operation to remove duplicate values, which is usually a costly query step.
UNION ALL does not perform a distinct query, so it is usually faster.

So, with no intention of having unique records in your result, or if you are sure that the records in the table are already unique, use UNION ALL. But if you feel that your tables will contain duplicate values, use UNION.

INTERSECT

Returns the records which are common from both tables and removes duplicates.
Take note: it removes duplicates.

SELECT id, first_name, last_name, department, level,residence 
FROM students1
INTERSECT
SELECT id, first_name, last_name, department, level,residence 
FROM students2

MINUS/EXCEPT

Return rows that exist in the result of the first query but not in the results of the second query.

SELECT id, first_name, last_name, department, level,residence 
FROM students1
EXCEPT
SELECT id, first_name, last_name, department, level,residence 
FROM students2

Except and Minus perform the same query operation. Minus however is an Oracle relational database keyword. Except is used in other relational databases such as MySQL, and PostgreSQL.

Important points to note

1 When selecting columns from both tables, the number of columns returned needs to match between queries. If three columns are selected from the first query, you need to select three columns from the second query.

2 The order of the columns must be the same.

SELECT id, first_name, last_name, department, level,residence 
FROM students1
UNION
SELECT id, last_name, department, first_name, level,residence 
FROM students2
ORDER BY  department

Executing the above query will not generate an error but will however cause a disorder in the result.

3 The data type of each column needs to be compatible with both tables. If the data type of the column named first_name in the first table is a string, then the datatype of the column named first_name in the second table should also be a string.

4 If you wish to order the results of the query, then the ORDER BY must be written at the end of the last query.

SELECT id, first_name, last_name, department, level,residence
FROM students1
UNION
SELECT id, first_name, last_name, department, level,residence 
FROM students2
ORDER BY  department

If the ORDER BY is placed in the first query, it will generate an error. You can try it out for yourself.

5 For the final result set, alias names assigned to columns are gotten from the names of the columns in the first query. Thus, aliasing column names in the second query won't be useful.

SELECT id AS studentID, first_name, last_name, department, level, residence AS location FROM students1
UNION
SELECT id as id_of_student, first_name, last_name, department,level, residence AS address FROM students2

Here, we have performed query operations using two tables. It is important to note that we can also add more tables to the set operations. Just make sure you aren't adding more tables that makes it difficult to keep track of operations.

Conclusion

Although an underused concept in SQL, set operators can be useful when we want to combine multiple tables with the same structure into a single table.

In this tutorial, we have learned:

What are set operators
Reasons for using set operators
Types of set operators
Points to note about set operators

I do hope you find this tutorial useful in helping you learn about SQL set operators
Any questions or feedback? Please comment below.

Connect with me on Linkedin or Twitter