Graeme Birchall (until Version 9.7, 2011)
Version 1.0, Rodney Krick, November 2019
Contributors: Andres Gomez Casanova, Robert Mala

Db2 SQL Cookbook by Graeme Birchall is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Based on a work at https://github.com/rodneykrick/db2-sql-cookbook.
Permissions beyond the scope of this license may be available at https://github.com/rodneykrick/db2-sql-cookbook
Introduction
Project in GitHub: https://github.com/rodneykrick/db2-sql-cookbook The PDF version can be found here: http://db2-sql-cookbook.org/pdf/Db2_SQL_Cookbook.pdf.
This version of the book is intended to be a community effort to help people learn SQL. Join, improve, enjoy, have fun! == Document history
(The history of the original book can bee seen at Graeme Birchall Book Editions Upload Dates)
Version | Date | Content |
---|---|---|
1.0 |
9.10.2019 |
Started new project in GitHub. Text transformed to asciidoc. |
1.1 |
24.11.2019 |
Added license terms |
1.2 |
06.12.2019 |
Update functions to Db2 version 11.1 |
1.3 |
18.01.2020 |
Added Time Travel chapter. |
1.4 |
30.08.2022 |
Personal notes removed from introduction. |
1.5 |
06.10.2022 |
VARCHAR_FORMAT explation corrected. New contributor added. |
1. Quick find
1.1. Index of Concepts
1.1.1. Join rows
To combine matching rows in multiple tables, use a join (see Joins).
EMP_NM
ID | NAME |
---|---|
10 |
Sanders |
20 |
Pernal |
50 |
Hanes |
EMP_JB
ID | JOB |
---|---|
10 |
Sales |
20 |
Clerk |
SELECT nm.id
, nm.name
, jb.job
FROM emp_nm nm
, emp_jb jb
WHERE nm.id = jb.id
ORDER BY 1;
ANSWER
ID | NAME | JOB |
---|---|---|
10 |
Sanders |
Sales |
20 |
Pernal |
Clerk |
1.1.2. Outer Join
To get all of the rows from one table, plus the matching rows from another table (if there are any), use an outer join (see Join Types).
EMP_NM
ID | NAME |
---|---|
10 |
Sanders |
20 |
Pernal |
50 |
Hanes |
EMP_JB
ID | JOB |
---|---|
10 |
Sales |
20 |
Clerk |
SELECT nm.id ,nm.name ,jb.job
FROM emp_nm nm
LEFT OUTER JOIN emp_jb jb
ON nm.id = jb.id
ORDER BY nm.id;
ANSWER
ID | NAME | JOB |
---|---|---|
10 |
Sanders |
Sales |
20 |
Pernal |
Clerk |
50 |
Hanes |
- |
To get rows from either side of the join, regardless of whether they match (the join) or not, use a full outer join (see Full Outer Joins).
1.1.3. Null Values & Replace
Use the COALESCE function (see COALESCE) to replace a null value (e.g. generated in an outer join) with a non-null value.
1.1.4. Select Where No Match
To get the set of the matching rows from one table where something is true or false in another table (e.g. no corresponding row), use a sub-query (see Sub-Query).
EMP_NM
ID | NAME |
---|---|
10 |
Sanders |
20 |
Pernal |
50 |
Hanes |
EMP_JB
ID | JOB |
---|---|
10 |
Sales |
20 |
Clerk |
SELECT * FROM emp_nm nm
WHERE NOT EXISTS
(SELECT * FROM emp_jb jb
WHERE nm.id = jb.id)
ORDER BY id;
ANSWER
ID | NAME |
---|---|
50 |
Hanes |
1.1.5. Append Rows
To add (append) one set of rows to another set of rows, use a union (see Union, Intersect, and Except).
EMP_NM
ID | NAME |
---|---|
10 |
Sanders |
20 |
Pernal |
50 |
Hanes |
EMP_JB
ID | JOB |
---|---|
10 |
Sales |
20 |
Clerk |
SELECT *
FROM emp_nm
WHERE emp_nm name < 'S'
UNION
SELECT *
FROM emp_jb
ORDER BY 1, 2;
ANSWER
ID | 2 |
---|---|
10 |
Sales |
20 |
Clerk |
20 |
Pernal |
50 |
Hanes |
1.1.6. Assign Output Numbers
To assign line numbers to SQL output, use the ROW_NUMBER function (see ROW_NUMBER).
EMP_JB
ID | JOB |
---|---|
10 |
Sales |
20 |
Clerk |
SELECT id
, job
, ROW_NUMBER() OVER(ORDER BY job) AS R
FROM emp_jb
ORDER BY job;
ANSWER
ID | JOB | R |
---|---|---|
20 |
Clerk |
1 |
10 |
Sales |
2 |
1.1.7. Assign Unique Key Numbers
To make each row inserted into a table automatically get a unique key value, use an identity column, or a sequence, when creating the table (see Identity Columns and Sequences).
1.1.8. If-Then-Else Logic
To include if-then-else logical constructs in SQL stmts, use the CASE phrase (see CASE Expression).
EMP_JB
ID | JOB |
---|---|
10 |
Sales |
20 |
Clerk |
SELECT id
, job
, CASE
WHEN job = 'Sales' THEN 'Fire'
ELSE 'Demote'
END AS STATUS
FROM emp_jb;
ANSWER
ID | JOB | STATUS |
---|---|---|
10 |
Sales |
Fire |
20 |
Clerk |
Demote |
1.1.9. Get Dependents
To get all of the dependents of some object, regardless of the degree of separation from the parent to the child, use recursion (see Recursive SQL).
FAMILY
PARNT | CHILD |
---|---|
GrDad |
Dad |
Dad |
Dghtr |
Dghtr |
GrSon |
Dghtr |
GrDtr |
WITH temp (persn, lvl)
AS (SELECT parnt, 1
FROM family
WHERE parnt = 'Dad'
UNION ALL
SELECT child, Lvl + 1
FROM temp, family
WHERE persn = parnt)
SELECT * FROM temp;
ANSWER
PERSN | LVL |
---|---|
Dad |
1 |
Dghtr |
2 |
GrSon |
3 |
GrDtr |
3 |
1.1.10. Convert String to Rows
To convert a (potentially large) set of values in a string (character field) into separate rows (e.g. one row per word), use recursion (see Recursive SQL).
INPUT DATA "Some silly text" Use Recursive SQL
ANSWER
TEXT | LINE# |
---|---|
Some |
1 |
silly |
2 |
text |
3 |
Be warned - in many cases, the code is not pretty.
1.1.11. Convert Rows to String
To convert a (potentially large) set of values that are in multiple rows into a single combined field, use recursion (see Recursive SQL).
INPUT DATA
TEXT | LINE# |
---|---|
Some |
1 |
silly |
2 |
text |
3 |
Use Recursive SQL
ANSWER: "Some silly text"
1.1.12. Fetch First "n" Rows
To fetch the first "n" matching rows, use the FETCH FIRST notation (see FETCH FIRST Clause).
EMP_NM
ID | NAME |
---|---|
10 |
Sanders |
20 |
Pernal |
50 |
Hanes |
SELECT * FROM
emp_nm
ORDER BY id DESC
FETCH FIRST 2 ROWS ONLY;
ANSWER
ID | NAME |
---|---|
50 |
Hanes |
20 |
Pernal |
Another way to do the same thing is to assign row numbers to the output, and then fetch those rows where the row-number is less than "n" (see Selecting "n" Rows).
1.1.13. Fetch Subsequent "n" Rows
To the fetch the "n" through "n + m" rows, first use the ROW_NUMBER function to assign output numbers, then put the result in a nested-table-expression, and then fetch the rows with desired numbers.
1.1.14. Fetch Uncommitted Data
To retrieve data that may have been changed by another user, but which they have yet to commit, use the WITH UR (Uncommitted Read) notation.
EMP_NM
ID | NAME |
---|---|
10 |
Sanders |
20 |
Pernal |
50 |
Hanes |
SELECT *
FROM emp_nm
WHERE name like 'S%'
WITH UR;
ANSWER
ID | NAME |
---|---|
10 |
Sanders |
Using this option can result in one fetching data that is subsequently rolled back, and so was never valid. Use with extreme care.
1.1.15. Summarize Column Contents
Use a column function (see Column Functions or Aggregate Functions) to summarize the contents of a column.
EMP_NM
ID | NAME |
---|---|
10 |
Sanders |
20 |
Pernal |
50 |
Hanes |
SELECT AVG(id)AS avg
,MAX(name) AS maxn
,COUNT(*) AS #rows
FROM emp_nm;
ANSWER
AVG | MAXN | #ROWS |
---|---|---|
26 |
Sanders |
3 |
1.1.16. Subtotals and Grand Totals
To obtain subtotals and grand-totals, use the ROLLUP or CUBE statements (see ROLLUP Statement).
SELECT job
,dept
,SUM(salary) AS sum_sal
,COUNT(*) AS #emps
FROM staff
WHERE dept < 30
AND salary < 90000
AND job < 'S'
GROUP BY ROLLUP(job, dept)
ORDER BY job, dept;
ANSWER
JOB | DEPT | SUM_SAL | #EMPS |
---|---|---|---|
Clerk |
15 |
84766.70 |
2 |
Clerk |
20 |
77757.35 |
2 |
Clerk |
- |
162524.05 |
4 |
Mgr |
10 |
243453.45 |
3 |
Mgr |
15 |
80659.80 |
1 |
Mgr |
- |
324113.25 |
4 |
- |
- |
486637.30 |
8 |
1.1.17. Enforcing Data Integrity
When a table is created, various Db2 features can be used to ensure that the data entered in the table is always correct:
-
Uniqueness (of values) can be enforced by creating unique indexes.
-
Check constraints can be defined to limit the values that a column can have.
-
Default values (for a column) can be defined - to be used when no value is provided.
-
Identity columns (see Identity Columns and Sequences), can be defined to automatically generate unique numeric values (e.g. invoice numbers) for all of the rows in a table. Sequences can do the same thing over multiple tables.
-
Referential integrity rules can be created to enforce key relationships between tables.
-
Triggers can be defined to enforce more complex integrity rules, and also to do things (e.g. populate an audit trail) whenever data is changed.
See the Db2 manuals for documentation or Protecting Your Data for more information about the above.
1.1.18. Hide Complex SQL
One can create a view (see View) to hide complex SQL that is run repetitively. Be warned however that doing so can make it significantly harder to tune the SQL - because some of the logic will be in the user code, and some in the view definition.
1.2. Summary Table
Some queries that use a GROUP BY can be made to run much faster by defining a summary table (see Materialized Query Tables) that Db2 automatically maintains. Subsequently, when the user writes the original GROUP BY against the source-data table, the optimizer substitutes with a much simpler (and faster) query against the summary table.
2. Introduction to SQL
This chapter contains a basic introduction to Db2 SQL. It also has numerous examples illustrating how to use this language to answer particular business problems. However, it is not meant to be a definitive guide to the language. Please refer to the relevant IBM manuals for a more detailed description.
2.1. Syntax Diagram
The original book has lots of syntax diagrams in it. We decided not to put them here anymore. Syntax diagrams can be found in the SQL Reference of Db2 (or your prefered RDBMS). We only give you lots of examples how to use SQL. When you master this stuff, you should learn more and improve your knowledge using the "real" manuals.
2.2. SQL Comments
A comment in a SQL statement starts with two dashes and goes to the end of the line:
SELECT name
FROM staff
ORDER BY id;
-- this is a comment.
-- this is another comment.
Some Db2 command processors (e.g. db2batch on the PC, or SPUFI on the mainframe) can process intelligent comments. These begin the line with a --#SET
phrase, and then identify the value to be set. In the following example, the statement delimiter is changed using an intelligent comment:
--#SET DELIMITER !
SELECT name
FROM staff
WHERE id = 10!
--#SET DELIMITER ;
SELECT name
FROM staff
WHERE id = 20;
When using the Db2 Command Processor (batch) script, the default statement terminator can be set using the "-tdx" option, where "x" is the value have chosen.
See the section titled Special Character Usage for notes on how to refer to the statement delimiter in the SQL text. |
2.3. Statement Delimiter
Db2 SQL does not come with a designated statement delimiter (terminator), though a semicolon is often used. A semi-colon cannot be used when writing a compound SQL statement (see Compound SQL) because that character is used to terminate the various subcomponents of the statement.
2.4. SQL Components
2.4.1. Db2 Objects
Db2 is a relational database that supports a variety of object types. In this section we shall overview those items which one can obtain data from using SQL.
Table
A table is an organized set of columns and rows. The number, type, and relative position, of the various columns in the table is recorded in the Db2 catalogue. The number of rows in the table will fluctuate as data is inserted and deleted. The CREATE TABLE statement is used to define a table. The following example will define the EMPLOYEE table, which is found in the Db2 sample database.
CREATE TABLE employee
( empno CHARACTER(6) NOT NULL
, firstnme VARCHAR(12) NOT NULL
, midinit CHARACTER(1) NOT NULL
, lastname VARCHAR(15) NOT NULL
, workdept CHARACTER(3)
, phoneno CHARACTER(4)
, hiredate DATE
, job CHARACTER(8)
, edlevel SMALLINT NOT NULL
, sex CHARACTER(1)
, birthdate DATE
, salary DECIMAL(9,02)
, bonus DECIMAL(9,02)
, comm DECIMAL(9,02)
);
View
A view is another way to look at the data in one or more tables (or other views). For example, a user of the following view will only see those rows (and certain columns) in the EMPLOYEE table where the salary of a particular employee is greater than or equal to the average salary for their particular department.
CREATE VIEW employee_view AS
SELECT a.empno
, a.firstnme
, a.salary
, a.workdept
FROM employee a
WHERE a.salary >=
(SELECT AVG(b.salary)
FROM employee b
WHERE a.workdept = b.workdept
);
A view need not always refer to an actual table. It may instead contain a list of values:
CREATE VIEW silly (c1, c2, c3)
AS VALUES
(11, 'AAA', SMALLINT(22))
,(12, 'BBB', SMALLINT(33))
,(13, 'CCC', NULL);
Selecting from the above view works the same as selecting from a table:
SELECT c1, c2, c3
FROM silly
ORDER BY c1 ASC;
ANSWER
C1 | C2 | C3 |
---|---|---|
11 |
AAA |
22 |
12 |
BBB |
33 |
13 |
CCC |
- |
We can go one step further and define a view that begins with a single value that is then manipulated using SQL to make many other values. For example, the following view, when selected from, will return 10,000 rows. Note however that these rows are not stored anywhere in the database - they are instead created on the fly when the view is queried.
CREATE VIEW test_data AS
WITH temp1 (num1) AS
(VALUES (1)
UNION ALL
SELECT num1 + 1
FROM temp1
WHERE num1 < 10000)
SELECT *
FROM temp1;
Alias
An alias is an alternate name for a table or a view. Unlike a view, an alias can not contain any processing logic. No authorization is required to use an alias other than that needed to access to the underlying table or view.
CREATE ALIAS employee_al1 FOR employee;
COMMIT;
CREATE ALIAS employee_al2 FOR employee_al1;
COMMIT;
CREATE ALIAS employee_al3 FOR employee_al2;
COMMIT;
Neither a view, nor an alias, can be linked in a recursive manner (e.g. V1 points to V2, which points back to V1). Also, both views and aliases still exist after a source object (e.g. a table) has been dropped. In such cases, a view, but not an alias, is marked invalid.
Nickname
A nickname is the name that one provides to Db2 for either a remote table, or a non-relational object that one wants to query as if it were a table.
CREATE NICKNAME emp FOR unixserver.production.employee;
Tablesample
Use of the optional TABLESAMPLE reference enables one to randomly select (sample) some fraction of the rows in the underlying base table:
SELECT *
FROM staff
TABLESAMPLE BERNOULLI(10);
See Randomly Sample Data for information on using the TABLESAMPLE feature.
2.4.2. Db2 Data Types
Db2 comes with the following standard data types:
-
SMALLINT, INT, and BIGINT (i.e. integer numbers).
-
FLOAT, REAL, and DOUBLE (i.e. floating point numbers).
-
DECIMAL and NUMERIC (i.e. decimal numbers).
-
DECFLOAT (i.e. decimal floating-point numbers).
-
CHAR, VARCHAR, and LONG VARCHAR (i.e. character values).
-
GRAPHIC, VARGRAPHIC, and LONG VARGRAPHIC (i.e. graphical values).
-
BLOB, CLOB, and DBCLOB (i.e. binary and character long object values).
-
DATE, TIME, and TIMESTAMP (i.e. date/time values).
-
DATALINK (i.e. link to external object).
-
XML (i.e. contains well formed XML data).
Below is a simple table definition that uses some of the above data types:
CREATE TABLE sales_record
(sales# INTEGER NOT NULL
GENERATED ALWAYS AS IDENTITY (START WITH 1, INCREMENT BY 1)
, sale_ts TIMESTAMP NOT NULL
, num_items SMALLINT NOT NULL
, payment_type CHAR(2) NOT NULL
, sale_value DECIMAL(12,2) NOT NULL
, sales_tax DECIMAL(12,2)
, employee# INTEGER NOT NULL
, CONSTRAINT sales1 CHECK (payment_type IN ('CS','CR'))
, CONSTRAINT sales2 CHECK (sale_value > 0)
, CONSTRAINT sales3 CHECK (num_items > 0)
, CONSTRAINT sales4 FOREIGN KEY (employee#)
REFERENCES staff (id) ON DELETE RESTRICT
, PRIMARY KEY (sales#)
);
In the above table, we have listed the relevant columns, and added various checks to ensure that the data is always correct. In particular, we have included the following:
-
The sales# is automatically generated (see Identity Columns and Sequences for details). It is also the primary key of the table, and so must always be unique.
-
The payment-type must be one of two possible values.
-
Both the sales-value and the num-items must be greater than zero.
-
The employee# must already exist in the staff table. Furthermore, once a row has been inserted into this table, any attempt to delete the related row from the staff table will fail.
Default Lengths
The following table has two columns:
CREATE TABLE default_values
(c1 CHAR NOT NULL
,d1 DECIMAL NOT NULL);
The length has not been provided for either of the above columns. In this case, Db2 defaults to CHAR(1) for the first column and DECIMAL(5,0) for the second column.
Data Type Usage
In general, use the standard Db2 data types as follows:
-
Always store monetary data in a decimal field.
-
Store non-fractional numbers in one of the integer field types.
-
Use floating-point when absolute precision is not necessary.
A Db2 data type is not just a place to hold data. It also defines what rules are applied when the data in manipulated. For example, storing monetary data in a Db2 floating-point field is a no-no, in part because the data-type is not precise, but also because a floating-point number is not manipulated (e.g. during division) according to internationally accepted accounting rules.
DECFLOAT Arithmetic
DECFLOAT numbers have quite different processing characteristics from the other number types. For a start, they support more values:
-
Zero.
-
Negative and positive numbers (e.g. -1234.56).
-
Negative and positive infinity.
-
Negative and positive NaN (i.e. Not a Number).
-
Negative and positive sNaN (i.e. signaling Not a Number).
NaN Usage
The value NaN represents the result of an arithmetic operation that does not return a number (e.g. the square root of a negative number), but is also not infinity. For example, the expression 0/0 returns NaN, while 1/0 returns infinity.
The value NaN propagates through any arithmetic expression. Thus the final result is always either positive or negative NaN, as the following query illustrates:
SELECT DECFLOAT(+1.23) + NaN AS " NaN"
, DECFLOAT(-1.23) + NaN AS " NaN"
, DECFLOAT(-1.23) + -NaN AS " -NaN"
, DECFLOAT(+infinity) + NaN AS " NaN"
, DECFLOAT(+sNaN) + NaN AS " NaN"
, DECFLOAT(-sNaN) + NaN AS " -NaN"
, DECFLOAT(+NaN) + NaN AS " NaN"
, DECFLOAT(-NaN) + NaN AS " -NaN"
FROM sysibm.sysdummy1;
Any reference to a signaling NaN value in a statement (as above) will result in a warning message being generated. |
Infinity Usage
The value infinity works similar to NaN. Its reference in an arithmetic expression almost always returns either positive or negative infinity (assuming NaN is not also present). The one exception is division by infinity, which returns a really small, but still finite, number:
SELECT DECFLOAT(1) / +infinity AS " 0E-6176"
, DECFLOAT(1) * +infinity AS " Infinity"
, DECFLOAT(1) + +infinity AS " Infinity"
, DECFLOAT(1) - +infinity AS "-Infinity"
, DECFLOAT(1) / -infinity AS " -0E-6176"
, DECFLOAT(1) * -infinity AS "-Infinity"
, DECFLOAT(1) + -infinity AS "-Infinity"
, DECFLOAT(1) - -infinity AS " Infinity"
FROM sysibm.sysdummy1;
The next query shows some situations where either infinity or NaN is returned:
SELECT DECFLOAT(+1.23) / 0 AS " Infinity"
, DECFLOAT(-1.23) / 0 AS "-Infinity"
, DECFLOAT(+1.23) + infinity AS " Infinity"
, DECFLOAT(0) / 0 AS "NaN"
, DECFLOAT(infinity) + -infinity AS "NaN"
, LOG(DECFLOAT(0)) AS "-Infinity"
, LOG(DECFLOAT(-123)) AS "NaN"
, SQRT(DECFLOAT(-123)) AS "NaN"
FROM sysibm.sysdummy1;
DECFLOAT Value Order
The DECFLOAT values have the following order, from low to high:
-NaN -sNan -infinity -1.2 -1.20 0 1.20 1.2 infinity sNaN NaN
Please note that the numbers 1.2 and 1.200 are "equal", but they will be stored as different values, and will have a different value order. The TOTALORDER function can be used to illustrate this. It returns one of three values:
-
Zero if the two values have the same order.
-
+1 if the first value has a higher order (even if it is equal).
-
-1 if the first value has a lower order (even if it is equal).
WITH temp1 (d1, d2) AS
(VALUES (DECFLOAT(+1.0), DECFLOAT(+1.00))
,(DECFLOAT(-1.0), DECFLOAT(-1.00))
,(DECFLOAT(+0.0), DECFLOAT(+0.00))
,(DECFLOAT(-0.0), DECFLOAT(-0.00))
,(DECFLOAT(+0), DECFLOAT(-0)) )
SELECT TOTALORDER(d1,d2) AS TOTALORDER
FROM temp1;
ANSWER
TOTALORDER |
---|
1 |
-1 |
1 |
1 |
0 |
The NORMALIZE_DECFLOAT scalar function can be used to strip trailing zeros from a DECFLOAT value:
WITH temp1 (d1) AS
(VALUES (DECFLOAT(+0 ,16))
,(DECFLOAT(+0.0 ,16))
,(DECFLOAT(+0.00 ,16))
,(DECFLOAT(+0.000 ,16))
)
SELECT d1
, HEX(d1) AS hex_d1
, NORMALIZE_DECFLOAT(d1) AS d2
, HEX(NORMALIZE_DECFLOAT(d1)) AS hex_d2
FROM temp1;
ANSWER
D1 | HEX_D1 | D2 | HEX_D2 |
---|---|---|---|
0 |
0000000000003822 |
0 |
0000000000003822 |
0.0 |
0000000000003422 |
0 |
0000000000003822 |
0.00 |
0000000000003022 |
0 |
0000000000003822 |
0.000 |
0000000000002C22 |
0 |
0000000000003822 |
DECFLOAT Scalar Functions
The following scalar functions support the DECFLOAT data type:
-
COMPARE_DECFLOAT: Compares order of two DECFLOAT values.
-
DECFLOAT: Converts input value to DECFLOAT.
-
NORMALIZE_DECFLOAT: Removes trailing blanks from DECFLOAT value.
-
QUANTIZE: Converts number to DECFLOAT, using mask to define precision.
-
TOTALORDER: Compares order of two DECFLOAT values.
Date/Time Arithmetic
Manipulating date/time values can sometimes give unexpected results. What follows is a brief introduction to the subject. The basic rules are:
-
Multiplication and division is not allowed.
-
Subtraction is allowed using date/time values, date/time durations, or labeled durations.
-
Addition is allowed using date/time durations, or labeled durations.
The valid labeled durations are listed below:
LABELED DURATIONS | WORKS WITH DATE/TIME | ||||
---|---|---|---|---|---|
SINGULAR |
PLURAL |
ITEM FIXED SIZE |
DATE |
TIME |
TIMESTAMP |
YEAR |
YEARS |
N |
Y |
- |
Y |
MONTH |
MONTHS |
N |
Y |
- |
Y |
DAY |
DAYS |
Y |
Y |
- |
Y |
HOUR |
HOURS |
Y |
- |
Y |
Y |
MINUTE |
MINUTES |
Y |
- |
Y |
Y |
SECOND |
SECONDS |
Y |
- |
Y |
Y |
MICROSECOND |
MICROSECONDS |
Y |
- |
Y |
Y |
Usage Notes
-
It doesn’t matter if one uses singular or plural. One can add "4 day" to a date.
-
Some months and years are longer than others. So when one adds "2 months" to a date the result is determined, in part, by the date that you began with. More on this below.
-
One cannot add "minutes" to a date, or "days" to a time, etc.
-
One cannot combine labeled durations in parenthesis: "date - (1 day + 2 months)" will fail. One should instead say: "date - 1 day - 2 months".
-
Adding too many hours, minutes or seconds to a time will cause it to wrap around. The overflow will be lost.
-
Adding 24 hours to the time '00.00.00' will get '24.00.00'. Adding 24 hours to any other time will return the original value.
-
When a decimal value is used (e.g. 4.5 days) the fractional part is discarded. So to add (to a timestamp value) 4.5 days, add 4 days and 12 hours.
Now for some examples:
SELECT sales_date
, sales_date - 10 DAY AS d1
, sales_date + -1 MONTH AS d2
, sales_date + 99 YEARS AS d3
, sales_date + 55 DAYS
- 22 MONTHS AS d4
, sales_date + (4+6) DAYS AS d5
FROM sales
WHERE sales_person = 'GOUNOT'
AND sales_date = '1995-12-31';
ANSWER
sales_date | d1 | d2 | d3 | d4 | d5 |
---|---|---|---|---|---|
1995-12-31 |
1995-12-21 |
1995-11-30 |
2094-12-31 |
1994-04-24 |
1996-01-10 |
Adding or subtracting months or years can give somewhat odd results when the month of the beginning date is longer than the month of the ending date. For example, adding 1 month to '2004-01-31' gives '2004-02-29', which is not the same as adding 31 days, and is not the same result that one will get in 2005. Likewise, adding 1 month, and then a second 1 month to '2004-01-31' gives '2004-03-29', which is not the same as adding 2 months. Below are some examples of this issue:
SELECT sales_date
, sales_date + 2 MONTH AS d1
, sales_date + 3 MONTHS AS d2
, sales_date + 2 MONTH + 1 MONTH AS d3
, sales_date + (2+1) MONTHS AS d4
FROM sales
WHERE sales_person = 'GOUNOT'
AND sales_date = '1995-12-31';
ANSWER
sales_date | d1 | d2 | d3 | d4 |
---|---|---|---|---|
1995-12-31 |
1996-02-29 |
1996-03-31 |
1996-03-29 |
1996-03-31 |
Date/Time Duration Usage
When one date/time value is subtracted from another date/time value the result is a date, time,or timestamp duration. This decimal value expresses the difference thus:
DURATION-TYPE | FORMAT | NUMBER-REPRESENTS | USE-WITH-D-TYPE |
---|---|---|---|
DATE |
DECIMAL(8,0) |
yyyymmdd |
TIMESTAMP, DATE |
TIME |
DECIMAL(6,0) |
hhmmss |
TIMESTAMP, TIME |
TIMESTAMP |
DECIMAL(20,6) |
yyyymmddhhmmss.zzzzzz |
TIMESTAMP |
Below is an example of date duration generation:
SELECT empno
, hiredate
, birthdate
, hiredate - birthdate
FROM employee
WHERE workdept = 'D11'
AND lastname < 'L'
ORDER BY empno;
ANSWER
EMPNO | HIREDATE | BIRTHDATE | - |
---|---|---|---|
000150 |
1972-02-12 |
1947-05-17 |
240826 |
000200 |
1966-03-03 |
1941-05-29 |
240905 |
000210 |
1979-04-11 |
1953-02-23 |
260116 |
A date/time duration can be added to or subtracted from a date/time value, but it does not make for very pretty code:
SELECT hiredate
, hiredate - 12345678
, hiredate - 1234 years
- 56 months
- 78 days
FROM employee
WHERE empno = '000150';
ANSWER
HIREDATE | - | - |
---|---|---|
1972-02-12 |
0733-03-26 |
0733-03-26 |
Date/Time Subtraction
One date/time can be subtracted (only) from another valid date/time value. The result is a date/time duration value. Date Duration Generation above has an example.
Db2 Special Registers
A special register is a Db2 variable that contains information about the state of the system. The complete list follows:
Special Register | Updatable | Data type |
---|---|---|
CURRENT CLIENT_ACCTNG |
no |
VARCHAR(255) |
CURRENT CLIENT_APPLNAME |
no |
VARCHAR(255) |
CURRENT CLIENT_USERID |
no |
VARCHAR(255) |
CURRENT CLIENT_WRKSTNNAME |
no |
VARCHAR(255) |
CURRENT DATE |
no |
DATE |
CURRENT DBPARTITIONNUM |
no |
INTEGER |
CURRENT DECFLOAT ROUNDING MODE |
no |
VARCHAR(128) |
CURRENT DEFAULT TRANSFORM GROUP |
yes |
VARCHAR(18) |
CURRENT DEGREE |
yes |
CHAR(5) |
CURRENT EXPLAIN MODE |
yes |
VARCHAR(254) |
CURRENT EXPLAIN SNAPSHOT |
yes |
CHAR(8) |
CURRENT FEDERATED ASYNCHRONY |
yes |
INTEGER |
CURRENT IMPLICIT XMLPARSE OPTION |
yes |
VARCHAR(19) |
CURRENT ISOLATION |
yes |
CHAR(2) |
CURRENT LOCK TIMEOUT |
yes |
INTEGER |
CURRENT MAINTAINED TABLE TYPES FOR OPTIMIZATION |
yes |
VARCHAR(254) |
CURRENT MDC ROLLOUT MODE |
yes |
VARCHAR(9) |
CURRENT OPTIMIZATION PROFILE |
yes |
VARCHAR(261) |
CURRENT PACKAGE PATH |
yes |
VARCHAR(4096) |
CURRENT PATH |
yes |
VARCHAR(2048) |
CURRENT QUERY OPTIMIZATION |
yes |
INTEGER |
CURRENT REFRESH AGE |
yes |
DECIMAL(20,6) |
CURRENT SCHEMA |
yes |
VARCHAR(128) |
CURRENT SERVER |
no |
VARCHAR(128) |
CURRENT TIME |
no |
TIME |
CURRENT TIMESTAMP |
no |
TIMESTAMP |
CURRENT TIMEZONE |
no |
DECIMAL(6,0) |
CURRENT USER |
no |
VARCHAR(128) |
SESSION_USER |
yes |
VARCHAR(128) |
SYSTEM_USER |
no |
VARCHAR(128) |
USER |
yes |
VARCHAR(128) |
Usage Notes
-
Some special registers can be referenced using an underscore instead of a blank in the name - as in: CURRENT_DATE.
-
Some special registers can be updated using the SET command (see list above).
-
All special registers can be queried using the SET command. They can also be referenced in ordinary SQL statements.
-
Those special registers that automatically change over time (e.g. current timestamp) are always the same for the duration of a given SQL statement. So if one inserts a thousand rows in a single insert, all will get the same current timestamp.
-
One can reference the current timestamp in an insert or update, to record in the target table when the row was changed. To see the value assigned, query the DML statement. See Select DML Changes for details.
Refer to the Db2 SQL Reference Volume 1 for a detailed description of each register.
Sample SQL
SET CURRENT ISOLATION = RR;
SET CURRENT SCHEMA = 'ABC';
SELECT CURRENT TIME AS cur_TIME
, CURRENT ISOLATION AS cur_ISO
, CURRENT SCHEMA AS cur_ID
FROM sysibm.sysdummy1;
ANSWER
CUR_TIME | CUR_ISO | CUR_ID |
---|---|---|
12:15:16 |
RR |
ABC |
Distinct Types
A distinct data type is a field type that is derived from one of the base Db2 field types. It is used when one wants to prevent users from combining two separate columns that should never be manipulated together (e.g. adding US dollars to Japanese Yen).
The following source type do not support distinct types: XML, Array. |
The creation of a distinct type, under the covers, results in the creation of two implied functions that can be used to convert data to and from the source type and the distinct type. Support for the basic comparison operators ( =, <>, <, < =, >, and > =
) is also provided. Below is a typical create and drop statement:
CREATE DISTINCT TYPE JAP_YEN AS DECIMAL(15,2) WITH COMPARISONS;
DROP DISTINCT TYPE JAP_YEN;
A distinct type cannot be dropped if it is currently being used in a table. |
Usage Example
Imagine that we had the following customer table:
CREATE TABLE customer
( id INTEGER NOT NULL
, fname VARCHAR(10) NOT NULL WITH DEFAULT ''
, lname VARCHAR(15) NOT NULL WITH DEFAULT ''
, date_of_birth DATE
, citizenship CHAR(3) NOT NULL WITH DEFAULT ''
, usa_sales DECIMAL(9,2)
, eur_sales DECIMAL(9,2)
, sales_office# SMALLINT
, last_updated TIMESTAMP
, PRIMARY KEY(id));
One problem with the above table is that the user can add the American and European sales values, which if they are expressed in dollars and euros respectively, is silly:
SELECT id
, usa_sales + eur_sales AS tot_sales
FROM customer;
To prevent the above, we can create two distinct types:
CREATE DISTINCT TYPE USA_DOLLARS AS DECIMAL(9,2) WITH COMPARISONS;
CREATE DISTINCT TYPE EUROS AS DECIMAL(9,2) WITH COMPARISONS;
Now we can define the customer table thus:
CREATE TABLE customer
( id INTEGER NOT NULL
, fname VARCHAR(10) NOT NULL WITH DEFAULT ''
, lname VARCHAR(15) NOT NULL WITH DEFAULT ''
, date_of_birth DATE
, citizenship CHAR(3)
, usa_sales USA_DOLLARS
, eur_sales EUROS
, sales_office# SMALLINT
, last_updated TIMESTAMP
, PRIMARY KEY(id));
Now, when we attempt to run the following, it will fail:
SELECT id
, usa_sales + eur_sales AS tot_sales
FROM customer;
The creation of a distinct type, under the covers, results in the creation two implied functions that can be used to convert data to and from the source type and the distinct type. In the next example, the two monetary values are converted to their common decimal source type, and then added together:
SELECT id
, DECIMAL(usa_sales) + DECIMAL(eur_sales) AS tot_sales
FROM customer;
Fullselect, Subselect, & Common Table Expression
It is not the purpose of this book to give you detailed description of SQL terminology, but there are a few words that you should know. For example, the following diagram illustrates the various components of a query:
WITH get_matching_rows AS
(SELECT id
, name
, salary
FROM staff
WHERE id < 50
UNION ALL
SELECT id
, name
, salary
FROM staff
WHERE id = 100
)
SELECT *
FROM get_matching_rows
ORDER BY id
FETCH FIRST 10 ROWS ONLY
FOR FETCH ONLY
WITH UR;
-
The structure from WITH until the last parenthesis is a COMMON TABLE EXPRESSION.
-
Each select block is called a SUBSELECT.
-
The block inside the WITH with two SUBSELECTS is a FULLSELECT.
2.4.3. Query Components
-
SUBSELECT: A query that selects zero or more rows from one or more tables.
-
FULLSELECT: One or more subselects or VALUES clauses, connected using a UNION, INTERSECT, or EXCEPT, all enclosed in parenthesis.
-
COMMON TABLE EXPRESSION: A named fullselect that can be referenced one more times in another subselect. See Common Table Expression for a more complete definition.
2.5. SELECT Statement
A SELECT statement is used to query the database. It has the following components, not all of which need be used in any particular query:
-
SELECT clause. One of these is required, and it must return at least one item, be it a column, a literal, the result of a function, or something else. One must also access at least one table, be that a true table, a temporary table, a view, an alias or a table function.
-
WITH clause. This clause is optional. Use this phrase to include independent SELECT statements that are subsequently accessed in a final SELECT (see Common Table Expression).
-
ORDER BY clause. Optionally, order the final output (see Order By, Group By, and Having).
-
FETCH FIRST clause. Optionally, stop the query after "n" rows (see FETCH FIRST Clause). If an optimize-for value is also provided, both values are used independently by the optimizer.
-
READ-ONLY clause. Optionally, state that the query is read-only. Some queries are inherently read-only, in which case this option has no effect.
-
FOR UPDATE clause. Optionally, state that the query will be used to update certain columns that are returned during fetch processing.
-
OPTIMIZE FOR n ROWS clause. Optionally, tell the optimizer to tune the query assuming that not all of the matching rows will be retrieved. If a first-fetch value is also provided, both values are used independently by the optimizer.
Refer to the IBM manuals for a complete description of all of the above. Some of the more interesting options are described below.
SELECT Clause
Every query must have at least one SELECT statement, and it must return at least one item, and access at least one object.
2.5.1. SELECT Items
-
Column: A column in one of the table being selected from.
-
Literal: A literal value (e.g. "ABC"). Use the AS expression to name the literal.
-
Special Register: A special register (e.g. CURRENT TIME).
-
Expression: An expression result (e.g. MAX(COL1*10)).
-
Full Select: An embedded SELECT statement that returns a single row.
2.5.2. FROM Objects
-
Table: Either a permanent or temporary Db2 table.
-
View: A standard Db2 view.
-
Alias: A Db2 alias that points to a table, view, or another alias.
-
Full Select: An embedded SELECT statement that returns a set of rows.
-
Table function: A kind of function that returns a table.
Sample SQL
SELECT deptno
, admrdept
, 'ABC' AS abc
FROM department
WHERE deptname LIKE '%ING%'
ORDER BY 1;
ANSWER
DEPTNO | ADMRDEPT | ABC |
---|---|---|
B01 |
A00 |
ABC |
D11 |
D01 |
ABC |
To select all of the columns in a table (or tables) one can use the "*" notation:
SELECT *
FROM department
WHERE deptname LIKE '%ING%'
ORDER BY 1;
ANSWER (part of)
DEPTNO | etc… |
---|---|
B01 |
PLANNING |
D11 |
MANUFACTU |
To select both individual columns, and all of the columns (using the "" notation), in a single SELECT statement, one can still use the "", but it must fully-qualified using either the object name, or a correlation name:
SELECT deptno
, department.*
FROM department
WHERE deptname LIKE '%ING%'
ORDER BY 1;
ANSWER (part of)
DEPTNO | DEPTNO | etc… |
---|---|---|
B01 |
B01 |
PLANNING |
D11 |
D11 |
MANUFACTU |
Use the following notation to select all the fields in a table twice:
SELECT department.*
, department.*
FROM department
WHERE eptname LIKE '%NING%'
ORDER BY 1;
ANSWER (part of)
DEPTNO | etc… | … | DEPTNO | etc… | … |
---|---|---|---|---|---|
B01 |
PLANNING |
… |
B01 |
PLANNING |
… |
D11 |
MANUFACTU |
… |
D11 |
MANUFACTU |
… |
2.5.3. FETCH FIRST Clause
The fetch first clause limits the cursor to retrieving "n" rows. If the clause is specified and no number is provided, the query will stop after the first fetch. If this clause is used, and there is no ORDER BY, then the query will simply return a random set of matching rows, where the randomness is a function of the access path used and/or the physical location of the rows in the table:
SELECT years
, name
, id
FROM staff
FETCH FIRST 3 ROWS ONLY;
ANSWER
YEARS | NAME | ID |
---|---|---|
7 |
Sanders |
10 |
8 |
Pernal |
20 |
5 |
Marenghi |
30 |
Using the FETCH FIRST clause to get the first "n" rows can sometimes return an answer that is not what the user really intended. See below for details. |
If an ORDER BY is provided, then the FETCH FIRST clause can be used to stop the query after a certain number of what are, perhaps, the most desirable rows have been returned. However, the phrase should only be used in this manner when the related ORDER BY uniquely identifies each row returned. To illustrate what can go wrong, imagine that we wanted to query the STAFF table in order to get the names of those three employees that have worked for the firm the longest - in order to give them a little reward (or possibly to fire them). The following query could be run:
SELECT years
, name
, id
FROM staff
WHERE years IS NOT NULL
ORDER BY years DESC
FETCH FIRST 3 ROWS ONLY;
ANSWER
YEARS | NAME | ID |
---|---|---|
13 |
Graham |
310 |
12 |
Jones |
260 |
10 |
Hanes |
50 |
The above query answers the question correctly, but the question was wrong, and so the answer is wrong. The problem is that there are two employees that have worked for the firm for ten years, but only one of them shows, and the one that does show was picked at random by the query processor. This is almost certainly not what the business user intended. The next query is similar to the previous, but now the ORDER ID uniquely identifies each row returned (presumably as per the end-user’s instructions):
SELECT years
, name
, id
FROM staff
WHERE years IS NOT NULL
ORDER BY years DESC
, id DESC
FETCH FIRST 3 ROWS ONLY;
ANSWER
YEARS | NAME | ID |
---|---|---|
13 |
Graham |
310 |
12 |
Jones |
260 |
10 |
Quill |
290 |
Getting the first "n" rows from a query is actually quite a complicated problem. Refer to Selecting "n" or more Rows for a more complete discussion. |
2.5.4. Correlation Name
The correlation name is defined in the FROM clause and relates to the preceding object name. In some cases, it is used to provide a short form of the related object name. In other situations, it is required in order to uniquely identify logical tables when a single physical table is referred to twice in the same query. Some sample SQL follows:
SELECT a.empno
, a.lastname
, (SELECT MAX(empno)AS empno
FROM employee) AS b
FROM employee a
WHERE a.empno = b.empno;
ANSWER
EMPNO | LASTNAME |
---|---|
000340 |
GOUNOT |
SELECT a.empno
, a.lastname
, b.deptno AS dept
FROM employee a
, department b
WHERE a.workdept = b.deptno
AND a.job <> 'SALESREP'
AND b.deptname = 'OPERATIONS'
AND a.sex IN ('M','F')
AND b.location IS NULL
ORDER BY 1;
ANSWER
EMPNO | LASTNAME | DEPT |
---|---|---|
000090 |
HENDERSON |
E11 |
000280 |
SCHNEIDER |
E11 |
000290 |
PARKER |
E11 |
000300 |
SMITH |
E11 |
000310 |
SETRIGHT |
E11 |
2.5.5. Renaming Fields
The AS phrase can be used in a SELECT list to give a field a different name. If the new name is an invalid field name (e.g. contains embedded blanks), then place the name in quotes:
SELECT empno AS e_num
, midinit AS "m int"
, phoneno AS "..."
FROM employee
WHERE empno < '000030'
ORDER BY 1;
ANSWER
E_NUM | M INT | … |
---|---|---|
000010 |
I |
3978 |
000020 |
L |
3476 |
The new field name must not be qualified (e.g. A.C1), but need not be unique. Subsequent usage of the new name is limited as follows:
-
It can be used in an order by clause.
-
It cannot be used in other part of the select (where-clause, group-by, or having).
-
It cannot be used in an update clause.
-
It is known outside of the fullselect of nested table expressions, common table expressions, and in a view definition.
CREATE view emp2
AS SELECT empno AS e_num
, midinit AS "m int"
, phoneno AS "..."
FROM employee;
SELECT * FROM emp2 WHERE "..." = '3978';
ANSWER
E_NUM | M INT | … |
---|---|---|
000010 |
I |
3978 |
2.5.6. Working with Nulls
In SQL something can be true, false, or NULL. This three-way logic has to always be considered when accessing data. To illustrate, if we first select all the rows in the STAFF table where the SALARY is < $10,000, then all the rows where the SALARY is >= $10,000, we have not necessarily found all the rows in the table because we have yet to select those rows where the SALARY is null. The presence of null values in a table can also impact the various column functions. For example, the AVG function ignores null values when calculating the average of a set of rows. This means that a user-calculated average may give a different result from a Db2 calculated equivalent:
SELECT AVG(comm) AS a1
, SUM(comm) / COUNT(*) AS a2
FROM staff
WHERE id < 100;
ANSWER
A1 | A2 |
---|---|
796.025 |
530.68 |
Null values can also pop in columns that are defined as NOT NULL. This happens when a field is processed using a column function and there are no rows that match the search criteria:
SELECT COUNT(*) AS num
, MAX(lastname) AS max
FROM employee
WHERE firstnme = 'FRED';
ANSWER
NUM | MAX |
---|---|
0 |
- |
Why Null Exist
NULL values can represent two kinds of data. In first case, the value is unknown (e.g. we do not know the name of the person’s spouse). Alternatively, the value is not relevant to the situation (e.g. the person does not have a spouse). Many people prefer not to have to bother with nulls, so they use instead a special value when necessary (e.g. an unknown employee name is blank). This trick works OK with character data, but it can lead to problems when used on numeric values (e.g. an unknown salary is set to zero).
Locating Null Values
One can not use an equal predicate to locate those values that are null because a null value does not actually equal anything, not even null, it is simply null. The IS NULL or IS NOT NULL phrases are used instead. The following example gets the average commission of only those rows that are not null. Note that the second result differs from the first due to rounding loss.
SELECT AVG(comm) AS a1
, SUM(comm) / COUNT(*) AS a2
FROM staff
WHERE id < 100
AND comm IS NOT NULL;
ANSWER
A1 | A2 |
---|---|
796.025 |
796.02 |
2.5.7. Quotes and Double-quotes
To write a string, put it in quotes. If the string contains quotes, each quote is represented by a pair of quotes:
SELECT 'JOHN' AS J1
, 'JOHN''S' AS J2
, '''JOHN''S''' AS J3
, '"JOHN''S"' AS J4
FROM staff
WHERE id = 10;
ANSWER
J1 | J2 | J3 | J4 |
---|---|---|---|
JOHN |
JOHN’S |
'JOHN’S' |
"JOHN’S" |
Double quotes can be used to give a name to an output field that would otherwise not be valid. To put a double quote in the name, use a pair of quotes:
SELECT id AS "USER ID"
, dept AS "D#"
, years AS "#Y"
, 'ABC' AS "'TXT'"
, '"' AS """quote"" fld"
FROM staff s
WHERE id < 40
ORDER BY "USER ID";
ANSWER
USER ID | D# | #Y | 'TXT' | "quote" fld |
---|---|---|---|---|
10 |
20 |
7 |
ABC |
" |
20 |
20 |
8 |
ABC |
" |
30 |
38 |
5 |
ABC |
" |
2.6. SQL Predicates
A predicate is used in either the WHERE or HAVING clauses of a SQL statement. It specifies a condition that true, false, or unknown about a row or a group.
2.6.1. Predicate Precedence
As a rule, a query will return the same result regardless of the sequence in which the various predicates are specified. However, note the following:
-
Predicates separated by an OR may need parenthesis - see AND/OR Precedence.
-
Checks specified in a CASE statement are done in the order written - see CASE Expression.
Basic Predicate
A basic predicate compares two values. If either value is null, the result is unknown. Otherwise the result is either true or false.
SELECT id, job, dept
FROM staff
WHERE job = 'Mgr'
AND NOT job <> 'Mgr'
AND NOT job = 'Sales'
AND id <> 100
AND id >= 0
AND id <= 150
AND NOT dept = 50
ORDER BY id;
ANSWER
ID | JOB | DEPT |
---|---|---|
10 |
Mgr |
20 |
30 |
Mgr |
38 |
50 |
Mgr |
15 |
140 |
Mgr |
51 |
A variation of this predicate type can be used to compare sets of columns/values. Everything on both sides must equal in order for the expressions to match:
SELECT id, dept, job
FROM staff
WHERE (id,dept) = (30,28)
OR (id,years) = (90, 7)
OR (dept,job) = (38,'Mgr')
ORDER BY 1;
ANSWER
ID | DEPT | JOB |
---|---|---|
30 |
38 |
Mgr |
Below is the same query written the old fashioned way:
SELECT id, dept, job
FROM staff
WHERE (id = 30 AND dept = 28)
OR (id = 90 AND years = 7)
OR (dept = 38 AND job = 'Mgr')
ORDER BY 1;
ANSWER
ID | DEPT | JOB |
---|---|---|
30 |
38 |
Mgr |
2.6.2. Quantified Predicate
A quantified predicate compares one or more values with a collection of values.
SELECT id, job
FROM staff
WHERE job = ANY (SELECT job FROM staff)
AND id <= ALL (SELECT id FROM staff)
ORDER BY id;
ANSWER
ID | JOB |
---|---|
10 |
Mgr |
SELECT id, dept, job
FROM staff
WHERE (id,dept) = ANY
(SELECT dept, id
FROM staff
)
ORDER BY 1;
ANSWER
ID | DEPT | JOB |
---|---|---|
20 |
20 |
Sales |
See the sub-query chapter on Sub-Query for more data on this predicate type.
2.6.3. BETWEEN Predicate
The BETWEEN predicate compares a value within a range of values.
The between check always assumes that the first value in the expression is the low value and the second value is the high value. For example, BETWEEN 10 AND 12 may find data, but BETWEEN 12 AND 10 never will.
SELECT id, job
FROM staff
WHERE id BETWEEN 10 AND 30
AND id NOT BETWEEN 30 AND 10
AND NOT id NOT BETWEEN 10 AND 30
ORDER BY id;
ANSWER
ID | JOB |
---|---|
10 |
Mgr |
20 |
Sales |
30 |
Mgr |
2.6.4. EXISTS Predicate
An EXISTS predicate tests for the existence of matching rows.
SELECT id, job
FROM staff a
WHERE EXISTS
(SELECT *
FROM staff b
WHERE b.id = a.id
AND b.id < 50
)
ORDER BY id;
ANSWER
ID | JOB |
---|---|
10 |
Mgr |
20 |
Sales |
30 |
Mgr |
40 |
Sales |
See the sub-query chapter on Sub-Query for more data on this predicate type. |
2.6.5. IN Predicate
The IN predicate compares one or more values with a list of values.
The list of values being compared in the IN statement can either be a set of in-line expressions (e.g. ID in (10,20,30)), or a set rows returned from a sub-query. Either way, Db2 simply goes through the list until it finds a match.
SELECT id, job
FROM staff a
WHERE id IN (10,20,30)
AND id IN
(SELECT id
FROM staff
)
AND id NOT IN 99
ORDER BY id;
ANSWER
ID | JOB |
---|---|
10 |
Mgr |
20 |
Sales |
30 |
Mgr |
The IN statement can also be used to compare multiple fields against a set of rows returned from a sub-query. A match exists when all fields equal. This type of statement is especially useful when doing a search against a table with a multi-columns key.
Be careful when using the NOT IN expression against a sub-query result. If any one row in the sub-query returns null, the result will be no match. See Sub-Query for more details. |
SELECT empno, lastname
FROM employee
WHERE (empno, 'AD3113') IN
(SELECT empno, projno
FROM emp_act
WHERE emptime > 0.5
)
ORDER BY 1;
ANSWER
EMPNO | LASTNAME |
---|---|
000260 |
JOHNSON |
000270 |
PEREZ |
See the sub-query chapter on Sub-Query for more data on this statement type. |
2.6.6. LIKE Predicate
The LIKE predicate does partial checks on character strings.
The percent and underscore characters have special meanings. The first means skip a string of any length (including zero) and the second means skip one byte. For example:
-
LIKE 'AB_D%' Finds 'ABCD' and 'ABCDE', but not 'ABD', nor 'ABCCD'.
-
LIKE '_X' Finds 'XX' and 'DX', but not 'X', nor 'ABX', nor 'AXB'.
-
LIKE '%X' Finds 'AX', 'X', and 'AAX', but not 'XA'.
SELECT id
, name
FROM staff
WHERE name LIKE 'S%n'
OR name LIKE '_a_a%'
OR name LIKE '%r_%a'
ORDER BY id;
ANSWER
ID | NAME |
---|---|
130 |
Yamaguchi |
200 |
Scoutten |
The ESCAPE Phrase
The escape character in a LIKE statement enables one to check for percent signs and/or underscores in the search string. When used, it precedes the '%' or '_' in the search string indicating that it is the actual value and not the special character which is to be checked for. When processing the LIKE pattern, Db2 works thus: Any pair of escape characters is treated as the literal value (e.g. "+" means the string ""). Any single occurrence of an escape character followed by either a "%" or a "\_" means the literal "%" or "_" (e.g. "+%" means the string "%"). Any other "%" or "\_" is used as in a normal LIKE pattern.
LIKE STATEMENT TEXT | WHAT VALUES MATCH |
---|---|
LIKE 'AB%' |
Finds AB, any string |
LIKE 'AB%' ESCAPE '+' |
Finds AB, any string |
LIKE 'AB+%' ESCAPE '+' |
Finds AB% |
LIKE 'AB+' ESCAPE '' |
Finds AB+ |
LIKE 'AB+%%' ESCAPE '+' |
Finds AB%, any string |
LIKE 'AB+%' ESCAPE '' |
Finds AB+, any string |
LIKE 'AB++%' ESCAPE '' |
Finds AB+% |
LIKE 'AB++%%' ESCAPE '' |
Finds AB+%, any string |
LIKE 'AB+%%%' ESCAPE '' |
Finds AB%%, any string |
LIKE 'AB' ESCAPE '+' |
Finds AB++ |
LIKE 'AB%' ESCAPE '' |
Finds AB++% |
LIKE 'AB%' ESCAPE '+' |
Finds AB++, any string |
LIKE 'AB+%+%' ESCAPE '' |
Finds AB%+, any string |
Now for sample SQL:
SELECT id
FROM staff
WHERE id = 10
AND 'ABC' LIKE 'AB%'
AND 'A%C' LIKE 'A/%C' ESCAPE '/'
AND 'A_C' LIKE 'A\_C' ESCAPE '\'
AND 'A_$' LIKE 'A$_$$' ESCAPE '$';
ANSWER
ID |
---|
10 |
2.6.7. LIKE_COLUMN Function
The LIKE predicate cannot be used to compare one column against another. One may need to do this when joining structured to unstructured data. For example, imagine that one had a list of SQL statements (in a table) and a list of view names in a second table. One might want to scan the SQL text (using a LIKE predicate) to find those statements that referenced the views. The LOCATE function can be used to do a simple equality check. The LIKE predicate allows a more sophisticated search. The following code creates a scalar function and dependent procedure that can compare one column against another (by converting both column values into input variables). The function is just a stub. It passes the two input values down to the procedure where they are compared using a LIKE predicate. If there is a match, the function returns one, else zero.
These examples use an "!" as the stmt delimiter. |
--#SET DELIMITER !
CREATE PROCEDURE LIKE_COLUMN
( IN instr1 VARCHAR(4000)
, IN instr2 VARCHAR(4000)
, OUT outval SMALLINT)
LANGUAGE SQL
CONTAINS SQL
DETERMINISTIC
NO EXTERNAL ACTION
BEGIN
SET outval =
CASE
WHEN instr1 LIKE instr2
THEN 1
ELSE 0
END;
RETURN;
END!
CREATE FUNCTION LIKE_COLUMN
( instr1 VARCHAR(4000)
, instr2 VARCHAR(4000))
RETURNS SMALLINT
LANGUAGE SQL
CONTAINS SQL
DETERMINISTIC
NO EXTERNAL ACTION
BEGIN ATOMIC
DECLARE outval SMALLINT;
CALL LIKE_COLUMN(instr1, instr2, outval);
RETURN outval;
END!
Below is an example of the above function being used to compare to the contents of one column against another:
WITH temp1 (jtest) AS
(VALUES ('_gr%')
, ('S_le%')
)
SELECT
s.id
, s.name
, s.job
, t.jtest
FROM staff s
, temp1 t
WHERE LIKE_COLUMN(s.job , t.jtest) = 1
AND s.id < 70
ORDER BY s.id;
ANSWER
ID | NAME | JOB | JTEST |
---|---|---|---|
10 |
Sanders |
Mgr |
_gr% |
20 |
Pernal |
Sales |
S_le% |
30 |
Marenghi |
Mgr |
_gr% |
40 |
O’Brien |
Sales |
S_le% |
50 |
Hanes |
Mgr |
_gr% |
60 |
Quigley |
Sales |
S_le% |
2.6.8. NULL Predicate
The NULL predicate checks for null values. The result of this predicate cannot be unknown. If the value of the expression is null, the result is true. If the value of the expression is not null, the result is false.
SELECT id, comm
FROM staff
WHERE id < 100
AND id IS NOT NULL
AND comm IS NULL
AND NOT comm IS NOT NULL
ORDER BY id;
ANSWER
ID | COMM |
---|---|
10 |
- |
30 |
- |
50 |
- |
Use the COALESCE function to convert null values into something else. |
2.6.9. Special Character Usage
To refer to a special character in a predicate, or anywhere else in a SQL statement, use the "X" notation to substitute with the ASCII hex value. For example, the following query will list all names in the STAFF table that have an "a" followed by a semicolon:
SELECT id
, name
FROM staff
WHERE name LIKE '%a' || X'3B' || '%'
ORDER BY id;
2.6.10. Precedence Rules
Expressions within parentheses are done first, then prefix operators (e.g. -1), then multiplication and division, then addition and subtraction. When two operations of equal precedence are together (e.g. 1 * 5 / 4) they are done from left to right.
Example: 555 + -22 / (12 - 3) * 66 ^ ^ ^ ^ ^ 5th 2nd 3rd 1st 4th ANSWER: 423
Be aware that the result that you get depends very much on whether you are doing integer or decimal arithmetic. Below is the above done using integer numbers:
SELECT (12 - 3) AS int1
, -22 / (12 - 3) AS int2
, -22 / (12 - 3) * 66 AS int3
, 555 + -22 / (12 - 3) * 66 AS int4
FROM sysibm.sysdummy1;
ANSWER
INT1 | INT2 | INT3 | INT4 |
---|---|---|---|
9 |
-2 |
-132 |
423 |
Db2 truncates, not rounds, when doing integer arithmetic. |
Here is the same done using decimal numbers:
SELECT (12.0 - 3) AS dec1
, -22 / (12.0 - 3) AS dec2
, -22 / (12.0 - 3) * 66 AS dec3
, 555 + -22 / (12.0 - 3) * 66 AS dec4
FROM sysibm.sysdummy1;
ANSWER
DEC1 | DEC2 | DEC3 | DEC4 |
---|---|---|---|
9.0 |
-2.4 |
-161.3 |
393.6 |
2.6.11. AND/OR Precedence
AND operations are done before OR operations. This means that one side of an OR is fully processed before the other side is begun. To illustrate:
TABLE1
col1 | col2 |
---|---|
A |
AA |
B |
BB |
C |
CC |
SELECT *
FROM table1
WHERE col1 = 'C'
AND col1 >= 'A'
OR col2 >= 'AA'
ORDER BY col1;
ANSWER
COL1 | COL2 |
---|---|
A |
AA |
B |
BB |
C |
CC |
SELECT *
FROM table1
WHERE (col1 = 'C'
AND col1 >= 'A')
OR col2 >= 'AA'
ORDER BY col1;
ANSWER
COL1 | COL2 |
---|---|
A |
AA |
B |
BB |
C |
CC |
SELECT *
FROM table1
WHERE col1 = 'C'
AND (col1 >= 'A'
OR col2 >= 'AA')
ORDER BY col1;
ANSWER
COL1 | COL2 |
---|---|
C |
CC |
The omission of necessary parenthesis surrounding OR operators is a very common mistake. The result is usually the wrong answer. One symptom of this problem is that many more rows are returned (or updated) than anticipated. |
2.6.12. Processing Sequence
The various parts of a SQL statement are always executed in a specific sequence in order to avoid semantic ambiguity:
-
FROM clause.
-
JOIN ON clause.
-
WHERE clause.
-
GROUP BY and aggregate.
-
HAVING clause.
-
SELECT list.
-
ORDER BY clause.
-
FETCH FIRST.
Observe that ON predicates (e.g. in an outer join) are always processed before any WHERE predicates (in the same join) are applied. Ignoring this processing sequence can cause what looks like an outer join to run as an inner join (see ON and WHERE Usage). Likewise, a function that is referenced in the SELECT section of a query (e.g. row-number) is applied after the set of matching rows has been identified, but before the data has been ordered.
2.7. CAST Expression
The CAST expression is used to convert one data type to another. It is similar to the various field-type functions (e.g. CHAR, SMALLINT) except that it can also handle null values and host-variable parameter markers.
2.7.1. Input vs. Output Rules
-
EXPRESSION: If the input is neither null, nor a parameter marker, the input data-type is converted to the output data-type. Truncation and/or padding with blanks occur as required. An error is generated if the conversion is illegal.
-
NULL: If the input is null, the output is a null value of the specified type.
-
PARAMETER MAKER: This option is only used in programs and need not concern us here. See the Db2 SQL Reference for details.
Examples
Use the CAST expression to convert the SALARY field from decimal to integer:
SELECT id
, salary
, CAST(salary AS INTEGER) AS sal2
FROM staff
WHERE id < 30
ORDER BY id;
ANSWER
ID | SALARY | SAL2 |
---|---|---|
10 |
98357.50 |
98357 |
20 |
78171.25 |
78171 |
Use the CAST expression to truncate the JOB field. A warning message will be generated for the second line of output because non-blank truncation is being done.
SELECT id
, job
, CAST(job AS CHAR(3)) AS job2
FROM staff
WHERE id < 30
ORDER BY id;
ANSWER
ID | JOB | JOB2 |
---|---|---|
10 |
Mgr |
Mgr |
20 |
Sales |
Sal |
Use the CAST expression to make a derived field called JUNK of type SMALLINT where all of the values are null.
SELECT id
, CAST(NULL AS SMALLINT) AS junk
FROM staff
WHERE id < 30
ORDER BY id;
ANSWER
ID | JUNK |
---|---|
10 |
- |
20 |
- |
The CAST expression can also be used in a join, where the field types being matched differ:
SELECT stf.id
, emp.empno
FROM staff stf
LEFT OUTER JOIN employee emp
ON stf.id = CAST(emp.empno AS INTEGER)
AND emp.job = 'MANAGER'
WHERE stf.id < 60
ORDER BY stf.id;
ANSWER
ID | EMPNO |
---|---|
10 |
- |
20 |
000020 |
30 |
000030 |
40 |
- |
50 |
000050 |
Of course, the same join can be written using the raw function:
SELECT stf.id
, emp.empno
FROM staff stf
LEFT OUTER JOIN employee emp
ON stf.id = INTEGER(emp.empno)
AND emp.job = 'MANAGER'
WHERE stf.id < 60
ORDER BY stf.id;
ANSWER
ID | EMPNO |
---|---|
10 |
- |
20 |
000020 |
30 |
000030 |
40 |
- |
50 |
000050 |
2.8. VALUES Statement
The VALUES clause is used to define a set of rows and columns with explicit values. The clause is commonly used in temporary tables, but can also be used in view definitions. Once defined in a table or view, the output of the VALUES clause can be grouped by, joined to, and otherwise used as if it is an ordinary table - except that it can not be updated.
Each column defined is separated from the next using a comma. Multiple rows (which may also contain multiple columns) are separated from each other using parenthesis and a comma. When multiple rows are specified, all must share a common data type. Some examples follow:
VALUES 6 <== 1 row, 1 column VALUES(6) <== 1 row, 1 column VALUES 6, 7, 8 <== 1 row, 3 columns VALUES (6), (7), (8) <== 3 rows, 1 column VALUES (6,66), (7,77), (8,NULL) <== 3 rows, 2 column
2.8.1. Sample SQL
The VALUES clause can be used by itself as a very primitive substitute for the SELECT statement. One key difference is that output columns cannot be named. But they can be ordered, and fetched, and even named externally, as the next example illustrates:
PLAIN VALUES
VALUES
(1,2)
, (2,3)
, (3,4)
ORDER BY 2 DESC;
VALUES + WITH
WITH temp (c1,c2) AS
( VALUES (1,2)
, (2,3)
, (3,4)
)
SELECT *
FROM temp
ORDER BY 2 DESC;
VALUES + SELECT
SELECT *
FROM (VALUES (1,2)
, (2,3)
, (3,4)
) temp (c1,c2)
ORDER BY 2 DESC;
ANSWER
1 | 2 |
---|---|
3 |
4 |
2 |
3 |
1 |
2 |
The VALUES clause can encapsulate several independent queries:
VALUES
(
(SELECT COUNT(*) FROM employee)
, (SELECT AVG(salary) FROM staff)
, (SELECT MAX(deptno) FROM department)
)
FOR FETCH ONLY
WITH UR;
ANSWER
1 | 2 | 3 |
---|---|---|
42 |
67932.78 |
J22 |
The next statement defines a temporary table containing two columns and three rows. The first column defaults to type integer and the second to type varchar.
WITH temp1 (col1, col2) AS
(VALUES
(0, 'AA')
, (1, 'BB')
, (2, NULL)
)
SELECT *
FROM temp1;
ANSWER
COL1 | COL2 |
---|---|
0 |
AA |
1 |
BB |
2 |
- |
If we wish to explicitly control the output field types we can define them using the appropriate function. This trick does not work if even a single value in the target column is null.
WITH temp1 (col1, col2) AS
(VALUES
(DECIMAL(0 ,3, 1), 'AA')
, (DECIMAL(1 ,3, 1), 'BB')
, (DECIMAL(2 ,3, 1), NULL)
)
SELECT *
FROM temp1;
ANSWER
COL1 | COL2 |
---|---|
0.0 |
AA |
1.0 |
BB |
2.0 |
- |
If any one of the values in the column that we wish to explicitly define has a null value, we have to use the CAST expression to set the output field type:
WITH temp1 (col1,col2) AS
(VALUES
(0, CAST('AA' AS CHAR(1)))
, (1, CAST('BB' AS CHAR(1)))
, (2, CAST(NULL AS CHAR(1)))
)
SELECT *
FROM temp1;
ANSWER
COL1 | COL2 |
---|---|
0 |
A |
1 |
B |
2 |
- |
Alternatively, we can set the output type for all of the not-null rows in the column. Db2 will then use these rows as a guide for defining the whole column:
WITH temp1 (col1,col2) AS
(VALUES
(0, CHAR('AA', 1))
, (1, CHAR('BB', 1))
, (2, NULL)
)
SELECT *
FROM temp1;
ANSWER
COL1 | COL2 |
---|---|
0 |
A |
1 |
B |
2 |
- |
2.8.2. More Sample SQL
Temporary tables, or (permanent) views, defined using the VALUES expression can be used much like a Db2 table. They can be joined, unioned, and selected from. They can not, however, be updated, or have indexes defined on them. Temporary tables can not be used in a sub-query.
WITH temp1 (col1, col2,col3) AS
(VALUES
(0, 'AA', 0.00)
, (1, 'BB', 1.11)
, (2, 'CC', 2.22)
)
, temp2 (col1b, colx) AS
(SELECT col1
, col1 + col3
FROM temp1
)
SELECT *
FROM temp2;
ANSWER
COL1B | COLX |
---|---|
0 |
0.00 |
1 |
2.11 |
2 |
4.22 |
CREATE VIEW silly (c1, c2, c3)
AS VALUES
(11, 'AAA', SMALLINT(22))
, (12, 'BBB', SMALLINT(33))
, (13, 'CCC', NULL);
COMMIT;
WITH temp1 (col1) AS
(VALUES 0
UNION ALL
SELECT col1 + 1
FROM temp1
WHERE col1 + 1 < 100
)
SELECT *
FROM temp1;
ANSWER
COL1 |
---|
0 |
1 |
2 |
3 |
etc |
All of the above examples have matched a VALUES statement up with a prior WITH expression, so as to name the generated columns. One doesn’t have to use the latter, but if you don’t, you get a table with unnamed columns, which is pretty useless:
SELECT *
FROM
(VALUES
(123, 'ABC')
, (234, 'DEF')
) AS ttt
ORDER BY 1 DESC;
ANSWER
- | - |
---|---|
234 |
DEF |
123 |
ABC |
Combine Columns
The VALUES statement can be used inside a TABLE function to combine separate columns into one. In the following example, three columns in the STAFF table are combined into a single column – with one row per item:
SELECT id
, salary AS sal
, comm AS com
, combo
, typ
FROM staff
, TABLE( VALUES(salary , 'SAL')
, (comm , 'COM')
) AS tab(combo, typ)
WHERE id < 40
ORDER BY id
, typ;
ANSWER
ID | SAL | COM | COMBO | TYP |
---|---|---|---|---|
10 |
98357.50 |
- |
COM |
|
10 |
98357.50 |
- |
98357.50 |
SAL |
20 |
78171.25 |
612.45 |
612.45 |
COM |
20 |
78171.25 |
612.45 |
78171.25 |
SAL |
30 |
77506.75 |
- |
COM |
|
30 |
77506.75 |
- |
77506.75 |
SAL |
The above query works as follows:
-
The set of matching rows are obtained from the STAFF table.
-
For each matching row, the TABLE function creates two rows, the first with the salary value, and the second with the commission.
-
Each new row as gets a second literal column – indicating the data source.
-
Finally, the "AS" expression assigns a correlation name to the table output, and also defines two column names.
The TABLE function is resolved row-by-row, with the result being joined to the current row in the STAFF table. This explains why we do not get a Cartesian product, even though no join criteria are provided.
The keyword LATERAL can be used instead of TABLE in the above query. |
2.9. CASE Expression
CASE expressions enable one to do if-then-else type processing inside of SQL statements.
The sequence of the CASE conditions can affect the answer. The first WHEN check that matches is the one used. |
2.9.1. CASE Syntax Styles
There are two general flavors of the CASE expression. In the first kind, each WHEN statement does its own independent check. In the second kind, all of the WHEN conditions do similar "equal" checks against a common reference expression.
SELECT Lastname
, sex AS sx
, CASE sex
WHEN 'F' THEN 'FEMALE'
WHEN 'M' THEN 'MALE'
ELSE NULL
END AS sexx
FROM employee
WHERE lastname LIKE 'J%'
ORDER BY 1;
ANSWER
LASTNAME | SX | SEXX |
---|---|---|
JEFFERSON |
M |
MALE |
JOHN |
F |
FEMALE |
JOHNSON |
F |
FEMALE |
JONES |
M |
MALE |
SELECT lastname
, sex AS sx
, CASE WHEN sex = 'F' THEN 'FEMALE'
WHEN sex = 'M' THEN 'MALE'
ELSE NULL
END AS sexx
FROM employee
WHERE lastname LIKE 'J%'
ORDER BY 1;
ANSWER
LASTNAME | SX | SEXX |
---|---|---|
JEFFERSON |
M |
MALE |
JOHN |
F |
FEMALE |
JOHNSON |
F |
FEMALE |
JONES |
M |
MALE |
Notes & Restrictions
-
If more than one WHEN condition is true, the first one processed that matches is used.
-
If no WHEN matches, the value in the ELSE clause applies. If no WHEN matches and there is no ELSE clause, the result is NULL.
-
There must be at least one non-null result in a CASE statement. Failing that, one of the NULL results must be inside of a CAST expression.
-
All result values must be of the same type.
-
Functions that have an external action (e.g. RAND) can not be used in the expression part of a CASE statement.
2.9.2. Sample SQL
SELECT lastname
, midinit AS mi
, sex AS sx
, CASE WHEN midinit > SEX THEN midinit
ELSE sex
END AS mx
FROM employee
WHERE lastname LIKE 'J%'
ORDER BY 1;
ANSWER
LASTNAME | MI | SX | MX |
---|---|---|---|
JEFFERSON |
J |
M |
M |
JOHN |
K |
K |
K |
JOHNSON |
P |
F |
P |
JONES |
T |
M |
T |
SELECT COUNT(*) AS tot
, SUM(CASE sex WHEN 'F' THEN 1 ELSE 0 END) AS #f
, SUM(CASE sex WHEN 'M' THEN 1 ELSE 0 END) AS #m
FROM employee
WHERE lastname LIKE 'J%';
ANSWER
TOT | #F | #M |
---|---|---|
4 |
2 |
2 |
SELECT lastname
, LENGTH(RTRIM(lastname)) AS len
, SUBSTR(lastname , 1 ,
CASE WHEN LENGTH(RTRIM(lastname)) > 6 THEN 6
ELSE LENGTH(RTRIM(lastname))
END
) AS lastnm
FROM employee
WHERE lastname LIKE 'J%'
ORDER BY 1;
ANSWER
LASTNAME | LEN | LASTNM |
---|---|---|
JEFFERSON |
9 |
JEFFER |
JOHN |
4 |
JOHN |
JOHNSON |
7 |
JOHNSO |
JONES |
5 |
JONES |
The CASE expression can also be used in an UPDATE statement to do any one of several alternative updates to a particular field in a single pass of the data:
UPDATE staff
SET comm =
CASE dept
WHEN 15 THEN comm * 1.1
WHEN 20 THEN comm * 1.2
WHEN 38 THEN
CASE
WHEN years < 5 THEN comm * 1.3
WHEN years >= 5 THEN comm * 1.4
ELSE NULL
END
ELSE comm
END
WHERE comm IS NOT NULL
AND dept < 50;
In the next example a CASE expression is used to avoid a divide-by-zero error:
WITH temp1 (c1, c2) AS
(VALUES
(88, 9),(44, 3),(22, 0),(0, 1))
SELECT c1
, c2
, CASE c2
WHEN 0 THEN NULL
ELSE c1/c2
END AS c3
FROM temp1;
ANSWER
C1 | C2 | C3 |
---|---|---|
88 |
9 |
9 |
44 |
3 |
14 |
22 |
0 |
0 |
1 |
0 |
- |
At least one of the results in a CASE expression must be a value (i.e. not null). This is so that Db2 will know what output type to make the result.
2.9.3. Problematic CASE Statements
The case WHEN checks are always processed in the order that they are found. The first one that matches is the one used. This means that the answer returned by the query can be affected by the sequence on the WHEN checks. To illustrate this, the next statement uses the SEX field (which is always either "F" or "M") to create a new field called SXX. In this particular example, the SQL works as intended.
SELECT lastname
, sex
, CASE
WHEN sex >= 'M' THEN 'MAL'
WHEN sex >= 'F' THEN 'FEM'
END AS sxx
FROM employee
WHERE lastname LIKE 'J%'
ORDER BY 1;
ANSWER
LASTNAME | SX | SXX |
---|---|---|
JEFFERSON |
M |
MAL |
JOHN |
F |
FEM |
JOHNSON |
F |
FEM |
JONES |
M |
MAL |
In the example below all of the values in SXX field are "FEM". This is not the same as what happened above, yet the only difference is in the order of the CASE checks.
SELECT lastname
, sex
, CASE
WHEN sex >= 'F' THEN 'FEM'
WHEN sex >= 'M' THEN 'MAL'
END AS sxx
FROM employee
WHERE lastname LIKE 'J%'
ORDER BY 1;
ANSWER
LASTNAME | SX | SXX |
---|---|---|
JEFFERSON |
M |
FEM |
JOHN |
F |
FEM |
JOHNSON |
F |
FEM |
JONES |
M |
FEM |
In the prior statement the two WHEN checks overlap each other in terms of the values that they include. Because the first check includes all values that also match the second, the latter never gets invoked. Note that this problem can not occur when all of the WHEN expressions are equality checks.
2.9.4. CASE in Predicate
The result of a CASE expression can be referenced in a predicate:
SELECT id
, dept
, salary
, comm
FROM staff
WHERE CASE
WHEN comm < 70 THEN 'A'
WHEN name LIKE 'W%' THEN 'B'
WHEN salary < 11000 THEN 'C'
WHEN salary < 18500 AND dept <> 33 THEN 'D'
WHEN salary < 19000 THEN 'E'
END IN ('A','C','E')
ORDER BY id;
ANSWER
ID | DEPT | SALARY | COMM |
---|---|---|---|
130 |
42 |
10505.90 |
75.60 |
270 |
66 |
18555.50 |
|
330 |
66 |
10988.00 |
55.50 |
The above query is arguably more complex than it seems at first glance, because unlike in an ordinary query, the CASE checks are applied in the sequence they are defined. So a row will only match "B" if it has not already matched "A". In order to rewrite the above query using standard AND/OR predicates, we have to reproduce the CASE processing sequence. To this end, the three predicates in the next example that look for matching rows also apply any predicates that preceded them in the CASE statement:
SELECT id
, name
, salary
, comm
FROM staff
WHERE (comm < 70)
OR (salary < 11000 AND NOT name LIKE 'W%')
OR (salary < 19000 AND NOT (name LIKE 'W%'
OR (salary < 18500 AND dept <> 33)
)
)
ORDER BY id;
ANSWER
ID | DEPT | SALARY | COMM |
---|---|---|---|
130 |
42 |
10505.90 |
75.60 |
270 |
66 |
18555.50 |
|
330 |
66 |
10988.00 |
55.50 |
2.10. Miscellaneous SQL Statements
This section will briefly discuss several miscellaneous SQL statements. See the Db2 manuals for more details.
2.10.1. Cursor
A cursor is used in an application program to retrieve and process individual rows from a result set. To use a cursor, one has to do the following:
-
DECLARE the cursor. The declare statement has the SQL text that the cursor will run. If the cursor is declared "with hold", it will remain open after a commit, otherwise it will be closed at commit time.
The declare cursor statement is not actually executed when the program is run. It simply defines the query that will be run. |
-
OPEN the cursor. This is when the contents of on any host variables referenced by the cursor (in the predicate part of the query) are transferred to Db2.
-
FETCH rows from the cursor. One does as many fetches as is needed. If no row is found, the SQLCODE from the fetch will be 100.
-
CLOSE the cursor.
Syntax Notes
-
The cursor-name must be unique with the application program.
-
The WITH HOLD phrase indicates that the cursor will remain open if the unit of work ends with a commit. The cursor will be closed if a rollback occurs.
-
The WITH RETURN phrase is used when the cursor will generate the result set returned by a stored procedure. If the cursor is open when the stored procedure ends the result set will be return either to the calling procedure, or directly to the client application.
-
The FOR phrase can either refer to a select statement, the text for which will follow, or to the name of a statement has been previously prepared.
Usage notes
-
Cursors that require a sort (e.g. to order the output) will obtain the set of matching rows at open time, and then store them in an internal temporary table. Subsequent fetches will be from the temporary table.
-
Cursors that do not require a sort are resolved as each row is fetched from the data table.
-
All references to the current date, time, and timestamp will return the same value (i.e. as of when the cursor was opened) for all fetches in a given cursor invocation.
-
One does not have to close a cursor, but one cannot reopen it until it is closed. All open cursors are automatically closed when the thread terminates, or when a rollback occurs, or when a commit is done - except if the cursor is defined "with hold".
-
One can both update and delete "where current of cursor". In both cases, the row most recently fetched is updated or deleted. An update can only be used when the cursor being referenced is declared "for update of".
Examples
DECLARE fred CURSOR FOR
WITH RETURN TO CALLER
SELECT id
, name
, salary
, comm
FROM staff
WHERE id < :id-var
AND salary > 1000
ORDER BY id ASC
FETCH FIRST 10 ROWS ONLY
OPTIMIZE FOR 10 ROWS
FOR FETCH ONLY
WITH UR
DECLARE fred CURSOR WITH HOLD FOR
SELECT name
, salary
FROM staff
WHERE id > :id-var
FOR UPDDATE OF salary, comm
OPEN fred
DO UNTIL SQLCODE = 100
FETCH fred INTO :name-var
, :salary-var
IF salary < 1000 THEN
DO
UPDATE staff
SET salary = :new-salary-var
WHERE CURRENT OF fred
END-IF
END-DO
CLOSE fred
2.10.2. Select Into
A SELECT-INTO statement is used in an application program to retrieve a single row. If more than one row matches, an error is returned. The statement text is the same as any ordinary query, except that there is an INTO section (listing the output variables) between the SELECT list and the FROM section.
Example
SELECT name
, salary
INTO :name-var
, :salary-var
FROM staff
WHERE id = :id-var
2.10.3. Prepare
The PREPARE statement is used in an application program to dynamically prepare a SQL statement for subsequent execution.
Syntax Notes
-
The statement name names the statement. If the name is already in use, it is overridden.
-
The OUTPUT descriptor will contain information about the output parameter markers.
-
The DESCRIBE statement may be used instead of this clause.
-
The INPUT descriptor will contain information about the input parameter markers.
-
The FROM phrase points to the host-variable which contains the SQL statement text.
Prepared statement can be used by the following:
STATEMENT CAN BE USED BY | STATEMENT TYPE |
---|---|
DESCRIBE |
Any statement |
DECLARE CURSOR |
Must be SELECT |
EXECUTE |
Must not be SELECT |
2.10.4. Describe
The DESCRIBE statement is typically used in an application program to get information about a prepared statement. It can also be used in the Db2 command processor (but not in Db2BATCH) to get a description of a table, or the output columns in a SQL statement:
Below are some examples of using the statement:
DESCRIBE OUTPUT SELECT * FROM staff SQLDA Information sqldaid : SQLDA sqldabc: 896 sqln: 20 sqld: 7 Column Information sqltype sqllen sqlname.data sqlname.length ----------------- ----- ------------------------- ------------- 500 SMALLINT 2 ID 2 449 VARCHAR 9 NAME 4 501 SMALLINT 2 DEPT 4 453 CHARACTER 5 JOB 3 501 SMALLINT 2 YEARS 5 485 DECIMAL 7, 2 SALARY 6 485 DECIMAL 7, 2 COMM 4
DESCRIBE TABLE staff Column name Type schema Type name Length Scale Nulls ---------------------- ------- ----------- ------ ---- ---- ID SYSIBM SMALLINT 2 0 No NAME SYSIBM VARCHAR 9 0 Yes DEPT SYSIBM SMALLINT 2 0 Yes JOB SYSIBM CHARACTER 5 0 Yes YEARS SYSIBM SMALLINT 2 0 Yes SALARY SYSIBM DECIMAL 7 2 Yes COMM SYSIBM DECIMAL 7 2 Yes
2.10.5. Execute
The EXECUTE statement is used in an application program to execute a prepared statement. The statement can not be a select.
2.10.6. Execute Immediate
The EXECUTE IMMEDIATE statement is used in an application program to prepare and execute a statement. Only certain kinds of statement (e.g. insert, update, delete, commit) can be run this way. The statement can not be a select.
2.10.7. Set Variable
The SET statement is used in an application program to set one or more program variables to values that are returned by Db2.
Examples
SET :host-var = CURRENT TIMESTAMP
SET :host-v1 = CURRENT TIME
, :host-v2 = CURRENT DEGREE
, :host-v3 = NULL
The SET statement can also be used to get the result of a select, as long as the select only returns a single row:
SET
( :hv1
, :hv2
, :hv3) =
(SELECT id
, name
, salary
FROM staff
WHERE id = :id-var)
2.10.8. Set Db2 Control Structures
In addition to setting a host-variable, one can also set various Db2 control structures:
SET CONNECTION SET CURRENT DEFAULT TRANSFORM GROUP SET CURRENT DEGREE SET CURRENT EXPLAIN MODE SET CURRENT EXPLAIN SNAPSHOT SET CURRENT ISOLATION SET CURRENT LOCK TIMEOUT SET CURRENT MAINTAINED TABLE TYPES FOR OPTIMIZATION SET CURRENT PACKAGE PATH SET CURRENT PACKAGESET SET CURRENT QUERY OPTIMIZATION SET CURRENT REFRESH AGE SET ENCRYPTION PASSWORD SET EVENT MONITOR STATE SET INTEGRITY SET PASSTHRU SET PATH SET SCHEMA SET SERVER OPTION SET SESSION AUTHORIZATION
2.11. Unit-of-Work Processing
No changes that you make are deemed to be permanent until they are committed. This section briefly lists the commands one can use to commit or rollback changes.
2.11.1. Commit
The COMMIT statement is used to commit whatever changes have been made. Locks that were taken as a result of those changes are freed. If no commit is specified, an implicit one is done when the thread terminates.
2.11.2. Savepoint
The SAVEPOINT statement is used in an application program to set a savepoint within a unit of work. Subsequently, the program can be rolled back to the savepoint, as opposed to rolling back to the start of the unit of work.
Notes
-
If the savepoint name is the same as a savepoint that already exists within the same level, it overrides the prior savepoint - unless the latter was defined a being unique, in which case an error is returned.
-
The RETAIN CURSORS phrase tells Db2 to, if possible, keep open any active cursors.
-
The RETAIN LOCKS phrase tells Db2 to retain any locks that were obtained subsequent to the savepoint. In other words, the changes are rolled back, but the locks that came with those changes remain.
Savepoint Levels
Savepoints exist within a particular savepoint level, which can be nested within another level. A new level is created whenever one of the following occurs:
-
A new unit of work starts.
-
A procedure defined with NEW SAVEPOINT LEVEL is called.
-
An atomic compound SQL statement starts.
A savepoint level ends when the process that caused its creation finishes. When a savepoint level ends, all of the savepoints created within it are released. The following rules apply to savepoint usage:
-
Savepoints can only be referenced from within the savepoint level in which they were created. Active savepoints in prior levels are not accessible.
-
The uniqueness of savepoint names is only enforced within a given savepoint level. The same name can exist in multiple active savepoint levels.
Example
Savepoints are especially useful when one has multiple SQL statements that one wants to run or rollback as a whole, without affecting other statements in the same transaction. For example, imagine that one is transferring customer funds from one account to another. Two updates will be required - and if one should fail, both should fail:
INSERT INTO transaction_audit_table;
SAVEPOINT before_updates ON ROLLBACK RETAIN CURSORS;
UPDATE savings_account
SET balance = balance - 100
WHERE cust# = 1234;
IF SQLCODE <> 0 THEN
ROLLBACK TO SAVEPOINT before_updates;
ELSE
UPDATE checking_account
SET balance = balance + 100
WHERE cust# = 1234;
IF SQLCODE <> 0 THEN
ROLLBACK TO SAVEPOINT before_updates;
END
END
COMMIT;
In the above example, if either of the update statements fail, the transaction is rolled back to the predefined savepoint. And regardless of what happens, there will still be a row inserted into the transaction-audit table.
2.11.3. Savepoints vs. Commits
Savepoints differ from commits in the following respects:
-
One cannot rollback changes that have been committed.
-
Only a commit guarantees that the changes are stored in the database. If the program subsequently fails, the data will still be there.
-
Once a commit is done, other users can see the changed data. After a savepoint, the data is still not visible to other users.
2.11.4. Release Savepoint
The RELEASE SAVEPOINT statement will remove the named savepoint. Any savepoints nested within the named savepoint are also released. Once run, the application can no longer rollback to any of the released savepoints.
2.11.5. Rollback
The ROLLBACK statement is used to rollback any database changes since the beginning of the unit of work, or since the named savepoint - if one is specified.
3. Data Manipulation Language
The chapter has a very basic introduction to the DML (Data Manipulation Language) statements. See the Db2 manuals for more details.
3.1. Insert
The INSERT statement is used to insert rows into a table, view, or fullselect. To illustrate how it is used, this section will use a copy of the EMP_ACT sample table:
CREATE TABLE emp_act_copy
( empno CHARACTER (00006) NOT NULL
, projno CHARACTER (00006) NOT NULL
, actno SMALLINT NOT NULL
, emptime DECIMAL (05,02)
, emstdate DATE
, emendate DATE);
3.1.1. Target Objects
One can insert into a table, view, nickname, or SQL expression. For views and SQL expressions, the following rules apply:
-
The list of columns selected cannot include a column function (e.g. MIN).
-
There must be no GROUP BY or HAVING acting on the select list.
-
The list of columns selected must include all those needed to insert a new row.
-
The list of columns selected cannot include one defined from a constant, expression, or a scalar function.
-
Sub-queries, and other predicates, are fine, but are ignored (see Insert into a fullselect).
-
The query cannot be a join, nor (plain) union.
-
A "union all" is permitted - as long as the underlying tables on either side of the union have check constraints such that a row being inserted is valid for one, and only one, of the tables in the union.
All bets are off if the insert is going to a table that has an INSTEAD OF trigger defined.
Usage Notes
-
One has to provide a list of the columns (to be inserted) if the set of values provided does not equal the complete set of columns in the target table, or are not in the same order as the columns are defined in the target table.
-
The columns in the INCLUDE list are not inserted. They are intended to be referenced in a SELECT statement that encompasses the INSERT (see Select DML Changes).
-
The input data can either be explicitly defined using the VALUES statement, or retrieved from some other table using a fullselect.
3.1.2. Direct Insert
To insert a single row, where all of the columns are populated, one lists the input the values in the same order as the columns are defined in the table:
INSERT INTO emp_act_copy VALUES
('100000' ,'ABC' ,10 ,1.4 ,'2003-10-22', '2003-11-24');
To insert multiple rows in one statement, separate the row values using a comma:
INSERT INTO emp_act_copy VALUES
('200000' ,'ABC' ,10 ,1.4 ,'2003-10-22', '2003-11-24')
, ('200000' ,'DEF' ,10 ,1.4 ,'2003-10-22', '2003-11-24')
, ('200000' ,'IJK' ,10 ,1.4 ,'2003-10-22', '2003-11-24');
If multiple rows are inserted in one statement, and one of them violates a unique index check, all of the rows are rejected. |
The NULL and DEFAULT keywords can be used to assign these values to columns. One can also refer to special registers, like the current date and current time:
INSERT INTO emp_act_copy VALUES
('400000' ,'ABC' ,10 ,NULL ,DEFAULT, CURRENT DATE);
To leave some columns out of the insert statement, one has to explicitly list the columns that are included. When this is done, one can refer to the columns used in any order:
INSERT INTO emp_act_copy (projno, emendate, actno, empno)
VALUES
('ABC' ,DATE(CURRENT TIMESTAMP) ,123 ,'500000');
3.1.3. Insert into Full-Select
The next statement inserts a row into a fullselect that just happens to have a predicate which, if used in a subsequent query, would not find the row inserted. The predicate has no impact on the insert itself:
INSERT INTO
(SELECT *
FROM emp_act_copy
WHERE empno < '1'
)
VALUES
('510000' ,'ABC' ,10 ,1.4 ,'2003-10-22', '2003-11-24');
One can insert rows into a view (with predicates in the definition) that are outside the bounds of the predicates. To prevent this, define the view WITH CHECK OPTION. |
3.1.4. Insert from Select
One can insert a set of rows that is the result of a query using the following notation:
INSERT INTO emp_act_copy
SELECT LTRIM(CHAR(id + 600000))
, SUBSTR(UCASE(name),1,6)
, salary / 229
, 123
, CURRENT DATE
, '2003-11-11'
FROM staff
WHERE id < 50;
In the above example, the fractional part of the SALARY value is eliminated when the data is inserted into the ACTNO field, which only supports integer values. |
If only some columns are inserted using the query, they need to be explicitly listed:
INSERT INTO emp_act_copy (empno, actno, projno)
SELECT LTRIM(CHAR(id + 700000))
, MINUTE(CURRENT TIME)
, 'DEF'
FROM staff
WHERE id < 40;
One reason why tables should always have unique indexes is to stop stupid SQL statements like the following, which will double the number of rows in the table:
INSERT INTO emp_act_copy
SELECT *
FROM emp_act_copy;
The select statement using the insert can be as complex as one likes. In the next example, it contains the union of two queries:
INSERT INTO emp_act_copy (empno, actno, projno)
SELECT LTRIM(CHAR(id + 800000))
, 77
, 'XYZ'
FROM staff
WHERE id < 40
UNION
SELECT LTRIM(CHAR(id + 900000))
, SALARY / 100
, 'DEF'
FROM staff
WHERE id < 50;
The select can also refer to a common table expression. In the following example, six values are first generated, each in a separate row. These rows are then selected during the insert:
INSERT INTO emp_act_copy (empno, actno, projno, emptime)
WITH temp1 (col1) AS
( VALUES (1),(2),(3),(4),(5),(6))
SELECT LTRIM(CHAR(col1 + 910000))
, col1
, CHAR(col1)
, col1 / 2
FROM temp1;
The next example inserts multiple rows - all with an EMPNO beginning "92". Three rows are found in the STAFF table, and all three are inserted, even though the sub-query should get upset once the first row has been inserted. This doesn’t happen because all of the matching rows in the STAFF table are retrieved and placed in a work-file before the first insert is done:
INSERT INTO emp_act_copy (empno, actno, projno)
SELECT LTRIM(CHAR(id + 920000))
, id
, 'ABC'
FROM staff
WHERE id < 40
AND NOT EXISTS
( SELECT *
FROM emp_act_copy
WHERE empno LIKE '92%'
);
3.1.5. Insert into Multiple Tables
Below are two tables that hold data for US and international customers respectively:
CREATE TABLE us_customer
( cust# INTEGER NOT NULL
, cname CHAR(10) NOT NULL
, country CHAR(03) NOT NULL
, CHECK (country = 'USA')
, PRIMARY KEY (cust#));
CREATE TABLE intl_customer
( cust# INTEGER NOT NULL
, cname CHAR(10) NOT NULL
, country CHAR(03) NOT NULL
, CHECK (country <> 'USA')
, PRIMARY KEY (cust#));
One can use a single insert statement to insert into both of the above tables because they have mutually exclusive check constraints. This means that a new row will go to one table or the other, but not both, and not neither. To do so one must refer to the two tables using a "union all" phrase - either in a view, or a query, as is shown below:
INSERT INTO
(SELECT *
FROM us_customer
UNION ALL
SELECT *
FROM intl_customer
)
VALUES
(111,'Fred','USA')
,(222,'Dave','USA')
,(333,'Juan','MEX');
The above statement will insert two rows into the table for US customers, and one row into the table for international customers.
3.2. Update
The UPDATE statement is used to change one or more columns/rows in a table, view, or fullselect. Each column that is to be updated has to specified. Here is an example:
UPDATE emp_act_copy
SET emptime = NULL
, emendate = DEFAULT
, emstdate = CURRENT DATE + 2 DAYS
, actno = ACTNO / 2
, projno = 'ABC'
WHERE empno = '100000';
Usage Notes
-
One can update rows in a table, view, or fullselect. If the object is not a table, then it must be updateable (i.e. refer to a single table, not have any column functions, etc).
-
The correlation name is optional, and is only needed if there is an expression or predicate that references another table.
-
The columns in the INCLUDE list are not updated. They are intended to be referenced in a SELECT statement that encompasses the UPDATE (see Select DML Changes).
-
The SET statement lists the columns to be updated, and the new values they will get.
-
Predicates are optional. If none are provided, all rows in the table are updated.
-
Usually, all matching rows are updated. To update some fraction of the matching rows, use a fullselect (see Use Full-Select).
Update Examples
To update all rows in a table, leave off all predicates:
UPDATE emp_act_copy
SET actno = actno / 2;
In the next example, both target columns get the same values. This happens because the result for both columns is calculated before the first column is updated:
UPDATE emp_act_copy ac1
SET actno = actno * 2
, emptime = actno * 2
WHERE empno LIKE '910%';
One can also have an update refer to the output of a select statement - as long as the result of the select is a single row:
UPDATE emp_act_copy
SET actno =
( SELECT MAX(salary) / 10
FROM staff)
WHERE empno = '200000';
The following notation lets one update multiple columns using a single select:
UPDATE emp_act_copy
SET (actno, emstdate, projno) =
( SELECT MAX(salary) / 10
, CURRENT DATE + 2 DAYS
, MIN(CHAR(id))
FROM staff
WHERE id <> 33
)
WHERE empno LIKE '600%';
Multiple rows can be updated using multiple different values, as long as there is a one-to-one relationship between the result of the select, and each row to be updated.
UPDATE emp_act_copy ac1
SET (actno, emptime) =
(SELECT ac2.actno + 1
, ac1.emptime / 2
FROM emp_act_copy ac2
WHERE ac2.empno LIKE '60%'
AND SUBSTR(ac2.empno,3) = SUBSTR(ac1.empno,3)
)
WHERE EMPNO LIKE '700%';
3.2.1. Use Full-Select
An update statement can be run against a table, a view, or a fullselect. In the next example, the table is referred to directly:
UPDATE emp_act_copy
SET emptime = 10
WHERE empno = '000010'
AND projno = 'MA2100';
Below is a logically equivalent update that pushes the predicates up into a fullselect:
UPDATE
(SELECT *
FROM emp_act_copy
WHERE empno = '000010'
AND projno = 'MA2100'
) AS ea
SET emptime = 20;
3.2.2. Update First "n" Rows
An update normally changes all matching rows. To update only the first "n" matching rows do the following:
-
In a fullselect, retrieve the first "n" rows that you want to update.
-
Update all rows returned by the fullselect.
In the next example, the first five staff with the highest salary get a nice fat commission:
UPDATE
(SELECT *
FROM staff
ORDER BY salary DESC
FETCH FIRST 5 ROWS ONLY
) AS xxx
SET comm = 10000;
The above statement may update five random rows – if there is more than one row with the ordering value. To prevent this, ensure that the list of columns provided in the ORDER BY identify each matching row uniquely. |
3.2.3. Use OLAP Function
Imagine that we want to set the employee-time for a particular row in the EMP_ACT table to the MAX time for that employee. Below is one way to do it:
UPDATE emp_act_copy ea1
SET emptime =
(SELECT MAX(emptime)
FROM emp_act_copy ea2
WHERE ea1.empno = ea2.empno
)
WHERE empno = '000010'
AND projno = 'MA2100';
The same result can be achieved by calling an OLAP function in a fullselect, and then updating the result. In next example, the MAX employee-time per employee is calculated (for each row), and placed in a new column. This column is then used to do the final update:
UPDATE
(SELECT ea1.*
, MAX(emptime) OVER(PARTITION BY empno) AS maxtime
FROM emp_act_copy ea1
) AS ea2
SET emptime = maxtime
WHERE empno = '000010'
AND projno = 'MA2100';
The above statement has the advantage of only accessing the EMP_ACT table once. If there were many rows per employee, and no suitable index (i.e. on EMPNO and EMPTIME), it would be much faster than the prior update. The next update is similar to the prior - but it does the wrong update! In this case, the scope of the OLAP function is constrained by the predicate on PROJNO, so it no longer gets the MAX time for the employee:
UPDATE emp_act_copy
SET emptime = MAX(emptime) OVER(PARTITION BY empno)
WHERE empno = '000010'
AND projno = 'MA2100';
3.2.4. Correlated and Uncorrelated Update
In the next example, regardless of the number of rows updated, the ACTNO will always come out as one. This is because the sub-query that calculates the row-number is correlated, which means that it is resolved again for each row to be updated in the "AC1" table. At most, one "AC2" row will match, so the row-number must always equal one:
UPDATE emp_act_copy ac1
SET (actno, emptime)
= (SELECT ROW_NUMBER() OVER()
, ac1.emptime / 2
FROM emp_act_copy ac2
WHERE ac2.empno LIKE '60%'
AND SUBSTR(ac2.empno,3) = SUBSTR(ac1.empno,3)
)
WHERE EMPNO LIKE '800%';
In the next example, the ACTNO will be updated to be values 1, 2, 3, etc, in order that the rows are updated. In this example, the sub-query that calculates the row-number is uncorrelated, so all of the matching rows are first resolved, and then referred to in the next, correlated, step:
UPDATE emp_act_copy ac1
SET (actno, emptime) =
(SELECT c1
, c2
FROM (SELECT ROW_NUMBER() OVER() AS c1
, actno / 100 AS c2
, empno
FROM emp_act_copy
WHERE empno LIKE '60%'
) AS ac2
WHERE SUBSTR(ac2.empno,3) = SUBSTR(ac1.empno,3)
)
WHERE empno LIKE '900%';
3.3. Delete
The DELETE statement is used to remove rows from a table, view, or fullselect. The set of rows deleted depends on the scope of the predicates used. The following example would delete a single row from the EMP_ACT sample table:
DELETE
FROM emp_act_copy
WHERE empno = '000010'
AND projno = 'MA2100'
AND actno = 10;
Usage Notes
-
One can delete rows from a table, view, or fullselect. If the object is not a table, then it must be deletable (i.e. refer to a single table, not have any column functions, etc).
-
The correlation name is optional, and is only needed if there is a predicate that references another table.
-
The columns in the INCLUDE list are not updated. They are intended to be referenced in a SELECT statement that encompasses the DELETE (see Select DML Changes).
-
Predicates are optional. If none are provided, all rows are deleted.
-
Usually, all matching rows are deleted. To delete some fraction of the matching rows, use a fullselect (see Use Full-Select).
3.3.1. Basic Delete
This statement would delete all rows in the EMP_ACT table:
DELETE
FROM emp_act_copy;
This statement would delete all the matching rows in the EMP_ACT:
DELETE
FROM emp_act_copy
WHERE empno LIKE '00%'
AND projno >= 'MA';
3.3.2. Correlated Delete
The next example deletes all the rows in the STAFF table - except those that have the highest ID in their respective department:
DELETE
FROM staff s1
WHERE id NOT IN
(SELECT MAX(id)
FROM staff s2
WHERE s1.dept = s2.dept
);
Here is another way to write the same:
DELETE
FROM staff s1
WHERE EXISTS
(SELECT *
FROM staff s2
WHERE s2.dept = s1.dept
AND s2.id > s1.id
);
The next query is logically equivalent to the prior two, but it works quite differently. It uses a fullselect and an OLAP function to get, for each row, the ID, and also the highest ID value in the current department. All rows where these two values do not match are then deleted:
DELETE
FROM
(SELECT id
, MAX(id) OVER(PARTITION BY dept) AS max_id
FROM staff
) AS ss
WHERE id <> max_id;
3.3.3. Delete First "n" Rows
A delete removes all encompassing rows. Sometimes this is not desirable - usually because an unknown, and possibly undesirably large, number rows is deleted. One can write a delete that stops after "n" rows using the following logic:
-
In a fullselect, retrieve the first "n" rows that you want to delete.
-
Delete all rows returned by the fullselect.
In the following example, those five employees with the highest salary are deleted:
DELETE
FROM
(SELECT *
FROM staff
ORDER BY salary DESC
FETCH FIRST 5 ROWS ONLY
) AS xxx;
The above statement may delete five random rows – if there is more than one row with the same salary. To prevent this, ensure that the list of columns provided in the ORDER BY identify each matching row uniquely. |
3.4. Select DML Changes
A special kind of SELECT statement (see Select DML Changes) can encompass an INSERT, UPDATE, or DELETE statement to get the before or after image of whatever rows were changed (e.g. select the list of rows deleted). This kind of SELECT can be very useful when the DML statement is internally generating a value that one needs to know (e.g. an INSERT automatically creates a new invoice number using a sequence column), or get the set of rows that were removed by a delete. All of this can be done by coding a special kind of select.
3.4.1. Table Types
-
OLD: Returns the state of the data prior to the statement being run. This is allowed for an update and a delete.
-
NEW: Returns the state of the data prior to the application of any AFTER triggers or referential constraints. Data in the table will not equal what is returned if it is subsequently changed by AFTER triggers or R.I. This is allowed for an insert and an update.
-
FINAL: Returns the final state of the data. If there AFTER triggers that alter the target table after running of the statement, an error is returned. Ditto for a view that is defined with an INSTEAD OF trigger. This is allowed for an insert and an update.
Usage Notes
-
Only one of the above tables can be listed in the FROM statement.
-
The table listed in the FROM statement cannot be given a correlation name.
-
No other table can be listed (i.e. joined to) in the FROM statement. One can reference another table in the SELECT list (see example Join result to another table), or by using a sub-query in the predicate section of the statement.
-
The SELECT statement cannot be embedded in a nested-table expression.
-
The SELECT statement cannot be embedded in an insert statement.
-
To retrieve (generated) columns that are not in the target table, list them in an INCLUDE phrase in the DML statement. This technique can be used to, for example, assign row numbers to the set of rows entered during an insert.
-
Predicates (on the select) are optional. They have no impact on the underlying DML.
-
The INPUT SEQUENCE phrase can be used in the ORDER BY to retrieve the rows in the same sequence as they were inserted. It is not valid in an update or delete.
-
The usual scalar functions, OLAP functions, and column functions, plus the GROUP BY phrase, can be applied to the output - as desired.
3.4.2. Insert Examples
The example below selects from the final result of the insert:
SELECT empno
, projno AS prj
, actno AS act
FROM FINAL TABLE
(INSERT INTO emp_act_copy
VALUES
('200000', 'ABC', 10, 1, '2003-10-22', '2003-11-24')
, ('200000', 'DEF', 10, 1, '2003-10-22', '2003-11-24')
)
ORDER BY 1,2,3;
ANSWER
EMPNO | PRJ | ACT |
---|---|---|
200000 |
ABC |
10 |
200000 |
DEF |
10 |
One way to retrieve the new rows in the order that they were inserted is to include a column in the insert statement that is a sequence number:
SELECT empno
, projno AS prj
, actno AS act
, row#
AS r#
FROM FINAL TABLE
(INSERT INTO emp_act_copy (empno, projno, actno)
INCLUDE (row# SMALLINT)
VALUES
('300000','ZZZ',999,1)
,('300000','VVV',111,2)
)
ORDER BY row#;
ANSWER
EMPNO | PRJ | ACT | R# |
---|---|---|---|
300000 |
ZZZ |
999 |
1 |
300000 |
VVV |
111 |
2 |
The next example uses the INPUT SEQUENCE phrase to select the new rows in the order that they were inserted. Row numbers are assigned to the output:
SELECT empno
, projno AS prj
, actno AS act
, ROW_NUMBER() OVER() AS r#
FROM FINAL TABLE
(INSERT INTO emp_act_copy (empno, projno, actno)
VALUES
('400000','ZZZ',999)
, ('400000','VVV',111)
)
ORDER BY INPUT SEQUENCE;
ANSWER
EMPNO | PRJ | ACT | R# |
---|---|---|---|
400000 |
ZZZ |
999 |
1 |
400000 |
VVV |
111 |
2 |
The INPUT SEQUENCE phrase only works in an insert statement. It can be listed in the ORDER BY part of the statement, but not in the SELECT part. The only way to display the row number of each row inserted is to explicitly assign row numbers. |
In the next example, the only way to know for sure what the insert has done is to select from the result. This is because the select statement (in the insert) has the following unknowns:
-
We do not, or may not, know what ID values were selected, and thus inserted.
-
The project-number is derived from the current-time special register.
-
The action-number is generated using the RAND function.
Now for the insert:
SELECT empno
,projno AS prj
,actno AS act
,ROW_NUMBER() OVER() AS r#
FROM NEW TABLE
(INSERT INTO emp_act_copy (empno, actno, projno)
SELECT LTRIM(CHAR(id + 600000))
, SECOND(CURRENT TIME)
, CHAR(SMALLINT(RAND(1) * 1000))
FROM staff
WHERE id < 40
)
ORDER BY INPUT SEQUENCE;
ANSWER
EMPNO | PRJ | ACT | R# |
---|---|---|---|
600010 |
1 |
59 |
1 |
600020 |
563 |
59 |
2 |
600030 |
193 |
59 |
3 |
3.4.3. Update Examples
The statement below updates the matching rows by a fixed amount. The select statement gets the old EMPTIME values:
SELECT empno
, projno AS prj
, emptime AS etime
FROM OLD TABLE
(UPDATE emp_act_copy
SET emptime = emptime * 2
WHERE empno = '200000')
ORDER BY projno;
ANSWER
EMPNO | PRJ | ETIME |
---|---|---|
200000 |
ABC |
1.00 |
200000 |
DEF |
1.00 |
The next statement updates the matching EMPTIME values by random amount. To find out exactly what the update did, we need to get both the old and new values. The new values are obtained by selecting from the NEW table, while the old values are obtained by including a column in the update which is set to them, and then subsequently selected:
SELECT projno AS prj
, old_t AS old_t
, emptime AS new_t
FROM NEW TABLE
(UPDATE emp_act_copy
INCLUDE (old_t DECIMAL(5,2))
SET emptime = emptime * RAND(1) * 10
, old_t = emptime
WHERE empno = '200000'
)
ORDER BY 1;
ANSWER
PRJ | OLD_T | NEW_T |
---|---|---|
ABC |
2.00 |
0.02 |
DEF |
2.00 |
11.27 |
3.4.4. Delete Examples
The following example lists the rows that were deleted:
SELECT projno AS prj
, actno AS act
FROM OLD TABLE
(DELETE
FROM emp_act_copy
WHERE empno = '300000'
)
ORDER BY 1,2;
ANSWER
PRJ | ACT |
---|---|
VVV |
111 |
ZZZ |
999 |
The next query deletes a set of rows, and assigns row-numbers (to the included field) as the rows are deleted. The subsequent query selects every second row:
SELECT empno
, projno
, actno AS act
, row# AS r#
FROM OLD TABLE
(DELETE
FROM emp_act_copy
INCLUDE (row# SMALLINT)
SET row# = ROW_NUMBER() OVER()
WHERE empno = '000260'
)
WHERE row# = row# / 2 * 2
ORDER BY 1, 2, 3;
ANSWER
EMPNO | PROJNO | ACT | R# |
---|---|---|---|
000260 |
AD3113 |
70 |
2 |
000260 |
AD3113 |
80 |
4 |
000260 |
AD3113 |
180 |
6 |
Predicates (in the select result phrase) have no impact on the range of rows changed by the underlying DML, which is determined by its own predicates. |
One cannot join the table generated by a DML statement to another table, nor include it in a nested table expression, but one can join in the SELECT phrase. The following delete illustrates this concept by joining to the EMPLOYEE table:
SELECT empno
, (SELECT lastname
FROM
(SELECT empno AS e#
, lastname
FROM employee
) AS xxx
WHERE empno = e#)
, projno AS projno
, actno AS act
FROM OLD TABLE
(DELETE
FROM emp_act_copy
WHERE empno < '0001')
ORDER BY 1, 2, 3
FETCH FIRST 5 ROWS ONLY;
ANSWER
EMPNO | LASTNAME | PROJNO | ACT |
---|---|---|---|
000010 |
HAAS |
AD3100 |
10 |
000010 |
HAAS |
MA2100 |
10 |
000010 |
HAAS |
MA2110 |
10 |
000020 |
THOMPSON |
PL2100 |
30 |
000030 |
KWAN |
IF1000 |
10 |
Observe above that the EMPNO field in the EMPLOYEE table was be renamed (before doing the join) using a nested table expression. This was necessary because one cannot join on two fields that have the same name, without using correlation names. A correlation name cannot be used on the OLD TABLE, so we had to rename the field to get around this problem.
3.5. Merge
The MERGE statement is a combination insert and update, or delete, statement on steroids. It can be used to take the data from a source table, and combine it with the data in a target table.
The qualifying rows in the source and target tables are first matched by unique key value, and then evaluated:
-
If the source row is already in the target, the latter can be either updated or deleted.
-
If the source row is not in the target, it can be inserted.
-
If desired, a SQL error can also be generated.
Usage Rules
The following rules apply to the merge statement:
-
Correlation names are optional, but are required if the field names are not unique.
-
If the target of the merge is a fullselect or a view, it must allow updates, inserts, and deletes - as if it were an ordinary table.
-
At least one ON condition must be provided.
-
The ON conditions must uniquely identify the matching rows in the target table.
-
Each individual WHEN check can only invoke a single modification statement.
-
When a MATCHED search condition is true, the matching target row can be updated, deleted, or an error can be flagged.
-
When a NOT MATCHED search condition is true, the source row can be inserted into the target table, or an error can be flagged.
-
When more than one MATCHED or NOT MATCHED search condition is true, the first one that matches (for each type) is applied. This prevents any target row from being updated or deleted more than once. Ditto for any source row being inserted.
-
The ELSE IGNORE phrase specifies that no action be taken if no WHEN check evaluates to true.
-
If an error is encountered, all changes are rolled back.
-
Row-level triggers are activated for each row merged, depending on the type of modification that is made. So if the merge initiates an insert, all insert triggers are invoked. If the same input initiates an update, all update triggers are invoked.
-
Statement-level triggers are activated, even if no rows are processed. So if a merge does either an insert, or an update, both types of statement triggers are invoked, even if all of the input is inserted.
3.5.1. Sample Tables
To illustrate the merge statement, the following test tables were created and populated:
CREATE TABLE old_staff AS
(SELECT id
, job
, salary
FROM staff
)
WITH NO DATA;
CREATE TABLE new_staff AS
(SELECT id
, salary
FROM staff
)
WITH NO DATA;
INSERT INTO old_staff
SELECT id
, job
, salary
FROM staff
WHERE id BETWEEN 20 and 40;
OLD_STAFF
ID | JOB | SALARY |
---|---|---|
20 |
Sales |
78171.25 |
30 |
Mgr |
77506.75 |
40 |
Sales |
78006.00 |
INSERT INTO new_staff
SELECT id, salary / 10
FROM staff
WHERE id BETWEEN 30 and 50;
NEW_STAFF
ID | SALARY |
---|---|
30 |
7750.67 |
40 |
7800.60 |
50 |
8065.98 |
3.5.2. Update or Insert Merge
The next statement merges the new staff table into the old, using the following rules:
-
The two tables are matched on common ID columns.
-
If a row matches, the salary is updated with the new value.
-
If there is no matching row, a new row is inserted.
Now for the code:
MERGE INTO old_staff oo
USING new_staff nn
ON oo.id = nn.id
WHEN MATCHED THEN
UPDATE
SET oo.salary = nn.salary
WHEN NOT MATCHED THEN
INSERT
VALUES (nn.id,'?',nn.salary);
OLD_STAFF
ID | JOB | SALARY |
---|---|---|
20 |
Sales |
78171.25 |
30 |
Mgr |
77506.75 |
40 |
Sales |
78006.00 |
NEW_STAFF
ID | SALARY |
---|---|
30 |
7750.67 |
40 |
7800.60 |
50 |
8065.98 |
AFTER-MERGE
ID | JOB | SALARY |
---|---|---|
20 |
Sales |
78171.25 |
30 |
Mgr |
7750.67 |
40 |
Sales |
7800.60 |
50 |
? |
8065.98 |
3.5.3. Delete-only Merge
The next statement deletes all matching rows:
MERGE INTO old_staff oo
USING new_staff nn
ON oo.id = nn.id
WHEN MATCHED THEN
DELETE;
AFTER-MERGE
ID | JOB | SALARY |
---|---|---|
20 |
Sales |
78171.25 |
3.5.4. Complex Merge
The next statement has the following options:
-
The two tables are matched on common ID columns.
-
If a row matches, and the old salary is < 18,000, it is updated.
-
If a row matches, and the old salary is > 18,000, it is deleted.
-
If no row matches, and the new ID is > 10, the new row is inserted.
-
If no row matches, and (by implication) the new ID is ⇐ 10, an error is flagged.
Now for the code:
MERGE INTO old_staff oo
USING new_staff nn
ON oo.id = nn.id
WHEN MATCHED
AND oo.salary < 78000 THEN
UPDATE
SET oo.salary = nn.salary
WHEN MATCHED
AND oo.salary > 78000 THEN
DELETE
WHEN NOT MATCHED
AND nn.id > 10 THEN
INSERT
VALUES (nn.id,'?',nn.salary)
WHEN NOT MATCHED THEN
SIGNAL SQLSTATE '70001'
SET MESSAGE_TEXT = 'New ID <= 10';
OLD_STAFF
ID | JOB | SALARY |
---|---|---|
20 |
Sales |
78171.25 |
30 |
Mgr |
77506.75 |
40 |
Sales |
78006.00 |
NEW_STAFF
ID | SALARY |
---|---|
30 |
7750.67 |
40 |
7800.60 |
50 |
8065.98 |
AFTER-MERGE
ID | JOB | SALARY |
---|---|---|
20 |
Sales |
78171.25 |
30 |
Mgr |
7750.67 |
50 |
? |
8065.98 |
The merge statement is like the case statement (see CASE Expression) in that the sequence in which one writes the WHEN checks determines the processing logic. In the above example, if the last check was written before the prior, any non-match would generate an error.
3.5.5. Using a Fullselect
The following merge generates an input table (i.e. fullselect) that has a single row containing the MAX value of every field in the relevant table. This row is then inserted into the table:
MERGE INTO old_staff
USING
(SELECT MAX(id) + 1 AS max_id
, MAX(job) AS max_job
, MAX(salary) AS max_sal
FROM old_staff
) AS mx
ON id = max_id
WHEN NOT MATCHED THEN
INSERT
VALUES (max_id, max_job, max_sal);
AFTER-MERGE
ID | JOB | SALARY |
---|---|---|
20 |
Sales |
78171.25 |
30 |
Mgr |
77506.75 |
40 |
Sales |
78006.00 |
41 |
Sales |
78171.25 |
Here is the same thing written as a plain on insert:
INSERT INTO old_staff
SELECT MAX(id) + 1 AS max_id
, MAX(job) AS max_job
, MAX(salary) AS max_sal
FROM old_staff;
Use a fullselect on the target and/or source table to limit the set of rows that are processed during the merge:
MERGE INTO
(SELECT *
FROM old_staff
WHERE id < 40
) AS oo
USING
(SELECT *
FROM new_staff
WHERE id < 50
) AS nn
ON oo.id = nn.id
WHEN MATCHED THEN
DELETE
WHEN NOT MATCHED THEN
INSERT
VALUES (nn.id,'?',nn.salary);
OLD_STAFF
ID | JOB | SALARY |
---|---|---|
20 |
Sales |
78171.25 |
30 |
Mgr |
77506.75 |
40 |
Sales |
78006.00 |
NEW_STAFF
ID | SALARY |
---|---|
30 |
7750.67 |
40 |
7800.60 |
50 |
8065.98 |
AFTER-MERGE
ID | JOB | SALARY |
---|---|---|
20 |
Sales |
78171.25 |
40 |
? |
7800.60 |
40 |
Sales |
78006.00 |
Observe that the above merge did the following:
-
The target row with an ID of 30 was deleted - because it matched.
-
The target row with an ID of 40 was not deleted, because it was excluded in the fullselect that was done before the merge.
-
The source row with an ID of 40 was inserted, because it was not found in the target fullselect. This is why the base table now has two rows with an ID of 40.
-
The source row with an ID of 50 was not inserted, because it was excluded in the fullselect that was done before the merge.
3.5.6. Listing Columns
The next example explicitly lists the target fields in the insert statement - so they correspond to those listed in the following values phrase:
MERGE INTO old_staff oo
USING new_staff nn
ON oo.id = nn.id
WHEN MATCHED THEN
UPDATE
SET (salary, job) = (1234, '?')
WHEN NOT MATCHED THEN
INSERT (id,salary,job)
VALUES (id,5678.9,'?');
AFTER-MERGE
ID | JOB | SALARY |
---|---|---|
20 |
Sales |
78171.25 |
30 |
? |
1234.00 |
40 |
? |
1234.00 |
50 |
? |
5678.90 |
4. Compound SQL
A compound statement groups multiple independent SQL statements into a single executable. In addition, simple processing logic can be included to create what is, in effect, a very basic program. Such statements can be embedded in triggers, SQL functions, SQL methods, and dynamic SQL statements.
4.1. Introduction
A compound SQL statement begins with an (optional) name, followed by the variable declarations, followed by the procedural logic.
Below is a compound statement that reads a set of rows from the STAFF table and, for each row fetched, updates the COMM field to equal the current fetch number.
BEGIN ATOMIC
DECLARE cntr SMALLINT DEFAULT 1;
FOR V1 AS
SELECT id as idval
FROM staff
WHERE id < 80
ORDER BY id
DO
UPDATE staff
SET comm = cntr
WHERE id = idval;
SET cntr = cntr + 1;
END FOR;
END
4.1.1. Statement Delimiter
Db2 SQL does not come with a designated statement delimiter (terminator), though a semicolon is typically used. However, a semi-colon cannot be used in a compound SQL statement because that character is used to differentiate the sub-components of the statement. In Db2BATCH, one can run the SET DELIMITER command (intelligent comment) to use something other than a semi-colon. The following script illustrates this usage:
--#SET DELIMITER !
SELECT NAME FROM STAFF WHERE id = 10!
--#SET DELIMITER ;
SELECT NAME FROM STAFF WHERE id = 20;
In the Db2 command processor one can do the same thing using the terminator keyword:
--#SET TERMINATOR !
SELECT NAME FROM STAFF WHERE id = 10!
--#SET TERMINATOR ;
SELECT NAME FROM STAFF WHERE id = 20;
4.1.2. SQL Statement Usage
When used in dynamic SQL, the following control statements can be used:
-
FOR statement
-
GET DIAGNOSTICS statement
-
IF statement
-
ITERATE statement
-
LEAVE statement
-
SIGNAL statement
-
WHILE statement
There are many more PSM (persistent stored modules) control statements than what is shown above. But only these ones can be used in Compound SQL statements. |
The following SQL statements can be issued:
-
fullselect
-
UPDATE
-
DELETE
-
INSERT
-
SET variable statement
4.1.3. DECLARE Variables
All variables have to be declared at the start of the compound statement. Each variable must be given a name and a type and, optionally, a default (start) value.
BEGIN ATOMIC
DECLARE aaa, bbb, ccc SMALLINT DEFAULT 1;
DECLARE ddd CHAR(10) DEFAULT NULL;
DECLARE eee INTEGER;
SET eee = aaa + 1;
UPDATE staff
SET comm = aaa
, salary = bbb
, years = eee
WHERE id = 10;
END
FOR Statement
The FOR statement executes a group of statements for each row fetched from a query.
In the next example one row is fetched per year of service (for selected years) in the STAFF table. That row is then used to do two independent updates to the three matching rows:
BEGIN ATOMIC
FOR V1 AS
SELECT years AS yr_num
, max(id) AS max_id
FROM staff
WHERE years < 4
GROUP BY years
ORDER BY years
DO
UPDATE staff
SET salary = salary / 10
WHERE id = max_id;
UPDATE staff
set comm = 0
WHERE years = yr_num;
END FOR;
END
BEFORE
ID | SALARY | COMM |
---|---|---|
180 |
37009.75 |
236.50 |
230 |
83369.80 |
189.65 |
330 |
49988.00 |
55.50 |
AFTER
ID | SALARY | COMM |
---|---|---|
180 |
37009.75 |
0.00 |
230 |
8336.98 |
0.00 |
330 |
4998.80 |
0.00 |
4.1.4. GET DIAGNOSTICS Statement
The GET DIAGNOSTICS statement returns information about the most recently run SQL statement. One can either get the number of rows processed (i.e. inserted, updated, or deleted), or the return status (for an external procedure call).
In the example below, some number of rows are updated in the STAFF table. Then the count of rows updated is obtained, and used to update a row in the STAFF table:
BEGIN ATOMIC
DECLARE numrows INT DEFAULT 0;
UPDATE staff
SET salary = 12345
WHERE id < 100;
GET DIAGNOSTICS numrows = ROW_COUNT;
UPDATE staff
SET salary = numrows
WHERE id = 10;
END
4.1.5. IF Statement
The IF statement is used to do standard if-then-else branching logic. It always begins with an IF THEN statement and ends with and END IF statement.
The next example uses if-then-else logic to update one of three rows in the STAFF table, depending on the current timestamp value:
BEGIN ATOMIC
DECLARE cur INT;
SET cur = MICROSECOND(CURRENT TIMESTAMP);
IF cur > 600000 THEN
UPDATE staff
SET name = CHAR(cur)
WHERE id = 10;
ELSEIF cur > 300000 THEN
UPDATE staff
SET name = CHAR(cur)
WHERE id = 20;
ELSE
UPDATE staff
SET name = CHAR(cur)
WHERE id = 30;
END IF;
END
4.1.6. ITERATE Statement
The ITERATE statement causes the program to return to the beginning of the labeled loop.
In next example, the second update statement will never get performed because the ITERATE will always return the program to the start of the loop:
BEGIN ATOMIC
DECLARE cntr INT DEFAULT 0;
whileloop:
WHILE cntr < 60 DO
SET cntr = cntr + 10;
UPDATE staff
SET salary = cntr
WHERE id = cntr;
ITERATE whileloop;
UPDATE staff
SET comm = cntr + 1
WHERE id = cntr;
END WHILE;
END
4.1.7. LEAVE Statement
The LEAVE statement exits the labeled loop.
In the next example, the WHILE loop would continue forever, if left to its own devices. But after some random number of iterations, the LEAVE statement will exit the loop:
BEGIN ATOMIC
DECLARE cntr INT DEFAULT 1;
whileloop:
WHILE 1 <> 2 DO
SET cntr = cntr + 1;
IF RAND() > 0.99 THEN
LEAVE whileloop;
END IF;
END WHILE;
UPDATE staff
SET salary = cntr
WHERE id = 10;
END
4.1.8. SIGNAL Statement
The SIGNAL statement is used to issue an error or warning message.
The next example loops a random number of times, and then generates an error message using the SIGNAL command, saying how many loops were done:
BEGIN ATOMIC
DECLARE cntr INT DEFAULT 1;
DECLARE emsg CHAR(20);
whileloop:
WHILE RAND() < .99 DO
SET cntr = cntr + 1;
END WHILE;
SET emsg = '#loops: ' || CHAR(cntr);
SIGNAL SQLSTATE '75001' SET MESSAGE_TEXT = emsg;
END
4.1.9. WHILE Statement
The WHILE statement repeats one or more statements while some condition is true.
The next statement has two nested WHILE loops, and then updates the STAFF table:
BEGIN ATOMIC
DECLARE c1, C2 INT DEFAULT 1;
WHILE c1 < 10 DO
WHILE c2 < 20 DO
SET c2 = c2 + 1;
END WHILE;
SET c1 = c1 + 1;
END WHILE;
UPDATE staff
SET salary = c1
, comm = c2
WHERE id = 10;
END
4.1.10. Other Usage
The following Db2 objects also support the language elements described above:
-
Triggers
-
Stored procedures
-
User-defined functions
-
Embedded compound SQL (in programs).
Some of the above support many more language elements. For example stored procedures that are written in SQL also allow the following:
ASSOCIATE, CASE, GOTO, LOOP, REPEAT, RESIGNAL
, and RETURN
.
4.1.11. Test Query
To illustrate some of the above uses of compound SQL, we are going to get from the STAFF table a complete list of departments, and the number of rows in each department. Here is the basic query, with the related answer:
SELECT dept
, count(*) as #rows
FROM staff
GROUP BY dept
ORDER BY dept;
ANSWER
DEPT | #ROWS |
---|---|
10 |
4 |
15 |
4 |
20 |
4 |
38 |
5 |
42 |
4 |
51 |
5 |
66 |
5 |
84 |
4 |
If all you want to get is this list, the above query is the way to go. But we will get the same answer using various other methods, just to show how it can be done using compound SQL statements.
Trigger
One cannot get an answer using a trigger. All one can do is alter what happens during an insert, update, or delete. With this in mind, the following example does the following:
-
Sets the statement delimiter to an "!". Because we are using compound SQL inside the trigger definition, we cannot use the usual semi-colon.
-
Creates a new table (note: triggers are not allowed on temporary tables).
-
Creates an INSERT trigger on the new table. This trigger gets the number of rows per department in the STAFF table - for each row (department) inserted.
-
Inserts a list of departments into the new table.
-
Selects from the new table.
Now for the code:
--#SET DELIMITER !
CREATE TABLE dpt
( dept SMALLINT NOT NULL
, #names SMALLINT
, PRIMARY KEY(dept))!
COMMIT!
CREATE TRIGGER dpt1 AFTER INSERT ON dpt
REFERENCING NEW AS NNN
FOR EACH ROW
MODE Db2SQL
BEGIN ATOMIC
DECLARE namecnt SMALLINT DEFAULT 0;
FOR getnames AS
SELECT COUNT(*) AS #n
FROM staff
WHERE dept = nnn.dept
DO
SET namecnt = #n;
END FOR;
UPDATE dpt
SET #names = namecnt
WHERE dept = nnn.dept;
END!
COMMIT!
INSERT INTO dpt (dept)
SELECT DISTINCT dept
FROM staff!
COMMIT!
SELECT *
FROM dpt
ORDER BY dept!
This example uses an "!" as the stmt delimiter. |
ANSWER
DEPT | #NAMES |
---|---|
10 |
4 |
15 |
4 |
20 |
4 |
38 |
5 |
42 |
4 |
51 |
5 |
66 |
5 |
84 |
4 |
The above code was designed to be run in Db2BATCH. The "set delimiter" notation will probably not work in other environments. |
Scalar Function
One can do something very similar to the above that is almost as stupid using a user-defined scalar function, that calculates the number of rows in a given department. The basic logic will go as follows:
-
Set the statement delimiter to an "!".
-
Create the scalar function.
-
Run a query that first gets a list of distinct departments, then calls the function.
Here is the code:
--#SET DELIMITER !
CREATE FUNCTION dpt1 (deptin SMALLINT)
RETURNS SMALLINT
BEGIN ATOMIC
DECLARE num_names SMALLINT;
FOR getnames AS
SELECT COUNT(*) AS #n
FROM staff
WHERE dept = deptin
DO
SET num_names = #n;
END FOR;
RETURN num_names;
END!
COMMIT!
SELECT XXX.*
, dpt1(dept) as #names
FROM
(SELECT dept
FROM staff
GROUP BY dept
) AS XXX
ORDER BY dept!
This example uses an "!" as the stmt delimiter. |
ANSWER
DEPT | #NAMES |
---|---|
10 |
4 |
15 |
4 |
20 |
4 |
38 |
5 |
42 |
4 |
51 |
5 |
66 |
5 |
84 |
4 |
Because the query used in the above function will only ever return one row, we can greatly simplify the function definition thus:
--#SET DELIMITER !
CREATE FUNCTION dpt1 (deptin SMALLINT)
RETURNS SMALLINT
BEGIN ATOMIC
RETURN
SELECT COUNT(*)
FROM staff
WHERE dept = deptin;
END!
COMMIT!
This example uses an "!" as the stmt delimiter. |
SELECT XXX.* , dpt1(dept) as #names FROM (SELECT dept FROM staff GROUP BY dept ) AS XXX ORDER BY dept!
In the above example, the RETURN statement is directly finding the one matching row, and then returning it to the calling statement.
Table Function
Below is almost exactly the same logic, this time using a table function:
Table Function with compound SQL
--#SET DELIMITER !
CREATE FUNCTION dpt2 ()
RETURNS TABLE ( dept SMALLINT
, #names SMALLINT)
BEGIN ATOMIC
RETURN
SELECT dept
, count(*)
FROM staff
GROUP BY dept
ORDER BY dept;
END!
COMMIT!
--#SET DELIMITER ;
SELECT *
FROM TABLE(dpt2()) T1
ORDER BY dept;
This example uses an "!" as the stmt delimiter. |
ANSWER
DEPT | #NAMES |
---|---|
10 |
4 |
15 |
4 |
20 |
4 |
38 |
5 |
42 |
4 |
51 |
5 |
66 |
5 |
84 |
4 |
BEGIN ATOMIC
DECLARE cntr SMALLINT DEFAULT 1;
FOR V1 AS
SELECT id as idval
FROM staff
WHERE id < 80
ORDER BY id
DO
UPDATE staff SET comm = cntr
WHERE id = idval;
SET cntr = cntr + 1;
END FOR;
END
5. Column Functions or Aggregate Functions
By themselves, column functions work on the complete set of matching rows. One can use a GROUP BY expression to limit them to a subset of matching rows. One can also use them in an OLAP function to treat individual rows differently.
Be very careful when using either a column function, or the DISTINCT clause, in a join. If the join is incorrectly coded, and does some form of Cartesian Product, the column function may get rid of the all the extra (wrong) rows so that it becomes very hard to confirm that the answer is incorrect. Likewise, be appropriately suspicious whenever you see that someone (else) has used a DISTINCT statement in a join. Sometimes, users add the DISTINCT clause to get rid of duplicate rows that they didn’t anticipate and don’t understand. |
5.1. Column Functions, Definitions
5.1.1. ARRAY_AGG
Aggregate the set of elements in an array. If an ORDER BY is provided, it determines the order in which the elements are entered into the array.
5.1.2. AVG
Get the average (mean) value of a set of non-null rows. The columns(s) must be numeric. ALL is the default. If DISTINCT is used duplicate values are ignored. If no rows match, the null value is returned.
SELECT AVG(dept) AS a1
, AVG(ALL dept) AS a2
, AVG(DISTINCT dept) AS a3
, AVG(dept/10) AS a4
, AVG(dept)/10 AS a5
FROM staff
HAVING AVG(dept) > 40;
ANSWER
A1 | A2 | A3 | A4 | A5 |
---|---|---|---|---|
41 |
41 |
40 |
3 |
4 |
Observe columns A4 and A5 above. Column A4 has the average of each value divided by 10. Column A5 has the average of all of the values divided by 10. In the former case, precision has been lost due to rounding of the original integer value and the result is arguably incorrect. This problem also occurs when using the SUM function. |
Averaging Null and Not-Null Values
Some database designers have an intense and irrational dislike of using nullable fields. What they do instead is define all columns as not-null and then set the individual fields to zero (for numbers) or blank (for characters) when the value is unknown. This solution is reasonable in some situations, but it can cause the AVG function to give what is arguably the wrong answer. One solution to this problem is some form of counseling or group therapy to overcome the phobia. Alternatively, one can use the CASE expression to put null values back into the answer-set being processed by the AVG function. The following SQL statement uses a modified version of the IBM sample STAFF table (all null COMM values were changed to zero) to illustrate the technique:
UPDATE staff
SET comm = 0
WHERE comm IS NULL;
SELECT AVG(salary) AS salary
, AVG(comm) AS comm1
, AVG(CASE comm
WHEN 0 THEN NULL
ELSE comm
END) AS comm2
FROM staff;
ANSWER
SALARY | COMM1 | COMM2 |
---|---|---|
67932.78 |
351.98 |
513.31 |
UPDATE staff
SET comm = NULL
WHERE comm = 0;
The COMM2 field above is the correct average. The COMM1 field is incorrect because it has factored in the zero rows with really represent null values. Note that, in this particular query, one cannot use a WHERE to exclude the "zero" COMM rows because it would affect the average salary value.
Dealing with Null Output
The AVG, MIN, MAX, and SUM functions almost always return a null value when there are no matching rows (see No Rows Match for exceptions). One can use the COALESCE function, or a CASE expression, to convert the null value into a suitable substitute. Both methodologies are illustrated below:
SELECT COUNT(*) AS c1
, AVG(salary) AS a1
, COALESCE(AVG(salary),0) AS a2
, CASE
WHEN AVG(salary) IS NULL THEN 0
ELSE AVG(salary)
END AS a3
FROM staff
WHERE id < 10;
ANSWER
C1 | A1 | A2 | A3 |
---|---|---|---|
0 |
- |
0 |
0 |
AVG Date/Time Values
The AVG function only accepts numeric input. However, one can, with a bit of trickery, also use the AVG function on a date field. First convert the date to the number of days since the start of the Current Era, then get the average, then convert the result back to a date. Please be aware that, in many cases, the average of a date does not really make good business sense. Having said that, the following SQL gets the average birth-date of all employees:
SELECT AVG(DAYS(birthdate))
, DATE(AVG(DAYS(birthdate)))
FROM employee;
ANSWER
1 | 2 |
---|---|
721092 |
1975-04-14 |
Time data can be manipulated in a similar manner using the MIDNIGHT_SECONDS function. If one is really desperate (or silly), the average of a character field can also be obtained using the ASCII and CHR functions.
Average of an Average
In some cases, getting the average of an average gives an overflow error. Inasmuch as you shouldn’t do this anyway, it is no big deal:
SELECT AVG(avg_sal) AS avg_avg
FROM (SELECT dept
, AVG(salary) AS avg_sal
FROM staff
GROUP BY dept
) AS xxx;
ANSWER: Overflow error
5.1.3. CORRELATION
I don’t know a thing about statistics, so I haven’t a clue what this function does. But I do know that the SQL Reference is wrong - because it says the value returned will be between 0 and 1. I found that it is between -1 and +1 (see below). The output type is float.
WITH temp1(col1, col2, col3, col4) AS
(VALUES (0, 0, 0, RAND(1))
UNION ALL
SELECT col1 + 1
, col2 - 1
, RAND()
, RAND()
FROM temp1
WHERE col1 < = 1000
)
SELECT DEC(CORRELATION(col1, col1), 5, 3) AS cor11
, DEC(CORRELATION(col1, col2), 5, 3) AS cor12
, DEC(CORRELATION(col2, col3), 5, 3) AS cor23
, DEC(CORRELATION(col3, col4), 5, 3) AS cor34
FROM temp1;
ANSWER
COR11 | COR12 | COR23 | COR34 |
---|---|---|---|
1.000 |
-1.000 |
-0.017 |
-0.005 |
5.1.4. COUNT
Get the number of values in a set of rows. The result is an integer. The value returned depends upon the options used:
-
COUNT(*) gets a count of matching rows.
-
COUNT(expression) gets a count of rows with a non-null expression value.
-
COUNT(ALL expression) is the same as the COUNT(expression) statement.
-
COUNT(DISTINCT expression) gets a count of distinct non-null expression values.
SELECT COUNT(*) AS c1
, COUNT(INT(comm/10)) AS c2
, COUNT(ALL INT(comm/10)) AS c3
, COUNT(DISTINCT INT(comm/10)) AS c4
, COUNT(DISTINCT INT(comm)) AS c5
, COUNT(DISTINCT INT(comm))/10 AS c6
FROM staff;
ANSWER
C1 | C2 | C3 | C4 | C5 | C6 |
---|---|---|---|---|---|
35 |
24 |
24 |
19 |
24 |
2 |
There are 35 rows in the STAFF table (see C1 above), but only 24 of them have non-null commission values (see C2 above). If no rows match, the COUNT returns zero - except when the SQL statement also contains a GROUP BY. In this latter case, the result is no row.
SELECT 'NO GP-BY' AS c1
, COUNT(*) AS c2
FROM staff
WHERE id = -1
UNION
SELECT 'GROUP-BY' AS c1
, COUNT(*) AS c2
FROM staff
WHERE id = -1
GROUP BY dept;
ANSWER
C1 | C2 |
---|---|
NO GP-BY |
0 |
5.1.5. COUNT_BIG
Get the number of rows or distinct values in a set of rows. Use this function if the result is too large for the COUNT function. The result is of type decimal 31. If the DISTINCT option is used both duplicate and null values are eliminated. If no rows match, the result is zero.
SELECT COUNT_BIG(*) AS c1
, COUNT_BIG(dept) AS c2
, COUNT_BIG(DISTINCT dept) AS c3
, COUNT_BIG(DISTINCT dept/10) AS c4
, COUNT_BIG(DISTINCT dept)/10 AS c5
FROM STAFF;
ANSWER
C1 | C2 | C3 | C4 | C5 |
---|---|---|---|---|
35. |
35. |
8. |
7. |
0. |
5.1.6. COVARIANCE
Returns the covariance of a set of number pairs. The output type is float.
WITH temp1(c1, c2, c3, c4) AS
(VALUES (0 , 0 , 0 , RAND(1))
UNION ALL
SELECT c1 + 1
, c2 - 1
, RAND()
, RAND()
FROM temp1
WHERE c1 <= 1000
)
SELECT DEC(COVARIANCE(c1,c1),6,0) AS cov11
, DEC(COVARIANCE(c1,c2),6,0) AS cov12
, DEC(COVARIANCE(c2,c3),6,4) AS cov23
, DEC(COVARIANCE(c3,c4),6,4) AS cov34
FROM temp1;
ANSWER
COV11 | COV12 | COV23 | COV34 |
---|---|---|---|
83666. |
-83666. |
-1.4689 |
-0.0004 |
5.1.7. COVARIANCE_SAMP
Returns the sample covariance of a set of number pairs.
5.1.8. CUME_DIST
Returns the cumulative distribution of a row that is hypothetically inserted into a group of rows.
5.1.9. GROUPING
The GROUPING function is used in CUBE, ROLLUP, and GROUPING SETS statements to identify what rows come from which particular GROUPING SET. A value of 1 indicates that the corresponding data field is null because the row is from of a GROUPING SET that does not involve this row. Otherwise, the value is zero.
SELECT dept
, AVG(salary) AS salary
, GROUPING(dept) AS df
FROM staff
GROUP BY ROLLUP(dept)
ORDER BY dept;
ANSWER
DEPT | SALARY | DF |
---|---|---|
10 |
83365.86 |
0 |
15 |
60482.33 |
0 |
20 |
63571.52 |
0 |
38 |
60457.11 |
0 |
42 |
49592.26 |
0 |
51 |
83218.16 |
0 |
66 |
73015.24 |
0 |
84 |
66536.75 |
0 |
- |
67932.78 |
1 |
See the section titled "Group By and Having" for more information on this function. |
5.1.10. LISTAGG
Aggregates a set of string elements into one string by concatenating the strings. Optionally, a separator string can be provided which is inserted between contiguous input strings.
5.1.11. MAX
Get the maximum value of a set of rows. The use of the DISTINCT option has no affect. If no rows match, the null value is returned.
SELECT MAX(dept)
, MAX(ALL dept)
, MAX(DISTINCT dept)
, MAX(DISTINCT dept/10)
FROM staff;
ANSWER
1 | 2 | 3 | 4 |
---|---|---|---|
84 |
84 |
84 |
8 |
MAX and MIN usage with Scalar Functions
Several Db2 scalar functions convert a value from one format to another, for example from numeric to character. The function output format will not always shave the same ordering sequence as the input. This difference can affect MIN, MAX, and ORDER BY processing.
SELECT MAX(hiredate)
, CHAR(MAX(hiredate),USA)
, MAX(CHAR(hiredate,USA))
FROM employee;
ANSWER
1 | 2 | 3 |
---|---|---|
2006-12-15 |
12/15/2006 |
12/15/2006 |
In the above the SQL, the second field gets the MAX before doing the conversion to character whereas the third field works the other way round. In most cases, the later is wrong. In the next example, the MAX function is used on a small integer value that has been converted to character. If the CHAR function is used for the conversion, the output is left justified, which results in an incorrect answer. The DIGITS output is correct (in this example).
SELECT MAX(id) AS id
, MAX(CHAR(id)) AS chr
, MAX(DIGITS(id)) AS dig
FROM staff;
ANSWER
ID | CHR | DIG |
---|---|---|
350 |
90 |
00350 |
The DIGITS function can also give the wrong answer - if the input data is part positive and part negative. This is because this function does not put a sign indicator in the output.
SELECT MAX(id - 250) AS id
, MAX(CHAR(id - 250)) AS chr
, MAX(DIGITS(id - 250)) AS dig
FROM staff;
ANSWER
D | CHR | DIG |
---|---|---|
100 |
90 |
0000000240 |
Be careful when using a column function on a field that has been converted from number to character, or from date/time to character. The result may not be what you intended. |
5.1.12. MEDIAN
Returns the median value in a set of values.
5.1.13. MIN
Get the minimum value of a set of rows. The use of the DISTINCT option has no affect. If no rows match, the null value is returned.
SELECT MIN(dept)
, MIN(ALL dept)
, MIN(DISTINCT dept)
, MIN(DISTINCT dept/10)
FROM staff;
ANSWER
1 | 2 | 3 | 4 |
---|---|---|---|
10 |
10 |
10 |
1 |
5.1.14. PERCENTILE_CONT
Returns the value that corresponds to the specified percentile given a sort specification by using a continuous distribution model.
5.1.15. PERCENTILE_DISC
Returns the value that corresponds to the specified percentile given a sort specification by using a discrete distribution model.
5.1.16. PERCENT_RANK
Returns the relative percentile rank of a row that is hypothetically inserted into a group of rows.
5.1.17. Regression Functions
The various regression functions support the fitting of an ordinary-least-squares regression line of the form y = a * x + b to a set of number pairs.
REGR_AVGX returns a quantity that than can be used to compute the validity of the regression model. The output is of type float.
REGR_AVGY (see REGR_AVGX).
REGR_COUNT returns the number of matching non-null pairs. The output is integer.
REGR_INTERCEPT returns the y-intercept of the regression line.
REGR_R2 returns the coefficient of determination for the regression.
REGR_SLOPE returns the slope of the line.
REGR_SXX (see REGR_AVGX).
REGR_SXY (see REGR_AVGX).
REGR_SYY (see REGR_AVGX).
See the IBM SQL Reference for more details on the above functions.
SELECT DEC(REGR_SLOPE(bonus,salary),7,5) AS r_slope
, DEC(REGR_INTERCEPT(bonus,salary),7,3) AS r_icpt
, INT(REGR_COUNT(bonus,salary)) AS r_count
, INT(REGR_AVGX(bonus,salary)) AS r_avgx
, INT(REGR_AVGY(bonus,salary)) AS r_avgy
, DEC(REGR_SXX(bonus,salary),10) AS r_sxx
, INT(REGR_SXY(bonus,salary)) AS r_sxy
, INT(REGR_SYY(bonus,salary)) AS r_syy
FROM employee
WHERE workdept = 'A00';
ANSWERS
r_slope | r_icpt | r_count | r_avgx | r_avgy | r_sxx | r_sxy | r_syy |
---|---|---|---|---|---|---|---|
0.00247 |
644.862 |
5 |
70850 |
820 |
8784575000 |
21715000 |
168000 |
5.1.18. STDDEV
Get the standard deviation of a set of numeric values. If DISTINCT is used, duplicate values are ignored. If no rows match, the result is null. The output format is double.
SELECT AVG(dept) AS a1
,STDDEV(dept) AS s1
,DEC(STDDEV(dept),3,1) AS s2
,DEC(STDDEV(ALL dept),3,1) AS s3
,DEC(STDDEV(DISTINCT dept),3,1) AS s4
FROM staff;
ANSWER
A1 | S1 | S2 | S3 | S4 |
---|---|---|---|---|
41 |
+2.3522355E+1 |
23.5 |
23.5 |
24.1 |
5.1.19. STDDEV_SAMP
The STDDEV_SAMP function returns the sample standard deviation (division by [n-1]) of a set of numbers.
5.1.20. SUM
Get the sum of a set of numeric values. If DISTINCT is used, duplicate values are ignored. Null values are always ignored. If no rows match, the result is null.
SELECT SUM(dept) AS s1
, SUM(ALL dept) AS s2
, SUM(DISTINCT dept) AS s3
, SUM(dept/10) AS s4
, SUM(dept)/10 AS s5
FROM staff;
ANSWER
S1 | S2 | S3 | S4 | S5 |
---|---|---|---|---|
1459 |
1459 |
326 |
134 |
145 |
The answers S4 and S5 above are different. This is because the division is done before the SUM in column S4, and after in column S5. In the former case, precision has been lost due to rounding of the original integer value and the result is arguably incorrect. When in doubt, use the S5 notation. |
5.1.21. VAR or VARIANCE
Get the variance of a set of numeric values. If DISTINCT is used, duplicate values are ignored. If no rows match, the result is null. The output format is double.
SELECT AVG(dept) AS a1
, VARIANCE(dept) AS s1
, DEC(VARIANCE(dept),4,1) AS s2
, DEC(VARIANCE(ALL dept),4,1) AS s3
, DEC(VARIANCE(DISTINCT dept),4,1) AS s4
FROM staff;
ANSWER
A1 | V1 | V2 | V3 | V4 |
---|---|---|---|---|
41 |
+5.533012244E+2 |
553 |
553 |
582 |
5.1.22. VARIANCE_SAMP
Returns the sample variance (division by [n-1]) of a set of numbers.
5.1.23. XMLAGG
Returns an XML sequence containing an item for each non-null value in a set of XML values.
5.1.24. XMLGROUP
The XMLGROUP function returns an XML value with a single XQuery document node containing one top-level element node. This is an aggregate expression that will return a single-rooted XML document from a group of rows where each row is mapped to a row subelement.
6. OLAP Functions
=== Introduction
Online Analytical Processing (OLAP) functions enable one to sequence and rank query rows. They are especially useful when the calling program is very simple.
6.1. The Bad Old Days
To really appreciate the value of the OLAP functions, one should try to do some seemingly trivial task without them. To illustrate this point, consider the following query:
SELECT s1.job
, s1.id
, s1.salary
FROM staff s1
WHERE s1.name LIKE '%s%'
AND s1.id < 90
ORDER BY s1.job
, s1.id;
ANSWER
JOB | ID | SALARY |
---|---|---|
Clerk |
80 |
43504.60 |
Mgr |
10 |
98357.50 |
Mgr |
50 |
80659.80 |
Let us now add two fields to this query:
-
A running sum of the salaries selected.
-
A running count of the rows retrieved.
Adding these fields is easy - when using OLAP functions:
SELECT s1.job
, s1.id
, s1.salary
, SUM(salary) OVER(ORDER BY job, id) AS sumsal
, ROW_NUMBER() OVER(ORDER BY job, id) AS r
FROM staff s1
WHERE s1.name LIKE '%s%'
AND s1.id < 90
ORDER BY s1.job
, s1.id;
ANSWER
JOB | ID | SALARY | SUMSAL | R |
---|---|---|---|---|
Clerk |
80 |
43504.60 |
43504.60 |
1 |
Mgr |
10 |
98357.50 |
141862.10 |
2 |
Mgr |
50 |
80659.80 |
222521.90 |
3 |
6.1.1. Write Query without OLAP Functions
If one does not have OLAP functions, one can still get the required answer, but the code is quite tricky. The problem is that this seemingly simple query contains two nasty tricks:
-
Not all of the rows in the table are selected.
-
The output is ordered on two fields, the first of which is not unique.
Below is the arguably the most elegant way to write the above query without using OLAP functions. There query has the following basic characteristics:
-
Define a common-table-expression with the set of matching rows.
-
Query from this common-table-expression.
-
For each row fetched, do two nested select statements. The first gets a running sum of the salaries, and the second gets a running count of the rows retrieved.
Now for the code:
WITH temp1 AS
(SELECT *
FROM staff s1
WHERE s1.name LIKE '%s%'
AND s1.id < 90
)
SELECT s1.job
, s1.id
, s1.salary
, (SELECT SUM(s2.salary)
FROM temp1 s2
WHERE (s2.job < s1.job)
OR (s2.job = s1.job AND s2.id <= s1.id)
) AS sumsal
, (SELECT COUNT(*)
FROM temp1 s2
WHERE (s2.job < s1.job)
OR (s2.job = s1.job AND s2.id <= s1.id)
) AS r
FROM temp1 s1
ORDER BY s1.job
, s1.id;
ANSWER
JOB | ID | SALARY | SUMSAL | R |
---|---|---|---|---|
Clerk |
80 |
43504.60 |
43504.60 |
1 |
Mgr |
10 |
98357.50 |
141862.10 |
2 |
Mgr |
50 |
80659.80 |
222521.90 |
3 |
6.1.2. Concepts
Below are some of the basic characteristics of OLAP functions:
-
OLAP functions are column functions that work (only) on the set of rows that match the predicates of the query.
-
Unlike ordinarily column functions, (e.g. SUM), OLAP functions do not require that the whole answer-set be summarized. In fact, OLAP functions never change the number of rows returned by the query.
-
OLAP functions work on sets of values, but the result is always a single value.
-
OLAP functions are used to return individual rows from a table (e.g. about each staff member), along with related summary data (e.g. average salary in department).
-
OLAP functions are often applied on some set (i.e. of a moving window) of rows that is defined relative to the current row being processed. These matching rows are classified using an ORDER BY as being one of three types:
-
Preceding rows are those that have already been processed.
-
Following rows are those that have yet to be processed.
-
Current row is the one currently being processed.
-
-
The ORDER BY used in an OLAP function is not related to the ORDER BY expression used to define the output order of the final answer set.
-
OLAP functions can summarize the matching rows by a subset (i.e. partition). When this is done, it is similar to the use of a GROUP BY in an ordinary column function.
Below is a query that illustrates these concepts. It gets some individual rows from the STAFF table, while using an OLAP function to calculate a running average salary within the DEPT of the current row. The average is calculated using one preceding row (in ID order), the current row, and two following rows:
SELECT dept
, id
, salary
, DEC(AVG(salary) OVER(PARTITION BY dept
ORDER BY id
ROWS BETWEEN 1 PRECEDING
AND 2 FOLLOWING)
, 8, 2) AS avb_sal
FROM staff
WHERE dept IN (20, 38)
ORDER BY dept
, id;
ANSWER
DEPT | ID | SALARY | AVG_SAL |
---|---|---|---|
20 |
10 |
98357.50 |
73344.45 |
20 |
20 |
78171.25 |
63571.52 |
20 |
80 |
43504.60 |
51976.20 |
20 |
190 |
34252.75 |
38878.67 |
38 |
30 |
77506.75 |
74107.01 |
38 |
40 |
78006.00 |
66318.95 |
38 |
60 |
66808.30 |
56194.70 |
38 |
120 |
42954.75 |
48924.26 |
38 |
180 |
37009.75 |
39982.25 |
TABLE
DEPT | ID | SALARY | Matching? | Partition | Relation to row [38 60 66808.30 56194.70] |
---|---|---|---|---|---|
15 |
110 |
42508.20 |
N |
- |
- |
15 |
170 |
42258.50 |
N |
- |
- |
20 |
10 |
98357.50 |
Y |
1 |
- |
20 |
20 |
78171.25 |
Y |
1 |
- |
20 |
80 |
43504.60 |
Y |
1 |
- |
20 |
190 |
34252.75 |
Y |
1 |
- |
38 |
30 |
77506.75 |
Y |
2 |
Preceding row |
38 |
40 |
78006.00 |
Y |
2 |
Preceding row |
38 |
60 |
66808.30 |
Y |
2 |
Current row |
38 |
120 |
42954.75 |
Y |
2 |
Following row |
38 |
180 |
37009.75 |
Y |
2 |
Following row |
42 |
90 |
38001.75 |
N |
- |
- |
42 |
100 |
78352.80 |
N |
- |
- |
42 |
130 |
40505.90 |
N |
- |
- |
Below is another query that calculates various running averages:
SELECT dept
, id
, salary
, DEC(AVG(salary) OVER() ,8,2) AS avg1
, DEC(AVG(salary) OVER(PARTITION BY dept) ,8,2) AS avg2
, DEC(AVG(salary) OVER(PARTITION BY dept
ORDER BY id
ROWS UNBOUNDED PRECEDING)
, 8, 2) AS avg3
, DEC(AVG(salary) OVER(PARTITION BY dept
ORDER BY id
ROWS BETWEEN 1 PRECEDING AND 2 FOLLOWING)
, 8, 2) AS avg4
FROM staff
WHERE dept IN (15,20)
AND id > 20
ORDER BY dept
, id;
ANSWER
DEPT | ID | SALARY | AVG1 | AVG2 | AVG3 | AVG4 |
---|---|---|---|---|---|---|
15 |
50 |
80659.80 |
53281.11 |
60482.33 |
80659.80 |
66556.94 |
15 |
70 |
76502.83 |
53281.11 |
60482.33 |
78581.31 |
60482.33 |
15 |
110 |
42508.20 |
53281.11 |
60482.33 |
66556.94 |
53756.51 |
15 |
170 |
42258.50 |
53281.11 |
60482.33 |
60482.33 |
42383.35 |
20 |
80 |
43504.60 |
53281.11 |
38878.67 |
43504.60 |
38878.67 |
20 |
190 |
34252.75 |
53281.11 |
38878.67 |
38878.67 |
38878.67 |
-
AVG1: An average of all matching rows
-
AVG2: An average of all matching rows within a department.
-
AVG3: An average of matching rows within a department, from the first matching row (ordered by ID), up to and including the current row.
-
AVG4: An average of matching rows within a department, starting with one preceding row (i.e. the highest, ordered by ID), the current row, and the next two following rows.
6.1.3. PARTITION Expression
The PARTITION BY expression, which is optional, defines the set of rows that are used in each OLAP function calculation.
Below is a query that uses different partitions to average sets of rows:
SELECT id
, dept
, job
, years
, salary
, DEC(AVG(salary) OVER(PARTITION BY dept) ,7,2) AS dpt_avg
, DEC(AVG(salary) OVER(PARTITION BY job) ,7,2) AS job_avg
, DEC(AVG(salary) OVER(PARTITION BY years/2) ,7,2) AS yr2_avg
, DEC(AVG(salary) OVER(PARTITION BY dept, job) ,7,2) AS d_j_avg
FROM staff
WHERE dept IN (15,20)
AND id > 20
ORDER BY id;
ANSWER
ID | DEPT | JOB | YEARS | SALARY | DPT_AVG | JOB_AVG | YR2_AVG | D_J_AVG |
---|---|---|---|---|---|---|---|---|
50 |
15 |
Mgr |
10 |
80659.80 |
60482.33 |
80659.80 |
80659.80 |
80659.80 |
70 |
15 |
Sales |
7 |
76502.83 |
60482.33 |
76502.83 |
76502.83 |
76502.83 |
80 |
20 |
Clerk |
- |
43504.60 |
38878.67 |
40631.01 |
43504.60 |
38878.67 |
110 |
15 |
Clerk |
5 |
42508.20 |
60482.33 |
40631.01 |
42383.35 |
42383.35 |
170 |
15 |
Clerk |
4 |
42258.50 |
60482.33 |
40631.01 |
42383.35 |
42383.35 |
190 |
20 |
Clerk |
8 |
34252.75 |
38878.67 |
40631.01 |
34252.75 |
38878.67 |
6.1.4. PARTITION vs. GROUP BY
The PARTITION clause, when used by itself, returns a very similar result to a GROUP BY, except that like all OLAP functions, it does not remove the duplicate rows. To illustrate, below is a simple query that does a GROUP BY:
SELECT dept
, SUM(years) AS sum
, AVG(years) AS avg
, COUNT(*) AS row
FROM staff
WHERE id BETWEEN 40 AND 120
AND years IS NOT NULL
GROUP BY dept;
ANSWER
DEPT | SUM | AVG | ROW |
---|---|---|---|
15 |
22 |
7 |
3 |
38 |
6 |
6 |
1 |
42 |
13 |
6 |
2 |
Below is a similar query that uses a PARTITION phrase. Observe that each value calculated is the same, but duplicate rows have not been removed:
SELECT dept
, SUM(years) OVER(PARTITION BY dept) AS sum
, AVG(years) OVER(PARTITION BY dept) AS avg
, COUNT(*) OVER(PARTITION BY dept) AS row
FROM staff
WHERE id BETWEEN 40 AND 120
AND years IS NOT NULL
ORDER BY dept;
ANSWER
DEPT | SUM | AVG | ROW |
---|---|---|---|
15 |
22 |
7 |
3 |
15 |
22 |
7 |
3 |
15 |
22 |
7 |
3 |
38 |
6 |
6 |
1 |
42 |
13 |
6 |
2 |
42 |
13 |
6 |
2 |
Below is a similar query that uses the PARTITION phrase, and then uses a DISTINCT clause to remove the duplicate rows:
SELECT DISTINCT dept
, SUM(years) OVER(PARTITION BY dept) AS sum
, AVG(years) OVER(PARTITION BY dept) AS avg
, COUNT(*) OVER(PARTITION BY dept) AS row
FROM staff
WHERE id BETWEEN 40 AND 120
AND years IS NOT NULL
ORDER BY dept;
ANSWER
DEPT | SUM | AVG | ROW |
---|---|---|---|
15 |
22 |
7 |
3 |
38 |
6 |
6 |
1 |
42 |
13 |
6 |
2 |
Even though the above statement gives the same answer as the prior GROUP BY example, it is not the same internally. Nor is it (probably) as efficient, and it is certainly not as easy to understand. Therefore, when in doubt, use the GROUP BY syntax. |
6.1.5. Window Definition
An OLAP function works on a "window" of matching rows. This window can be defined as:
-
All matching rows.
-
All matching rows within a partition.
-
Some moving subset of the matching rows (within a partition, if defined).
A moving window has to have an ORDER BY clause so that the set of matching rows can be determined.
Window Size Partitions
-
UNBOUNDED PRECEDING: All of the preceding rows.
-
Number PRECEDING: The "n" preceding rows.
-
UNBOUNDED FOLLOWING: All of the following rows.
-
Number FOLLOWING: The "n" following rows.
-
CURRENT ROW: Only the current row.
Defaults
-
No ORDER BY: UNBOUNDED PRECEDING to UNBOUNDED FOLLOWING.
-
ORDER BY only: UNBOUNDED PRECEDING to CURRENT ROW.
-
No BETWEEN: CURRENT ROW to "n" preceding/following row or rank.
-
BETWEEN stmt: From "n" to "n" preceding/following row or rank. The end-point must be greater than or equal to the starting point.
6.1.6. Sample Queries
Below is a query that illustrates some of the above concepts:
SELECT id
, salary
, DEC(AVG(salary) OVER() ,7,2) AS avg_all
, DEC(AVG(salary) OVER(ORDER BY id) ,7,2) AS avg_odr
, DEC(AVG(salary)
OVER(ORDER BY id
ROWS BETWEEN UNBOUNDED PRECEDING
AND UNBOUNDED FOLLOWING) ,7,2) AS avg_p_f
, DEC(AVG(salary)
OVER(ORDER BY id
ROWS BETWEEN UNBOUNDED PRECEDING
AND CURRENT ROW) ,7,2) AS avg_p_c
, DEC(AVG(salary)
OVER(ORDER BY id
ROWS BETWEEN CURRENT ROW
AND UNBOUNDED FOLLOWING) ,7,2) AS avg_c_f
, DEC(AVG(salary)
OVER(ORDER BY id
ROWS BETWEEN 2 PRECEDING
AND 1 FOLLOWING) ,7,2) AS avg_2_1
FROM staff
WHERE dept IN (15,20)
AND id > 20
ORDER BY id;
ANSWER
ID | SALARY | AVG_ALL | AVG_ODR | AVG_P_F | AVG_P_C | AVG_C_F | AVG_2_1 |
---|---|---|---|---|---|---|---|
50 |
80659.80 |
53281.11 |
80659.80 |
53281.11 |
80659.80 |
53281.11 |
78581.31 |
70 |
76502.83 |
53281.11 |
78581.31 |
53281.11 |
78581.31 |
47805.37 |
66889.07 |
80 |
43504.60 |
53281.11 |
66889.07 |
53281.11 |
66889.07 |
40631.01 |
60793.85 |
110 |
42508.20 |
53281.11 |
60793.85 |
53281.11 |
60793.85 |
39673.15 |
51193.53 |
170 |
42258.50 |
53281.11 |
57086.78 |
53281.11 |
57086.78 |
38255.62 |
40631.01 |
190 |
34252.75 |
53281.11 |
53281.11 |
53281.11 |
53281.11 |
34252.75 |
39673.15 |
When the BETWEEN syntax is used, the start of the range/rows must be less than or equal to the end of the range/rows. |
When no BETWEEN is used, the set of rows to be evaluated goes from the current row up or down to the end value:
SELECT id
, SUM(id) OVER(ORDER BY id) AS sum1
, SUM(id) OVER(ORDER BY id ROWS 1 PRECEDING) AS sum2
, SUM(id) OVER(ORDER BY id ROWS UNBOUNDED PRECEDING) AS sum3
, SUM(id) OVER(ORDER BY id ROWS CURRENT ROW) AS sum4
, SUM(id) OVER(ORDER BY id ROWS2 FOLLOWING) AS sum6
, SUM(id) OVER(ORDER BY id ROWS UNBOUNDED FOLLOWING) AS sum6
FROM staff
WHERE id < 40
ORDER BY id;
ANSWER
ID | SUM1 | SUM2 | SUM3 | SUM4 | SUM5 | SUM6 |
---|---|---|---|---|---|---|
10 |
10 |
10 |
10 |
10 |
60 |
60 |
20 |
30 |
30 |
30 |
20 |
50 |
50 |
30 |
60 |
50 |
60 |
30 |
30 |
30 |
6.1.7. ROWS vs. RANGE
(OLAP, ROWS A moving window of rows to be evaluated (relative to the current row) can be defined using either the ROW or RANGE expressions. These differ as follows:
-
ROWS: Refers to the "n" rows before and/or after (within the partition), as defined by the ORDER BY.
-
RANGE: Refers to those rows before and/or after (within the partition) that are within an arithmetic range of the current row, as defined by the ORDER BY.
The next query compares the ROW and RANGE expressions:
SELECT id
, SMALLINT(SUM(id)
OVER(ORDER BY id
RANGE BETWEEN 10 PRECEDING AND 10 FOLLOWING)) AS rng1
, SMALLINT(SUM(id)
OVER(ORDER BY id
ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING)) AS row1
, SMALLINT(SUM(id)
OVER(ORDER BY id
RANGE BETWEEN 10 PRECEDING AND CURRENT ROW)) AS rng2
, SMALLINT(SUM(id)
OVER(ORDER BY id
ROWS BETWEEN 3 PRECEDING AND 1 PRECEDING)) AS row2
, SMALLINT(SUM(id)
OVER(ORDER BY id DESC
ROWS BETWEEN 3 PRECEDING AND 1 PRECEDING)) AS row3
, SMALLINT(SUM(id)
OVER(ORDER BY id
RANGE BETWEEN UNBOUNDED PRECEDING AND 20 FOLLOWING)) AS rng3
FROM staff
WHERE id < 60
ORDER BY id;
ANSWER
ID | RNG1 | ROW1 | RNG2 | ROW2 | ROW3 | RNG3 |
---|---|---|---|---|---|---|
10 |
30 |
30 |
10 |
- |
90 |
60 |
20 |
60 |
60 |
30 |
10 |
120 |
100 |
30 |
90 |
90 |
50 |
30 |
90 |
150 |
40 |
120 |
120 |
70 |
60 |
50 |
150 |
50 |
90 |
90 |
90 |
90 |
- |
150 |
Usage Notes
-
An ORDER BY statement is required when using either expression.
-
If no RANGE or ROWS expression was provided, the default range (assuming there was an ORDER BY) is all preceding rows – up to the current row.
-
When using the RANGE expression, only one expression can be specified in the ORDER BY, and that expression must be numeric.
6.1.8. ORDER BY Expression
The ORDER BY phrase has several purposes:
-
It defines the set of rows that make up a moving window.
-
It provides a set of values to do aggregations on. Each distinct value gets a new result.
-
It gives a direction to the aggregation function processing (i.e. ASC or DESC).
An ORDER BY expression is required for the RANK and DENSE_RANK functions. It is optional for all others (except of using ROWS or RANGE).
Usage Notes
-
ASC: Sorts the values in ascending order. This is the default.
-
DESC: Sorts the values in descending order.
-
NULLS: Determines whether null values are sorted high or low, relative to the non-null values present. Note that the default option differs for ascending and descending order.
-
Sort Expression: The sort-key expression can be any valid column, or any scalar expression is deterministic, and has no external action.
-
ORDER BY ORDER OF table-designator: The table designator refers to a subselect or fullselect in the query and any ordering defined on columns in that subselect or fullselect (note: if there is no explicit ordering the results are unpredictable). If the subselect or fullselect ORDER BY is changed, the ordering sequence will automatically change to match. Note that the final query may have an ordering that differs from that in the subselect or fullselect.
When the table designator refers to a table in the current subselect or fullselect, as opposed to the results of a nested subselect or fullselect, the values are unpredictable. |
6.1.9. Sample Query
In the next query, various aggregations are done on a variety of fields, and on a nested-tableexpression that contains an ORDER BY. Observe that the ascending fields sum or count up, while the descending fields sum down. Also observe that each aggregation field gets a separate result for each new set of rows, as defined in the ORDER BY phrase:
SELECT dept
, name
, salary
, DEC(SUM(salary) OVER(ORDER BY dept) ,8,2) AS sum1
, DEC(SUM(salary) OVER(ORDER BY dept DESC) ,8,2) AS sum2
, DEC(SUM(salary) OVER(ORDER BY ORDER OF s1) ,8,2) AS sum3
, SMALLINT(RANK() OVER(ORDER BY salary, name, dept) ) AS r1
, SMALLINT(RANK() OVER(ORDER BY ORDER OF s1) AS r2
, ROW_NUMBER() OVER(ORDER) AS w1
, COUNT(*) OVER(ORDER BY salary) AS w2
FROM (SELECT *
FROM staff
WHERE id < 60
ORDER BY dept
, name
) AS s1
ORDER BY 1, 2;
ANSWER
DEPT | NAME | SALARY | SUM1 | SUM2 | SUM3 | R1 | R2 | W1 | W2 |
---|---|---|---|---|---|---|---|---|---|
15 |
Hanes |
80659.80 |
80659.80 |
412701.30 |
80659.80 |
4 |
1 |
4 |
4 |
20 |
Pernal |
78171.25 |
257188.55 |
332041.50 |
158831.05 |
3 |
2 |
3 |
3 |
20 |
Sanders |
98357.50 |
257188.55 |
332041.50 |
257188.55 |
5 |
3 |
5 |
5 |
38 |
Marenghi |
77506.75 |
412701.30 |
155512.75 |
334695.30 |
1 |
4 |
1 |
1 |
38 |
O’Brien |
78006.00 |
412701.30 |
155512.75 |
412701.30 |
2 |
5 |
2 |
2 |
There is no relationship between the ORDER BY used in an OLAP function, and the final ordering of the answer. Both are calculated independently. |
6.1.10. Table Designator
The next two queries illustrate referencing a table designator in a subselect. Observe that as the ORDER BY in the subselect changes, the ordering sequence changes. Note that the final query output order does match that of the subselect:
SELECT id
, name
, ROW_NUMBER()
OVER(ORDER BY ORDER OF s) od
FROM (SELECT *
FROM staff
WHERE id < 50
ORDER BY name ASC
) AS s
ORDER BY id ASC;
ANSWER
ID | NAME | OD |
---|---|---|
10 |
Sanders |
4 |
20 |
Pernal |
3 |
30 |
Marenghi |
1 |
40 |
O’Brien |
2 |
SELECT id
, name
, ROW_NUMBER()
OVER(ORDER BY ORDER OF s) od
FROM (SELECT *
FROM staff
WHERE id < 50
ORDER BY name DESC
) AS s
ORDER BY id ASC;
ANSWER
ID | NAME | OD |
---|---|---|
10 |
Sanders |
1 |
20 |
Pernal |
2 |
30 |
Marenghi |
4 |
40 |
O’Brien |
3 |
6.1.11. Nulls Processing
When writing the ORDER BY, one can optionally specify whether or not null values should be counted as high or low. The default, for an ascending field is that they are counted as high (i.e. come last), and for a descending field, that they are counted as low:
SELECT id
, years AS yr
, salary
, DENSE_RANK() OVER(ORDER BY years ASC) AS a
, DENSE_RANK() OVER(ORDER BY years ASC NULLS FIRST) AS af
, DENSE_RANK() OVER(ORDER BY years ASC NULLS LAST ) AS al
, DENSE_RANK() OVER(ORDER BY years DESC) AS d
, DENSE_RANK() OVER(ORDER BY years DESC NULLS FIRST) AS df
, DENSE_RANK() OVER(ORDER BY years DESC NULLS LAST ) AS dl
FROM staff
WHERE id < 100
ORDER BY years
, salary;
ANSWER
ID | YR | SALARY | A | AF | AL | D | DF | DL |
---|---|---|---|---|---|---|---|---|
30 |
5 |
77506.75 |
1 |
2 |
1 |
6 |
6 |
5 |
90 |
6 |
38001.75 |
2 |
3 |
2 |
5 |
5 |
4 |
40 |
6 |
78006.00 |
2 |
3 |
2 |
5 |
5 |
4 |
70 |
7 |
76502.83 |
3 |
4 |
3 |
4 |
4 |
3 |
10 |
7 |
98357.50 |
3 |
4 |
3 |
4 |
4 |
3 |
20 |
8 |
78171.25 |
4 |
5 |
4 |
3 |
3 |
2 |
50 |
10 |
80659.80 |
5 |
6 |
5 |
2 |
2 |
1 |
80 |
- |
43504.60 |
6 |
1 |
6 |
1 |
1 |
6 |
60 |
- |
66808.30 |
6 |
1 |
6 |
1 |
1 |
6 |
In general, one null value does not equal another null value. But, as is illustrated above, for purposes of assigning rank, all null values are considered equal. |
6.1.12. Counting Nulls
The DENSE RANK and RANK functions include null values when calculating rankings. By contrast the COUNT DISTINCT statement excludes null values when counting values. Thus, as is illustrated below, the two methods will differ (by one) when they are used get a count of distinct values - if there are nulls in the target data:
SELECT COUNT(DISTINCT years) AS y#1
, MAX(y#) AS y#2
FROM (SELECT years
, DENSE_RANK() OVER(ORDER BY years) AS y#
FROM staff
WHERE id < 100
) AS xxx
ORDER BY 1;
ANSWER
Y#1 | Y#2 |
---|---|
5 |
6 |
6.1.13. OLAP Functions
RANK and DENSE_RANK
The RANK and DENSE_RANK functions enable one to rank the rows returned by a query. The result is of type BIGINT.
The ORDER BY phrase, which is required, is used to both sequence the values, and to tell Db2 when to generate a new value. |
RANK vs. DENSE_RANK
The two functions differ in how they handle multiple rows with the same value:
-
The RANK function returns the number of proceeding rows, plus one. If multiple rows have equal values, they all get the same rank, while subsequent rows get a ranking that counts all of the prior rows. Thus, there may be gaps in the ranking sequence.
-
The DENSE_RANK function returns the number of proceeding distinct values, plus one. If multiple rows have equal values, they all get the same rank. Each change in data value causes the ranking number to be incremented by one.
Usage Notes
-
The ORDER BY expression is mandatory.
-
The PARTITION BY expression is optional.
Compare Functions
The following query illustrates the use of the two functions:
SELECT id
, years
, salary
, RANK()
OVER(ORDER BY years) AS rank#
, DENSE_RANK()
OVER(ORDER BY years) AS dense#
, ROW_NUMBER()
OVER(ORDER BY years) AS row#
FROM staff
WHERE id < 100
AND years < 10
ORDER BY years;
ANSWER
ID | YEARS | SALARY | RANK# | DENSE# | ROW# |
---|---|---|---|---|---|
30 |
5 |
77506.75 |
1 |
1 |
1 |
40 |
6 |
78006.00 |
2 |
2 |
2 |
90 |
6 |
38001.75 |
2 |
2 |
3 |
10 |
7 |
98357.50 |
4 |
3 |
4 |
70 |
7 |
76502.83 |
4 |
3 |
5 |
20 |
8 |
78171.25 |
6 |
4 |
6 |
6.1.14. ORDER BY Usage
The mandatory ORDER BY phrase gives a sequence to the ranking, and also tells Db2 when to start a new rank value. The following query illustrates both uses:
SELECT job AS job
, years AS yr
, id AS id
, name AS name
, RANK() OVER(ORDER BY job ASC ) AS a1
, RANK() OVER(ORDER BY job ASC, years ASC) AS a2
, RANK() OVER(ORDER BY job ASC, years ASC ,id ASC ) AS a3
, RANK() OVER(ORDER BY job DESC) AS d1
, RANK() OVER(ORDER BY job DESC, years DESC) AS d2
, RANK() OVER(ORDER BY job DESC, years DESC, id DESC) AS d3
, RANK() OVER(ORDER BY job ASC, years DESC, id ASC ) AS m1
, RANK() OVER(ORDER BY job DESC, years ASC, id DESC) AS m2
FROM staff
WHERE id < 150
AND years IN (6,7)
AND job > 'L'
ORDER BY job
, years
, id;
ANSWER
JOB | YR | ID | NAME | A1 | A2 | A3 | D1 | D2 | D3 | M1 | M2 |
---|---|---|---|---|---|---|---|---|---|---|---|
Mgr |
6 |
140 |
Fraye |
1 |
1 |
1 |
4 |
6 |
6 |
3 |
4 |
Mgr |
7 |
10 |
Sanders |
1 |
2 |
2 |
4 |
4 |
5 |
1 |
6 |
Mgr |
7 |
100 |
Plotz |
1 |
2 |
3 |
4 |
4 |
4 |
2 |
5 |
Sales |
6 |
40 |
O’Brien |
4 |
4 |
4 |
1 |
2 |
3 |
5 |
2 |
Sales |
6 |
90 |
Koonitz |
4 |
4 |
5 |
1 |
2 |
2 |
6 |
1 |
Sales |
7 |
70 |
Rothman |
4 |
6 |
6 |
1 |
1 |
1 |
4 |
3 |
Observe above that adding more fields to the ORDER BY phrase resulted in more ranking values being generated.
6.1.15. PARTITION Usage
The optional PARTITION phrase lets one rank the data by subsets of the rows returned. In the following example, the rows are ranked by salary within year:
SELECT id
, years AS yr
, salary
, RANK() OVER(PARTITION BY years
ORDER BY salary) AS r1
FROM staff
WHERE id < 80
AND years IS NOT NULL
ORDER BY years
, salary;
ANSWER
ID | YR | SALARY | R1 |
---|---|---|---|
30 |
5 |
77506.75 |
1 |
40 |
6 |
78006.00 |
1 |
70 |
7 |
76502.83 |
1 |
10 |
7 |
98357.50 |
2 |
20 |
8 |
78171.25 |
1 |
50 |
0 |
80659.80 |
1 |
6.1.16. Multiple Rankings
One can do multiple independent rankings in the same query:
SELECT id
, years
, salary
, SMALLINT(RANK() OVER(ORDER BY years ASC)) AS rank_a
, SMALLINT(RANK() OVER(ORDER BY years DESC)) AS rank_d
, SMALLINT(RANK() OVER(ORDER BY id, years)) AS rank_iy
FROM staff
WHERE id < 100
AND years IS NOT NULL
ORDER BY years;
6.1.17. Dumb Rankings
If one wants to, one can do some really dumb rankings. All of the examples below are fairly stupid, but arguably the dumbest of the lot is the last. In this case, the "ORDER BY 1" phrase ranks the rows returned by the constant "one", so every row gets the same rank. By contrast the "ORDER BY 1" phrase at the bottom of the query sequences the rows, and so has valid business meaning:
SELECT id
, years
, name
, salary
, SMALLINT(RANK() OVER(ORDER BY SUBSTR(name,3,2))) AS dumb1
, SMALLINT(RANK() OVER(ORDER BY salary / 1000)) AS dumb2
, SMALLINT(RANK() OVER(ORDER BY years * ID)) AS dumb3
, SMALLINT(RANK() OVER(ORDER BY 1)) AS dumb4
FROM staff
WHERE id < 40
AND years IS NOT NULL
ORDER BY 1;
ID | YEARS | NAME | SALARY | DUMB1 | DUMB2 | DUMB3 | DUMB4 |
---|---|---|---|---|---|---|---|
10 |
7 |
Sanders |
98357.50 |
1 |
3 |
1 |
1 |
20 |
8 |
Pernal |
78171.25 |
3 |
2 |
3 |
1 |
30 |
5 |
Marenghi |
77506.75 |
2 |
1 |
2 |
1 |
6.1.18. Subsequent Processing
The ranking function gets the rank of the value as of when the function was applied. Subsequent processing may mean that the rank no longer makes sense. To illustrate this point, the following query ranks the same field twice. Between the two ranking calls, some rows were removed from the answer set, which has caused the ranking results to differ:
SELECT xxx.*
, RANK()OVER(ORDER BY id) AS r2
FROM (SELECT id
, name
, RANK() OVER(ORDER BY id) AS r1
FROM staff
WHERE id < 100
AND years IS NOT NULL
) AS xxx
WHERE id > 30
ORDER BY id;
ANSWER
ID | NAME | R1 | R2 |
---|---|---|---|
40 |
O’Brien |
4 |
1 |
50 |
Hanes |
5 |
2 |
70 |
Rothman |
6 |
3 |
90 |
Koonitz |
7 |
4 |
6.1.19. Ordering Rows by Rank
One can order the rows based on the output of a ranking function. This can let one sequence the data in ways that might be quite difficult to do using ordinary SQL. For example, in the following query the matching rows are ordered so that all those staff with the highest salary in their respective department come first, followed by those with the second highest salary, and so on. Within each ranking value, the person with the highest overall salary is listed first:
SELECT id
, RANK() OVER(PARTITION BY dept
ORDER BY salary DESC) AS r1
, salary
, dept AS dp
FROM staff
WHERE id < 80
AND years IS NOT NULL
ORDER BY r1 ASC
, salary DESC;
ANSWER
ID | R1 | SALARY | DP |
---|---|---|---|
10 |
1 |
98357.50 |
20 |
50 |
1 |
80659.80 |
15 |
40 |
1 |
78006.00 |
38 |
20 |
2 |
78171.25 |
20 |
30 |
2 |
77506.75 |
38 |
70 |
2 |
76502.83 |
15 |
Here is the same query, written without the ranking function:
SELECT id
, (SELECT COUNT(*)
FROM staff s2
WHERE s2.id < 80
AND s2.years IS NOT NULL
AND s2.dept = s1.dept
AND s2.salary >= s1.salary
) AS R1
, salary
, dept AS dp
FROM staff s1
WHERE id < 80
AND years IS NOT NULL
ORDER BY r1 ASC
, salary DESC;
ANSWER
ID | R1 | SALARY | DP |
---|---|---|---|
10 |
1 |
98357.50 |
20 |
50 |
1 |
80659.80 |
15 |
40 |
1 |
78006.00 |
38 |
20 |
2 |
78171.25 |
20 |
30 |
2 |
77506.75 |
38 |
70 |
2 |
76502.83 |
15 |
The above query has all of the failings that were discussed at the beginning of this chapter:
-
The nested table expression has to repeat all of the predicates in the main query, and have predicates that define the ordering sequence. Thus it is hard to read.
-
The nested table expression will (inefficiently) join every matching row to all prior rows.
6.1.20. Selecting the Highest Value
The ranking functions can also be used to retrieve the row with the highest value in a set of rows. To do this, one must first generate the ranking in a nested table expression, and then query the derived field later in the query. The following statement illustrates this concept by getting the person, or persons, in each department with the highest salary:
SELECT id
, salary
, dept AS dp
FROM
(SELECT s1.*
, RANK() OVER(PARTITION BY dept
ORDER BY salary DESC) AS r1
FROM staff s1
WHERE id < 80
AND years IS NOT NULL
) AS xxx
WHERE r1 = 1
ORDER BY dp;
ANSWER
ID | SALARY | DP |
---|---|---|
50 |
80659.80 |
15 |
10 |
98357.50 |
20 |
40 |
78006.00 |
38 |
Here is the same query, written using a correlated sub-query:
SELECT id
, salary
, dept AS dp
FROM staff s1
WHERE id < 80
AND years IS NOT NULL
AND NOT EXISTS
(SELECT *
FROM staff s2
WHERE s2.id < 80
AND s2.years IS NOT NULL
AND s2.dept = s1.dept
AND s2.salary > s1.salary)
ORDER BY dp;
ID | SALARY | DP |
---|---|---|
50 |
80659.80 |
15 |
10 |
98357.50 |
20 |
40 |
78006.00 |
38 |
Here is the same query, written using an uncorrelated sub-query:
SELECT id
, salary
, dept AS dp
FROM staff
WHERE id < 80
AND years IS NOT NULL
AND (dept, salary) IN
(SELECT dept, MAX(salary)
FROM staff
WHERE id < 80
AND years IS NOT NULL
GROUP BY dept)
ORDER BY dp;
ANSWER
ID | SALARY | DP |
---|---|---|
50 |
80659.80 |
15 |
10 |
98357.50 |
20 |
40 |
78006.00 |
38 |
Arguably, the first query above (i.e. the one using the RANK function) is the most elegant of the series because it is the only statement where the basic predicates that define what rows match are written once. With the two sub-query examples, these predicates have to be repeated, which can often lead to errors.
6.1.21. ROW_NUMBER
The ROW_NUMBER function lets one number the rows being returned. The result is of type BIGINT. A syntax diagram follows. Observe that unlike with the ranking functions, the ORDER BY is not required.
ORDER BY Usage
You don’t have to provide an ORDER BY when using the ROW_NUMBER function, but not doing so can be considered to be either brave or foolish, depending on one’s outlook on life. To illustrate this issue, consider the following query:
SELECT id
, name
, ROW_NUMBER() OVER() AS r1
, ROW_NUMBER() OVER(ORDER BY id) AS r2
FROM staff
WHERE id < 50
AND years IS NOT NULL
ORDER BY id;
ANSWER
ID | NAME | R1 | R2 |
---|---|---|---|
10 |
Sanders |
1 |
1 |
20 |
Pernal |
2 |
2 |
30 |
Marenghi |
3 |
3 |
40 |
O’Brien |
4 |
4 |
In the above example, both ROW_NUMBER functions return the same set of values, which happen to correspond to the sequence in which the rows are returned. In the next query, the second ROW_NUMBER function purposely uses another sequence:
SELECT id
, name
, ROW_NUMBER() OVER() AS r1
, ROW_NUMBER() OVER(ORDER BY name) AS r2
FROM staff
WHERE id < 50
AND years IS NOT NULL
ORDER BY id;
ANSWER
ID | NAME | R1 | R2 |
---|---|---|---|
10 |
Sanders |
4 |
4 |
20 |
Pernal |
3 |
3 |
30 |
Marenghi |
1 |
1 |
40 |
O’Brien |
2 |
2 |
Observe that changing the second function has had an impact on the first. Now lets see what happens when we add another ROW_NUMBER function:
SELECT id
,name
,ROW_NUMBER() OVER() AS r1
,ROW_NUMBER() OVER(ORDER BY ID) AS r2
,ROW_NUMBER() OVER(ORDER BY NAME) AS r3
FROM staff
WHERE id < 50
AND years IS NOT NULL
ORDER BY id;
ANSWER
ID | NAME | R1 | R2 | R3 |
---|---|---|---|---|
10 |
Sanders |
1 |
1 |
4 |
20 |
Pernal |
2 |
2 |
3 |
30 |
Marenghi |
3 |
3 |
1 |
40 |
O’Brien |
4 |
4 |
2 |
Observe that now the first function has reverted back to the original sequence.
When not given an explicit ORDER BY, the ROW_NUMBER function, may create a value in any odd order. Usually, the sequence will reflect the order in which the rows are returned - but not always. |
6.1.22. PARTITION Usage
The PARTITION phrase lets one number the matching rows by subsets of the rows returned. In the following example, the rows are both ranked and numbered within each JOB:
SELECT job
, years
, id
, name
, ROW_NUMBER() OVER(PARTITION BY job ORDER BY years) AS row#
, RANK() OVER(PARTITION BY job ORDER BY years) AS rn1#
, DENSE_RANK() OVER(PARTITION BY job ORDER BY years) AS rn2#
FROM staff
WHERE id < 150
AND years IN (6,7)
AND job > 'L'
ORDER BY job,years;
ANSWER
JOB | YEARS | ID | NAME | ROW# | RN1# | RN2# |
---|---|---|---|---|---|---|
Mgr |
6 |
140 |
Fraye |
1 |
1 |
1 |
Mgr |
7 |
10 |
Sanders |
2 |
2 |
2 |
Mgr |
7 |
100 |
Plotz |
3 |
2 |
2 |
Sales |
6 |
40 |
O’Brien |
1 |
1 |
1 |
Sales |
6 |
90 |
Koonitz |
2 |
1 |
1 |
Sales |
7 |
70 |
Rothman |
3 |
3 |
2 |
One problem with the above query is that the final ORDER BY that sequences the rows does not identify a unique field (e.g. ID). Consequently, the rows can be returned in any sequence within a given JOB and YEAR. Because the ORDER BY in the ROW_NUMBER function also fails to identify a unique row, this means that there is no guarantee that a particular row will always give the same row number. For consistent results, ensure that both the ORDER BY phrase in the function call, and at the end of the query, identify a unique row. And to always get the rows returned in the desired row-number sequence, these phrases must be equal.
Selecting "n" Rows
To query the output of the ROW_NUMBER function, one has to make a nested temporary table that contains the function expression. In the following example, this technique is used to limit the query to the first three matching rows:
SELECT *
FROM
(SELECT id
, name
, ROW_NUMBER() OVER(ORDER BY id) AS r
FROM staff
WHERE id < 100
AND years IS NOT NULL
) AS xxx
WHERE r <= 3
ORDER BY id;
ANSWER
ID | NAME | R |
---|---|---|
10 |
Sanders |
1 |
20 |
Pernal |
2 |
30 |
Marenghi |
3 |
In the next query, the FETCH FIRST "n" ROWS notation is used to achieve the same result:
SELECT id
, name
, ROW_NUMBER() OVER(ORDER BY id) AS r
FROM staff
WHERE id < 100
AND years IS NOT NULL
ORDER BY id
FETCH FIRST 3 ROWS ONLY;
ANSWER
ID | NAME | R |
---|---|---|
10 |
Sanders |
1 |
20 |
Pernal |
2 |
30 |
Marenghi |
3 |
So far, the ROW_NUMBER and the FETCH FIRST notations seem to be about the same. But the former is much more flexible. To illustrate, the next query gets the 3rd through 6th rows:
SELECT *
FROM
(SELECT id
, name
, ROW_NUMBER() OVER(ORDER BY id) AS r
FROM staff
WHERE id < 200
AND years IS NOT NULL
) AS xxx
WHERE r BETWEEN 3 AND 6
ORDER BY id;
ANSWER
ID | NAME | R |
---|---|---|
30 |
Marenghi |
3 |
40 |
O’Brien |
4 |
50 |
Hanes |
5 |
70 |
Rothman |
6 |
In the next query we get every 5th matching row - starting with the first:
SELECT *
FROM
(SELECT id
, name
, ROW_NUMBER() OVER(ORDER BY id) AS r
FROM staff
WHERE id < 200
AND years IS NOT NULL
) AS xxx
WHERE (r - 1) = ((r - 1) / 5) * 5
ORDER BY id;
ANSWER
ID | NAME | R |
---|---|---|
10 |
Sanders |
1 |
70 |
Rothman |
6 |
140 |
Fraye |
11 |
190 |
Sneider |
16 |
In the next query we get the last two matching rows:
SELECT *
FROM
(SELECT id
, name
, ROW_NUMBER() OVER(ORDER BY id DESC) AS r
FROM staff
WHERE id < 200
AND years IS NOT NULL
) AS xxx
WHERE r <= 2
ORDER BY id;
ANSWER
ID | NAME | R |
---|---|---|
180 |
Abrahams |
2 |
190 |
Sneider |
1 |
6.1.23. Selecting "n" or more Rows
Imagine that one wants to fetch the first "n" rows in a query. This is easy to do, and has been illustrated above. But imagine that one also wants to keep on fetching if the following rows have the same value as the "nth". In the next example, we will get the first three matching rows in the STAFF table, ordered by years of service. However, if the 4th row, or any of the following rows, has the same YEAR as the 3rd row, then we also want to fetch them.
The query logic goes as follows:
-
Select every matching row in the STAFF table, and give them all both a row-number and a ranking value. Both values are assigned according to the order of the final output. Do all of this work in a nested table expression.
-
Select from the nested table expression where the rank is three or less.
The query relies on the fact that the RANK function (see RANK and DENSE_RANK) assigns the lowest common row number to each row with the same ranking:
SELECT *
FROM
(SELECT years
, id
, name
, RANK() OVER(ORDER BY years) AS rnk
, ROW_NUMBER() OVER(ORDER BY years, id) AS row
FROM staff
WHERE id < 200
AND years IS NOT NULL
) AS xxx
WHERE rnk <= 3
ORDER BY years
, id;
ANSWER
YEARS | ID | NAME | RNK | ROW |
---|---|---|---|---|
3 |
180 |
Abrahams |
1 |
1 |
4 |
170 |
Kermisch |
2 |
2 |
5 |
30 |
Marenghi |
3 |
3 |
5 |
110 |
Ngan |
3 |
4 |
The type of query illustrated above can be extremely useful in certain business situations. To illustrate, imagine that one wants to give a reward to the three employees that have worked for the company the longest. Stopping the query that lists the lucky winners after three rows are fetched can get one into a lot of trouble if it happens that there are more than three employees that have worked for the company for the same number of years.
6.1.24. Selecting "n" Rows - Efficiently
Sometimes, one only wants to fetch the first "n" rows, where "n" is small, but the number of matching rows is extremely large. In this section, we will discus how to obtain these "n" rows efficiently, which means that we will try to fetch just them without having to process any of the many other matching rows. Below is an invoice table. Observe that we have defined the INV# field as the primary key, which means that Db2 will build a unique index on this column:
CREATE TABLE invoice
( inv# INTEGER NOT NULL
, customer# INTEGER NOT NULL
, sale_date DATE NOT NULL
, sale_value DECIMAL(9,2) NOT NULL
, CONSTRAINT ctx1 PRIMARY KEY (inv#)
, CONSTRAINT ctx2 CHECK(inv# >= 0));
The next SQL statement will insert 1,000,000 rows into the above table. After the rows are inserted a REORG and RUNSTATS is run, so the optimizer can choose the best access path.
INSERT INTO invoice
WITH temp (n, m) AS
(VALUES
(INTEGER(0), RAND(1))
UNION ALL
SELECT n+1, RAND()
FROM temp
WHERE n+1 < 1000000
)
SELECT n AS inv#
, INT(m * 1000) AS customer#
, DATE('2000-11-01') + (m*40) DAYS AS sale_date
, DECIMAL((m * m * 100),8,2) AS sale_value
FROM temp;
Imagine we want to retrieve the first five rows (only) from the above table. Below are several queries that get this result. For each query, the elapsed time, as measured by Db2BATCH, is provided. Below we use the "FETCH FIRST n ROWS" notation to stop the query at the 5th row. The query scans the primary index to get first five matching rows, and thus is cheap:
SELECT s.*
FROM invoice s
ORDER BY inv#
FETCH FIRST 5 ROWS ONLY;
The next query is essentially the same as the prior, but this time we tell Db2 to optimize the query for fetching five rows. Nothing has changed, and all is good:
SELECT s.*
FROM invoice s
ORDER BY inv#
FETCH FIRST 5 ROWS ONLY
OPTIMIZE FOR 5 ROWS;
The next query is the same as the first, except that it invokes the ROW_NUMBER function to passively sequence the output. This query also uses the primary index to identify the first five matching rows, and so is cheap:
SELECT s.*
, ROW_NUMBER() OVER() AS row#
FROM invoice s
ORDER BY inv#
FETCH FIRST 5 ROWS ONLY;
The next query is the same as the previous. It uses a nested-table-expression, but no action is taken subsequently, so this code is ignored:
SELECT *
FROM
(SELECT s.*
, ROW_NUMBER() OVER() AS row#
FROM invoice s
) xxx
ORDER BY inv#
FETCH FIRST 5 ROWS ONLY;
All of the above queries processed only five matching rows. The next query will process all one million matching rows in order to calculate the ROW_NUMBER value, which is on no particular column. It will cost:
SELECT *
FROM
(SELECT s.*
, ROW_NUMBER() OVER() AS row#
FROM invoice s
) xxx
WHERE row# <= 5
ORDER BY inv#;
In the above query the "OVER()" phrase told Db2 to assign row numbers to each row. In the next query we explicitly provide the ROW_NUMBER with a target column, which happens to be the same at the ORDER BY sequence, and is also an indexed column. Db2 can use all this information to confine the query to the first "n" matching rows:
SELECT *
FROM
(SELECT s.*
, ROW_NUMBER() OVER(ORDER BY inv#) AS row#
FROM invoice s
) xxx
WHERE row# <= 5
ORDER BY inv#;
Changing the above predicate to: "WHERE row# BETWEEN 1 AND 5" will get the same answer, but use a much less efficient access path. |
One can also use recursion to get the first "n" rows. One begins by getting the first matching row, and then uses that row to get the next, and then the next, and so on (in a recursive join), until the required number of rows have been obtained. In the following example, we start by getting the row with the MIN invoice-number. This row is then joined to the row with the next to lowest invoice-number, which is then joined to the next, and so on. After five such joins, the cycle is stopped and the result is selected:
WITH temp (inv#, c#, sd, sv, n) AS
(SELECT inv.*
, 1
FROM invoice inv
WHERE inv# =
(SELECT MIN(inv#)
FROM invoice)
UNION ALL
SELECT new.*
, n + 1
FROM temp old
, invoice new
WHERE old.inv# < new.inv#
AND old.n < 5
AND new.inv# =
(SELECT MIN(xxx.inv#)
FROM invoice xxx
WHERE xxx.inv# > old.inv#)
)
SELECT *
FROM temp;
The above technique is nice to know, but it has several major disadvantages:
-
It is not exactly easy to understand.
-
It requires that all primary predicates (e.g. get only those rows where the sale-value is greater than $10,000) be repeated four times. In the above example there are none, which is unusual in the real world.
-
It quickly becomes both very complicated and quite inefficient when the sequencing value is made up of multiple fields. In the above example, we sequenced by the INV# column, but imagine if we had used the sale-date, sale-value, and customer-number.
-
It is extremely vulnerable to inefficient access paths. For example, if instead of joining from one (indexed) invoice-number to the next, we joined from one (non-indexed) customer-number to the next, the query would run forever.
In this section we have illustrated how minor changes to the SQL syntax can cause major changes in query performance. But to illustrate this phenomenon, we used a set of queries with 1,000,000 matching rows. In situations where there are far fewer matching rows, one can reasonably assume that this problem is not an issue.
6.1.25. FIRST_VALUE and LAST_VALUE
The FIRST_VALUE and LAST_VALUE functions get first or last value in the (moving) window of matching rows.
Usage Notes
-
An expression value must be provided in the first set of parenthesis. Usually this will be a column name, but any valid scalar expression is acceptable.
-
The PARTITION BY expression is optional.
-
The ORDER BY expression is optional.
-
See Window Definition for notes on how to define a moving-window of rows to process.
-
If no explicit moving-window definition is provided, the default window size is between UNBOUNDED PRECEDING (of the partition and/or range) and the CURRENT ROW. This can sometimes cause logic errors when using the LAST_VALUE function. The last value is often simply the current row. To get the last matching value within the partition and/or range, set the upper bound to UNBOUNDED FOLLOWING.
-
If IGNORE NULLS is specified, null values are ignored, unless all values are null, in which case the result is null. The default is RESPECT NULLS.
Examples
The following query illustrates the basics. The first matching name (in ID order) within each department is obtained:
SELECT dept
, id
, name
, FIRST_VALUE(name) OVER(PARTITION BY dept
ORDER BY id) AS frst
FROM staff
WHERE dept <= 15
AND id > 160
ORDER BY dept ,id;
ANSWER
DEPT | ID | NAME | FRST |
---|---|---|---|
10 |
210 |
Lu |
Lu |
10 |
240 |
Daniels |
Lu |
10 |
260 |
Jones |
Lu |
15 |
170 |
Kermisch |
Kermisch |
The next uses various ordering schemas and moving-window sizes the get a particular first or last value (within a department):
SELECT dept
, id
, comm
, FIRST_VALUE(comm) OVER(PARTITION BY dept
ORDER BY comm) AS first1
, FIRST_VALUE(comm) OVER(PARTITION BY dept
ORDER BY comm NULLS FIRST) AS first2
, FIRST_VALUE(comm) OVER(PARTITION BY dept
ORDER BY comm NULLS LAST) AS first3
, FIRST_VALUE(comm) OVER(PARTITION BY dept
ORDER BY comm NULLS LAST
ROWS BETWEEN 1 PRECEDING AND CURRENT ROW) AS first4
, LAST_VALUE(comm) OVER(PARTITION BY dept
ORDER BY comm) AS last1
, LAST_VALUE(comm) OVER(PARTITION BY dept
ORDER BY comm NULLS FIRST
ROWS UNBOUNDED FOLLOWING) AS last2
FROM
staff
WHERE id < 100
AND dept < 30
ORDER BY dept ,comm;
ANSWER
DEPT | ID | COMM | FIRST1 | FIRST2 | FIRST3 | FIRST4 | LAST1 | LAST2 |
---|---|---|---|---|---|---|---|---|
15 |
70 |
1152.00 |
1152.00 |
- |
1152.00 |
1152.00 |
1152.00 |
1152.00 |
15 |
50 |
- |
1152.00 |
- |
1152.00 |
1152.00 |
- |
1152.00 |
20 |
80 |
128.20 |
128.20 |
- |
128.20 |
128.20 |
128.20 |
612.45 |
20 |
20 |
612.45 |
128.20 |
- |
128.20 |
128.20 |
612.45 |
612.45 |
20 |
10 |
- |
128.20 |
- |
128.20 |
612.45 |
- |
612.45 |
The next query illustrates what happens when one, or all, of the matching values are null:
SELECT dept
, id
, comm
, FIRST_VALUE(comm) OVER(PARTITION BY dept
ORDER BY comm) AS rn_lst
, FIRST_VALUE(comm) OVER(PARTITION BY dept
ORDER BY comm NULLS LAST) AS rn_ls2
, FIRST_VALUE(comm) OVER(PARTITION BY dept
ORDER BY comm NULLS FIRST) AS rn_fst
, FIRST_VALUE(comm,'IGNORE NULLS') OVER(PARTITION BY dept
ORDER BY comm NULLS FIRST) AS in_fst
FROM staff
WHERE id BETWEEN 20 AND 160
AND dept <= 20
ORDER BY dept ,comm;
ANSWER
DEPT | ID | COMM | RN_LST | RN_LS2 | RN_FST | IN_FST |
---|---|---|---|---|---|---|
10 |
160 |
|||||
15 |
110 |
206.60 |
206.60 |
206.60 |
- |
206.60 |
15 |
70 |
1152.00 |
206.60 |
206.60 |
- |
206.60 |
15 |
50 |
- |
206.60 |
206.60 |
||
20 |
80 |
128.20 |
128.20 |
128.20 |
128.20 |
128.20 |
20 |
20 |
612.45 |
128.20 |
128.20 |
128.20 |
128.20 |
6.1.26. LAG and LEAD
The LAG, and LEAD functions get the previous or next value from the (moving) window of matching rows:
-
LAG: Get previous value. Return null if at first value.
-
LEAD: Get next value. Return null if at last value.
Usage Notes
-
An expression value must be provided in the first set of parenthesis. Usually this will be a column name, but any valid scalar expression is acceptable.
-
The PARTITION BY expression is optional.
-
The ORDER BY expression is mandatory.
-
The default OFFSET value is 1. A value of zero refers to the current row. An offset that is outside of the moving-window returns null.
-
If IGNORE NULLS is specified, a default (override) value must also be provided.
Examples
The next query uses the LAG function to illustrate what happens when one messes around with the ORDER BY expression:
SELECT dept
, id
, comm
, LAG(comm) OVER(PARTITION BY dept ORDER BY comm) AS lag1
, LAG(comm,0) OVER(PARTITION BY dept ORDER BY comm) AS lag2
, LAG(comm,2) OVER(PARTITION BY dept ORDER BY comm) AS lag3
, LAG(comm,1,-1,'IGNORE NULLS') OVER(PARTITION BY dept ORDER BY comm) AS lag4
, LEAD(comm) OVER(PARTITION BY dept ORDER BY comm) AS led1
FROM staff
WHERE id BETWEEN 20 AND 160
AND dept <= 20
ORDER BY dept ,comm;
ANSWER
DEPT | ID | COMM | LAG1 | LAG2 | LAG3 | LAG4 | LED1 |
---|---|---|---|---|---|---|---|
10 |
160 |
- |
- |
- |
- |
-1.00 |
- |
15 |
110 |
206.60 |
- |
206.60 |
- |
-1.00 |
1152.00 |
15 |
70 |
1152.00 |
206.60 |
1152.00 |
- |
206.60 |
- |
15 |
50 |
- |
1152.00 |
- |
206.60 |
1152.00 |
- |
20 |
80 |
128.20 |
- |
128.20 |
- |
-1.00 |
612.45 |
20 |
20 |
612.45 |
128.20 |
612.45 |
- |
128.20 |
- |
6.1.27. Aggregation
The various aggregation functions let one do cute things like get cumulative totals or running averages. In some ways, they can be considered to be extensions of the existing Db2 column functions. The output type is dependent upon the input type.
Syntax Notes
Guess what - this is a complicated function. Be aware of the following:
-
Any Db2 column function (e.g. AVG, SUM, COUNT), except ARRAY_AGG, can use the aggregation function.
-
The OVER() usage aggregates all of the matching rows. This is equivalent to getting the current row, and also applying a column function (e.g. MAX, SUM) against all of the matching rows.
-
The PARTITION BY expression is optional.
-
The ORDER BY expression is mandatory if the aggregation is confined to a set of rows or range of values. Otherwise it is optional. If a RANGE is specified (see ROWS vs. RANGE for definition), then the ORDER BY expression must be a single value that allows subtraction.
-
If an ORDER BY phrase is provided, but neither a RANGE nor ROWS is specified, then the aggregation is done from the first row to the current row.
-
See Window Definition for notes on how to define a moving-window of rows to process.
Basic Usage
In its simplest form, with just an "OVER()" phrase, an aggregation function works on all of the matching rows, running the column function specified. Thus, one gets both the detailed data, plus the SUM, or AVG, or whatever, of all the matching rows. In the following example, five rows are selected from the STAFF table. Along with various detailed fields, the query also gets sum summary data about the matching rows:
SELECT id
, name
, salary
, SUM(salary) OVER() AS sum_sal
, AVG(salary) OVER() AS avg_sal
, MIN(salary) OVER() AS min_sal
, MAX(salary) OVER() AS max_sal
, COUNT(*) OVER() AS #rows
FROM staff
WHERE id < 30
ORDER BY id;
ANSWER
ID | NAME | SALARY | SUM_SAL | AVG_SAL | MIN_SAL | MAX_SAL | #ROWS |
---|---|---|---|---|---|---|---|
10 |
Sanders |
98357.50 |
254035.50 |
84678.50 |
77506.75 |
98357.50 |
3 |
20 |
Pernal |
78171.25 |
254035.50 |
84678.50 |
77506.75 |
98357.50 |
3 |
30 |
Marenghi |
77506.75 |
254035.50 |
84678.50 |
77506.75 |
98357.50 |
3 |
An aggregation function with just an "OVER()" phrase is logically equivalent to one that has an ORDER BY on a field that has the same value for all matching rows. To illustrate, in the following query, the four aggregation functions are all logically equivalent:
SELECT id
, name
, salary
, SUM(salary) OVER() AS sum1
, SUM(salary) OVER(ORDER BY id * 0) AS sum2
, SUM(salary) OVER(ORDER BY 'ABC') AS sum3
, SUM(salary) OVER(ORDER BY 'ABC'
RANGE BETWEEN UNBOUNDED PRECEDING
AND UNBOUNDED FOLLOWING) AS sum4
FROM staff
WHERE id < 60
ORDER BY id;
ANSWER
ID | NAME | SALARY | SUM1 | SUM2 | SUM3 | SUM4 |
---|---|---|---|---|---|---|
10 |
Sanders |
98357.50 |
412701.30 |
412701.30 |
412701.30 |
412701.30 |
20 |
Pernal |
78171.25 |
412701.30 |
412701.30 |
412701.30 |
412701.30 |
30 |
Marenghi |
77506.75 |
412701.30 |
412701.30 |
412701.30 |
412701.30 |
40 |
O’Brien |
78006.00 |
412701.30 |
412701.30 |
412701.30 |
412701.30 |
50 |
Hanes |
80659.80 |
412701.30 |
412701.30 |
412701.30 |
412701.30 |
6.1.28. ORDER BY Usage
The ORDER BY phrase (see Order By, Group By, and Having for syntax) has two main purposes:
-
It provides a set of values to do aggregations on. Each distinct value gets a new result.
-
It gives a direction to the aggregation function processing (i.e. ASC or DESC).
In the next query, various aggregations are run on the DEPT field, which is not unique, and on the DEPT and NAME fields combined, which are unique (for these rows). Both ascending and descending aggregations are illustrated. Observe that the ascending fields sum or count up, while the descending fields sum down. Also observe that each aggregation field gets a separate result for each new set of rows, as defined in the ORDER BY phrase:
SELECT dept
, name
, salary
, SUM(salary) OVER(ORDER BY dept) AS sum1
, SUM(salary) OVER(ORDER BY dept DESC) AS sum2
, SUM(salary) OVER(ORDER BY dept, NAME) AS sum3
, SUM(salary) OVER(ORDER BY dept DESC, name DESC) AS sum4
, COUNT(*) OVER(ORDER BY dept) AS rw1
, COUNT(*) OVER(ORDER BY dept, NAME) AS rw2
FROM staff
WHERE id < 60
ORDER BY dept
, name;
ANSWER
DEPT | NAME | SALARY | SUM1 | SUM2 | SUM3 | SUM4 | RW1 | RW2 |
---|---|---|---|---|---|---|---|---|
15 |
Hanes |
80659.80 |
80659.80 |
412701.30 |
80659.80 |
412701.30 |
1 |
1 |
20 |
Pernal |
78171.25 |
257188.55 |
332041.50 |
158831.05 |
332041.50 |
3 |
2 |
20 |
Sanders |
98357.50 |
257188.55 |
332041.50 |
257188.55 |
253870.25 |
3 |
3 |
38 |
Marenghi |
77506.75 |
412701.30 |
155512.75 |
334695.30 |
155512.75 |
5 |
4 |
38 |
O’Brien |
78006.00 |
412701.30 |
155512.75 |
412701.30 |
78006.00 |
5 |
5 |
6.1.29. ROWS Usage
The ROWS phrase (see Window Definition for syntax) is used to limit the aggregation function to a subset of the matching rows. The set of rows to process are defined thus:
-
No ORDER BY: UNBOUNDED PRECEDING to UNBOUNDED FOLLOWING.
-
ORDER BY only: UNBOUNDED PRECEDING to CURRENT ROW.
-
No BETWEEN: CURRENT ROW to "n" preceding/following row.
-
BETWEEN stmt: From "n" to "n" preceding/following row. The end-point must be greater than or equal to the starting point.
The following query illustrates these concepts:
SELECT id
, years
, AVG(years) OVER() AS "p_f"
, AVG(years) OVER(ORDER BY id ROWS
BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS "p_f"
, AVG(years) OVER(ORDER BY id) AS "p_c"
, AVG(years) OVER(ORDER BY id
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS "p_c"
, AVG(years) OVER(ORDER BY id
ROWS UNBOUNDED PRECEDING) AS "p_c"
, AVG(years) OVER(ORDER BY id
ROWS UNBOUNDED FOLLOWING) AS "c_f"
, AVG(years) OVER(ORDER BY id
ROWS 2 FOLLOWING) AS "c_2"
, AVG(years) OVER(ORDER BY id
ROWS 1 PRECEDING) AS "1_c"
, AVG(years) OVER(ORDER BY id
ROWS BETWEEN 1 FOLLOWING AND 2 FOLLOWING) AS "1_2"
FROM staff
WHERE dept IN (15,20)
AND id > 20
AND years > 1
ORDER BY id;
ID | YEARS | p_f | p_f | p_c | p_c | p_c | c_f | c_2 | 1_c | 1_2 |
---|---|---|---|---|---|---|---|---|---|---|
50 |
10 |
6 |
6 |
10 |
10 |
10 |
6 |
7 |
10 |
6 |
70 |
7 |
6 |
6 |
8 |
8 |
8 |
6 |
5 |
8 |
4 |
110 |
5 |
6 |
6 |
7 |
7 |
7 |
5 |
5 |
6 |
6 |
170 |
4 |
6 |
6 |
6 |
6 |
6 |
6 |
6 |
4 |
8 |
190 |
8 |
6 |
6 |
6 |
6 |
6 |
8 |
8 |
6 |
- |
6.1.30. RANGE Usage
The RANGE phrase limits the aggregation result to a range of numeric values - defined relative to the value of the current row being processed (see Window Definition). The range is obtained by taking the value in the current row (defined by the ORDER BY expression) and adding to and/or subtracting from it, then seeing what other matching rows are in the range.
When using a RANGE, only one expression can be specified in the ORDER BY, and that expression must be numeric. |
In the following example, the RANGE function adds to and/or subtracts from the DEPT field. For example, in the function that is used to populate the RG10 field, the current DEPT value is checked against the preceding DEPT values. If their value is within 10 digits of the current value, the related YEARS field is added to the SUM:
SELECT dept
, name
, years
, SMALLINT(SUM(years) OVER(ORDER BY dept
ROWS BETWEEN 1 PRECEDING AND CURRENT ROW)) AS row1
, SMALLINT(SUM(years) OVER(ORDER BY dept
ROWS BETWEEN 2 PRECEDING AND CURRENT ROW)) AS row2
, SMALLINT(SUM(years) OVER(ORDER BY dept
RANGE BETWEEN 1 PRECEDING AND CURRENT ROW)) AS rg01
, SMALLINT(SUM(years) OVER(ORDER BY dept
RANGE BETWEEN 10 PRECEDING AND CURRENT ROW)) AS rg10
, SMALLINT(SUM(years) OVER(ORDER BY dept
RANGE BETWEEN 20 PRECEDING AND CURRENT ROW)) AS rg20
, SMALLINT(SUM(years) OVER(ORDER BY dept
RANGE BETWEEN 10 PRECEDING AND 20 FOLLOWING)) AS rg11
, SMALLINT(SUM(years) OVER(ORDER BY dept
RANGE BETWEEN CURRENT ROW AND 20 FOLLOWING)) AS rg99
FROM staff
WHERE id < 100
AND years IS NOT NULL
ORDER BY dept
, name;
ANSWER
DEPT | NAME | YEARS | ROW1 | ROW2 | RG01 | RG10 | RG20 | RG11 | RG99 |
---|---|---|---|---|---|---|---|---|---|
15 |
Hanes |
10 |
10 |
10 |
17 |
17 |
17 |
32 |
32 |
15 |
Rothman |
7 |
17 |
17 |
17 |
17 |
17 |
32 |
32 |
20 |
Pernal |
8 |
15 |
25 |
15 |
32 |
32 |
43 |
26 |
20 |
Sanders |
7 |
15 |
22 |
15 |
32 |
32 |
43 |
26 |
38 |
Marengh |
5 |
12 |
20 |
11 |
11 |
26 |
17 |
17 |
38 |
O’Brien |
6 |
11 |
18 |
11 |
11 |
26 |
17 |
17 |
42 |
Koonitz |
6 |
12 |
17 |
6 |
17 |
17 |
17 |
6 |
Note the difference between the ROWS as RANGE expressions:
-
The ROWS expression refers to the "n" rows before and/or after (within the partition), as defined by the ORDER BY.
-
The RANGE expression refers to those before and/or after rows (within the partition) that are within an arithmetic range of the current row.
6.1.31. BETWEEN vs. ORDER BY
The BETWEEN predicate in an ordinary SQL statement is used to get those rows that have a value between the specified low-value (given first) and the high value (given last). Thus the predicate "BETWEEN 5 AND 10" may find rows, but the predicate "BETWEEN 10 AND 5" will never find any. The BETWEEN phrase in an aggregation function has a similar usage in that it defines the set of rows to be aggregated. But it differs in that the answer depends upon the related ORDER BY sequence, and a non-match returns a null value, not no-rows. Below is some sample SQL. Observe that the first two aggregations are ascending, while the last two are descending:
SELECT id
, name
, SMALLINT(SUM(id) OVER(ORDER BY id ASC
ROWS BETWEEN 1 PRECEDING AND CURRENT ROW)) AS apc
, SMALLINT(SUM(id) OVER(ORDER BY id ASC
ROWS BETWEEN CURRENT ROW AND 1 FOLLOWING)) AS acf
, SMALLINT(SUM(id) OVER(ORDER BY id DESC
ROWS BETWEEN 1 PRECEDING AND CURRENT ROW)) AS dpc
, SMALLINT(SUM(id) OVER(ORDER BY id DESC
ROWS BETWEEN CURRENT ROW AND 1 FOLLOWING)) AS dcf
FROM staff
WHERE id < 50
AND years IS NOT NULL
ORDER BY id;
ANSWER
ID | NAME | APC | ACF | DPC | DCF |
---|---|---|---|---|---|
10 |
Sanders |
10 |
30 |
30 |
10 |
20 |
Pernal |
30 |
50 |
50 |
30 |
30 |
Marenghi |
50 |
70 |
70 |
50 |
40 |
O’Brien |
70 |
40 |
40 |
70 |
The following table illustrates the processing sequence in the above query. Each BETWEEN is applied from left to right, while the rows are read either from left to right (ORDER BY ID ASC) or right to left (ORDER BY ID DESC):
ASC id (10,20,30,40)
READ ROWS, LEFT to RIGHT | 1ST-ROW | 2ND-ROW | 3RD-ROW | 4TH-ROW |
---|---|---|---|---|
1 PRECEDING to CURRENT ROW |
10=10 |
10+20=30 |
20+30=40 |
30+40=70 |
CURRENT ROW to 1 FOLLOWING |
10+20=30 |
20+30=50 |
30+40=70 |
40 =40 |
DESC id (40,30,20,10)
READ ROWS, RIGHT to LEFT | 1ST-ROW | 2ND-ROW | 3RD-ROW | 4TH-ROW |
---|---|---|---|---|
1 PRECEDING to CURRENT ROW |
20+10=30 |
30+20=50 |
40+30=70 |
40 =40 |
CURRENT ROW to 1 FOLLOWING |
10 =10 |
20+10=30 |
30+20=50 |
40+30=70 |
Preceding row is always on LEFT of current row. Following row_ is always on RIGHT of current row. |
The BETWEEN predicate, when used in an ordinary SQL statement, is not affected by the sequence of the input rows. But the BETWEEN phrase, when used in an aggregation function, is affected by the input sequence. |
7. Scalar Functions
=== Introduction
Scalar functions act on a single row at a time. In this section we shall list all of the ones that come with Db2 and look in detail at some of the more interesting ones. Refer to the SQL Reference for information on those functions not fully described here.
7.1. Sample Data
The following self-defined view will be used throughout this section to illustrate how some of the following functions work. Observe that the view has a VALUES expression that defines the contents- three rows and nine columns.
CREATE VIEW scalar
(d1, f1, s1, c1, v1, ts1, dt1, tm1, tc1) AS
WITH temp1 (n1, c1, t1) AS
( VALUES ( -2.4
, 'ABCDEF'
, '1996-04-22-23.58.58.123456')
, ( +0.0
, 'ABCD '
, '1996-08-15-15.15.15.151515')
, ( +1.8
, 'AB'
, '0001-01-01-00.00.00.000000')
)
SELECT DECIMAL(n1,3,1)
, DOUBLE(n1)
, SMALLINT(n1)
, CHAR(c1,6)
, VARCHAR(RTRIM(c1),6)
, TIMESTAMP(t1)
, DATE(t1)
, TIME(t1)
, CHAR(t1)
FROM temp1;
Below are the view contents:
D1 | F1 | S1 | C1 | V1 | TS1 | DT1 | TM1 | TC1 |
---|---|---|---|---|---|---|---|---|
-2.4 |
-2.4e+000 |
-2 |
ABCDEF |
ABCDEF |
1996-04-22-23.58.58.123456 |
1996-04-22 |
23:58:58 |
1996-04-22-23.58.58.123456 |
0.0 |
0.0e+000 |
0 |
ABCD |
ABCD |
1996-08-15-15.15.15.151515 |
1996-08-15 |
15:15:15 |
1996-08-15-15.15.15.151515 |
1.8 |
1.8e+000 |
1 |
AB |
AB |
0001-01-01-00.00.00.000000 |
0001-01-01 |
00:00:00 |
0001-01-01-00.00.00.000000 |
7.2. Scalar Functions, Definitions
7.2.1. ABS or ABSVAL
Returns the absolute value of a number (e.g. -0.4 returns + 0.4). The output field type will equal the input field type (i.e. double input returns double output).
SELECT d1 AS d1
, ABS(D1) AS d2
, f1 AS f1
, ABS(f1) AS F2
FROM scalar;
ANSWER (float output shortened)
D1 | D2 | F1 | F2 |
---|---|---|---|
-2.4 |
2.4 |
-2.400e+0 |
2.400e+00 |
0.0 |
0.0 |
0.000e+0 |
0.000e+00 |
1.8 |
1.8 |
1.800e+0 |
1.800e+00 |
7.2.2. ACOS
Returns the arccosine of the argument as an angle expressed in radians. The output format is double.
7.2.3. ADD_DAYS
Returns a datetime value that represents the first argument plus a specified number of days.
7.2.4. ADD_HOURS
Returns a timestamp value that represents the first argument plus a specified number of hours.
7.2.5. ADD_MINUTES
Returns a timestamp value that represents the first argument plus a specified number of minutes.
7.2.6. ADD_MONTHS
Returns a datetime value that represents expression plus a specified number of months.
7.2.7. ADD_SECONDS
Returns a timestamp value that represents the first argument plus a specified number of seconds and fractional seconds.
7.2.8. ADD_YEARS
Returns a datetime value that represents the first argument plus a specified number of years.
7.2.9. AGE
Returns a numeric value that represents the number of full years, full months, and full days between the current timestamp and the argument.
7.2.10. ARRAY_DELETE
Deletes elements from an array.
7.2.11. ARRAY_FIRST
Returns the minimum array index value of the array.
7.2.12. ARRAY_LAST
Returns the maximum array index value of the array.
7.2.13. ARRAY_NEXT
Returns the next larger array index value for an array relative to the specified array index argument.
7.2.14. ARRAY_PRIOR
Returns the next smaller array index value for an array relative to the specified array index argument.
7.2.15. ASCII
Returns the ASCII code value of the leftmost input character. Valid input types are any valid character type up to 1 MEG. The output type is integer.
SELECT c1
, ASCII(c1) AS ac1
, ASCII(SUBSTR(c1,2)) AS ac2
FROM scalar
WHERE c1 = 'ABCDEF';
ANSWER
C1 | AC1 | AC2 |
---|---|---|
ABCDEF |
65 |
66 |
The CHR function is the inverse of the ASCII function.
7.2.16. ASIN
Returns the arcsine of the argument as an angle expressed in radians. The output format is double.
7.2.17. ATAN
Returns the arctangent of the argument as an angle expressed in radians. The output format is double.
7.2.18. ATAN2
Returns the arctangent of x and y coordinates, specified by the first and second arguments, as an angle, expressed in radians. The output format is double.
7.2.19. ATANH
Returns the hyperbolic acrctangent of the argument, where the argument is and an angle expressed in radians. The output format is double.
7.2.20. BIGINT
Converts the input value to bigint (big integer) format. The input can be either numeric or character. If character, it must be a valid representation of a number.
WITH temp (big) AS
(VALUES BIGINT(1)
UNION ALL
SELECT big * 256
FROM temp
WHERE big < 1E16)
SELECT big
FROM temp;
ANSWER
BIG |
---|
1 |
256 |
65536 |
16777216 |
4294967296 |
1099511627776 |
281474976710656 |
72057594037927936 |
Converting certain float values to both BIGINT and decimal will result in different values being returned (see below). Both results are arguably correct, it is simply that the two functions use different rounding methods:
WITH temp (f1) AS
(VALUES FLOAT(1.23456789)
UNION ALL
SELECT f1 * 100
FROM temp
WHERE f1 < 1E18
)
SELECT f1 AS float1
, DEC(f1,19) AS decimal1
, BIGINT(f1) AS bigint1
FROM temp;
FLOAT1 | DECIMAL1 | BIGINT1 |
---|---|---|
+1.23456789000000E+000 |
1. |
1 |
+1.23456789000000E+002 |
123. |
123 |
+1.23456789000000E+004 |
12345. |
12345 |
+1.23456789000000E+006 |
1234567. |
1234567 |
+1.23456789000000E+008 |
123456789. |
123456788 |
+1.23456789000000E+010 |
12345678900. |
12345678899 |
+1.23456789000000E+012 |
1234567890000. |
1234567889999 |
+1.23456789000000E+014 |
123456789000000. |
123456788999999 |
+1.23456789000000E+016 |
12345678900000000. |
12345678899999996 |
+1.23456789000000E+018 |
1234567890000000000. |
1234567889999999488 |
See Floating Point Numbers for a discussion on floating-point number manipulation.
7.2.21. BINARY
Returns a fixed-length binary string representation of a string of any data type.
7.2.22. BIT Functions
There are five BIT functions:
-
BITAND 1 if both arguments are 1.
-
BITANDNOT Zero if bit in second argument is 1, otherwise bit in first argument.
-
BITOR 1 if either argument is 1.
-
BITXOR 1 if both arguments differ.
-
BITNOT Returns opposite of the single argument.
The arguments can be SMALLINT (16 bits), INTEGER (32 bits), BIGINT (64 bits), or DECFLOAT (113 bits). The result is the same as the argument with the largest data type.
Negative numbers can be used in bit manipulation. For example the SMALLINT value -1 will have all 16 bits set to "1" (see example on Updating BIT Values). As their name implies, the BIT functions can be used to do bit-by-bit comparisons between two numbers:
WITH temp1 (b1, b2) AS
(VALUES ( 1, 0) ,( 0, 1)
, ( 0, 0) ,( 1, 1)
, ( 2, 1) ,(15,-7)
, (15, 7) ,(-1, 1)
, (15,63) ,(63,31)
, (99,64) ,( 0,-2)
)
, temp2 (b1, b2) AS
(SELECT SMALLINT(b1)
, SMALLINT(b2)
FROM temp1)
SELECT b1
, b2
, HEX(b1) AS "hex1"
, HEX(b2) AS "hex2"
, BITAND(b1,b2) AS "and"
, BITANDNOT(b1,b2) AS "ano"
, BITOR(b1,b2) AS "or"
, BITXOR(b1,b2) AS "xor"
FROM temp2;
ANSWER
B1 | B2 | hex1 | hex2 | and | ano | or | xor |
---|---|---|---|---|---|---|---|
1 |
0 |
0100 |
0000 |
0 |
1 |
1 |
1 |
0 |
1 |
0000 |
0100 |
0 |
0 |
1 |
1 |
0 |
0 |
0000 |
0000 |
0 |
0 |
0 |
0 |
1 |
1 |
0100 |
0100 |
1 |
0 |
1 |
0 |
2 |
1 |
0200 |
0100 |
0 |
2 |
3 |
3 |
15 |
-7 |
0F00 |
F9FF |
9 |
6 |
-1 |
-10 |
15 |
7 |
0F00 |
0700 |
7 |
8 |
15 |
8 |
-1 |
1 |
FFFF |
0100 |
1 |
-2 |
-1 |
-2 |
15 |
63 |
0F00 |
3F00 |
15 |
0 |
63 |
48 |
63 |
31 |
3F00 |
1F00 |
31 |
32 |
63 |
32 |
99 |
64 |
6300 |
4000 |
64 |
35 |
99 |
35 |
0 |
-2 |
0000 |
FEFF |
0 |
0 |
-2 |
-2 |
Displaying BIT Values
It can sometimes be hard to comprehend what a given BASE 10 value is in BIT format. To help, the following user-defined-function converts SMALLINT numbers to BIT values:
CREATE FUNCTION bitdisplay(inparm SMALLINT)
RETURNS CHAR(16)
BEGIN ATOMIC
DECLARE outstr VARCHAR(16);
DECLARE inval INT;
IF inparm >= 0 THEN
SET inval = inparm;
ELSE
SET inval = INT(65536) + inparm;
END IF;
SET outstr = '';
WHILE inval > 0 DO
SET outstr = STRIP(CHAR(MOD(inval,2))) || outstr;
SET inval = inval / 2;
END WHILE;
RETURN RIGHT(REPEAT('0',16) || outstr,16);
END!
Below is an example of the above function in use:
WITH temp1 (b1) AS
(VALUES (32767)
, (16383)
, ( 4096)
, ( 118)
, ( 63)
, ( 16)
, ( 2)
, ( 1)
, ( 0)
, ( -1)
, ( -2)
, ( -3)
, ( -64)
, (-32768)
)
, temp2 (b1) AS
(SELECT SMALLINT(b1)
FROM temp1
)
SELECT b1
, HEX(b1) AS "hex1"
, BITDISPLAY(b1) AS "bit_display"
FROM temp2;
ANSWER
B1 | hex1 | bit_display |
---|---|---|
32767 |
FF7F |
0111111111111111 |
16383 |
FF3F |
0011111111111111 |
4096 |
0010 |
0001000000000000 |
118 |
7600 |
0000000001110110 |
63 |
3F00 |
0000000000111111 |
16 |
1000 |
0000000000010000 |
2 |
0200 |
0000000000000010 |
1 |
0100 |
0000000000000001 |
0 |
0000 |
0000000000000000 |
-1 |
FFFF |
1111111111111111 |
-2 |
FEFF |
1111111111111110 |
-3 |
FDFF |
1111111111111101 |
-64 |
C0FF |
1111111111000000 |
-32768 |
0080 |
1000000000000000 |
Updating BIT Values
Use the BITXOR function to toggle targeted bits in a value. Use the BITANDNOT function to clear the same targeted bits. To illustrate, the next query uses these two functions to toggle and clear the last four bits, because the second parameter is 15, which is b"1111":
WITH temp1 (b1) AS
( VALUES (32767)
, (21845)
, (4096)
, (0)
, (-1)
, (-64)
)
, temp2 (b1, s15) AS
(SELECT SMALLINT(b1)
, SMALLINT(15)
FROM temp1
)
SELECT b1
, BITDISPLAY(b1) AS "b1_display"
, BITXOR(b1,s15) AS "xor"
, BITDISPLAY(BITXOR(b1,s15)) AS "xor_display"
, BITANDNOT(b1,s15) AS "andnot"
, BITDISPLAY(BITANDNOT(b1,s15)) AS "andnot_display"
FROM temp2;
Below is the answer:
B1 | b1_display | xor | xor_display | andnot | andnot_display |
---|---|---|---|---|---|
32767 |
0111111111111111 |
32752 |
0111111111110000 |
32752 |
0111111111110000 |
21845 |
0101010101010101 |
21850 |
0101010101011010 |
21840 |
0101010101010000 |
4096 |
0001000000000000 |
4111 |
0001000000001111 |
4096 |
0001000000000000 |
0 |
0000000000000000 |
15 |
0000000000001111 |
0 |
0000000000000000 |
-1 |
1111111111111111 |
-16 |
1111111111110000 |
-16 |
1111111111110000 |
-64 |
1111111111000000 |
-49 |
1111111111001111 |
-64 |
1111111111000000 |
The next query illustrate the use of the BITAND function to return those bits that match both parameters, and the BITNOT function to toggle all bits:
WITH temp1 (b1) AS
(VALUES (32767)
, (21845)
, (4096)
, (0)
, (-1)
, (-64)
)
, temp2 (b1, s15) AS
(SELECT SMALLINT(b1)
, SMALLINT(15)
FROM temp1
)
SELECT b1
, BITDISPLAY(b1) AS "b1_display"
, BITAND(b1,s15) AS "and"
, BITDISPLAY(BITAND(b1,s15)) AS "and_display"
, BITNOT(b1) AS "not"
, BITDISPLAY(BITNOT(b1)) AS "not_display"
FROM temp2;
Below is the answer:
B1 | b1_display | and | and_display | not | not_display |
---|---|---|---|---|---|
32767 |
0111111111111111 |
15 |
0000000000001111 |
-32768 |
1000000000000000 |
21845 |
0101010101010101 |
5 |
0000000000000101 |
-21846 |
1010101010101010 |
4096 |
0001000000000000 |
0 |
0000000000000000 |
-4097 |
1110111111111111 |
0 |
0000000000000000 |
0 |
0000000000000000 |
-1 |
1111111111111111 |
-1 |
1111111111111111 |
15 |
0000000000001111 |
0 |
0000000000000000 |
-64 |
1111111111000000 |
0 |
0000000000000000 |
63 |
0000000000111111 |
7.2.23. BLOB
Converts the input (1st argument) to a blob. The output length (2nd argument) is optional.
7.2.24. BOOLEAN
Returns the actual Boolean value that corresponds to a non-Boolean representation of a Boolean value.
7.2.25. BTRIM
Removes the characters that are specified in a trim string from the beginning and end of a source string.
7.2.26. CARDINALITY
Returns a value of type BIGINT that is the number of elements in an array.
7.2.27. CEIL or CEILING
Returns the next smallest integer value that is greater than or equal to the input (e.g. 5.045 returns 6.000). The output field type will equal the input field type.
SELECT d1
, CEIL(d1) AS d2
, f1
, CEIL(f1) AS f2
FROM scalar;
ANSWER (float output shortened)
D1 | D2 | F1 | F2 |
---|---|---|---|
-2.4 |
-2. |
-2.400E+0 |
-2.000E+0 |
0.0 |
0. |
+0.000E+0 |
+0.000E+0 |
1.8 |
2. |
+1.800E+0 |
+2.000E+0 |
Usually, when Db2 converts a number from one format to another, any extra digits on the right are truncated, not rounded. For example, the output of INTEGER(123.9) is 123. Use the CEIL or ROUND functions to avoid truncation. |
7.2.28. CHAR
The CHAR function has a multiplicity of uses. The result is always a fixed-length character value, but what happens to the input along the way depends upon the input type:
-
For character input, the CHAR function acts a bit like the SUBSTR function, except that it can only truncate starting from the left-most character. The optional length parameter, if provided, must be a constant or keyword.
-
Date-time input is converted into an equivalent character string. Optionally, the external format can be explicitly specified (i.e. ISO, USA, EUR, JIS, or LOCAL).
-
Integer and double input is converted into a left-justified character string.
-
Decimal input is converted into a right-justified character string with leading zeros. The format of the decimal point can optionally be provided. The default decimal point is a dot. The '+' and '-' symbols are not allowed as they are used as sign indicators.
Below are some examples of the CHAR function in action:
SELECT name
, CHAR(name, 3)
, comm
, CHAR(comm)
, CHAR(comm,'@')
FROM staff
WHERE id BETWEEN 80 AND 100
ORDER BY id;
ANSWER
NAME | 2 | COMM | 4 | 5 |
---|---|---|---|---|
James |
Jam |
128.20 |
00128.20 |
00128@20 |
Koonitz |
Koo |
1386.70 |
01386.70 |
01386@70 |
Plotz |
Plo |
- |
- |
- |
The CHAR function treats decimal numbers quite differently from integer and real numbers. In particular, it right-justifies the former (with leading zeros), while it left-justifies the latter (with trailing blanks). The next example illustrates this point:
WITH temp1 (n) AS
(VALUES (3)
UNION ALL
SELECT n * n
FROM temp1
WHERE n < 9000
)
SELECT n AS int
, CHAR(INT(n)) AS char_int
, CHAR(FLOAT(n)) AS char_flt
, CHAR(DEC(n)) AS char_dec
FROM temp1;
ANSWER
INT | CHAR_INT | CHAR_FLT | CHAR_DEC |
---|---|---|---|
3 |
3 |
3.0E0 |
00000000003. |
9 |
9 |
9.0E0 |
00000000009. |
81 |
81 |
8.1E1 |
00000000081. |
6561 |
6561 |
6.561E3 |
00000006561. |
43046721 |
43046721 |
4.3046721E7 |
00043046721. |
Negative numeric input is given a leading minus sign. This messes up the alignment of digits in the column (relative to any positive values). In the following query, a leading blank is put in front of all positive numbers in order to realign everything:
WITH temp1 (n1, n2) AS
(VALUES (SMALLINT(+3)
, SMALLINT(-7))
UNION ALL
SELECT n1 * n2
, n2
FROM temp1
WHERE n1 < 300
)
SELECT n1
, CHAR(n1) AS i1
, CASE
WHEN n1 < 0 THEN CHAR(n1)
ELSE '+' CONCAT CHAR(n1)
END AS i2
, CHAR(DEC(n1)) AS d1
, CASE
WHEN n1 < 0 THEN CHAR(DEC(n1))
ELSE '+' CONCAT CHAR(DEC(n1))
END AS d2
FROM temp1;
ANSWER
N1 | I1 | I2 | D1 | D2 |
---|---|---|---|---|
3 |
3 |
+3 |
00003. |
+00003. |
-21 |
-21 |
-21 |
-00021. |
-00021. |
147 |
147 |
+147 |
00147. |
+00147. |
-1029 |
-1029 |
-1029 |
-01029. |
-01029. |
7203 |
7203 |
+7203 |
07203. |
+07203. |
Both the I2 and D2 fields above will have a trailing blank on all negative values - that was added during the concatenation operation. The RTRIM function can be used to remove it.
DATE-TIME Conversion
The CHAR function can be used to convert a date-time value to character. If the input is not a timestamp, the output layout can be controlled using the format option:
-
ISO: International Standards Organization.
-
USA: American.
-
EUR: European, which is usually the same as ISO.
-
JIS: Japanese Industrial Standard, which is usually the same as ISO.
-
LOCAL: Whatever your computer is set to.
Below are some DATE examples:
SELECT CHAR(CURRENT DATE,ISO) AS iso
, CHAR(CURRENT DATE,EUR) AS eur
, CHAR(CURRENT DATE,JIS) AS jis
, CHAR(CURRENT DATE,USA) AS usa
FROM sysibm.sysdummy1;
ANSWER
ISO | EUR | JIS | USA |
---|---|---|---|
2005-11-30 |
30.11.2005 |
2005-11-30 |
11/30/2005 |
Below are some TIME examples: .CHAR function examples - time value
SELECT CHAR(CURRENT TIME,ISO) AS iso
, CHAR(CURRENT TIME,EUR) AS eur
, CHAR(CURRENT TIME,JIS) AS jis
, CHAR(CURRENT TIME,USA) AS usa
FROM sysibm.sysdummy1;
ANSWER
ISO | EUR | JIS | USA |
---|---|---|---|
19.42.21 |
19.42.21 |
19:42:21 |
07:42 PM |
A timestamp cannot be formatted to anything other than ISO output:
SELECT CHAR(CURRENT TIMESTAMP) AS TS
FROM sysibm.sysdummy1;
ANSWER
TS |
---|
2005-11-30-19.42.21.873002 |
Converting a date or time value to character, and then ordering the set of matching rows can result in unexpected orders. See CASE Checks in Wrong Sequence for details. |
CHAR vs. DIGITS - A Comparison
Numeric input can be converted to character using either the DIGITS or the CHAR function, though the former does not support float. Both functions work differently, and neither gives perfect output. The CHAR function doesn’t properly align up positive and negative numbers, while the DIGITS function looses both the decimal point and sign indicator:
SELECT d2
, CHAR(d2) AS cd2
, DIGITS(d2) AS dd2
FROM
(SELECT DEC(d1, 4, 1) AS d2
FROM scalar
) AS xxx
ORDER BY 1;
ANSWER
D2 | CD2 | DD2 |
---|---|---|
-2.4 |
-002.4 |
0024 |
0.0 |
000.0 |
0000 |
1.8 |
001.8 |
0018 |
Neither the DIGITS nor the CHAR function do a great job of converting numbers to characters. See Convert Number to Character for some user-defined functions that can be used instead. |
7.2.29. CHARACTER_LENGTH
This function is similar to the LENGTH function, except that it works with different encoding schemas. The result is an integer value that is the length of the input string.
WITH temp1 (c1) AS
(VALUES (CAST('ÁÉÌ' AS VARCHAR(10)))
)
SELECT c1 AS C1
, LENGTH(c1) AS LEN
, OCTET_LENGTH(c1) AS OCT
, CHAR_LENGTH(c1,OCTETS) AS L08
, CHAR_LENGTH(c1,CODEUNITS16) AS L16
, CHAR_LENGTH(c1,CODEUNITS32) AS L32
FROM temp1;
ANSWER
C1 | LEN | OCT | L08 | L16 | L32 |
---|---|---|---|---|---|
ÁÉÌ |
6 |
6 |
6 |
3 |
3 |
7.2.30. CHR
Converts integer input in the range 0 through 255 to the equivalent ASCII character value. An input value above 255 returns 255. The ASCII function (see above) is the inverse of the CHR function.
SELECT 'A' AS "c"
, ASCII('A') AS "c>n"
, CHR(ASCII('A')) AS "c>n>c"
, CHR(333) AS "nl"
FROM staff
WHERE id = 10;
ANSWER
C | C>N | C>N>C | NL |
---|---|---|---|
A |
65 |
A |
ÿ |
At present, the CHR function has a bug that results in it not returning a null value when the input value is greater than 255. |
7.2.31. CLOB
Converts the input (1st argument) to a CLOB. The output length (2nd argument) is optional. If the input is truncated during conversion, a warning message is issued. For example, in the following example the second CLOB statement will induce a warning for the first two lines of input because they have non-blank data after the third byte:
SELECT c1
, CLOB(c1) AS cc1
, CLOB(c1,3) AS cc2
FROM scalar;
ANSWER
C1 | CC1 | CC2 |
---|---|---|
ABCDEF |
ABCDEF |
ABC |
ABCD |
ABCD |
ABC |
AB |
AB |
AB |
The Db2BATCH command processor dies a nasty death whenever it encounters a CLOB field in the output. If possible, convert to VARCHAR first to avoid this problem. |
7.2.32. COALESCE
Returns the first non-null value in a list of input expressions (reading from left to right). Each expression is separated from the prior by a comma. All input expressions must be compatible. VALUE is a synonym for COALESCE.
SELECT id
, comm
, COALESCE(comm, 0)
FROM staff
WHERE id < 30
ORDER BY id;
ANSWER
ID | COMM | 3 |
---|---|---|
10 |
- |
0.00 |
20 |
612.45 |
612.45 |
A CASE expression can be written to do exactly the same thing as the COALESCE function. The following SQL statement shows two logically equivalent ways to replace nulls:
WITH temp1(c1,c2,c3) AS
(VALUES (CAST(NULL AS SMALLINT)
, CAST(NULL AS SMALLINT)
, CAST(10 AS SMALLINT))
)
SELECT COALESCE(c1, c2, c3) AS cc1
, CASE
WHEN c1 IS NOT NULL THEN c1
WHEN c2 IS NOT NULL THEN c2
WHEN c3 IS NOT NULL THEN