Redshift: Loops and Conditional Expressions
January 2023 (1059 Words, 6 Minutes)
When we need to iterate through the data or implement logic conditionally, SQL’s looping and conditional expressions come in particularly handy. In this post, I’ll explain how to use them in Redshift SQL.
Redshift SQL - PL/pgsql
Redshift implements stored procedures using the PostgreSQL procedural language (PL/pgsql). Control structures in pgsqlpl can be utilised to build vital business logics for flexible data manipulation. Redshift can also use majority of plpgsql statements, but not all of them. For future reference, I’m listing the Redshift supported control structures in this post.
Control Structures in Redshift
Redshift’s control structures are usually divided into two categories:
- Conditional
- Iterative
Conditional Control Structures
A task can be carried out using conditional control structures under specific circumstances. In conditional control structures, IF statements are commonly utilised. These operate similarly to general-purpose programming languages. If a value in a table needs to be updated based on a conditional flow, for instance, IF statements can be utilised in this situation. In Redshift’s plpgsql, there are three types of conditional control structures available:
- IF - THEN - END IF
- IF - THEN - ELSE - END IF
- IF - THEN - ELSIF - THEN - ELSE - END IF
CASE vs IF in SQL
CASE expressions and IF statements are similar. CASE can be substituted with IF, but not the other way around. While CASE is an expression that is used to return data values, IF is a statement that is frequently used for logic flow. Although it has several limitations, CASE is essentially like a function with built-in IF logic. CASE is limited in comparison to what an IF statement may accomplish.
Syntax for IF statements
Note: The following syntaxes are taken from the Redshift documentation.
IF-THEN-END IF
IF <boolean-expression> THEN
statements
END IF;
IF-THEN-ELSE-END IF
IF <boolean-expression> THEN
statements
ELSE
statements
END IF;
IF-THEN-ELSIF-THEN-ELSE-END IF
IF <boolean-expression> THEN
statements
[ ELSIF <boolean-expression> THEN
statements
[ ELSIF <boolean-expression> THEN
statements
...] ]
[ ELSE
statements ]
END IF;
Example
Let’s say your teammate inserted a few rows into “open_jobs” table with wrong “profession”s as either ‘Data Analyst’ or ‘Data Scientist’ instead of ‘Data Engineer’. You have to update the “profession” of those rows to ‘Data Engineer’. If no wrong rows are there, then you have to re-insert the rows of ‘Data Engineer’ profession. If rows are there, raise a notice.
CREATE OR REPLACE PROCEDURE data_cleaning()
LANGUAGE plpgsql
AS $$
BEGIN
IF EXISTS (SELECT FROM open_jobs WHERE profession='Data Analyst') THEN
UPDATE open_jobs SET profession='Data Engineer' WHERE profession='Data Analyst';
ELSIF EXISTS (SELECT FROM open_jobs WHERE profession='Data Scientist') THEN
UPDATE open_jobs SET profession='Data Engineer' WHERE profession='Data Scientist';
ELSIF NOT EXISTS (SELECT FROM open_jobs WHERE profession='Data Engineer') THEN
INSERT INTO open_jobs SELECT * FROM stage_jobs WHERE profession='Data Engineer';
ELSE
RAISE INFO 'The Data Engineer open job rows were correctly inserted into the table'
END;
$$;
Iterative Control Structures
The execution flow of plpgsql commands can be managed with the use of iterative control statements. The following iterative control structures are supported by Redshift:
- LOOP
- EXIT
- FOR
- WHILE
- CONTINUE
LOOP and EXIT are keywords used in plpgsql procedures for starting and ending the loop respectively. WHILE and FOR loops operate similarly to general-purpose programming languages.
Lets see simple examples using LOOPS in Redshift SQL.
LOOP - EXIT - WHEN
CREATE OR REPLACE PROCEDURE simple_loop()
LANGUAGE plpgsql
AS $$
DECLARE
count INTEGER := 0;
BEGIN
<<print_waiting>>
LOOP
RAISE INFO 'WAITING: %', count;
cnt = cnt + 1;
EXIT print_waiting WHEN (cnt > 10);
END LOOP;
END;
$$;
output:
CALL redshift_simple_loop();
INFO: WAITING: 0
INFO: WAITING: 1
INFO: WAITING: 2
INFO: WAITING: 3
INFO: WAITING: 4
INFO: WAITING: 5
INFO: WAITING: 6
INFO: WAITING: 7
INFO: WAITING: 8
INFO: WAITING: 9
INFO: WAITING: 10
WHILE LOOP
WHILE revenue > 0 AND cost > 0 LOOP
-- some computations here
END LOOP;
FOR LOOP
CREATE OR REPLACE PROCEDURE records_info_message()
LANGUAGE plpgsql
AS $$
DECLARE
r record;
BEGIN
for f in select title, year
from film_details
where director='RAJAMOULI'
LOOP
RAISE INFO '%(year: %)', f.title, f.director;;
END LOOP;
END;
$$;
output:
INFO: RRR(year: 2022)
INFO: Bahubali 2(year: 2017)
INFO: Bahubali(year: 2015)
Fibonacci using LOOPs in UDFs
I explained about using SQL UDFs with Redshift in my previous blog. We may execute sophisticated logics like computing the Fibonacci number for each value in a table column in SQL by using LOOPs in UDFs.
CREATE FUNCTION f_fibonacci(int)
returns float stable AS $$
DECLARE n INTEGER:= $1;
fib integer := 0;
counter integer := 0 ;
i integer := 0 ;
j integer := 1 ;
BEGIN
IF (n < 1) THEN
fib := 0 ;
END IF;
LOOP
EXIT WHEN counter = n ;
counter := counter + 1 ;
SELECT j,i + j INTO i,j ;
END LOOP;
fib := i;
raise notice '%', fib;
END;
$$ language sql;
Further Information: