home..

Redshift: User-Defined Functions

aws

Today I learned, I can create functions in SQL and use them over and over for repetitive tasks while building or developing pipelines. I’ve been working with AWS Redshift for the past few months. Redshift is a very powerful and cost-effective cloud data warehousing solution provided by Amazon. It has its own Redshift SQL dialect which is a variant of PostgreSQL.

What are User-defined functions?

Redshift offers various inbuilt functions like sysdate or coalesce which can be executed whenever required. As the name suggestes, User-defined functions are the functions defined by a user to extend the inbuilt capabilities of RedshiftSQL. The defined functions are stored inside the database and can be used by any user with sufficient permissions and credentials.

Language support for UDFs

UDFs in general can be created using SQL select statements. Redshift UDFs can also be created using Python functions. While UDFs written in SQL are better at performance and less prone to errors, Python UDF support comes with advantage of inbuilt libraries. In addition to Python STL, modules like pandas, numpy, scipy and dateutil are part of RedshiftSQL.

Redshift also supports AWS Lambda UDFs which are custom functions defined in Lambda. Sometimes, there might be some limitations for SQL and python UDFs. This can be handled using Lambda UDFs since these can be created in any language that is supported by AWS Lambda. This includes Go, Java, C#, Node.js etc. So, most of the languages can be integrated into SQL statemnts in Redshift and can be utilized over and over by all the users.

SQL UDF

SQL UDF integrates SQL select clause that runs whenever it is called and returns a scalar. A function can be created using create function statement. It has the following parameters - input arguments along with data types, one return argument, a SQL select clause. A weird point to note here is - the arguments cannot be named inside the UDF select clause. They must be refered as $1, $2 and so on based on the order of the arguments mentioned in function creation. The select clause should not include any other clauses like from, where, limit etc. instead these should be used in the SQL statament that calls the UDF.

Structure of SQL UDF

create function f_sample_udf(<arg1_datatype>, <arg2_datatype>,...)
    returns <return_datatype>
stable
as $$

    select ...

$$ language sql;

Example - SQL UDF

Example 1:

create function f_clean_string_number(varchar)
    returns float
stable
as $$

    select cast(regex_replace($1, ',', '') as float)

$$ language sql;

Example 2:

create function f_percentage(float, float)
    returns float
stable 
as $$

    select case when $2 > $1 then ($1/$2)*100
        else ($2/$1)*100
    end

$$ language sql;

Now the above UDF can be used as following:

select f_percentage(acheived, monthly_target) from finance; 

Python UDF

Python UDFs are also similar to SQL UDFs but python code can be used inside a function directly. The language definition for a UDF for using python inside it is PLPYTHONU. Python UDF requires argument names inside the select clause. Modules like pandas and numpy can also be imported apart from the python standard library. The input and return data types suppoorted by Python UDFs are SMALLINT, INTEGER, BIGINT, DECIMAL, REAL, DOUBLE PRECISION, BOOLEAN, CHAR, VARCHAR, DATE, and TIMESTAMP. For a mapping of Amazon Redshift data types to Python data types, refer to Python UDF datatypes.

Structure of Python UDF

create function f_sample_udf(arg1 <arg1_datatype>, arg2 <arg2_datatype>, ...))
returns <return_datatype>
stable 
as $$

    <PYTHON CODE>

$$ language plpythonu;

Example - Python UDF

Example 1:

create function f_clean_string_number(txt varchar)
    returns varchar
stable
as $$

    return float(txt.replace(',',''))

$$ language plpythonu;

Example 2:

create function f_percentage(x float, y float)
    returns float
stable 
as $$

    if y > x:
        return (x/y)*100
    else:
        return (y/x)*100

$$ language plpythonu;

Now the above UDF can be used as following:

select f_percentage(acheived, monthly_target) from finance; 

UDF Security and Permissions

grant usage on language plpythonu to group <group_name>;

revoke usage on language sql from group <group_name>;
revoke execute on function f_sample_function(a float, b float) from PUBLIC;

grant execute on function f_sample_function(a float, b float) to group udf_devs;

Note

Further Information

© 2025 Jithendra Yenugula