Useful Teradata Queries

Every now and then I find myself need reminding one of this handy queries. While googling them is easy enough, it is easier to have it all in my own blog.

Here is some of my Teradata Queries that I often need for my work, and perhaps you need for your work.

Create Table with Select

I often forgotten the complete syntax of this very useful query.

--to create one on the database
CREATE MULTISET TABLE db.table_name AS (
	SELECT *
	FROM source_data
	WHERE conditions = value
) WITH DATA PRIMARY INDEX (ID, SNAP_DATE);

-- and the volatile table version
CREATE MULTISET VOLATILE TABLE vt_table_name AS (
	SELECT *
	FROM source_data
	WHERE conditions = value
) WITH DATA PRIMARY INDEX (ID, SNAP_DATE)
ON COMMIT PRESERVE ROWS;

--and to create one without select
CREATE VOLATILE TABLE vt_table (
       ACCT_ID VARCHAR(10)
       ,SNAP_D DATE
) ON COMMIT PRESERVE ROWS;

Qualify Rank or Row_number Statement

Qualify Rank() or Row_number() can be used as a filter condition, or as a created column/value inside the select statement. If this select query is created in a table, the values then can be used in a where clause.

--to use it as a filter
SELECT *
FROM [SOURCE_DATA]
WHERE [CONDITION] = 1
QUALIFY RANK() OVER (PARTITION BY [ID] ORDER BY [ORDERED_COLUMN]) = 1
--or
QUALIFY ROW_NUMBER() OVER (PARTITION BY ID ORDER BY [ORDERED_COLUMN]) = 1


--to create rank or row_number as value.
SELECT *
    ,ROW_NUMBER() OVER (PARTITION BY ID ORDER BY [ORDERED_COLUMN]) AS ROW_NUM
FROM [SOURCE_DATA]
WHERE [CONDITION] = 1

If you wonder the difference between Rank() and Row_number(), the difference is that the same value can have the same rank number. For example if there are two smallest numbers, then the first two values will both have Rank value 1, then the third number will have rank = 3, and so on. While Row_number will randomly choose and assigned a value for each row, and thus no duplicate value.

Analytical Functions Using Partition By

This is very useful to quickly get a summary or average for a group of data without running the query with multiple where clause or copying to Excel.

--To get total amount per particular value (like customer, or division, etc)
SUM(amount) over (PARTITION BY CUST_ID) As Total_amount_per_customer

--Same as above, but value will be ordered by Account ID
SUM(amount) over (PARTITION BY CUST_ID ORDER BY ACCT_ID) As Total_amount_per_customer

--Same as above, but value will be Cumulative
SUM(amount) over (PARTITION BY CUST_ID ORDER BY ACCT_ID ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW ) As Total_amount_per_customer

Preceding and Following

This syntax is useful to get the value of the previous row. However this works only if the previous value is not calculated value. If it is, then recursive query is the way.

This is useful, for example, when trying to compare the difference between previous and current period balances.

SUM(amount) over (PARTITION BY CUST_ID ORDER BY ACCT_ID ROWS BETWEEN UNBOUNDED PRECEDING AND PRECEDING ROW ) As prev_amount

SUM(amount) over (PARTITION BY CUST_ID ORDER BY ACCT_ID ROWS BETWEEN UNBOUNDED FOLLOWING AND FOLLOWING ROW ) As next_amount

Recursive SQL

This is by far the most often googled query I have ever done.

Recursive Query can be used to setup tree-based/node-based hierarchy, or to get the calculated value of previous row.

In the example below, the code used calculated value of previous level in the current level.

WITH RECURSIVE NSEED AS
(
	SELECT ID
		,10+1 AS VAL
		,LEVEL   --predefined value
	FROM [SOURCE_TABLE]
	WHERE LEVEL = 1
	
	UNION ALL
	
	SELECT CURR.ID
		,PREV.VAL + 1 AS VAL_NEW --new value based on previous level
		,CURR.LEVEL
	FROM NSEED AS PREV
	JOIN [SOURCE_TABLE] AS CURR
		ON PREV.ID = CURR.ID
		AND PREV.LEVEL + 1 = CURR.LEVEL --set the next hierarchy level
)
SELECT *
FROM NSEED

In the above example, column [level] is already predefined. But what if you don’t have such column? You can simply create them based on a rank and populate them with the qualify statement as mentioned above.

Check Table Skew Factor

In Teradata, tables with high skew factor eats a large chunk of storage. It is a good idea to check the efficiency of your table from time to time.

SELECT 
    TABLENAME,
    SUM(CURRENTPERM) /(1024*1024) AS CURRENTPERM, 
    (100 - (AVG(CURRENTPERM)/MAX(CURRENTPERM)*100)) AS SKEWFACTOR 
FROM DBC.TABLESIZE 
WHERE DATABASENAME= <your database> 
    AND TABLENAME =<your table>  
GROUP BY 1;

Find the table owner

If you are like me, often don’t remember the tables that you created, or simply want to find out who is the creator of a certain table, then this query is for you.

SELECT  DatabaseName,
        TableName,
        CreateTimeStamp,
        LastAlterTimeStamp,
        CreatorName
FROM    DBC.TablesV
WHERE   TableKind = 'T'  --T for table, V for View
and     DatabaseName = '[Change to your DB Name]'
ORDER BY    TableName;

Update Query

Simple update query, always forgot the syntax.

--simple basic update query
UPDATE <your table>
SET <your column> = <your value>
WHERE <condition> = 1 --this is most important, otherwise the query will update all records

--update query if you need to join
UPDATE TGT
FROM <your table to update> as TGT
     ,<source table> as SRC
SET <your column> = <your value>
WHERE TGT.ID = SRC.ID --join criteria
     AND TGT.<conditions> = 1

Alter table to Add New Column

Simple enough, code to add new column on existing table.

ALTER TABLE <your table>
ADD <new column> DATE DEFAULT NULL;

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s