Thursday, May 30, 2019

Concatenated string to individual rows in Spark SQL, PG and Snowflake

I had this column named age_band which will have values like "55-64|65-74|75+"
As you can see it contains age groups stored in as a string concatenated with '|' and each age group needs to be compared separately.

I had this taken care in PostgreSQL using:


select unnest(string_to_array('55-64|65-74|75+', '|'));


which results:

# unnest
1 55-64
2 65-74
3 75+

Now, I have to perform the same in Spark SQL and here it is:


select explode(split('55-64|65-74|75+', '[|]'));


and here is the result:

col
55-64
65-74
75+

Update: 11/27/2019

And in SnowFlake:

select col1, c.value::string as age_band
  from san_test,
     lateral flatten(input=>split(col2, '|')) c;

col2 is the column where the age_band has concatenated values with | delimited. 

select c.value::string as age_band from lateral flatten(input=>split('55-64|65-74|75+', '|')) c;

AGE_BAND 55-64 65-74 75+