SanOraLife: May 2019

Thursday, May 30, 2019

Concatenated string to individual rows in Spark SQL, PG and Snowflake

I had this column named age_band which will have values like "55-64|65-74|75+"
As you can see it contains age groups stored in as a string concatenated with '|' and each age group needs to be compared separately.

I had this taken care in PostgreSQL using:

select unnest(string_to_array('55-64|65-74|75+', '|'));

which results:

# unnest
1 55-64
2 65-74
3 75+

Now, I have to perform the same in Spark SQL and here it is:

select explode(split('55-64|65-74|75+', '[|]'));

and here is the result:

col
55-64
65-74
75+

Update: 11/27/2019

And in SnowFlake:

select col1, c.value::string as age_band

from san_test,

lateral flatten(input=>split(col2, '|')) c;

col2 is the column where the age_band has concatenated values with | delimited.

select c.value::string as age_band from lateral flatten(input=>split('55-64|65-74|75+', '|')) c;

AGE_BAND 55-64 65-74 75+

SanOraLife

Thursday, May 30, 2019

Concatenated string to individual rows in Spark SQL, PG and Snowflake

About Me

Popular Posts

select count(1)from page_hits;

Total Pageviews

Blog Archive