How to get array/bag of elements from Hive group by operator?
Answer #1 100 %The built in aggregate function collect_set
(doumented here) gets you almost what you want. It would actually work on your example input:
SELECT F1, collect_set(F2)
FROM sample_table
GROUP BY F1
Unfortunately, it also removes duplicate elements and I imagine this isn't your desired behavior. I find it odd that collect_set
exists, but no version to keep duplicates. Someone else apparently thought the same thing. It looks like the top and second answer there will give you the UDAF you need.