Loading...

How to count a boolean in grouped Spark data frame

Answer #1 100 %

Probably the simplest solution is a plain CAST (C style where TRUE -> 1, FALSE -> 0) with SUM:

(data
    .groupby("Region")
    .agg(F.avg("Salary"), F.sum(F.col("IsUnemployed").cast("long"))))

A little bit more universal and idiomatic solution is CASE WHEN with COUNT:

(data
    .groupby("Region")
    .agg(
        F.avg("Salary"),
        F.count(F.when(F.col("IsUnemployed"), F.col("IsUnemployed")))))

but here it is clearly an overkill.

You’ll also like:


© 2022 CodeForDev.com -