How to count a boolean in grouped Spark data frame

Answer #1 100 %

Probably the simplest solution is a plain CAST (C style where TRUE -> 1, FALSE -> 0) with SUM:

    .agg(F.avg("Salary"), F.sum(F.col("IsUnemployed").cast("long"))))

A little bit more universal and idiomatic solution is CASE WHEN with COUNT:

        F.count(F.when(F.col("IsUnemployed"), F.col("IsUnemployed")))))

but here it is clearly an overkill.

You’ll also like:

© 2022 CodeForDev.com -