Original article was published on Artificial Intelligence on Medium
Sticking to the main concept behind the development of Vaex, we need to remember the following note,
“Filtering and evaluating expressions will not waste memory by making copies; the data is kept untouched on disk and will be streamed only when needed. Delay the time before you need a cluster.”
All the algorithms work out of the core, the limit is the size of your hard driver
Vaex implements parallelized, highly performant groupby operations, especially when using categories (>1 billion/second).
vaex_df_group = vaex_df.groupby(vaex_df.c1,
agg = vaex.agg.mean(vaex_df.c4))