Now, Load huge datasets within a second ⚡ using lazy computation in Python?

Original article was published on Artificial Intelligence on Medium

Out-of-core DataFrame

Sticking to the main concept behind the development of Vaex, we need to remember the following note,

“Filtering and evaluating expressions will not waste memory by making copies; the data is kept untouched on disk and will be streamed only when needed. Delay the time before you need a cluster.”

such as…

vaex_df[vaex_df.c2>70]

All the algorithms work out of the core, the limit is the size of your hard driver

like…

dff.c2.minmax(progress='widget')

Vaex implements parallelized, highly performant groupby operations, especially when using categories (>1 billion/second).

implement as…

%%time
vaex_df_group = vaex_df.groupby(vaex_df.c1,
agg = vaex.agg.mean(vaex_df.c4))
vaex_df_group
%%time
vaex_df.groupby(vaex_df.c1,agg='count')