Now, Load huge datasets within a second ⚡ using lazy computation in Python?

Original article was published on Artificial Intelligence on Medium

Out-of-core DataFrame

Sticking to the main concept behind the development of Vaex, we need to remember the following note,

“Filtering and evaluating expressions will not waste memory by making copies; the data is kept untouched on disk and will be streamed only when needed. Delay the time before you need a cluster.”

such as…


All the algorithms work out of the core, the limit is the size of your hard driver



Vaex implements parallelized, highly performant groupby operations, especially when using categories (>1 billion/second).

implement as…

vaex_df_group = vaex_df.groupby(vaex_df.c1,
agg = vaex.agg.mean(vaex_df.c4))