21 NumPy Functions That Will Boost Your Data Analysis Process

Original article was published by Soner Yıldırım on Artificial Intelligence on Medium


21 NumPy Functions That Will Boost Your Data Analysis Process

Explained with examples

Photo by SpaceX on Unsplash

Note: All images created by the author unless stated otherwise.

Everything about data science starts with data and it comes in various formats. Numbers, images, texts, x-rays, sound, and video recordings are just some examples of data sources. Whatever the format data comes in, it needs to be converted to an array of numbers to be analyzed.

One of the foremost tools to handle arrays of numbers is NumPy which is a scientific computing package for Python.

In this post, we will go over 20 functions and methods that will boost your data analysis process.

1. Array

It is used to create an array from scratch or convert a list or pandas series object to an array.

2. Arange

It creates an array in a range with a specified increment.

The first two arguments are lower and upper bounds (upper is exclusive). The third argument is the step size.

3. Linspace

It creates an array in a specified range with equidistant elements.

The first two arguments determine the lower and upper bounds. Unlike the arange function, upper bound is inclusive. The third arguments specify how many equidistant elements we want in that range.

4. Unique

It returns the number of unique elements in an array. We can also see how many times each element occur in the array using the return_counts parameter.

5. Argmax and argmin

They return the indices of maximum and minimum values along an axis.

Argmax with axis=1 will return the indices of the maximum values in each row. Argmin with axis=0 will return the indices of the minimum values in each column.

6. Random.random

It creates an array with random floats between 0 and 1. Same operation can be done with the random_sample function as well.

7. Random.randint

It creates an array of integers in any shape.

The first two arguments determine the bounds. If only one bound is passed, it is considered as the upper bound and the lower bound is taken as 0.

8. Random.randn

It returns a sample or samples from the standard normal distribution (i.e. Zero mean and unit variance).

Let’s also plot the values to observe the standard normal distribution.

9. Random.shuffle

It modifies the sequence of an array by shuffling its elements.

10. Reshape

As the name suggests, it changes the shape of an array. The overall size must be maintained. For instance, an array with a shape of 3×4 can be converted to an array of shape 2×6 (size is 12).

You can also specify the size in one dimension and pass -1 for the other dimension. Numpy will infer the shape.

Reshape is also used to increase the dimension of an array which is a common practice when working machine learning or deep learning models.

11. Expand_dims

It expands the dimension of an array.

The axis parameter allows to choose through which axis the expansion is done. Expand_dims with axis=1 is equivalent to reshape(-1,1).

12. Count_nonzero

It returns the count of non-zero elements in an array which may come in handy when working with arrays with high sparsity.

13. Argwhere

It returns the indices of nonzero elements in an array.

For instance, the second column in the first row is zero so its index ([0,1]) is not returned by the argwhere function.

14. Zeros, Ones, Full

These are actually three separate functions but what they do is very similar. They create arrays with zeros, ones, or a specific value.

The default data type is float but can be changed to integers using the type parameter.

15. Eye and Identity

Both eye and identity create identity matrix with a specified dimension. Identity matrix, denoted as I, is a square matrix that have 1’s on the diagonal and 0’s at all other positions.

What makes an identity matrix special is that it does not change a matrix when multiplied. In this sense, it is similar to number 1 in real numbers.

The inverse of a matrix is the matrix that gives the identity matrix when multiplied with the original matrix. Not every matrix has an inverse. If matrix A has an inverse, then it is called invertible or non-singular.

16. Ravel

Ravel returns a flattened array. If you are familiar with convolutional neural netwoks (CNN), pooled feature maps are flattened before feeding to fully connected layer.

Second row is concatenated at the end of first row. Ravel function also allows column-wise concatenation using order parameter.

17. Hsplit and Vsplit

They split arrays vertically (vsplit) or horizontally (hsplit).

A is an array with a shape of 4×4. Splitting A horizontally will result in two arrays with 4×2.

If A is vertically split, the resulting arrays will have a shape of 2×4.

18. Hstack and Vstack

They stack arrays horizontally (column-wise) and vertically (rows on top of each other)

19. Transpose

It transposes an array. In case of 2-dimensional arrays (i.e. matrix), transposing means switching rows and columns.

20. Abs and Absolute

Both abs and absolute return the absolute values of elements in an array.

21. Round and Around

Both of them round up the floats to a specified number of decimal points.