Original article was published by Soner Yıldırım on Artificial Intelligence on Medium
Note: We will work the examples on Pandas Series which can also be considered as DataFrame columns.
Str accessor provides methods to work with textual data. Let’s start by creating a series that contains the words of a text.
Split, as the name suggests, splits the data based on a specified character (the default is space). We then used the explode function to use each separated word as a new item in the series.
Cat is the opposite of split. It concatenates strings in a series.
The upper and lower methods convert all characters to upper case and lower case, respectively.
The capitalize method only converts the first letter to upper case.
The replace method can be used to replace the whole or part of a string with another string.
The len method returns the length of the strings in a series.
Str accessors allows to use indexing on strings. For instance, we can select only two first two characters as follows:
We can also check if all characters are alphabetical with isalpha method.
It returns false if there is at least one not alphabetical character.
Startswith and endswith methods allow to check if strings start or end with particular characters.
We can also count the number of the occurrences of a character or a set of characters in a string.
The lstrip and rstrip methods can be used to trim off the strings. For instance, we can get rid of the numbers, spaces, and commas at the end strings in series C.