Handling Missing Values in Pandas



Original Source Here

Let’s take a few examples to understand how the parameter values affect the output.

Before we dive deeper into the pd.fillna() function, let us first create a DataFrame to work with.

A. Create a DataFrame:

Create a DataFrame

Python Implementation:

Python Code
Output

B. Example — 1:

We can use the value parameter to specify by which value we want to fill the missing elements. In the following example, we are specifying value=0So it will fill all the missing elements with 0.

Parameters Used:

value = 0

pd.fillna( ) with value=0

Python Implementation:

Python Code
Output

C. Example — 2:

We can also specify different values to fill the missing elements for different columns by using the value parameter. The following example demonstrates how we can perform this operation.

Parameters Used:

value = dictionary

pd.fillna( ) with a dictionary of values

Python Implementation:

Python Code
Output

D. Example — 3:

To fill the missing elements, we can use the method parameter. If we specify method=”ffill”, it will use the last valid observation to fill the gap. If we do not specify the axis value, it will perform the operation row-wise or with axis=0. Please note that there is no limit to propagate the last valid observation to fill the gaps. If there are multiple consecutive missing elements, they will get filled by the last valid observation.

Important Note:

If we specify method=”ffill” and the axis=0, and if the elements in the first row are missing, they will never get filled.

Parameters Used:

method = “ffill”

data.fillna( ) with method=”ffill”

Python Implementation:

Python Code
Output

E. Example — 4:

If we specify method=”pad”, it works the same way as method=”ffill”.

Parameters Used:

method = “pad”

pd.fillna( ) with method=”pad”

Python Implementation:

Python Code
Output

F. Example — 5:

By default, the missing elements will be filled row-wise or with axis=0.

Important Note:

If we specify method=”ffill” and the axis=0, then if the elements in the first row are missing, they will never get filled.

Parameters Used:

method = “ffill”

axis = 0

pd.fillna( ) with method=”ffill” and axis=0

Python Implementation:

Python Code
Output

G. Example — 6:

In some cases, if we want to fill missing the elements column-wise, we can specify the axis parameter and set axis=1.

Important Note:

If we specify method=”ffill” and the axis=1, then if the elements in the first column are missing, they will never get filled.

Parameters Used:

method = “ffill”

axis = 1

pd.fillna( ) with method=”ffill” and axis=1

Python Implementation:

Python Code
Output

H. Example — 7:

To fill the missing elements, we can use the method parameter. If we specify method=”bfill”, it will use the next valid observation to fill the gap. If we do not specify the axis value, it will perform the operation row-wise or with axis=0. Please note that there is no limit to propagate the next valid observation to fill the gaps. If there are multiple consecutive missing elements, they will get filled by the next valid observation.

Important Note:

If we specify method=”bfill” and the axis=0, then if the elements in the last row are missing, they will never get filled.

Parameters Used:

method = “bfill”

axis = 0

pd.fillna( ) with method=”bfill”

Python Implementation:

Python Code
Output

I. Example — 8:

If we specify method=”backfill”, it works the same way as method=”bfill”.

Parameters Used:

method = “backfill”

pd.fillna( ) with method=”backfill”

Python Implementation:

Python Code
Output

J. Example — 9:

By default, the missing elements will be filled row-wise or with axis=0.

Important Note:

If we specify method=”bfill” and the axis=0, then if the elements in the last row are missing, they will never get filled.

Parameters Used:

method = “bfill”

axis = 0

pd.fillna( ) with method=”bfill” and axis=0

Python Implementation:

Python Code
Output

K. Example — 10:

In some cases, if we want to fill missing the elements column-wise, we can specify the axis parameter and set axis=1.

Important Note:

If we specify method=”bfill” and the axis=1, then if the elements in the last column are missing, they will never get filled.

Parameters Used:

method = “ffill”

axis = 1

pd.fillna( ) with method=”bfill” and axis=1

Python Implementation:

Python Code
Output

L. Example — 11:

If we specify the limit parameter, it will restrict the maximum number of consecutive missing values to be filled in forward or backward fill methods. We can say that if the gap of consecutive missing elements is more than the number specified by the limitparameter, it will only be filled partially. Here we are using the fill forward method with axis=0 and a limit of 1 element.

Parameters Used:

method = “ffill”

axis = 0

limit = 1

pd.fillna( ) with method=”ffill” and axis=0 and limit=1

Python Implementation:

Python Code
Output

M. Example — 12:

In this example, we will use the fill forward method with axis=1 and a limit of 1 element.

Parameters Used:

method = “ffill”

axis = 1

limit = 1

pd.fillna( ) with method=”ffill” and axis=1 and limit=1

Python Implementation:

Python Code
Output

N. Example — 13:

In this example, we will use the backward fill method with axis=0 and a limit of 1 element.

Parameters Used:

method = “bfill”

axis = 0

limit = 1

pd.fillna( ) with method=”bfill” and axis=0 and limit=1

Python Implementation:

Python Code
Output

O. Example — 12:

In this example, we will use the backward fill method with axis=1 and a limit of 1 element.

Parameters Used:

method = “bfill”

axis = 1

limit = 1

pd.fillna( ) with method=”bfill” and axis=1 and limit=1

Python Implementation:

Python Code
Output

P. Creating a DataFrame:

Creating a New DataFrame

Python Implementation:

Python Code
Output
Datatypes

Q. Example — 13:

We can use the downcast parameter to downcast the datatype if possible. The string value “infer” will try to downcast to an appropriate equal type. For example, float64 to int64.

Parameters Used:

downcast = infer

pd.fillna( ) with value=0 and downcast=”infer”

Python Implementation:

Python Code

R. Example — 14:

If we want the changes to take place in our original DataFrame, then we have to specify inplace=True as a parameter. Note that it will not return anything. After execution, the original DataFrame will be modified by the result of pd.dropna() function.

Parameters Used:

inplace = True

pd.fillna( ) with values=0 and inplace=True

Python Implementation:

Python Code
Output

AI/ML

Trending AI/ML Article Identified & Digested via Granola by Ramsey Elbasheer; a Machine-Driven RSS Bot



via WordPress https://ramseyelbasheer.io/2021/04/28/handling-missing-values-in-pandas/

Popular posts from this blog

Fully Explained DBScan Clustering Algorithm with Python

Hierarchical clustering explained

Streamlit — Deploy your app in just a few minutes