The best way to Use Conditional Formatting in Pandas to Improve Knowledge Visualization - KDnuggets

Picture by Creator | DALLE-3 & Canva

Whereas pandas is especially used for information manipulation and evaluation, it could actually additionally present fundamental information visualization capabilities. Nonetheless, plain dataframes could make the knowledge look cluttered and overwhelming. So, what might be executed to make it higher? In case you’ve labored with Excel earlier than, that you could spotlight necessary values with totally different colours, font kinds, and so forth. The thought of utilizing these kinds and colours is to speak the knowledge in an efficient method. You are able to do related work with pandas dataframes too, utilizing conditional formatting and the Styler object.

On this article, we are going to see what conditional formatting is and tips on how to use it to reinforce your information readability.

Conditional Formatting

Conditional formatting is a function in pandas that permits you to format the cells primarily based on some standards. You may simply spotlight the outliers, visualize traits, or emphasize necessary information factors utilizing it. The Styler object in pandas offers a handy solution to apply conditional formatting. Earlier than overlaying the examples, let’s take a fast take a look at how the Styler object works.

What’s the Styler Object & How Does It Work?

You may management the visible illustration of the dataframe through the use of the property. This property returns a Styler object, which is accountable for styling the dataframe. The Styler object permits you to manipulate the CSS properties of the dataframe to create a visually interesting and informative show. The generic syntax is as follows:



df.fashion.<methodology>(<arguments>)

 
The place <methodology> is the particular formatting operate you need to apply, and <arguments> are the parameters required by that operate. The Styler object returns the formatted dataframe with out altering the unique one. There are two approaches to utilizing conditional formatting with the Styler object:

Constructed-in Types: To use fast formatting kinds to your dataframe
Customized Stylization: Create your individual formatting guidelines for the Styler object and move them by way of one of many following strategies (Styler.applymap: element-wise or Styler.apply: column-/row-/table-wise)

Now, we are going to cowl some examples of each approaches that will help you improve the visualization of your information.
 
Examples: Constructed-in-Types
 
Let’s create a dummy inventory value dataset with columns for Date, Value Value, Satisfaction Rating, and Gross sales Quantity to exhibit the examples beneath:

import pandas as pd
import numpy as np

information = {'Date': ['2024-03-05', '2024-03-06', '2024-03-07', '2024-03-08', '2024-03-09', '2024-03-10'],
        'Value Value': [100, 120, 110, 1500, 1600, 1550],
        'Satisfaction Rating': [90, 80, 70, 95, 85, 75],
        'Gross sales Quantity': [1000, 800, 1200, 900, 1100, None]}

df = pd.DataFrame(information)
df

 
Output:
 

Authentic Unformatted Dataframe
 
1. Highlighting Most and Minimal Values
We will use highlight_max and highlight_min capabilities to focus on the utmost and minimal values in a column or row. For column set axis=0 like this:

# Highlighting Most and Minimal Values
df.fashion.highlight_max(colour="green", axis=0 , subset=['Cost Price', 'Satisfaction Score', 'Sales Amount']).highlight_min(colour="red", axis=0 , subset=['Cost Price', 'Satisfaction Score', 'Sales Amount'])

 
Output:
 

Max & Min Values
 
2. Making use of Colour Gradients
Colour gradients are an efficient solution to visualize the values in your information. On this case, we are going to apply the gradient to satisfaction scores utilizing the colormap set to 'viridis'. This can be a kind of colour coding that ranges from purple (low values) to yellow (excessive values). Right here is how you are able to do this:

# Making use of Colour Gradients
df.fashion.background_gradient(cmap='viridis', subset=['Satisfaction Score'])

 
Output:
 

Colormap - viridis
 
3. Highlighting Null or Lacking Values
When we've got massive datasets, it turns into troublesome to determine null or lacking values. You should use conditional formatting utilizing the built-in df.fashion.highlight_null operate for this function. For instance, on this case, the gross sales quantity of the sixth entry is lacking. You may spotlight this info like this:

# Highlighting Null or Lacking Values
df.fashion.highlight_null('yellow', subset=['Sales Amount'])

 
Output:
 

Highlighting Lacking Values
 
Examples: Customized Stylization Utilizing apply() & applymap()
 
1.  Conditional Formatting for Outliers
Suppose that we've got a housing dataset with their costs, and we need to spotlight the homes with outlier costs (i.e., costs which can be considerably greater or decrease than the opposite neighborhoods). This may be executed as follows:

import pandas as pd
import numpy as np

# Home costs dataset
df = pd.DataFrame({
   'Neighborhood': ['H1', 'H2', 'H3', 'H4', 'H5', 'H6', 'H7'],
   'Value': [50, 300, 360, 390, 420, 450, 1000],
})

# Calculate Q1 (twenty fifth percentile), Q3 (seventy fifth percentile) and Interquartile Vary (IQR)
q1 = df['Price'].quantile(0.25)
q3 = df['Price'].quantile(0.75)
iqr = q3 - q1

# Bounds for outliers
lower_bound = q1 - 1.5 * iqr
upper_bound = q3 + 1.5 * iqr

# Customized operate to focus on outliers
def highlight_outliers(val):
   if val  upper_bound:
      return 'background-color: yellow; font-weight: daring; colour: black'
   else:
      return ''

df.fashion.applymap(highlight_outliers, subset=['Price'])


 
Output:
 

Highlighting Outliers
 
2. Highlighting Developments
Contemplate that you simply run an organization and are recording your gross sales each day. To research the traits, you need to spotlight the times when your each day gross sales enhance by 5% or extra. You may obtain this utilizing a customized operate and the apply methodology in pandas. Right here’s how:

import pandas as pd

# Dataset of Firm's Gross sales
information = {'date': ['2024-02-10', '2024-02-11', '2024-02-12', '2024-02-13', '2024-02-14'],
        'gross sales': [100, 105, 110, 115, 125]}

df = pd.DataFrame(information)

# Each day proportion change
df['pct_change'] = df['sales'].pct_change() * 100

# Spotlight the day if gross sales elevated by greater than 5%
def highlight_trend(row):
    return ['background-color: green; border: 2px solid black; font-weight: bold' if row['pct_change'] > 5 else '' for _ in row]

df.fashion.apply(highlight_trend, axis=1)

 
Output:
 

 
3. Highlighting Correlated Columns
Correlated columns are necessary as a result of they present relationships between totally different variables. For instance, if we've got a dataset containing age, earnings, and spending habits and our evaluation exhibits a excessive correlation (near 1) between age and earnings, then it means that older individuals usually have greater incomes. Highlighting correlated columns helps to visually determine these relationships. This method turns into extraordinarily useful because the dimensionality of your information will increase. Let's discover an instance to raised perceive this idea: 

import pandas as pd

# Dataset of individuals
information = {
    'age': [30, 35, 40, 45, 50],
    'earnings': [60000, 66000, 70000, 75000, 100000],
    'spending': [10000, 15000, 20000, 18000, 12000]
}

df = pd.DataFrame(information)

# Calculate the correlation matrix
corr_matrix = df.corr()

# Spotlight extremely correlated columns
def highlight_corr(val):
    if val != 1.0 and abs(val) > 0.5:   # Exclude self-correlation
        return 'background-color: blue; text-decoration: underline'
    else:
        return ''

corr_matrix.fashion.applymap(highlight_corr)

 
Output:
 

Correlated Columns
 
Wrapping Up
 
These are simply among the examples I confirmed as a starter to up your recreation of information visualization. You may apply related strategies to numerous different issues to reinforce the information visualization, similar to highlighting duplicate rows, grouping into classes and deciding on totally different formatting for every class, or highlighting peak values. Moreover, there are various different CSS choices you possibly can discover within the official documentation. You may even outline totally different properties on hover, like magnifying textual content or altering colour. Try the "Enjoyable Stuff" part for extra cool concepts. This text is a part of my Pandas collection, so when you loved this, there's loads extra to discover. Head over to my writer web page for extra suggestions, methods, and tutorials. 
 
 
Kanwal Mehreen Kanwal is a machine studying engineer and a technical author with a profound ardour for information science and the intersection of AI with medication. She co-authored the e-book "Maximizing Productivity with ChatGPT". As a Google Technology Scholar 2022 for APAC, she champions range and educational excellence. She's additionally acknowledged as a Teradata Variety in Tech Scholar, Mitacs Globalink Analysis Scholar, and Harvard WeCode Scholar. Kanwal is an ardent advocate for change, having based FEMCodes to empower ladies in STEM fields.