PYTHON

Python Pandas Library: Syntax, Usage, and Examples

Python pandas is a powerful open-source library that simplifies data analysis and manipulation. It’s built on top of NumPy and provides data structures like DataFrames and Series, which allow you to load, process, and analyze structured data efficiently. Pandas Python is widely used in data science, machine learning, finance, statistics, and any field that works with tabular data.

If you're dealing with CSVs, Excel files, SQL databases, or even large datasets from APIs, using the Python pandas package will drastically cut down your code complexity and processing time.


What Is Python Pandas?

Pandas is a data analysis and manipulation tool designed for fast performance and ease of use. The library introduces two key data structures:

  • Series: A one-dimensional labeled array (like a column).
  • DataFrame: A two-dimensional labeled data structure, similar to a spreadsheet or SQL table.

Pandas Python lets you filter, aggregate, join, pivot, reshape, and export datasets with just a few lines of code. It also integrates well with other Python libraries like Matplotlib, Seaborn, and Scikit-learn.


How to Install Pandas in Python

Before using the pandas Python package, you need to install it. You can do this using pip:

pip install pandas

If you’re using Anaconda, it’s already included. But if needed, you can also install it via conda:

conda install pandas

After installation, you’re ready to import and start using it.


How to Import Pandas in Python

You can import pandas using the standard alias pd, which is commonly used in the Python ecosystem:

import pandas as pd

Using pd as an alias allows concise syntax throughout your code. For example:

df = pd.read_csv("data.csv")

This line reads a CSV file into a pandas DataFrame, one of the most common tasks in data analysis.


Creating a Pandas DataFrame in Python

You can create a DataFrame from dictionaries, lists of lists, NumPy arrays, or even other DataFrames.

From a Dictionary

data = {
    "Name": ["Alice", "Bob", "Charlie"],
    "Age": [25, 30, 35]
}
df = pd.DataFrame(data)
print(df)

From a List of Lists

data = [
    ["Alice", 25],
    ["Bob", 30]
]
df = pd.DataFrame(data, columns=["Name", "Age"])

The pandas DataFrame Python structure is extremely flexible and allows additional options like setting an index or assigning data types explicitly.


Exploring and Analyzing Data

Once you've created or loaded a DataFrame, you can quickly inspect your dataset:

df.head()         # First 5 rows
df.tail(3)        # Last 3 rows
df.info()         # Column types and non-null values
df.describe()     # Summary statistics for numeric columns

These built-in methods help you understand the structure and quality of your data before applying transformations.


Common Operations in Pandas Python

Pandas supports a wide range of operations to manipulate and transform your data.

Filtering Rows

df[df["Age"] > 30]

Sorting Data

df.sort_values(by="Age", ascending=False)

Adding a Column

df["Country"] = ["USA", "Canada", "UK"]

Deleting a Column

df.drop("Country", axis=1, inplace=True)

These operations are intuitive and make pandas Python extremely beginner-friendly while also being powerful enough for large-scale data tasks.


Working with Missing Data

Handling missing data is crucial in real-world datasets. Pandas provides multiple ways to deal with it:

df.dropna()            # Remove rows with missing values
df.fillna(0)           # Replace missing values with 0
df.isnull().sum()      # Count missing values in each column

You can also forward-fill or backward-fill missing entries:

df.fillna(method="ffill")

Reading and Writing Files with Pandas

Pandas makes file input/output operations simple and fast.

Reading Data

  • CSV: pd.read_csv("file.csv")
  • Excel: pd.read_excel("file.xlsx")
  • JSON: pd.read_json("file.json")
  • SQL: pd.read_sql(query, connection)

Writing Data

  • To CSV: df.to_csv("output.csv", index=False)
  • To Excel: df.to_excel("output.xlsx")
  • To JSON: df.to_json("output.json")

The ability to switch between formats effortlessly is one of the most useful features of the pandas Python library.


Grouping and Aggregating Data

Use groupby() to perform operations like sum, count, mean, or median on subsets of your data:

df.groupby("Department")["Salary"].mean()

You can group by multiple columns and chain multiple operations:

df.groupby(["Department", "Gender"])["Salary"].agg(["mean", "max"])

Grouping is essential in summarizing large datasets.


Merging and Joining DataFrames

Pandas makes it easy to combine data from different sources:

pd.merge(df1, df2, on="id", how="inner")

You can also concatenate DataFrames:

pd.concat([df1, df2])

These features mimic SQL-style joins and are useful in data pipelines.


Working with Dates and Times

Pandas includes a full set of tools for datetime handling:

df["Date"] = pd.to_datetime(df["Date"])
df["Year"] = df["Date"].dt.year
df["Month"] = df["Date"].dt.month

Resampling time series data by day, month, or year is straightforward:

df.resample("M", on="Date").mean()

Applying Functions and Lambda Expressions

You can apply custom functions row-wise or column-wise:

df["New_Column"] = df["Age"].apply(lambda x: x * 2)

For row-wise logic:

df.apply(lambda row: row["Age"] + row["Salary"], axis=1)

This allows complex calculations and logic to run on large datasets efficiently.


Using Pandas with NumPy and Matplotlib

You can seamlessly integrate pandas DataFrames with NumPy functions:

import numpy as np
df["Log_Age"] = np.log(df["Age"])

To visualize data:

import matplotlib.pyplot as plt
df["Age"].hist()
plt.show()

Pandas supports inline plotting via DataFrame.plot() for quick visual checks.


Best Practices for Using Python Pandas

  • Always import pandas as pd.
  • Use vectorized operations instead of loops.
  • Avoid chaining operations when readability suffers.
  • Handle missing values explicitly.
  • Use descriptive column names and comments for clarity.

These habits make your pandas code more maintainable and less prone to bugs.


Summary

Python pandas gives you all the tools to load, explore, manipulate, and export data efficiently. It simplifies many of the tasks that would otherwise require verbose loops and custom logic. You can create a pandas DataFrame in Python from multiple data sources, clean it, transform it, group it, and export it—all with readable and consistent syntax.

To start using pandas, you only need to install pandas Python via pip or conda and learn how to import pandas in Python with import pandas as pd. Once set up, the pandas Python package becomes an essential companion for anyone working with data.

Learn to Code in Python for Free
Start learning now
button icon
To advance beyond this tutorial and learn Python by doing, try the interactive experience of Mimo. Whether you're starting from scratch or brushing up your coding skills, Mimo helps you take your coding journey above and beyond.

Sign up or download Mimo from the App Store or Google Play to enhance your programming skills and prepare for a career in tech.

You can code, too.

© 2025 Mimo GmbH

Reach your coding goals faster