HTML CSS JavaScript Python Swift TypeScript SQL React

How to Use Pandas in Python

What you’ll build or solve

You’ll load real data into a Pandas DataFrame, inspect it, clean it lightly, and answer simple questions with filters and groupings.

When this approach works best

This approach works best when you:

Learn Python on Mimo

CAREER PATHPython Developer

Become a Python developer. Master Python from basics to advanced topics, including data structures, functions, classes, and error handling

Beginner friendly

COURSEPython

Start your coding journey with Python. Learn basics, data types, control flow, and more

Beginner friendly

Work with tabular data like CSV exports from spreadsheets, payments, analytics, or logs.
Need to filter and summarize data fast, like “top countries by signups” or “average order value by month”.
Want a repeatable script that turns raw data into a cleaned report.

Avoid this approach when:

You only have a handful of values. A plain list or dictionary may be simpler.
Your file is huge and does not fit in memory. You may need chunked processing or a database.

Prerequisites

Python installed
Pandas installed (python -m pip install pandas)
You know what variables are
You know what lists and dictionaries are
You know what a CSV file is

Step-by-step instructions

1) Import Pandas and load a CSV into a DataFrame

Most Pandas work starts by reading a file into a DataFrame.

Bash

Open in Mimo

importpandasaspd

df=pd.read_csv("data.csv")
print(df.head())

What to look for:

head() prints the first few rows so you can confirm the file loaded and columns look right.

If your CSV uses a different delimiter:

Open in Mimo

df=pd.read_csv("data.csv",sep=";")

2) Inspect columns, types, and missing values

Before you filter or calculate, check the shape, types, and nulls.

Python

Open in Mimo

print(df.shape)# (rows, columns)
print(df.columns)# column names
print(df.dtypes)# types per column
print(df.isna().sum())# missing values per column

For a compact summary:

Open in Mimo

print(df.info())

3) Select columns and filter rows

Select columns by passing a list of column names.

LUA

Open in Mimo

subset=df[["country","signups","source"]]
print(subset.head())

Filter rows with a boolean condition.

Bash

Open in Mimo

us=df[df["country"]=="US"]
print(us.head())

Combine conditions with & and wrap each condition in parentheses.

Bash

Open in Mimo

paid_us=df[(df["country"]=="US")& (df["plan"]=="paid")]
print(paid_us.head())

4) Convert types and handle missing values

Convert columns when the type does not match how you want to use the data.

Bash

Open in Mimo

df["revenue"]=pd.to_numeric(df["revenue"],errors="coerce")
df["date"]=pd.to_datetime(df["date"],errors="coerce")

If you want to treat missing numeric values as zero:

Open in Mimo

df["revenue"]=df["revenue"].fillna(0)

What to look for:

errors="coerce" turns bad values into NaN instead of crashing.
fillna(0) replaces missing values with zero.

5) Create new columns with calculations

Build new columns using existing columns.

Bash

Open in Mimo

df["revenue_per_signup"]=df["revenue"]/df["signups"]
print(df[["revenue","signups","revenue_per_signup"]].head())

If signups can be zero:

Bash

Open in Mimo

df["revenue_per_signup"]= (
df["revenue"]/df["signups"].replace(0,pd.NA)
)

What to look for:

replace(0, pd.NA) avoids dividing by zero.

6) Group and summarize data

Grouping helps you get totals or averages by category.

Total signups by country:

Open in Mimo

by_country=df.groupby("country",as_index=False)["signups"].sum()
print(by_country.sort_values("signups",ascending=False).head(10))

Multiple metrics at once:

Open in Mimo

summary=df.groupby("source",as_index=False).agg(
signups=("signups","sum"),
revenue=("revenue","sum"),
avg_revenue=("revenue","mean"),
)

print(summary.sort_values("revenue",ascending=False))

What to look for:

Each line uses (column_name, aggregation_function) to define what to calculate.

7) Save results back to CSV

Once you have a filtered DataFrame or summary table, write it out.

Open in Mimo

by_country.to_csv("signups_by_country.csv",index=False)

What to look for:

index=False prevents Pandas from adding an extra index column.

Examples you can copy

1) Basic “load and inspect” script

Open in Mimo

importpandasaspd

df=pd.read_csv("data.csv")

print("rows, cols:",df.shape)
print("columns:",list(df.columns))
print(df.head(5))
print("missing values:")
print(df.isna().sum())

2) Filter rows and keep only needed columns

Open in Mimo

importpandasaspd

df=pd.read_csv("data.csv")

filtered=df[df["country"].isin(["US","CA","GB"])]
filtered=filtered[["date","country","source","signups","revenue"]]

filtered.to_csv("filtered.csv",index=False)

3) Simple report: top sources by revenue

Open in Mimo

importpandasaspd

df=pd.read_csv("data.csv")

df["revenue"]=pd.to_numeric(df["revenue"],errors="coerce").fillna(0)
df["signups"]=pd.to_numeric(df["signups"],errors="coerce").fillna(0)

report=df.groupby("source",as_index=False).agg(
revenue=("revenue","sum"),
signups=("signups","sum"),
)

report["revenue_per_signup"]= (
report["revenue"]/report["signups"].replace(0,pd.NA)
)

report=report.sort_values("revenue",ascending=False)
report.to_csv("report_by_source.csv",index=False)

print(report.head(10))

Common mistakes and how to fix them

Mistake 1: Using parentheses instead of brackets

You might write:

Bash

Open in Mimo

print(df("country"))

Why it breaks:

Use square brackets df["column"], not parentheses.

Correct approach:

LUA

Open in Mimo

print(df["country"])
print(df[["country","source"]])

Mistake 2: Using `and` / `or` instead of `&` / `|`

You might write:

Bash

Open in Mimo

paid_us=df[(df["country"]=="US")and (df["plan"]=="paid")]

Why it breaks:

and tries to evaluate the whole column as one True/False value.

Correct approach:

Bash

Open in Mimo

paid_us=df[(df["country"]=="US")& (df["plan"]=="paid")]

Mistake 3: Getting a `KeyError` due to column name mismatch

You might write:

Bash

Open in Mimo

df["Country"]

Why it breaks:

Column names are case-sensitive.

Correct approach:

Open in Mimo

print(df.columns)
df["country"]

Troubleshooting

If you see ModuleNotFoundError: No module named 'pandas', install it:

Open in Mimo

python-m pip install pandas

If you see garbled characters, pass an encoding:

Open in Mimo

df=pd.read_csv("data.csv",encoding="utf-8")

If numbers load as text (object type), convert them:

Bash

Open in Mimo

df["revenue"]=pd.to_numeric(df["revenue"],errors="coerce")

If your filter gives “truth value of a Series is ambiguous,” you likely used and/or. Use &/| and parentheses.

If your output CSV has an extra first column, add index=False to to_csv().

Quick recap

Load data with pd.read_csv() into a DataFrame.
Inspect with head(), info(), and isna().sum().
Select columns with df[["col1", "col2"]] and filter rows with boolean masks.
Convert types with pd.to_numeric() and pd.to_datetime(), use fillna() when needed.
Create new columns with math on columns and handle divide-by-zero if needed.
Summarize with groupby().agg() and save with to_csv(index=False).