How to Use Pandas in Python
What you’ll build or solve
You’ll load real data into a Pandas DataFrame, inspect it, clean it lightly, and answer simple questions with filters and groupings.
When this approach works best
This approach works best when you:
Learn Python on Mimo
- Work with tabular data like CSV exports from spreadsheets, payments, analytics, or logs.
- Need to filter and summarize data fast, like “top countries by signups” or “average order value by month”.
- Want a repeatable script that turns raw data into a cleaned report.
Avoid this approach when:
- You only have a handful of values. A plain list or dictionary may be simpler.
- Your file is huge and does not fit in memory. You may need chunked processing or a database.
Prerequisites
- Python installed
- Pandas installed (
python -m pip install pandas) - You know what variables are
- You know what lists and dictionaries are
- You know what a CSV file is
Step-by-step instructions
1) Import Pandas and load a CSV into a DataFrame
Most Pandas work starts by reading a file into a DataFrame.
Bash
importpandasaspd
df=pd.read_csv("data.csv")
print(df.head())
What to look for:
head() prints the first few rows so you can confirm the file loaded and columns look right.
If your CSV uses a different delimiter:
df=pd.read_csv("data.csv",sep=";")
2) Inspect columns, types, and missing values
Before you filter or calculate, check the shape, types, and nulls.
Python
print(df.shape)# (rows, columns)
print(df.columns)# column names
print(df.dtypes)# types per column
print(df.isna().sum())# missing values per column
For a compact summary:
print(df.info())
3) Select columns and filter rows
Select columns by passing a list of column names.
LUA
subset=df[["country","signups","source"]]
print(subset.head())
Filter rows with a boolean condition.
Bash
us=df[df["country"]=="US"]
print(us.head())
Combine conditions with & and wrap each condition in parentheses.
Bash
paid_us=df[(df["country"]=="US")& (df["plan"]=="paid")]
print(paid_us.head())
4) Convert types and handle missing values
Convert columns when the type does not match how you want to use the data.
Bash
df["revenue"]=pd.to_numeric(df["revenue"],errors="coerce")
df["date"]=pd.to_datetime(df["date"],errors="coerce")
If you want to treat missing numeric values as zero:
df["revenue"]=df["revenue"].fillna(0)
What to look for:
errors="coerce"turns bad values intoNaNinstead of crashing.fillna(0)replaces missing values with zero.
5) Create new columns with calculations
Build new columns using existing columns.
Bash
df["revenue_per_signup"]=df["revenue"]/df["signups"]
print(df[["revenue","signups","revenue_per_signup"]].head())
If signups can be zero:
Bash
df["revenue_per_signup"]= (
df["revenue"]/df["signups"].replace(0,pd.NA)
)
What to look for:
replace(0, pd.NA) avoids dividing by zero.
6) Group and summarize data
Grouping helps you get totals or averages by category.
Total signups by country:
by_country=df.groupby("country",as_index=False)["signups"].sum()
print(by_country.sort_values("signups",ascending=False).head(10))
Multiple metrics at once:
summary=df.groupby("source",as_index=False).agg(
signups=("signups","sum"),
revenue=("revenue","sum"),
avg_revenue=("revenue","mean"),
)
print(summary.sort_values("revenue",ascending=False))
What to look for:
Each line uses (column_name, aggregation_function) to define what to calculate.
7) Save results back to CSV
Once you have a filtered DataFrame or summary table, write it out.
by_country.to_csv("signups_by_country.csv",index=False)
What to look for:
index=False prevents Pandas from adding an extra index column.
Examples you can copy
1) Basic “load and inspect” script
importpandasaspd
df=pd.read_csv("data.csv")
print("rows, cols:",df.shape)
print("columns:",list(df.columns))
print(df.head(5))
print("missing values:")
print(df.isna().sum())
2) Filter rows and keep only needed columns
importpandasaspd
df=pd.read_csv("data.csv")
filtered=df[df["country"].isin(["US","CA","GB"])]
filtered=filtered[["date","country","source","signups","revenue"]]
filtered.to_csv("filtered.csv",index=False)
3) Simple report: top sources by revenue
importpandasaspd
df=pd.read_csv("data.csv")
df["revenue"]=pd.to_numeric(df["revenue"],errors="coerce").fillna(0)
df["signups"]=pd.to_numeric(df["signups"],errors="coerce").fillna(0)
report=df.groupby("source",as_index=False).agg(
revenue=("revenue","sum"),
signups=("signups","sum"),
)
report["revenue_per_signup"]= (
report["revenue"]/report["signups"].replace(0,pd.NA)
)
report=report.sort_values("revenue",ascending=False)
report.to_csv("report_by_source.csv",index=False)
print(report.head(10))
Common mistakes and how to fix them
Mistake 1: Using parentheses instead of brackets
You might write:
Bash
print(df("country"))
Why it breaks:
Use square brackets df["column"], not parentheses.
Correct approach:
LUA
print(df["country"])
print(df[["country","source"]])
Mistake 2: Using and / or instead of & / |
You might write:
Bash
paid_us=df[(df["country"]=="US")and (df["plan"]=="paid")]
Why it breaks:
and tries to evaluate the whole column as one True/False value.
Correct approach:
Bash
paid_us=df[(df["country"]=="US")& (df["plan"]=="paid")]
Mistake 3: Getting a KeyError due to column name mismatch
You might write:
Bash
df["Country"]
Why it breaks:
Column names are case-sensitive.
Correct approach:
print(df.columns)
df["country"]
Troubleshooting
If you see ModuleNotFoundError: No module named 'pandas', install it:
python-m pip install pandas
If you see garbled characters, pass an encoding:
df=pd.read_csv("data.csv",encoding="utf-8")
If numbers load as text (object type), convert them:
Bash
df["revenue"]=pd.to_numeric(df["revenue"],errors="coerce")
If your filter gives “truth value of a Series is ambiguous,” you likely used and/or. Use &/| and parentheses.
If your output CSV has an extra first column, add index=False to to_csv().
Quick recap
- Load data with
pd.read_csv()into a DataFrame. - Inspect with
head(),info(), andisna().sum(). - Select columns with
df[["col1", "col2"]]and filter rows with boolean masks. - Convert types with
pd.to_numeric()andpd.to_datetime(), usefillna()when needed. - Create new columns with math on columns and handle divide-by-zero if needed.
- Summarize with
groupby().agg()and save withto_csv(index=False).
Join 35M+ people learning for free on Mimo
4.8 out of 5 across 1M+ reviews
Check us out on Apple AppStore, Google Play Store, and Trustpilot