How to Use Pandas in Python

What you’ll build or solve

You’ll load real data into a Pandas DataFrame, inspect it, clean it lightly, and answer simple questions with filters and groupings.

When this approach works best

This approach works best when you:

  • Work with tabular data like CSV exports from spreadsheets, payments, analytics, or logs.
  • Need to filter and summarize data fast, like “top countries by signups” or “average order value by month”.
  • Want a repeatable script that turns raw data into a cleaned report.

Avoid this approach when:

  • You only have a handful of values. A plain list or dictionary may be simpler.
  • Your file is huge and does not fit in memory. You may need chunked processing or a database.

Prerequisites

  • Python installed
  • Pandas installed (python -m pip install pandas)
  • You know what variables are
  • You know what lists and dictionaries are
  • You know what a CSV file is

Step-by-step instructions

1) Import Pandas and load a CSV into a DataFrame

Most Pandas work starts by reading a file into a DataFrame.

importpandasaspd

df=pd.read_csv("data.csv")
print(df.head())

What to look for:

head() prints the first few rows so you can confirm the file loaded and columns look right.

If your CSV uses a different delimiter:

df=pd.read_csv("data.csv",sep=";")

2) Inspect columns, types, and missing values

Before you filter or calculate, check the shape, types, and nulls.

print(df.shape)# (rows, columns)
print(df.columns)# column names
print(df.dtypes)# types per column
print(df.isna().sum())# missing values per column

For a compact summary:

print(df.info())

3) Select columns and filter rows

Select columns by passing a list of column names.

subset=df[["country","signups","source"]]
print(subset.head())

Filter rows with a boolean condition.

us=df[df["country"]=="US"]
print(us.head())

Combine conditions with & and wrap each condition in parentheses.

paid_us=df[(df["country"]=="US")& (df["plan"]=="paid")]
print(paid_us.head())

4) Convert types and handle missing values

Convert columns when the type does not match how you want to use the data.

df["revenue"]=pd.to_numeric(df["revenue"],errors="coerce")
df["date"]=pd.to_datetime(df["date"],errors="coerce")

If you want to treat missing numeric values as zero:

df["revenue"]=df["revenue"].fillna(0)

What to look for:

  • errors="coerce" turns bad values into NaN instead of crashing.
  • fillna(0) replaces missing values with zero.

5) Create new columns with calculations

Build new columns using existing columns.

df["revenue_per_signup"]=df["revenue"]/df["signups"]
print(df[["revenue","signups","revenue_per_signup"]].head())

If signups can be zero:

df["revenue_per_signup"]= (
df["revenue"]/df["signups"].replace(0,pd.NA)
)

What to look for:

replace(0, pd.NA) avoids dividing by zero.


6) Group and summarize data

Grouping helps you get totals or averages by category.

Total signups by country:

by_country=df.groupby("country",as_index=False)["signups"].sum()
print(by_country.sort_values("signups",ascending=False).head(10))

Multiple metrics at once:

summary=df.groupby("source",as_index=False).agg(
signups=("signups","sum"),
revenue=("revenue","sum"),
avg_revenue=("revenue","mean"),
)

print(summary.sort_values("revenue",ascending=False))

What to look for:

Each line uses (column_name, aggregation_function) to define what to calculate.


7) Save results back to CSV

Once you have a filtered DataFrame or summary table, write it out.

by_country.to_csv("signups_by_country.csv",index=False)

What to look for:

index=False prevents Pandas from adding an extra index column.


Examples you can copy

1) Basic “load and inspect” script

importpandasaspd

df=pd.read_csv("data.csv")

print("rows, cols:",df.shape)
print("columns:",list(df.columns))
print(df.head(5))
print("missing values:")
print(df.isna().sum())

2) Filter rows and keep only needed columns

importpandasaspd

df=pd.read_csv("data.csv")

filtered=df[df["country"].isin(["US","CA","GB"])]
filtered=filtered[["date","country","source","signups","revenue"]]

filtered.to_csv("filtered.csv",index=False)

3) Simple report: top sources by revenue

importpandasaspd

df=pd.read_csv("data.csv")

df["revenue"]=pd.to_numeric(df["revenue"],errors="coerce").fillna(0)
df["signups"]=pd.to_numeric(df["signups"],errors="coerce").fillna(0)

report=df.groupby("source",as_index=False).agg(
revenue=("revenue","sum"),
signups=("signups","sum"),
)

report["revenue_per_signup"]= (
report["revenue"]/report["signups"].replace(0,pd.NA)
)

report=report.sort_values("revenue",ascending=False)
report.to_csv("report_by_source.csv",index=False)

print(report.head(10))

Common mistakes and how to fix them

Mistake 1: Using parentheses instead of brackets

You might write:

print(df("country"))

Why it breaks:

Use square brackets df["column"], not parentheses.

Correct approach:

print(df["country"])
print(df[["country","source"]])

Mistake 2: Using and / or instead of & / |

You might write:

paid_us=df[(df["country"]=="US")and (df["plan"]=="paid")]

Why it breaks:

and tries to evaluate the whole column as one True/False value.

Correct approach:

paid_us=df[(df["country"]=="US")& (df["plan"]=="paid")]

Mistake 3: Getting a KeyError due to column name mismatch

You might write:

df["Country"]

Why it breaks:

Column names are case-sensitive.

Correct approach:

print(df.columns)
df["country"]

Troubleshooting

If you see ModuleNotFoundError: No module named 'pandas', install it:

python-m pip install pandas

If you see garbled characters, pass an encoding:

df=pd.read_csv("data.csv",encoding="utf-8")

If numbers load as text (object type), convert them:

df["revenue"]=pd.to_numeric(df["revenue"],errors="coerce")

If your filter gives “truth value of a Series is ambiguous,” you likely used and/or. Use &/| and parentheses.

If your output CSV has an extra first column, add index=False to to_csv().


Quick recap

  • Load data with pd.read_csv() into a DataFrame.
  • Inspect with head(), info(), and isna().sum().
  • Select columns with df[["col1", "col2"]] and filter rows with boolean masks.
  • Convert types with pd.to_numeric() and pd.to_datetime(), use fillna() when needed.
  • Create new columns with math on columns and handle divide-by-zero if needed.
  • Summarize with groupby().agg() and save with to_csv(index=False).