How to Use Regex in Python

What you’ll build or solve

You’ll use regular expressions in Python to find, extract, and replace text reliably.

When this approach works best

Regex works best when you:

  • Validate text that must follow a clear pattern, like EU-42 or a date like 2026-02-17.
  • Extract parts of a string, like getting the username and domain from an email.
  • Clean or transform text, like removing extra whitespace or stripping non-digit characters from a phone number.

Avoid regex when simple string methods cover the job. For example, split(), replace(), and startswith() stay easier to read for basic cases.

Prerequisites

  • Python installed
  • You know what a string is

Step-by-step instructions

1) Search for a pattern in text

Use re.search() when you want to find a match anywhere in the string.

import re

text = "Order: EU-42 shipped"
m = re.search(r"EU-\d+", text)

if m:
    print(m.group())

What to look for:

Use a raw string like r"\d+" so backslashes work the way you expect. re.search() returns None when nothing matches.


2) Extract parts with capturing groups

Use parentheses () to capture pieces of a match. Then read them with group(1), group(2), or groups().

import re

email = "mina@example.com"
m = re.search(r"^(.+)@(.+)$", email)

if m:
    username = m.group(1)
    domain = m.group(2)
    print(username, domain)

What to look for:

Add ^ and $ when you want the whole string to match, not just a substring inside it.


3) Replace matches with re.sub()

Use re.sub() to replace every match of a pattern.

import re

raw = "Call me at +382 67-123-456"
digits_only = re.sub(r"\D", "", raw)
print(digits_only)

Option A: Replace using captured groups

This is useful when you want to reorder parts.

import re

date = "2026-02-17"
swapped = re.sub(r"^(\d{4})-(\d{2})-(\d{2})$", r"\3/\2/\1", date)
print(swapped)

What to look for:

In the replacement string, \1, \2, and \3 refer to captured groups. Use a raw string for the replacement too, like r"\1".


4) Change matching behavior with flags

Flags modify how matching works, like making it case-insensitive or letting anchors work per line.

import re

text = "AdminUser"
print(bool(re.search(r"^admin", text, re.IGNORECASE)))

What to look for:

Use re.IGNORECASE for case-insensitive matching. Use re.MULTILINE when ^ and $ should match per line, not just the whole string.


Examples you can copy

Example 1: Check if a string contains only digits

import re

s = "123045"
print(bool(re.fullmatch(r"\d+", s)))

Example 2: Validate a simple ID like EU-42

import re

code = "EU-42"
valid = bool(re.fullmatch(r"EU-\d+", code))
print(valid)

Example 3: Extract all hashtags from a post

import re

post = "Loving #Python and #regex today!"
tags = re.findall(r"#\w+", post)
print(tags)

Example 4: Normalize repeated whitespace to a single space

import re

s = "Hello     from   Boston\n\nNice to meet you"
normalized = re.sub(r"\s+", " ", s).strip()
print(normalized)

Example 5: Parse key=value pairs from a log line

import re

line = "user=mina id=42 plan=pro"
pairs = re.findall(r"(\w+)=([^\s]+)", line)

data = {key: value for key, value in pairs}
print(data)

Example 6: Find and redact emails in text

import re

text = "Contact mina@example.com or ivan@test.org for details."
redacted = re.sub(r"[\w.+-]+@[\w-]+\.[\w.-]+", "[email hidden]", text)
print(redacted)

Example 7: Compile a pattern you reuse in a loop

import re

pattern = re.compile(r"EU-\d+", re.IGNORECASE)

codes = ["EU-42", "eu-7", "US-99"]
matches = [c for c in codes if pattern.fullmatch(c)]
print(matches)

Common mistakes and how to fix them

Mistake 1: Forgetting the raw string prefix and breaking escapes

What you might do

import re

m = re.search("\bcat\b", "a cat b")
print(m)

Why it breaks

\b becomes a backspace character in a normal Python string. The pattern is not what you think.

Fix

import re

m = re.search(r"\bcat\b", "a cat b")
print(bool(m))

Mistake 2: Using match() when you need search()

What you might do

import re

text = "Order: EU-42 shipped"
m = re.match(r"EU-\d+", text)
print(m)

Why it breaks

re.match() only checks from the start of the string.

Fix

import re

text = "Order: EU-42 shipped"
m = re.search(r"EU-\d+", text)
print(m.group() if m else "no match")

Mistake 3: Writing a greedy pattern that grabs too much

What you might do

import re

text = "<b>one</b> <b>two</b>"
m = re.search(r"<b>.*</b>", text)
print(m.group())

Why it breaks

.* is greedy. It matches as much as possible.

Fix

Use a non-greedy quantifier *?:

import re

text = "<b>one</b> <b>two</b>"
m = re.search(r"<b>.*?</b>", text)
print(m.group())

Troubleshooting

If you see None for every match, print the input and test the pattern on a smaller string first. Confirm you used search() vs match().

If your pattern works in a regex tester but not in Python, check for missing raw strings like "\d" instead of r"\d".

If matches look too large, you may be using greedy quantifiers like .* or .+. Try .*? or be more specific.

If ^ and $ do not work per line, pass re.MULTILINE.

If accented letters or non-English characters fail with \w, remember \w matches letters, digits, and underscore, but behavior can vary. Consider a clearer character class for your use case.

If performance is slow, simplify nested .* patterns and avoid patterns that can backtrack heavily.


Quick recap

  • Use re.search() to find a pattern anywhere in text.
  • Add groups () to extract parts with group() or groups().
  • Use re.sub() to replace matches, optionally with backreferences like \1.
  • Anchor with ^ and $ when the whole string must match.
  • Use flags like re.IGNORECASE and re.MULTILINE.