How to Use Regex in Python

What you’ll build or solve

You’ll use regular expressions in Python to find, extract, and replace text reliably. .

When this approach works best

Regex works best when you:

  • Validate text that must follow a clear pattern, like EU-42 or a date like 2026-02-17.
  • Extract parts of a string, like getting the username and domain from an email.
  • Clean or transform text, like removing extra whitespace or stripping non-digit characters from a phone number.

Avoid regex when simple string methods cover the job. For example, split(), replace(), and startswith() stay easier to read for basic cases.

Prerequisites

  • Python installed
  • You know what a string is

Step-by-step instructions

1) Search for a pattern in text

Use re.search() when you want to find a match anywhere in the string.

importre

text="Order: EU-42 shipped"
m=re.search(r"EU-\d+",text)

ifm:
print(m.group())

What to look for:

Use a raw string like r"\d+" so backslashes work the way you expect. re.search() returns None when nothing matches.


2) Extract parts with capturing groups

Use parentheses () to capture pieces of a match. Then read them with group(1), group(2), or groups().

importre

email="mina@example.com"
m=re.search(r"^(.+)@(.+)$",email)

ifm:
username=m.group(1)
domain=m.group(2)
print(username,domain)

What to look for:

Add ^ and $ when you want the whole string to match, not just a substring inside it.


3) Replace matches with re.sub()

Use re.sub() to replace every match of a pattern.

importre

raw="Call me at +382 67-123-456"
digits_only=re.sub(r"\D","",raw)
print(digits_only)

Option A: Replace using captured groups

This is useful when you want to reorder parts.

importre

date="2026-02-17"
swapped=re.sub(r"^(\d{4})-(\d{2})-(\d{2})$",r"\3/\2/\1",date)
print(swapped)

What to look for:

In the replacement string, \1, \2, and \3 refer to captured groups. Use a raw string for the replacement too, like r"\1".


4) Change matching behavior with flags

Flags modify how matching works, like making it case-insensitive or letting anchors work per line.

importre

text="AdminUser"
print(bool(re.search(r"^admin",text,re.IGNORECASE)))

What to look for:

Use re.IGNORECASE for case-insensitive matching. Use re.MULTILINE when ^ and $ should match per line, not just the whole string.


Examples you can copy

Example 1: Check if a string contains only digits

importre

s="123045"
print(bool(re.fullmatch(r"\d+",s)))

Example 2: Validate a simple ID like EU-42

importre

code="EU-42"
valid=bool(re.fullmatch(r"EU-\d+",code))
print(valid)

Example 3: Extract all hashtags from a post

importre

post="Loving #Python and #regex today!"
tags=re.findall(r"#\w+",post)
print(tags)

Example 4: Normalize repeated whitespace to a single space

importre

s="Hello     from   Boston\n\nNice to meet you"
normalized=re.sub(r"\s+"," ",s).strip()
print(normalized)

Example 5: Parse key=value pairs from a log line

importre

line="user=mina id=42 plan=pro"
pairs=re.findall(r"(\w+)=([^\s]+)",line)

data= {key:valueforkey,valueinpairs}
print(data)

Example 6: Find and redact emails in text

importre

text="Contact mina@example.com or ivan@test.org for details."
redacted=re.sub(r"[\w.+-]+@[\w-]+\.[\w.-]+","[email hidden]",text)
print(redacted)

Example 7: Compile a pattern you reuse in a loop

importre

pattern=re.compile(r"EU-\d+",re.IGNORECASE)

codes= ["EU-42","eu-7","US-99"]
matches= [cforcincodesifpattern.fullmatch(c)]
print(matches)

Common mistakes and how to fix them

Mistake 1: Forgetting the raw string prefix and breaking escapes

What you might do

importre

m=re.search("\bcat\b","a cat b")
print(m)

Why it breaks

\b becomes a backspace character in a normal Python string. The pattern is not what you think.

Fix

importre

m=re.search(r"\bcat\b","a cat b")
print(bool(m))

Mistake 2: Using match() when you need search()

What you might do

importre

text="Order: EU-42 shipped"
m=re.match(r"EU-\d+",text)
print(m)

Why it breaks

re.match() only checks from the start of the string.

Fix

importre

text="Order: EU-42 shipped"
m=re.search(r"EU-\d+",text)
print(m.group()ifmelse"no match")

Mistake 3: Writing a greedy pattern that grabs too much

What you might do

importre

text="<b>one</b> <b>two</b>"
m=re.search(r"<b>.*</b>",text)
print(m.group())

Why it breaks

.* is greedy. It matches as much as possible.

Fix

Use a non-greedy quantifier *?:

importre

text="<b>one</b> <b>two</b>"
m=re.search(r"<b>.*?</b>",text)
print(m.group())

Troubleshooting

If you see None for every match, print the input and test the pattern on a smaller string first. Confirm you used search() vs match().

If your pattern works in a regex tester but not in Python, check for missing raw strings like "\d" instead of r"\d".

If matches look too large, you may be using greedy quantifiers like .* or .+. Try .*? or be more specific.

If ^ and $ do not work per line, pass re.MULTILINE.

If accented letters or non-English characters fail with \w, remember \w matches letters, digits, and underscore, but behavior can vary. Consider a clearer character class for your use case.

If performance is slow, simplify nested .* patterns and avoid patterns that can backtrack heavily.


Quick recap

  • Use re.search() to find a pattern anywhere in text.
  • Add groups () to extract parts with group() or groups().
  • Use re.sub() to replace matches, optionally with backreferences like \1.
  • Anchor with ^ and $ when the whole string must match.
  • Use flags like re.IGNORECASE and re.MULTILINE.