How to Use Regex in Python
What you’ll build or solve
You’ll use regular expressions in Python to find, extract, and replace text reliably.
When this approach works best
Regex works best when you:
Learn Python on Mimo
- Validate text that must follow a clear pattern, like
EU-42or a date like2026-02-17. - Extract parts of a string, like getting the username and domain from an email.
- Clean or transform text, like removing extra whitespace or stripping non-digit characters from a phone number.
Avoid regex when simple string methods cover the job. For example, split(), replace(), and startswith() stay easier to read for basic cases.
Prerequisites
- Python installed
- You know what a string is
Step-by-step instructions
1) Search for a pattern in text
Use re.search() when you want to find a match anywhere in the string.
Python
import re
text = "Order: EU-42 shipped"
m = re.search(r"EU-\d+", text)
if m:
print(m.group())
What to look for:
Use a raw string like r"\d+" so backslashes work the way you expect. re.search() returns None when nothing matches.
2) Extract parts with capturing groups
Use parentheses () to capture pieces of a match. Then read them with group(1), group(2), or groups().
Python
import re
email = "mina@example.com"
m = re.search(r"^(.+)@(.+)$", email)
if m:
username = m.group(1)
domain = m.group(2)
print(username, domain)
What to look for:
Add ^ and $ when you want the whole string to match, not just a substring inside it.
3) Replace matches with re.sub()
Use re.sub() to replace every match of a pattern.
Python
import re
raw = "Call me at +382 67-123-456"
digits_only = re.sub(r"\D", "", raw)
print(digits_only)
Option A: Replace using captured groups
This is useful when you want to reorder parts.
Python
import re
date = "2026-02-17"
swapped = re.sub(r"^(\d{4})-(\d{2})-(\d{2})$", r"\3/\2/\1", date)
print(swapped)
What to look for:
In the replacement string, \1, \2, and \3 refer to captured groups. Use a raw string for the replacement too, like r"\1".
4) Change matching behavior with flags
Flags modify how matching works, like making it case-insensitive or letting anchors work per line.
Python
import re
text = "AdminUser"
print(bool(re.search(r"^admin", text, re.IGNORECASE)))
What to look for:
Use re.IGNORECASE for case-insensitive matching. Use re.MULTILINE when ^ and $ should match per line, not just the whole string.
Examples you can copy
Example 1: Check if a string contains only digits
Python
import re
s = "123045"
print(bool(re.fullmatch(r"\d+", s)))
Example 2: Validate a simple ID like EU-42
Python
import re
code = "EU-42"
valid = bool(re.fullmatch(r"EU-\d+", code))
print(valid)
Example 3: Extract all hashtags from a post
Python
import re
post = "Loving #Python and #regex today!"
tags = re.findall(r"#\w+", post)
print(tags)
Example 4: Normalize repeated whitespace to a single space
Python
import re
s = "Hello from Boston\n\nNice to meet you"
normalized = re.sub(r"\s+", " ", s).strip()
print(normalized)
Example 5: Parse key=value pairs from a log line
Python
import re
line = "user=mina id=42 plan=pro"
pairs = re.findall(r"(\w+)=([^\s]+)", line)
data = {key: value for key, value in pairs}
print(data)
Example 6: Find and redact emails in text
Python
import re
text = "Contact mina@example.com or ivan@test.org for details."
redacted = re.sub(r"[\w.+-]+@[\w-]+\.[\w.-]+", "[email hidden]", text)
print(redacted)
Example 7: Compile a pattern you reuse in a loop
Python
import re
pattern = re.compile(r"EU-\d+", re.IGNORECASE)
codes = ["EU-42", "eu-7", "US-99"]
matches = [c for c in codes if pattern.fullmatch(c)]
print(matches)
Common mistakes and how to fix them
Mistake 1: Forgetting the raw string prefix and breaking escapes
What you might do
Python
import re
m = re.search("\bcat\b", "a cat b")
print(m)
Why it breaks
\b becomes a backspace character in a normal Python string. The pattern is not what you think.
Fix
Python
import re
m = re.search(r"\bcat\b", "a cat b")
print(bool(m))
Mistake 2: Using match() when you need search()
What you might do
Python
import re
text = "Order: EU-42 shipped"
m = re.match(r"EU-\d+", text)
print(m)
Why it breaks
re.match() only checks from the start of the string.
Fix
Python
import re
text = "Order: EU-42 shipped"
m = re.search(r"EU-\d+", text)
print(m.group() if m else "no match")
Mistake 3: Writing a greedy pattern that grabs too much
What you might do
Python
import re
text = "<b>one</b> <b>two</b>"
m = re.search(r"<b>.*</b>", text)
print(m.group())
Why it breaks
.* is greedy. It matches as much as possible.
Fix
Use a non-greedy quantifier *?:
Python
import re
text = "<b>one</b> <b>two</b>"
m = re.search(r"<b>.*?</b>", text)
print(m.group())
Troubleshooting
If you see None for every match, print the input and test the pattern on a smaller string first. Confirm you used search() vs match().
If your pattern works in a regex tester but not in Python, check for missing raw strings like "\d" instead of r"\d".
If matches look too large, you may be using greedy quantifiers like .* or .+. Try .*? or be more specific.
If ^ and $ do not work per line, pass re.MULTILINE.
If accented letters or non-English characters fail with \w, remember \w matches letters, digits, and underscore, but behavior can vary. Consider a clearer character class for your use case.
If performance is slow, simplify nested .* patterns and avoid patterns that can backtrack heavily.
Quick recap
- Use
re.search()to find a pattern anywhere in text. - Add groups
()to extract parts withgroup()orgroups(). - Use
re.sub()to replace matches, optionally with backreferences like\1. - Anchor with
^and$when the whole string must match. - Use flags like
re.IGNORECASEandre.MULTILINE.
Join 35M+ people learning for free on Mimo
4.8 out of 5 across 1M+ reviews
Check us out on Apple AppStore, Google Play Store, and Trustpilot