How to Use Regex in Python
What you’ll build or solve
You’ll use regular expressions in Python to find, extract, and replace text reliably. .
When this approach works best
Regex works best when you:
Learn Python on Mimo
- Validate text that must follow a clear pattern, like
EU-42or a date like2026-02-17. - Extract parts of a string, like getting the username and domain from an email.
- Clean or transform text, like removing extra whitespace or stripping non-digit characters from a phone number.
Avoid regex when simple string methods cover the job. For example, split(), replace(), and startswith() stay easier to read for basic cases.
Prerequisites
- Python installed
- You know what a string is
Step-by-step instructions
1) Search for a pattern in text
Use re.search() when you want to find a match anywhere in the string.
Python
importre
text="Order: EU-42 shipped"
m=re.search(r"EU-\d+",text)
ifm:
print(m.group())
What to look for:
Use a raw string like r"\d+" so backslashes work the way you expect. re.search() returns None when nothing matches.
2) Extract parts with capturing groups
Use parentheses () to capture pieces of a match. Then read them with group(1), group(2), or groups().
Python
importre
email="mina@example.com"
m=re.search(r"^(.+)@(.+)$",email)
ifm:
username=m.group(1)
domain=m.group(2)
print(username,domain)
What to look for:
Add ^ and $ when you want the whole string to match, not just a substring inside it.
3) Replace matches with re.sub()
Use re.sub() to replace every match of a pattern.
Python
importre
raw="Call me at +382 67-123-456"
digits_only=re.sub(r"\D","",raw)
print(digits_only)
Option A: Replace using captured groups
This is useful when you want to reorder parts.
Python
importre
date="2026-02-17"
swapped=re.sub(r"^(\d{4})-(\d{2})-(\d{2})$",r"\3/\2/\1",date)
print(swapped)
What to look for:
In the replacement string, \1, \2, and \3 refer to captured groups. Use a raw string for the replacement too, like r"\1".
4) Change matching behavior with flags
Flags modify how matching works, like making it case-insensitive or letting anchors work per line.
Python
importre
text="AdminUser"
print(bool(re.search(r"^admin",text,re.IGNORECASE)))
What to look for:
Use re.IGNORECASE for case-insensitive matching. Use re.MULTILINE when ^ and $ should match per line, not just the whole string.
Examples you can copy
Example 1: Check if a string contains only digits
Python
importre
s="123045"
print(bool(re.fullmatch(r"\d+",s)))
Example 2: Validate a simple ID like EU-42
Python
importre
code="EU-42"
valid=bool(re.fullmatch(r"EU-\d+",code))
print(valid)
Example 3: Extract all hashtags from a post
Python
importre
post="Loving #Python and #regex today!"
tags=re.findall(r"#\w+",post)
print(tags)
Example 4: Normalize repeated whitespace to a single space
Python
importre
s="Hello from Boston\n\nNice to meet you"
normalized=re.sub(r"\s+"," ",s).strip()
print(normalized)
Example 5: Parse key=value pairs from a log line
Python
importre
line="user=mina id=42 plan=pro"
pairs=re.findall(r"(\w+)=([^\s]+)",line)
data= {key:valueforkey,valueinpairs}
print(data)
Example 6: Find and redact emails in text
Python
importre
text="Contact mina@example.com or ivan@test.org for details."
redacted=re.sub(r"[\w.+-]+@[\w-]+\.[\w.-]+","[email hidden]",text)
print(redacted)
Example 7: Compile a pattern you reuse in a loop
Python
importre
pattern=re.compile(r"EU-\d+",re.IGNORECASE)
codes= ["EU-42","eu-7","US-99"]
matches= [cforcincodesifpattern.fullmatch(c)]
print(matches)
Common mistakes and how to fix them
Mistake 1: Forgetting the raw string prefix and breaking escapes
What you might do
importre
m=re.search("\bcat\b","a cat b")
print(m)
Why it breaks
\b becomes a backspace character in a normal Python string. The pattern is not what you think.
Fix
Python
importre
m=re.search(r"\bcat\b","a cat b")
print(bool(m))
Mistake 2: Using match() when you need search()
What you might do
Python
importre
text="Order: EU-42 shipped"
m=re.match(r"EU-\d+",text)
print(m)
Why it breaks
re.match() only checks from the start of the string.
Fix
Python
importre
text="Order: EU-42 shipped"
m=re.search(r"EU-\d+",text)
print(m.group()ifmelse"no match")
Mistake 3: Writing a greedy pattern that grabs too much
What you might do
Python
importre
text="<b>one</b> <b>two</b>"
m=re.search(r"<b>.*</b>",text)
print(m.group())
Why it breaks
.* is greedy. It matches as much as possible.
Fix
Use a non-greedy quantifier *?:
Python
importre
text="<b>one</b> <b>two</b>"
m=re.search(r"<b>.*?</b>",text)
print(m.group())
Troubleshooting
If you see None for every match, print the input and test the pattern on a smaller string first. Confirm you used search() vs match().
If your pattern works in a regex tester but not in Python, check for missing raw strings like "\d" instead of r"\d".
If matches look too large, you may be using greedy quantifiers like .* or .+. Try .*? or be more specific.
If ^ and $ do not work per line, pass re.MULTILINE.
If accented letters or non-English characters fail with \w, remember \w matches letters, digits, and underscore, but behavior can vary. Consider a clearer character class for your use case.
If performance is slow, simplify nested .* patterns and avoid patterns that can backtrack heavily.
Quick recap
- Use
re.search()to find a pattern anywhere in text. - Add groups
()to extract parts withgroup()orgroups(). - Use
re.sub()to replace matches, optionally with backreferences like\1. - Anchor with
^and$when the whole string must match. - Use flags like
re.IGNORECASEandre.MULTILINE.
Join 35M+ people learning for free on Mimo
4.8 out of 5 across 1M+ reviews
Check us out on Apple AppStore, Google Play Store, and Trustpilot