示例1:基本文字替换
import re
text = "Python is an excellent programming language. I love Python!"
result = re.sub(r"Python", "JavaScript", text)
print(result)
# 输出: JavaScript is an excellent programming language. I love JavaScript!
全面指南 - 使用re.sub()方法高效处理文本内容
在Python中,re.sub()
函数是处理文本替换的强大工具。基本语法如下:
import re
result = re.sub(pattern, replacement, string, count=0, flags=0)
import re
text = "Python is an excellent programming language. I love Python!"
result = re.sub(r"Python", "JavaScript", text)
print(result)
# 输出: JavaScript is an excellent programming language. I love JavaScript!
text = "Python is great. Do you like python?"
result = re.sub(r"python", "JavaScript", text, flags=re.IGNORECASE)
print(result)
# 输出: JavaScript is great. Do you like JavaScript?
def to_upper(match):
return match.group(0).upper()
text = "make this important. and this too!"
result = re.sub(r"important|too", to_upper, text)
print(result)
# 输出: make this IMPORTANT. and this TOO!
text = "This text has too many spaces."
result = re.sub(r"\s+", " ", text)
print(result)
# 输出: This text has too many spaces.
text = " This has leading and trailing spaces. "
result = re.sub(r"^\s+|\s+$", "", text)
print(f"'{result}'")
# 输出: 'This has leading and trailing spaces.'
text = "This text has\tspaces and\nnew lines."
result = re.sub(r"\s+", "", text)
print(result)
# 输出: Thistexthasspacesandnewlines.
text = " This has\n extra spaces \n between words. "
# 替换多个空格但保留换行符
result = re.sub(r"[^\S\n]+", " ", text)
# 删除开头和结尾空格
result = re.sub(r"^\s+|\s+$", "", result, flags=re.MULTILINE)
print(result)
# 输出: This has\n extra spaces\n between words.
def clean_input(user_input):
# 移除多余空格
cleaned = re.sub(r"\s+", " ", user_input)
# 移除首尾空格
cleaned = cleaned.strip()
# 替换特殊字符
cleaned = re.sub(r"[^\w\s]", "", cleaned)
return cleaned
user_text = " Hello, World! This is some $ text! "
print(clean_input(user_text))
# 输出: Hello World This is some text
def format_phone_number(phone):
# 移除非数字字符
cleaned = re.sub(r"\D", "", phone)
# 格式化为 (123) 456-7890
formatted = re.sub(r"(\d{3})(\d{3})(\d{4})", r"(\1) \2-\3", cleaned)
return formatted
print(format_phone_number("555-123-4567")) # (555) 123-4567
print(format_phone_number("1 (800) 555-1234")) # (800) 555-1234
html_text = "<p> This is <b>bold</b> text! </p>"
# 去除HTML标签
text_only = re.sub(r"<.*?>", "", html_text)
# 清理多余空格
cleaned_text = re.sub(r"\s+", " ", text_only).strip()
print(cleaned_text)
# 输出: This is bold text!
对于频繁使用的模式,使用re.compile()
提高效率:
space_pattern = re.compile(r"\s+")
result = space_pattern.sub(" ", text)
使用非贪婪匹配.*?
避免匹配过多内容:
# 贪婪匹配
re.sub(r"<.*>", "", "<div>content</div><p>more</p>")
# 输出: ""
# 非贪婪匹配
re.sub(r"<.*?>", "", "<div>content</div><p>more</p>")
# 输出: "contentmore"
处理包含正则特殊字符的文本时使用re.escape()
:
search_term = "file.txt"
safe_pattern = re.escape(search_term)
result = re.sub(safe_pattern, "document.txt", "Find file.txt here")
对于简单替换,字符串方法可能更快:
# 简单替换 - 使用字符串方法更快
text.replace("old", "new")
# 复杂模式 - 使用正则表达式
Python正则表达式替换教程 © 2023 | 专注于文本处理技巧
本文由MuXiaoNan于2025-07-23发表在吾爱品聚,如有疑问,请联系我们。
本文链接:https://521pj.cn/20256309.html
发表评论