Python正则表达式：匹配替换文字及空格的完整指南 | Python字符串处理教程

正则表达式替换基础

在Python中，re.sub() 函数是处理文本替换的强大工具。基本语法如下：

import re

result = re.sub(pattern, replacement, string, count=0, flags=0)

• pattern: 要匹配的正则表达式模式
• replacement: 替换的字符串或函数
• string: 要处理的原始字符串
• count: 最大替换次数（0表示全部替换）
• flags: 正则表达式标志（如re.IGNORECASE）

文字替换示例

示例1：基本文字替换

import re

text = "Python is an excellent programming language. I love Python!"
result = re.sub(r"Python", "JavaScript", text)

print(result)
# 输出: JavaScript is an excellent programming language. I love JavaScript!

示例2：不区分大小写替换

text = "Python is great. Do you like python?"
result = re.sub(r"python", "JavaScript", text, flags=re.IGNORECASE)

print(result)
# 输出: JavaScript is great. Do you like JavaScript?

示例3：使用函数进行复杂替换

def to_upper(match):
    return match.group(0).upper()

text = "make this important. and this too!"
result = re.sub(r"important|too", to_upper, text)

print(result)
# 输出: make this IMPORTANT. and this TOO!

空格处理技巧

示例4：替换多个连续空格为单个空格

text = "This   text    has    too    many     spaces."
result = re.sub(r"\s+", " ", text)

print(result)
# 输出: This text has too many spaces.

示例5：删除字符串开头和结尾的空格

text = "   This has leading and trailing spaces.   "
result = re.sub(r"^\s+|\s+$", "", text)

print(f"'{result}'")
# 输出: 'This has leading and trailing spaces.'

示例6：删除所有空格（包括制表符、换行符）

text = "This text has\tspaces and\nnew lines."
result = re.sub(r"\s+", "", text)

print(result)
# 输出: Thistexthasspacesandnewlines.

示例7：保留换行符的文本清理

text = "  This  has\n  extra  spaces  \n  between  words.  "
# 替换多个空格但保留换行符
result = re.sub(r"[^\S\n]+", " ", text)
# 删除开头和结尾空格
result = re.sub(r"^\s+|\s+$", "", result, flags=re.MULTILINE)

print(result)
# 输出: This has\n extra spaces\n between words.

实用场景应用

场景1：清理用户输入

def clean_input(user_input):
    # 移除多余空格
    cleaned = re.sub(r"\s+", " ", user_input)
    # 移除首尾空格
    cleaned = cleaned.strip()
    # 替换特殊字符
    cleaned = re.sub(r"[^\w\s]", "", cleaned)
    return cleaned

user_text = "  Hello,   World! This is some $ text!   "
print(clean_input(user_text))
# 输出: Hello World This is some text

场景2：格式化电话号码

def format_phone_number(phone):
    # 移除非数字字符
    cleaned = re.sub(r"\D", "", phone)
    # 格式化为 (123) 456-7890
    formatted = re.sub(r"(\d{3})(\d{3})(\d{4})", r"(\1) \2-\3", cleaned)
    return formatted

print(format_phone_number("555-123-4567"))      # (555) 123-4567
print(format_phone_number("1 (800) 555-1234"))  # (800) 555-1234

场景3：处理HTML文本

html_text = "<p>  This   is <b>bold</b> text!  </p>"
# 去除HTML标签
text_only = re.sub(r"<.*?>", "", html_text)
# 清理多余空格
cleaned_text = re.sub(r"\s+", " ", text_only).strip()

print(cleaned_text)
# 输出: This is bold text!

最佳实践与注意事项

1. 编译常用模式

对于频繁使用的模式，使用re.compile()提高效率：

space_pattern = re.compile(r"\s+")
result = space_pattern.sub(" ", text)

2. 注意贪婪匹配

使用非贪婪匹配.*?避免匹配过多内容：

# 贪婪匹配
re.sub(r"<.*>", "", "<div>content</div><p>more</p>")
# 输出: ""

# 非贪婪匹配
re.sub(r"<.*?>", "", "<div>content</div><p>more</p>")
# 输出: "contentmore"

3. 特殊字符转义

处理包含正则特殊字符的文本时使用re.escape()：

search_term = "file.txt"
safe_pattern = re.escape(search_term)
result = re.sub(safe_pattern, "document.txt", "Find file.txt here")

4. 性能考虑

对于简单替换，字符串方法可能更快：

# 简单替换 - 使用字符串方法更快
text.replace("old", "new")

# 复杂模式 - 使用正则表达式

Python正则表达式：匹配替换文字及空格的完整指南 | Python字符串处理教程

Python正则表达式：匹配替换文字及空格

正则表达式替换基础

文字替换示例

示例1：基本文字替换

示例2：不区分大小写替换

示例3：使用函数进行复杂替换

空格处理技巧

示例4：替换多个连续空格为单个空格

示例5：删除字符串开头和结尾的空格

示例6：删除所有空格（包括制表符、换行符）

示例7：保留换行符的文本清理

实用场景应用

场景1：清理用户输入

场景2：格式化电话号码

场景3：处理HTML文本

最佳实践与注意事项

1. 编译常用模式

2. 注意贪婪匹配

3. 特殊字符转义

4. 性能考虑

Python中NaN是什么？详解Python中NaN的含义及处理方法

Python List求和方法大全 - 5种实用方法详解

发表评论取消回复

Python正则表达式：匹配替换文字及空格的完整指南 | Python字符串处理教程

正则表达式替换基础

文字替换示例

示例1：基本文字替换

示例2：不区分大小写替换

示例3：使用函数进行复杂替换

空格处理技巧

示例4：替换多个连续空格为单个空格

示例5：删除字符串开头和结尾的空格

示例6：删除所有空格（包括制表符、换行符）

示例7：保留换行符的文本清理

实用场景应用

场景1：清理用户输入

场景2：格式化电话号码

场景3：处理HTML文本

最佳实践与注意事项

1. 编译常用模式

2. 注意贪婪匹配

3. 特殊字符转义

4. 性能考虑

Python中NaN是什么？详解Python中NaN的含义及处理方法

Python List求和方法大全 - 5种实用方法详解

相关文章

发表评论取消回复