上一篇
Python下载模块教程:requests与urllib实战指南 - Python文件下载技巧
- Python
- 2025-08-14
- 1109
Python下载模块完全指南:高效实现文件下载
作者:Python技术专家
更新日期:2023年10月15日
为什么需要专门的下载模块?
在日常开发中,文件下载是常见需求,但需要考虑诸多因素:网络异常、大文件处理、进度显示、性能优化等。Python提供了多种下载解决方案,本教程将深入讲解最实用的两种:requests和urllib。
教程目录
- ▶ 使用requests库下载文件
- ▶ urllib模块基础下载
- ▶ 添加下载进度条
- ▶ 大文件分块下载
- ▶ 处理下载异常和错误
- ▶ 设置请求头和参数
- ▶ 并发下载优化
- ▶ 实战:图片下载器
1. 使用requests库下载文件
requests是Python中最流行的HTTP库,安装简单:pip install requests
基础下载示例
import requests def download_file(url, save_path): response = requests.get(url) if response.status_code == 200: with open(save_path, 'wb') as f: f.write(response.content) print(f"文件已保存到: {save_path}") else: print(f"下载失败,状态码: {response.status_code}") # 使用示例 download_file('https://example.com/image.jpg', 'downloaded_image.jpg')
流式下载大文件
def download_large_file(url, save_path): with requests.get(url, stream=True) as r: r.raise_for_status() with open(save_path, 'wb') as f: for chunk in r.iter_content(chunk_size=8192): f.write(chunk) print(f"大文件下载完成: {save_path}")
2. 使用urllib模块下载文件
Python标准库中的urllib无需额外安装,适合基础下载需求。
基础下载方法
from urllib import request def download_with_urllib(url, save_path): try: request.urlretrieve(url, save_path) print(f"下载成功: {save_path}") except Exception as e: print(f"下载失败: {str(e)}") # 使用示例 download_with_urllib('https://example.com/document.pdf', 'downloaded_document.pdf')
添加进度显示
def progress_callback(count, block_size, total_size): percent = int(count * block_size * 100 / total_size) print(f"下载进度: {percent}%", end='\r') def download_with_progress(url, save_path): request.urlretrieve(url, save_path, reporthook=progress_callback) print("\n下载完成!")
3. 高级下载技巧
自定义请求头
headers = { 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64)', 'Referer': 'https://example.com/' } response = requests.get(url, headers=headers)
错误重试机制
from requests.adapters import HTTPAdapter from requests.packages.urllib3.util.retry import Retry session = requests.Session() retries = Retry(total=3, backoff_factor=0.1) session.mount('http://', HTTPAdapter(max_retries=retries)) session.mount('https://', HTTPAdapter(max_retries=retries))
多线程下载加速
import threading import os def download_chunk(url, start, end, filename): headers = {'Range': f'bytes={start}-{end}'} response = requests.get(url, headers=headers, stream=True) with open(filename, "r+b") as f: f.seek(start) f.write(response.content) def parallel_download(url, num_threads=4): response = requests.head(url) file_size = int(response.headers.get('content-length', 0)) chunk_size = file_size // num_threads with open("downloaded_file", "wb") as f: f.truncate(file_size) threads = [] for i in range(num_threads): start = i * chunk_size end = start + chunk_size - 1 if i < num_threads - 1 else file_size - 1 thread = threading.Thread( target=download_chunk, args=(url, start, end, "downloaded_file") ) thread.start() threads.append(thread) for thread in threads: thread.join() print("多线程下载完成!")
4. 实战:图片下载器
结合所学知识,创建一个功能完整的图片下载器:
import requests import os from urllib.parse import urlparse def download_image(url, folder="images"): if not os.path.exists(folder): os.makedirs(folder) try: response = requests.get(url, stream=True, timeout=10) response.raise_for_status() # 从URL获取文件名 parsed = urlparse(url) filename = os.path.basename(parsed.path) if not filename: filename = f"image_{int(time.time())}.jpg" save_path = os.path.join(folder, filename) # 下载并显示进度 file_size = int(response.headers.get('content-length', 0)) downloaded = 0 with open(save_path, 'wb') as f: for chunk in response.iter_content(chunk_size=8192): downloaded += len(chunk) f.write(chunk) progress = int(downloaded * 100 / file_size) if file_size > 0 else 0 print(f"下载进度: {progress}%", end='\r') print(f"\n图片已保存到: {save_path}") return True except Exception as e: print(f"下载失败: {str(e)}") return False # 使用示例 download_image("https://example.com/sample.jpg")
功能特点:
- 自动创建保存目录
- 智能文件名提取
- 实时进度显示
- 异常处理和超时设置
- 流式下载节省内存
掌握Python下载的核心技巧
本教程涵盖了Python文件下载的关键技术:
requests库使用
urllib标准库
进度条实现
大文件处理
错误重试机制
多线程加速
将这些技术应用到实际项目中,可以构建出高效可靠的文件下载功能!
本文由XiangHui于2025-08-14发表在吾爱品聚,如有疑问,请联系我们。
本文链接:https://521pj.cn/20258135.html
发表评论