当前位置:首页 > Python > 正文

Python下载模块教程:requests与urllib实战指南 - Python文件下载技巧

Python下载模块完全指南:高效实现文件下载

作者:Python技术专家 更新日期:2023年10月15日

为什么需要专门的下载模块?

在日常开发中,文件下载是常见需求,但需要考虑诸多因素:网络异常、大文件处理、进度显示、性能优化等。Python提供了多种下载解决方案,本教程将深入讲解最实用的两种:requestsurllib

教程目录

  • ▶ 使用requests库下载文件
  • ▶ urllib模块基础下载
  • ▶ 添加下载进度条
  • ▶ 大文件分块下载
  • ▶ 处理下载异常和错误
  • ▶ 设置请求头和参数
  • ▶ 并发下载优化
  • ▶ 实战:图片下载器

1. 使用requests库下载文件

requests是Python中最流行的HTTP库,安装简单:pip install requests

基础下载示例

import requests

def download_file(url, save_path):
    response = requests.get(url)
    if response.status_code == 200:
        with open(save_path, 'wb') as f:
            f.write(response.content)
        print(f"文件已保存到: {save_path}")
    else:
        print(f"下载失败,状态码: {response.status_code}")

# 使用示例
download_file('https://example.com/image.jpg', 'downloaded_image.jpg')

流式下载大文件

def download_large_file(url, save_path):
    with requests.get(url, stream=True) as r:
        r.raise_for_status()
        with open(save_path, 'wb') as f:
            for chunk in r.iter_content(chunk_size=8192): 
                f.write(chunk)
    print(f"大文件下载完成: {save_path}")

2. 使用urllib模块下载文件

Python标准库中的urllib无需额外安装,适合基础下载需求。

基础下载方法

from urllib import request

def download_with_urllib(url, save_path):
    try:
        request.urlretrieve(url, save_path)
        print(f"下载成功: {save_path}")
    except Exception as e:
        print(f"下载失败: {str(e)}")

# 使用示例
download_with_urllib('https://example.com/document.pdf', 'downloaded_document.pdf')

添加进度显示

def progress_callback(count, block_size, total_size):
    percent = int(count * block_size * 100 / total_size)
    print(f"下载进度: {percent}%", end='\r')

def download_with_progress(url, save_path):
    request.urlretrieve(url, save_path, reporthook=progress_callback)
    print("\n下载完成!")

3. 高级下载技巧

自定义请求头

headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64)',
    'Referer': 'https://example.com/'
}
response = requests.get(url, headers=headers)

错误重试机制

from requests.adapters import HTTPAdapter
from requests.packages.urllib3.util.retry import Retry

session = requests.Session()
retries = Retry(total=3, backoff_factor=0.1)
session.mount('http://', HTTPAdapter(max_retries=retries))
session.mount('https://', HTTPAdapter(max_retries=retries))

多线程下载加速

import threading
import os

def download_chunk(url, start, end, filename):
    headers = {'Range': f'bytes={start}-{end}'}
    response = requests.get(url, headers=headers, stream=True)
    with open(filename, "r+b") as f:
        f.seek(start)
        f.write(response.content)

def parallel_download(url, num_threads=4):
    response = requests.head(url)
    file_size = int(response.headers.get('content-length', 0))
    chunk_size = file_size // num_threads
    
    with open("downloaded_file", "wb") as f:
        f.truncate(file_size)
    
    threads = []
    for i in range(num_threads):
        start = i * chunk_size
        end = start + chunk_size - 1 if i < num_threads - 1 else file_size - 1
        thread = threading.Thread(
            target=download_chunk, 
            args=(url, start, end, "downloaded_file")
        )
        thread.start()
        threads.append(thread)
    
    for thread in threads:
        thread.join()
        
    print("多线程下载完成!")

4. 实战:图片下载器

结合所学知识,创建一个功能完整的图片下载器:

import requests
import os
from urllib.parse import urlparse

def download_image(url, folder="images"):
    if not os.path.exists(folder):
        os.makedirs(folder)
    
    try:
        response = requests.get(url, stream=True, timeout=10)
        response.raise_for_status()
        
        # 从URL获取文件名
        parsed = urlparse(url)
        filename = os.path.basename(parsed.path)
        if not filename:
            filename = f"image_{int(time.time())}.jpg"
        
        save_path = os.path.join(folder, filename)
        
        # 下载并显示进度
        file_size = int(response.headers.get('content-length', 0))
        downloaded = 0
        
        with open(save_path, 'wb') as f:
            for chunk in response.iter_content(chunk_size=8192):
                downloaded += len(chunk)
                f.write(chunk)
                progress = int(downloaded * 100 / file_size) if file_size > 0 else 0
                print(f"下载进度: {progress}%", end='\r')
        
        print(f"\n图片已保存到: {save_path}")
        return True
    
    except Exception as e:
        print(f"下载失败: {str(e)}")
        return False

# 使用示例
download_image("https://example.com/sample.jpg")

功能特点:

  • 自动创建保存目录
  • 智能文件名提取
  • 实时进度显示
  • 异常处理和超时设置
  • 流式下载节省内存

掌握Python下载的核心技巧

本教程涵盖了Python文件下载的关键技术:

requests库使用 urllib标准库 进度条实现 大文件处理 错误重试机制 多线程加速

将这些技术应用到实际项目中,可以构建出高效可靠的文件下载功能!

发表评论