实验目标

本实验将帮助你理解对抗样本的迁移性，体验如何在黑盒场景下利用替代模型发起攻击。

学习目标

完成本实验后，你将能够：

理解对抗样本迁移性的概念和原因
使用一个模型生成对抗样本，测试其对其他模型的效果
观察不同模型架构之间的迁移成功率
实践集成攻击方法提高迁移性
认识黑盒攻击的现实威胁

实验前提

环境要求

Python 3.8+
PyTorch 1.10+
torchvision（需要多个预训练模型）
matplotlib
numpy

建议先完成实验 3.1 和 3.2 再进行本实验。

实验内容

实验 3.3：黑盒迁移攻击

实验目标

- 理解对抗样本的迁移性原理
- 在替代模型上生成对抗样本，测试对目标模型的效果
- 观察不同模型间的迁移成功率

实验环境

- Python 3.8+
- PyTorch
- torchvision（多个预训练模型）

预计时间：30 分钟

---

核心概念回顾

迁移攻击：在本地替代模型上用白盒方法生成对抗样本，然后测试它是否对未知的目标模型也有效。

第一部分：环境准备

In [ ]:

# 导入必要的库
import torch
import torch.nn as nn
import numpy as np
import matplotlib.pyplot as plt
from torchvision import models, transforms

# 设置中文显示
plt.rcParams['font.sans-serif'] = ['SimHei', 'Arial Unicode MS']
plt.rcParams['axes.unicode_minus'] = False

# 标准化参数
normalize = transforms.Normalize(
    mean=[0.485, 0.456, 0.406],
    std=[0.229, 0.224, 0.225]
)

print("环境准备完成！")

In [ ]:

# 加载多个预训练模型
# 我们将用一个模型作为"替代模型"，其他作为"目标模型"

print("正在加载多个模型（这可能需要一些时间）...")

# 替代模型：我们用它来生成对抗样本
surrogate_model = models.resnet18(pretrained=True)
surrogate_model.eval()
print("✓ 替代模型 (ResNet18) 加载完成")

# 目标模型：我们测试对抗样本是否能迁移
target_models = {
    'ResNet34': models.resnet34(pretrained=True),
    'VGG16': models.vgg16(pretrained=True),
    'DenseNet121': models.densenet121(pretrained=True),
}

for name, model in target_models.items():
    model.eval()
    print(f"✓ 目标模型 ({name}) 加载完成")

print("\n所有模型加载完成！")

In [ ]:

# 创建测试图片
def create_test_image():
    np.random.seed(42)
    img = np.random.rand(224, 224, 3) * 0.3 + 0.35
    center_x, center_y = 112, 112
    for i in range(224):
        for j in range(224):
            dist = np.sqrt((i - center_x)**2 + (j - center_y)**2)
            if dist < 60:
                img[i, j] = [0.1, 0.1, 0.1]
            elif dist < 80:
                img[i, j] = [0.9, 0.9, 0.9]
    return torch.tensor(img, dtype=torch.float32).permute(2, 0, 1)

def predict(model, img_tensor):
    """获取模型预测"""
    input_tensor = normalize(img_tensor).unsqueeze(0)
    with torch.no_grad():
        output = model(input_tensor)
    probs = torch.softmax(output, dim=1)
    pred_class = output.argmax(dim=1).item()
    confidence = probs[0, pred_class].item()
    return pred_class, confidence

# 创建测试图片
original_image = create_test_image()

# 在各模型上测试原始图片
print("原始图片在各模型上的预测：")
print("-" * 40)
surrogate_pred, surrogate_conf = predict(surrogate_model, original_image)
print(f"替代模型 (ResNet18): 类别 {surrogate_pred}, 置信度 {surrogate_conf:.2%}")

for name, model in target_models.items():
    pred, conf = predict(model, original_image)
    print(f"目标模型 ({name}): 类别 {pred}, 置信度 {conf:.2%}")

第二部分：在替代模型上生成对抗样本

In [ ]:

# PGD 攻击函数
def pgd_attack(model, image, label, epsilon, alpha, num_steps):
    """在指定模型上执行 PGD 攻击"""
    adv_image = image.clone().unsqueeze(0)
    
    for step in range(num_steps):
        adv_image.requires_grad = True
        normalized = normalize(adv_image.squeeze(0)).unsqueeze(0)
        output = model(normalized)
        loss = nn.CrossEntropyLoss()(output, torch.tensor([label]))
        model.zero_grad()
        loss.backward()
        
        gradient = adv_image.grad.data
        adv_image = adv_image.detach() + alpha * gradient.sign()
        
        perturbation = torch.clamp(adv_image - image.unsqueeze(0), -epsilon, epsilon)
        adv_image = torch.clamp(image.unsqueeze(0) + perturbation, 0, 1)
    
    return adv_image.squeeze(0).detach()

print("PGD 攻击函数定义完成！")

In [ ]:

# 【填空 1】在替代模型上生成对抗样本
# 提示：使用 pgd_attack 函数，以 surrogate_model 为目标
# 参考答案：adversarial_image = pgd_attack(surrogate_model, original_image, surrogate_pred, epsilon=0.03, alpha=0.01, num_steps=20)

epsilon = 0.03
alpha = 0.01
num_steps = 20

# 在替代模型上生成对抗样本
adversarial_image = ___________________

# 验证在替代模型上的攻击效果
adv_pred, adv_conf = predict(surrogate_model, adversarial_image)
print(f"替代模型上的攻击结果：")
print(f"  原始预测：类别 {surrogate_pred}, 置信度 {surrogate_conf:.2%}")
print(f"  攻击后：  类别 {adv_pred}, 置信度 {adv_conf:.2%}")
print(f"  攻击{'成功 ✓' if adv_pred != surrogate_pred else '失败 ✗'}")

第三部分：测试迁移性

In [ ]:

# 【填空 2】测试对抗样本在目标模型上的迁移效果
# 提示：将在替代模型上生成的对抗样本送入目标模型，观察是否也能攻击成功

print("迁移攻击测试结果：")
print("=" * 60)
print(f"{'模型':<15} {'原始预测':<12} {'攻击后预测':<12} {'迁移结果'}")
print("-" * 60)

transfer_results = {}

for name, model in target_models.items():
    # 获取原始预测
    orig_pred, orig_conf = predict(model, original_image)
    
    # 【填空 2】获取对抗样本在目标模型上的预测
    # 参考答案：adv_pred, adv_conf = predict(model, adversarial_image)
    adv_pred, adv_conf = ___________________
    
    # 判断迁移是否成功（预测类别改变）
    success = orig_pred != adv_pred
    transfer_results[name] = success
    
    status = "✓ 成功" if success else "✗ 失败"
    print(f"{name:<15} 类别{orig_pred:<10} 类别{adv_pred:<10} {status}")

# 统计迁移成功率
success_count = sum(transfer_results.values())
total_count = len(transfer_results)
print("-" * 60)
print(f"迁移成功率：{success_count}/{total_count} ({success_count/total_count*100:.1f}%)")

In [ ]:

# 可视化迁移效果
fig, axes = plt.subplots(2, 3, figsize=(12, 8))

# 第一行：原始图片和对抗样本
axes[0, 0].imshow(original_image.permute(1, 2, 0).numpy())
axes[0, 0].set_title("原始图片")
axes[0, 0].axis('off')

axes[0, 1].imshow(adversarial_image.permute(1, 2, 0).numpy())
axes[0, 1].set_title("对抗样本\n(在 ResNet18 上生成)")
axes[0, 1].axis('off')

# 扰动可视化
perturbation = adversarial_image - original_image
perturbation_vis = (perturbation - perturbation.min()) / (perturbation.max() - perturbation.min())
axes[0, 2].imshow(perturbation_vis.permute(1, 2, 0).numpy())
axes[0, 2].set_title("对抗扰动")
axes[0, 2].axis('off')

# 第二行：各目标模型的结果
for idx, (name, model) in enumerate(target_models.items()):
    orig_pred, _ = predict(model, original_image)
    adv_pred, adv_conf = predict(model, adversarial_image)
    
    axes[1, idx].imshow(adversarial_image.permute(1, 2, 0).numpy())
    color = 'red' if orig_pred != adv_pred else 'black'
    status = "迁移成功" if orig_pred != adv_pred else "迁移失败"
    axes[1, idx].set_title(f"{name}\n{orig_pred}→{adv_pred} ({status})", color=color)
    axes[1, idx].axis('off')

plt.suptitle("黑盒迁移攻击结果", fontsize=14)
plt.tight_layout()
plt.show()

第四部分：提高迁移性的技巧

In [ ]:

# 【填空 3】使用更大的扰动来提高迁移性
# 提示：增大 epsilon 通常可以提高迁移成功率，但扰动也更明显

epsilon_values = [0.01, 0.03, 0.05, 0.1]

print("不同扰动大小对迁移性的影响：")
print("=" * 70)

for eps in epsilon_values:
    # 【填空 3】使用不同的 epsilon 生成对抗样本
    # 参考答案：adv = pgd_attack(surrogate_model, original_image, surrogate_pred, eps, 0.01, 20)
    adv = ___________________
    
    # 测试迁移效果
    success_count = 0
    for name, model in target_models.items():
        orig_pred, _ = predict(model, original_image)
        adv_pred, _ = predict(model, adv)
        if orig_pred != adv_pred:
            success_count += 1
    
    transfer_rate = success_count / len(target_models) * 100
    print(f"ε = {eps:.2f}: 迁移成功率 = {transfer_rate:.1f}% ({success_count}/{len(target_models)})")

In [ ]:

# 多模型集成攻击（提高迁移性的高级技巧）
def ensemble_attack(models, image, labels, epsilon, alpha, num_steps):
    """
    多模型集成攻击：同时考虑多个模型的梯度
    对多个模型都有效的扰动，更可能迁移到其他模型
    """
    adv_image = image.clone().unsqueeze(0)
    
    for step in range(num_steps):
        adv_image.requires_grad = True
        total_loss = 0
        
        # 累加多个模型的损失
        for model, label in zip(models, labels):
            normalized = normalize(adv_image.squeeze(0)).unsqueeze(0)
            output = model(normalized)
            loss = nn.CrossEntropyLoss()(output, torch.tensor([label]))
            total_loss += loss
        
        # 使用平均损失计算梯度
        avg_loss = total_loss / len(models)
        for model in models:
            model.zero_grad()
        avg_loss.backward()
        
        gradient = adv_image.grad.data
        adv_image = adv_image.detach() + alpha * gradient.sign()
        perturbation = torch.clamp(adv_image - image.unsqueeze(0), -epsilon, epsilon)
        adv_image = torch.clamp(image.unsqueeze(0) + perturbation, 0, 1)
    
    return adv_image.squeeze(0).detach()

# 使用两个模型的集成攻击
ensemble_models = [surrogate_model, target_models['ResNet34']]
ensemble_labels = [surrogate_pred, predict(target_models['ResNet34'], original_image)[0]]

ensemble_adv = ensemble_attack(ensemble_models, original_image, ensemble_labels, 0.03, 0.01, 20)

print("集成攻击 vs 单模型攻击的迁移效果：")
print("-" * 50)

for name, model in target_models.items():
    orig_pred, _ = predict(model, original_image)
    
    single_adv_pred, _ = predict(model, adversarial_image)
    ensemble_adv_pred, _ = predict(model, ensemble_adv)
    
    single_status = "✓" if orig_pred != single_adv_pred else "✗"
    ensemble_status = "✓" if orig_pred != ensemble_adv_pred else "✗"
    
    print(f"{name}: 单模型攻击 {single_status}, 集成攻击 {ensemble_status}")

实验总结

观察记录

请回答以下问题：

1. 迁移攻击的成功率如何？ 在替代模型上生成的对抗样本，对目标模型的迁移成功率是多少？

2. 哪些因素影响迁移性？ 扰动大小、模型架构相似度等因素如何影响迁移成功率？

3. 集成攻击有帮助吗？ 使用多个模型集成是否提高了迁移性？

核心概念回顾

- 迁移性：在一个模型上生成的对抗样本可能对其他模型也有效
- 黑盒攻击：不需要知道目标模型的细节
- 提高迁移性：增大扰动、多模型集成、输入多样性

---

下一个实验：实验 3.4 文本对抗攻击

实验总结

完成检查

完成本实验后，你应该已经：

使用不同的模型架构（如 ResNet、VGG、DenseNet）进行了迁移攻击实验
观察了对抗样本在不同模型之间的迁移成功率
实现了集成攻击方法，提高了迁移成功率
理解了为什么对抗样本具有跨模型迁移性
认识到黑盒攻击在实际场景中的可行性

延伸思考

为什么针对某个模型生成的对抗样本能够欺骗其他不同架构的模型？
观察你的实验结果，哪些模型之间的迁移成功率更高？为什么？
集成多个模型生成对抗样本为什么能够提高迁移性？
在真实的黑盒攻击场景中，攻击者如何选择替代模型？
从防御者的角度，了解迁移攻击后你会采取什么措施？

实验 3.3：迁移攻击

实验目标

实验前提

实验内容

实验 3.3：黑盒迁移攻击

实验目标

实验环境

预计时间：30 分钟

核心概念回顾

第一部分：环境准备

第二部分：在替代模型上生成对抗样本

第三部分：测试迁移性

第四部分：提高迁移性的技巧

实验总结

观察记录

核心概念回顾

实验总结

延伸思考

目录导航