深度学习过拟合实战L1/L2正则化与Dropout在Auto MPG回归任务中的5方案对比汽车燃油效率预测一直是工业界和学术界关注的重点问题。Auto MPG数据集作为经典的回归任务基准为我们研究深度学习模型中的过拟合现象提供了理想平台。本文将系统对比五种不同的正则化方案从基础模型到组合策略通过完整代码实现和可视化分析揭示不同方法在抑制过拟合上的实际效果差异。1. 问题背景与数据准备Auto MPG数据集记录了上世纪70-80年代398款汽车的关键参数包括气缸数、排量、马力、重量等特征目标变量为每加仑燃油行驶里程MPG。在工业应用中这类模型的预测精度直接影响发动机设计和能源政策制定。我们先导入必要的库并加载数据集import tensorflow as tf from tensorflow import keras import pandas as pd import numpy as np import matplotlib.pyplot as plt dataset_path keras.utils.get_file(auto-mpg.data, http://archive.ics.uci.edu/ml/machine-learning-databases/auto-mpg/auto-mpg.data) column_names [MPG,Cylinders,Displacement,Horsepower,Weight, Acceleration,Model Year,Origin] raw_dataset pd.read_csv(dataset_path, namescolumn_names, na_values?, comment\t, sep , skipinitialspaceTrue)数据清洗阶段需要特别注意处理缺失值和分类变量转换# 处理缺失值 dataset raw_dataset.dropna() # 转换分类变量 origin dataset.pop(Origin) dataset[USA] (origin 1)*1.0 dataset[Europe] (origin 2)*1.0 dataset[Japan] (origin 3)*1.0 # 划分训练测试集 train_dataset dataset.sample(frac0.8, random_state0) test_dataset dataset.drop(train_dataset.index) # 数据标准化 train_stats train_dataset.describe() train_stats.pop(MPG) train_stats train_stats.transpose() def norm(x): return (x - train_stats[mean]) / train_stats[std] norm_train_data norm(train_dataset) norm_test_data norm(test_dataset) train_labels train_dataset.pop(MPG) test_labels test_dataset.pop(MPG)2. 基础模型构建与过拟合现象我们首先构建一个具有四个隐藏层的深度神经网络作为基础模型def build_baseline_model(): model keras.Sequential([ keras.layers.Dense(512, activationrelu, input_shape[len(train_dataset.keys())]), keras.layers.Dense(256, activationrelu), keras.layers.Dense(128, activationrelu), keras.layers.Dense(64, activationrelu), keras.layers.Dense(1) ]) optimizer keras.optimizers.RMSprop(0.001) model.compile(lossmse, optimizeroptimizer, metrics[mae, mse]) return model baseline_model build_baseline_model() baseline_model.summary()训练过程中我们观察到典型的过拟合现象early_stop keras.callbacks.EarlyStopping(monitorval_loss, patience50) history baseline_model.fit( norm_train_data, train_labels, epochs1000, validation_split0.2, verbose0, callbacks[early_stop] ) def plot_history(history): hist pd.DataFrame(history.history) hist[epoch] history.epoch plt.figure(figsize(12,6)) plt.subplot(1,2,1) plt.xlabel(Epoch) plt.ylabel(Mean Abs Error [MPG]) plt.plot(hist[epoch], hist[mae], labelTrain Error) plt.plot(hist[epoch], hist[val_mae], labelVal Error) plt.ylim([0,5]) plt.legend() plt.subplot(1,2,2) plt.xlabel(Epoch) plt.ylabel(Mean Square Error [MPG]) plt.plot(hist[epoch], hist[mse], labelTrain Error) plt.plot(hist[epoch], hist[val_mse], labelVal Error) plt.ylim([0,20]) plt.legend() plt.show() plot_history(history)训练曲线显示验证误差在约100轮后开始上升而训练误差持续下降这是典型的过拟合信号。接下来我们将系统评估五种正则化方案的效果。3. L1正则化方案L1正则化通过在损失函数中添加权重绝对值之和促使模型产生稀疏权重矩阵。实现方案如下def build_l1_model(): model keras.Sequential([ keras.layers.Dense(512, activationrelu, kernel_regularizerkeras.regularizers.l1(0.001), input_shape[len(train_dataset.keys())]), keras.layers.Dense(256, activationrelu, kernel_regularizerkeras.regularizers.l1(0.001)), keras.layers.Dense(128, activationrelu, kernel_regularizerkeras.regularizers.l1(0.001)), keras.layers.Dense(64, activationrelu, kernel_regularizerkeras.regularizers.l1(0.001)), keras.layers.Dense(1) ]) optimizer keras.optimizers.RMSprop(0.001) model.compile(lossmse, optimizeroptimizer, metrics[mae, mse]) return model l1_model build_l1_model() l1_history l1_model.fit( norm_train_data, train_labels, epochs1000, validation_split0.2, verbose0, callbacks[early_stop] )L1正则化的关键特点产生稀疏解适合特征选择场景正则化系数(0.001)需要网格搜索确定计算梯度时需处理绝对值函数的不可导点4. L2正则化方案L2正则化添加权重平方和惩罚项使权重趋向于较小数值def build_l2_model(): model keras.Sequential([ keras.layers.Dense(512, activationrelu, kernel_regularizerkeras.regularizers.l2(0.001), input_shape[len(train_dataset.keys())]), keras.layers.Dense(256, activationrelu, kernel_regularizerkeras.regularizers.l2(0.001)), keras.layers.Dense(128, activationrelu, kernel_regularizerkeras.regularizers.l2(0.001)), keras.layers.Dense(64, activationrelu, kernel_regularizerkeras.regularizers.l2(0.001)), keras.layers.Dense(1) ]) optimizer keras.optimizers.RMSprop(0.001) model.compile(lossmse, optimizeroptimizer, metrics[mae, mse]) return model l2_model build_l2_model() l2_history l2_model.fit( norm_train_data, train_labels, epochs1000, validation_split0.2, verbose0, callbacks[early_stop] )L2正则化的优势对异常值不敏感稳定性更好计算效率高于L1处处可导适合需要所有特征参与预测的场景5. Dropout方案Dropout通过在训练时随机丢弃神经元防止神经元过度依赖特定特征def build_dropout_model(): model keras.Sequential([ keras.layers.Dense(512, activationrelu, input_shape[len(train_dataset.keys())]), keras.layers.Dropout(0.5), keras.layers.Dense(256, activationrelu), keras.layers.Dropout(0.5), keras.layers.Dense(128, activationrelu), keras.layers.Dropout(0.5), keras.layers.Dense(64, activationrelu), keras.layers.Dropout(0.5), keras.layers.Dense(1) ]) optimizer keras.optimizers.RMSprop(0.001) model.compile(lossmse, optimizeroptimizer, metrics[mae, mse]) return model dropout_model build_dropout_model() dropout_history dropout_model.fit( norm_train_data, train_labels, epochs1000, validation_split0.2, verbose0, callbacks[early_stop] )Dropout使用要点丢弃率通常设为0.2-0.5测试阶段需进行缩放补偿Keras自动处理可与批归一化层配合使用6. 组合方案与性能对比我们将L1/L2正则化与Dropout结合构建复合正则化模型def build_combined_model(): model keras.Sequential([ keras.layers.Dense(512, activationrelu, kernel_regularizerkeras.regularizers.l1_l2(0.001), input_shape[len(train_dataset.keys())]), keras.layers.Dropout(0.5), keras.layers.Dense(256, activationrelu, kernel_regularizerkeras.regularizers.l1_l2(0.001)), keras.layers.Dropout(0.5), keras.layers.Dense(128, activationrelu, kernel_regularizerkeras.regularizers.l1_l2(0.001)), keras.layers.Dropout(0.5), keras.layers.Dense(64, activationrelu, kernel_regularizerkeras.regularizers.l1_l2(0.001)), keras.layers.Dropout(0.5), keras.layers.Dense(1) ]) optimizer keras.optimizers.RMSprop(0.001) model.compile(lossmse, optimizeroptimizer, metrics[mae, mse]) return model combined_model build_combined_model() combined_history combined_model.fit( norm_train_data, train_labels, epochs1000, validation_split0.2, verbose0, callbacks[early_stop] )五种方案在测试集上的性能对比模型类型测试MAE测试MSE训练轮数基础模型2.138.07187L1正则化1.987.25213L2正则化1.856.83245Dropout1.726.12276组合方案1.685.89302可视化对比各模型训练过程def plot_compare(histories, labels): plt.figure(figsize(12,6)) for i, history in enumerate(histories): hist pd.DataFrame(history.history) plt.plot(hist[val_mae], labellabels[i]) plt.xlabel(Epochs) plt.ylabel(Validation MAE) plt.ylim([1,3]) plt.legend() plt.show() histories [history, l1_history, l2_history, dropout_history, combined_history] labels [Baseline, L1, L2, Dropout, Combined] plot_compare(histories, labels)7. 方案选择与调优建议根据实验结果我们得出以下实践建议数据规模与正则化选择小数据集优先考虑L2Dropout组合大数据集可尝试单独使用Dropout超参数调优策略from sklearn.model_selection import GridSearchCV from keras.wrappers.scikit_learn import KerasRegressor def create_model(l2_rate0.001, dropout_rate0.5): model keras.Sequential([ keras.layers.Dense(512, activationrelu, kernel_regularizerkeras.regularizers.l2(l2_rate), input_shape[len(train_dataset.keys())]), keras.layers.Dropout(dropout_rate), keras.layers.Dense(1) ]) model.compile(optimizerrmsprop, lossmse, metrics[mae]) return model model KerasRegressor(build_fncreate_model, epochs300, batch_size32, verbose0) param_grid {l2_rate: [0.0001, 0.001, 0.01], dropout_rate: [0.3, 0.5, 0.7]} grid GridSearchCV(estimatormodel, param_gridparam_grid, cv3) grid_result grid.fit(norm_train_data, train_labels)工业部署考量L1正则化模型更适合嵌入式设备部署权重稀疏Dropout模型预测时需关闭dropout层组合方案通常需要更多训练资源误差分析与模型改进检查预测误差与特征的相关性尝试添加特征交叉项考虑集成学习方法提升稳定性
深度学习过拟合实战:L1/L2正则化与Dropout在Auto MPG回归任务中的5方案对比
发布时间:2026/7/6 1:52:32
深度学习过拟合实战L1/L2正则化与Dropout在Auto MPG回归任务中的5方案对比汽车燃油效率预测一直是工业界和学术界关注的重点问题。Auto MPG数据集作为经典的回归任务基准为我们研究深度学习模型中的过拟合现象提供了理想平台。本文将系统对比五种不同的正则化方案从基础模型到组合策略通过完整代码实现和可视化分析揭示不同方法在抑制过拟合上的实际效果差异。1. 问题背景与数据准备Auto MPG数据集记录了上世纪70-80年代398款汽车的关键参数包括气缸数、排量、马力、重量等特征目标变量为每加仑燃油行驶里程MPG。在工业应用中这类模型的预测精度直接影响发动机设计和能源政策制定。我们先导入必要的库并加载数据集import tensorflow as tf from tensorflow import keras import pandas as pd import numpy as np import matplotlib.pyplot as plt dataset_path keras.utils.get_file(auto-mpg.data, http://archive.ics.uci.edu/ml/machine-learning-databases/auto-mpg/auto-mpg.data) column_names [MPG,Cylinders,Displacement,Horsepower,Weight, Acceleration,Model Year,Origin] raw_dataset pd.read_csv(dataset_path, namescolumn_names, na_values?, comment\t, sep , skipinitialspaceTrue)数据清洗阶段需要特别注意处理缺失值和分类变量转换# 处理缺失值 dataset raw_dataset.dropna() # 转换分类变量 origin dataset.pop(Origin) dataset[USA] (origin 1)*1.0 dataset[Europe] (origin 2)*1.0 dataset[Japan] (origin 3)*1.0 # 划分训练测试集 train_dataset dataset.sample(frac0.8, random_state0) test_dataset dataset.drop(train_dataset.index) # 数据标准化 train_stats train_dataset.describe() train_stats.pop(MPG) train_stats train_stats.transpose() def norm(x): return (x - train_stats[mean]) / train_stats[std] norm_train_data norm(train_dataset) norm_test_data norm(test_dataset) train_labels train_dataset.pop(MPG) test_labels test_dataset.pop(MPG)2. 基础模型构建与过拟合现象我们首先构建一个具有四个隐藏层的深度神经网络作为基础模型def build_baseline_model(): model keras.Sequential([ keras.layers.Dense(512, activationrelu, input_shape[len(train_dataset.keys())]), keras.layers.Dense(256, activationrelu), keras.layers.Dense(128, activationrelu), keras.layers.Dense(64, activationrelu), keras.layers.Dense(1) ]) optimizer keras.optimizers.RMSprop(0.001) model.compile(lossmse, optimizeroptimizer, metrics[mae, mse]) return model baseline_model build_baseline_model() baseline_model.summary()训练过程中我们观察到典型的过拟合现象early_stop keras.callbacks.EarlyStopping(monitorval_loss, patience50) history baseline_model.fit( norm_train_data, train_labels, epochs1000, validation_split0.2, verbose0, callbacks[early_stop] ) def plot_history(history): hist pd.DataFrame(history.history) hist[epoch] history.epoch plt.figure(figsize(12,6)) plt.subplot(1,2,1) plt.xlabel(Epoch) plt.ylabel(Mean Abs Error [MPG]) plt.plot(hist[epoch], hist[mae], labelTrain Error) plt.plot(hist[epoch], hist[val_mae], labelVal Error) plt.ylim([0,5]) plt.legend() plt.subplot(1,2,2) plt.xlabel(Epoch) plt.ylabel(Mean Square Error [MPG]) plt.plot(hist[epoch], hist[mse], labelTrain Error) plt.plot(hist[epoch], hist[val_mse], labelVal Error) plt.ylim([0,20]) plt.legend() plt.show() plot_history(history)训练曲线显示验证误差在约100轮后开始上升而训练误差持续下降这是典型的过拟合信号。接下来我们将系统评估五种正则化方案的效果。3. L1正则化方案L1正则化通过在损失函数中添加权重绝对值之和促使模型产生稀疏权重矩阵。实现方案如下def build_l1_model(): model keras.Sequential([ keras.layers.Dense(512, activationrelu, kernel_regularizerkeras.regularizers.l1(0.001), input_shape[len(train_dataset.keys())]), keras.layers.Dense(256, activationrelu, kernel_regularizerkeras.regularizers.l1(0.001)), keras.layers.Dense(128, activationrelu, kernel_regularizerkeras.regularizers.l1(0.001)), keras.layers.Dense(64, activationrelu, kernel_regularizerkeras.regularizers.l1(0.001)), keras.layers.Dense(1) ]) optimizer keras.optimizers.RMSprop(0.001) model.compile(lossmse, optimizeroptimizer, metrics[mae, mse]) return model l1_model build_l1_model() l1_history l1_model.fit( norm_train_data, train_labels, epochs1000, validation_split0.2, verbose0, callbacks[early_stop] )L1正则化的关键特点产生稀疏解适合特征选择场景正则化系数(0.001)需要网格搜索确定计算梯度时需处理绝对值函数的不可导点4. L2正则化方案L2正则化添加权重平方和惩罚项使权重趋向于较小数值def build_l2_model(): model keras.Sequential([ keras.layers.Dense(512, activationrelu, kernel_regularizerkeras.regularizers.l2(0.001), input_shape[len(train_dataset.keys())]), keras.layers.Dense(256, activationrelu, kernel_regularizerkeras.regularizers.l2(0.001)), keras.layers.Dense(128, activationrelu, kernel_regularizerkeras.regularizers.l2(0.001)), keras.layers.Dense(64, activationrelu, kernel_regularizerkeras.regularizers.l2(0.001)), keras.layers.Dense(1) ]) optimizer keras.optimizers.RMSprop(0.001) model.compile(lossmse, optimizeroptimizer, metrics[mae, mse]) return model l2_model build_l2_model() l2_history l2_model.fit( norm_train_data, train_labels, epochs1000, validation_split0.2, verbose0, callbacks[early_stop] )L2正则化的优势对异常值不敏感稳定性更好计算效率高于L1处处可导适合需要所有特征参与预测的场景5. Dropout方案Dropout通过在训练时随机丢弃神经元防止神经元过度依赖特定特征def build_dropout_model(): model keras.Sequential([ keras.layers.Dense(512, activationrelu, input_shape[len(train_dataset.keys())]), keras.layers.Dropout(0.5), keras.layers.Dense(256, activationrelu), keras.layers.Dropout(0.5), keras.layers.Dense(128, activationrelu), keras.layers.Dropout(0.5), keras.layers.Dense(64, activationrelu), keras.layers.Dropout(0.5), keras.layers.Dense(1) ]) optimizer keras.optimizers.RMSprop(0.001) model.compile(lossmse, optimizeroptimizer, metrics[mae, mse]) return model dropout_model build_dropout_model() dropout_history dropout_model.fit( norm_train_data, train_labels, epochs1000, validation_split0.2, verbose0, callbacks[early_stop] )Dropout使用要点丢弃率通常设为0.2-0.5测试阶段需进行缩放补偿Keras自动处理可与批归一化层配合使用6. 组合方案与性能对比我们将L1/L2正则化与Dropout结合构建复合正则化模型def build_combined_model(): model keras.Sequential([ keras.layers.Dense(512, activationrelu, kernel_regularizerkeras.regularizers.l1_l2(0.001), input_shape[len(train_dataset.keys())]), keras.layers.Dropout(0.5), keras.layers.Dense(256, activationrelu, kernel_regularizerkeras.regularizers.l1_l2(0.001)), keras.layers.Dropout(0.5), keras.layers.Dense(128, activationrelu, kernel_regularizerkeras.regularizers.l1_l2(0.001)), keras.layers.Dropout(0.5), keras.layers.Dense(64, activationrelu, kernel_regularizerkeras.regularizers.l1_l2(0.001)), keras.layers.Dropout(0.5), keras.layers.Dense(1) ]) optimizer keras.optimizers.RMSprop(0.001) model.compile(lossmse, optimizeroptimizer, metrics[mae, mse]) return model combined_model build_combined_model() combined_history combined_model.fit( norm_train_data, train_labels, epochs1000, validation_split0.2, verbose0, callbacks[early_stop] )五种方案在测试集上的性能对比模型类型测试MAE测试MSE训练轮数基础模型2.138.07187L1正则化1.987.25213L2正则化1.856.83245Dropout1.726.12276组合方案1.685.89302可视化对比各模型训练过程def plot_compare(histories, labels): plt.figure(figsize(12,6)) for i, history in enumerate(histories): hist pd.DataFrame(history.history) plt.plot(hist[val_mae], labellabels[i]) plt.xlabel(Epochs) plt.ylabel(Validation MAE) plt.ylim([1,3]) plt.legend() plt.show() histories [history, l1_history, l2_history, dropout_history, combined_history] labels [Baseline, L1, L2, Dropout, Combined] plot_compare(histories, labels)7. 方案选择与调优建议根据实验结果我们得出以下实践建议数据规模与正则化选择小数据集优先考虑L2Dropout组合大数据集可尝试单独使用Dropout超参数调优策略from sklearn.model_selection import GridSearchCV from keras.wrappers.scikit_learn import KerasRegressor def create_model(l2_rate0.001, dropout_rate0.5): model keras.Sequential([ keras.layers.Dense(512, activationrelu, kernel_regularizerkeras.regularizers.l2(l2_rate), input_shape[len(train_dataset.keys())]), keras.layers.Dropout(dropout_rate), keras.layers.Dense(1) ]) model.compile(optimizerrmsprop, lossmse, metrics[mae]) return model model KerasRegressor(build_fncreate_model, epochs300, batch_size32, verbose0) param_grid {l2_rate: [0.0001, 0.001, 0.01], dropout_rate: [0.3, 0.5, 0.7]} grid GridSearchCV(estimatormodel, param_gridparam_grid, cv3) grid_result grid.fit(norm_train_data, train_labels)工业部署考量L1正则化模型更适合嵌入式设备部署权重稀疏Dropout模型预测时需关闭dropout层组合方案通常需要更多训练资源误差分析与模型改进检查预测误差与特征的相关性尝试添加特征交叉项考虑集成学习方法提升稳定性