编写程序识别服饰文案文化调性,自动划分国风,法式,日系,美式四类文案。 用 Python 构建服饰文案文化调性自动分类器通过 NLP 技术识别并划分国风、法式、日系、美式四类文案并以中立视角呈现分析结果。一、实际应用场景描述在《时尚产业与品牌创新》课程中品牌调性一致性是核心议题。具体表现为- 国风文案偏好诗意、意境、留白——如一袭青衣染就江南烟雨新中式美学淡雅如诗。- 法式文案偏好浪漫、慵懒、精致——如Parisian chic, effortless elegance法式慵懒自带高级感。- 日系文案偏好克制、功能性、氛围感——如less is more 的日式哲学干净利落的东京街头。- 美式文案偏好自信、直白、冲击力——如Own the roomBold, unapologetic, iconic。品牌面临核心问题我有 10 万条历史文案哪些符合品牌调性新品文案写出来是国风还是日系AI 能否自动分类二、引入痛点- 文案调性判断多依赖人工审核效率低且主观性强。- 多语言混排中英法日的文案库缺乏统一分类框架。- 缺乏可解释的判定依据——为什么这条被分为法式无法回答。⇒ 用 Python 构建基于关键词权重 TF-IDF 朴素贝叶斯的多语言文案分类器输出分类结果 判定依据。三、核心逻辑讲解1. 分类特征工程每类调性提取高频特征词 语言风格标记调性 核心关键词 语言风格特征国风 诗意、留白、水墨、禅意、青瓷、烟雨、新中式 四字短语多、对仗工整、意象密集法式 浪漫、慵懒、chic、effortless、巴黎、左岸 法语借词多、长句、形容词堆叠日系 极简、克制、wabi-sabi、留白、功能性、东京 短句多、助词多の/に/で、安静感美式 bold、iconic、fearless、statement、trendsetter 感叹号多、大写词多、短平快节奏2. 算法选择朴素贝叶斯P(类别|文案) ∝ P(类别) × Π P(词|类别)预测类别 argmax P(类别|文案)优势- 多分类场景稳定训练速度快- 可输出各特征词对分类的贡献度可解释性- 对短文本文案通常 10-50 词表现良好3. 混合语言处理策略中文文案 → jieba 分词 → 特征提取英文文案 → 空格分词 → 小写归一化法文文案 → 空格分词 → 小写归一化日文文案 → nagisa 分词 → 特征提取统一去停用词 → TF-IDF 向量化 → 朴素贝叶斯分类四、代码模块化text_style_classifier.py#!/usr/bin/env python3# -*- coding: utf-8 -*-text_style_classifier.py服饰文案文化调性自动分类器支持国风/法式/日系/美式四类文案识别依赖: numpy, pandas, matplotlib, scikit-learn安装: pip install numpy pandas matplotlib scikit-learnimport reimport numpy as npimport pandas as pdimport matplotlib.pyplot as pltfrom matplotlib import rcParamsfrom sklearn.feature_extraction.text import TfidfVectorizerfrom sklearn.naive_bayes import MultinomialNBfrom sklearn.model_selection import train_test_splitfrom sklearn.metrics import classification_report, confusion_matrix, accuracy_scorefrom sklearn.pipeline import Pipelinefrom collections import Counter# 中文字体设置rcParams[font.sans-serif] [Noto Sans CJK SC, SimHei, Microsoft YaHei]rcParams[axes.unicode_minus] False# ──────────────────────────────────────────────# 1. 调性词典模块# ──────────────────────────────────────────────class StyleDictionary:服饰文案文化调性关键词词典基于各文化审美特征与时尚传播语言习惯构建# 国风关键词GUOFENG [诗意, 意境, 留白, 水墨, 青瓷, 烟雨, 江南, 新中式,禅意, 东方, 古典, 雅致, 清雅, 国潮, 汉服, 旗袍,刺绣, 织锦, 敦煌, 飞天, 山水, 花鸟, 梅兰竹菊,淡雅, 温润, 古韵, 风雅, 锦绣, 丝竹, 墨色,poetic, oriental, chinoiserie, new chinese]# 法式关键词FRENCH [浪漫, 慵懒, chic, effortless, 巴黎, 左岸, 马赛,法式, 优雅, 精致, 复古, 女人味, 慵懒, 随性,romantic, parisian, chic, bohème, élégant, sophistiqué,avant-garde, couture, très, très chic, je ne sais quoi,art de vivre, savoir-faire]# 日系关键词JAPANESE [极简, 克制, wabi, sabi, 留白, 功能性, 东京,日式, 安静, 朴素, 侘寂, 枯山水, 禅, 静寂,minimal, tokyo, japanese, kimono, zen, serenity,transient, imperfect, aesthetic, harmony, muji,less is more, clean, neat]# 美式关键词AMERICAN [bold, iconic, fearless, statement, trendsetter,confident, fierce, unapologetic, legendary, timeless,streetwear, hip-hop, urban, grunge, preppy,all-american, classic, vintage, retro, denim,自信, 大胆, 无畏, 标志性, 街头, 经典]# 停用词中英文通用STOPWORDS {# 中文停用词的, 了, 在, 是, 我, 有, 和, 就, 不, 人,都, 一, 一个, 上, 也, 很, 到, 说, 要, 去,你, 会, 着, 没有, 看, 好, 自己, 这,# 英文停用词the, a, an, and, or, but, in, on, at, to,for, of, with, by, from, is, are, was, were,be, been, have, has, had, do, does, did, will,would, could, should, may, might, must, shall,this, that, these, those, it, its, they, them,we, you, he, she, his, her, our, your, their,not, no, so, up, out, if, about, who, get,which, go, me, when, make, can, like, time,just, very, now, new, because, people, such,only, way, thing, every, after, between, under,before, never, always, something, anything, everything,}classmethoddef get_all_dictionaries(cls) - dict:返回所有调性的关键词字典return {国风: cls.GUOFENG,法式: cls.FRENCH,日系: cls.JAPANESE,美式: cls.AMERICAN}# ──────────────────────────────────────────────# 2. 合成数据生成模块# ──────────────────────────────────────────────class SyntheticTextGenerator:生成模拟的服饰文案数据集每条文案带有明确的调性标签TEMPLATES {国风: [一袭{adj}染就{object}{poetic_end},{object}之上{adj}如诗意境悠远,新中式美学{adj}中见真章{object}间的微妙平衡,淡雅如{object}{adj}而不张扬东方韵味自在其中,以{object}为画布{adj}为笔墨书写当代国风篇章,青瓷般的{adj}烟雨江南的{object}一袭在身便有了诗意,古韵今风{adj}与{object}的完美邂逅,禅意入衣{object}间尽显东方{adj},水墨晕染的{object}{adj}得恰到好处,温润如玉{adj}似水这件{object}藏着江南的秘密,],法式: [Parisian {adj_en}, effortless {object_en} for the modern soul,Lart de vivre: {adj_en} {object_en} that whispers French elegance,Chic à la française: {adj_en} {object_en}, timeless and bold,Savoir-faire meets modern {object_en} — {adj_en}, always,Left Bank vibes: {adj_en} {object_en} with a touch of bohème,Très chic {object_en}, {adj_en} and unapologetically French,浪漫如巴黎{object}{adj}中透着法式慵懒,法式优雅{adj}的{object}effortless chic 的最高境界,马赛的阳光洒在{object}上{adj}得刚刚好,左岸咖啡馆里的{object}{adj}得让人心动,],日系: [{object}の美しさは、{adj}さにある,東京の{object}、{adj}な美意識,Less is more: {adj_en} {object_en} with Japanese precision,Wabi-sabi の心で作られた{object}、{adj}で静かな美,枯山水のように、{adj}で{object}に宿る禅,Minimalist {object_en}: {adj_en}, clean, and distinctly Tokyo,侘寂の{object}、{adj}で不完全な美,日式{object}、{adj}の中に見える機能美,静寂の{object}、{adj}で温かい,Clean lines, {adj_en} {object_en} — the Japanese way,],美式: [OWN THE ROOM in this {adj_en} {object_en}!,Bold. Fearless. {adj_en} {object_en} for the iconic you.,Street-ready {object_en} thats {adj_en}, unapologetic,Make a statement: {adj_en} {object_en} pure confidence,All-American {object_en}: {adj_en}, classic, and absolutely legendary,THIS is how you do {object_en} — {adj_en} and unforgettable!,大胆{object}{adj}到骨子里穿上就是街头王者,无畏的{object}{adj}且标志性定义你的风格,自信从{object}开始{adj}得不费吹灰之力,Iconic {object_en}: {adj_en}, fierce, trendsetting,],}# 填充词库ADJECTIVES_CN {国风: [淡雅, 清雅, 温润, 古韵, 雅致, 素净, 清冷,灵动, 飘逸, 婉约, 隽永, 空灵],法式: [慵懒, 浪漫, 精致, 优雅, 随性, 迷人, 细腻,妩媚, 洒脱, 从容, 梦幻],日系: [克制, 安静, 朴素, 干净, 纯粹, 素雅, 淡然,内敛, 简洁, 静谧, 温润],美式: [大胆, 自信, 无畏, 标志性, 经典, 传奇, 震撼,鲜明, 独特, 耀眼, 霸气],}OBJECTS_CN {国风: [裙裾, 衣襟, 袖口, 盘扣, 刺绣, 丝帛, 青衫,罗裙, 云肩, 襦裙, 披帛, 马面裙],法式: [连衣裙, 套装, 风衣, 衬衫, 半裙, 针织衫, 外套,吊带裙, 阔腿裤, 西装],日系: [衬衫, 裤装, 外套, 连衣裙, 针织, 制服, 风衣,裙装, 马甲, 工装裤],美式: [牛仔裤, T恤, 夹克, 球鞋, 卫衣, 短裙, 西装,大衣, 背心, 工装],}ADJECTIVES_EN {法式: [romantic, elegant, chic, effortless, timeless,sophisticated, bohème, iconic, bold, graceful],日系: [minimal, serene, tranquil, pure, clean, quiet,refined, subtle, austere, harmonious],美式: [bold, fierce, iconic, fearless, legendary,unapologetic, timeless, confident, statement, powerful],}OBJECTS_EN {法式: [dress, ensemble, trench, blouse, skirt, sweater,coat, slip dress, wide-leg pants, suit],日系: [shirt, trousers, jacket, dress, knit, uniform,trench, skirt, vest, cargo pants],美式: [jeans, tee, jacket, sneakers, hoodie, mini skirt,suit, coat, tank, workwear],}classmethoddef generate(cls, n_per_class: int 200, seed: int 42) - pd.DataFrame:生成带标签的文案数据集返回 DataFrame: [text, style, style_id]np.random.seed(seed)rows []style_id_map {国风: 0, 法式: 1, 日系: 2, 美式: 3}for style, templates in cls.TEMPLATES.items():for _ in range(n_per_class):template np.random.choice(templates)# 中文填充adj np.random.choice(cls.ADJECTIVES_CN[style])obj np.random.choice(cls.OBJECTS_CN[style])# 英文填充adj_en np.random.choice(cls.ADJECTIVES_EN.get(style, [beautiful]))obj_en np.random.choice(cls.OBJECTS_EN.get(style, [dress]))# 诗意结尾poetic_ends [诗意盎然, 韵味悠长, 意境天成, 风雅自来,美不胜收, 恰到好处]poetic_end np.random.choice(poetic_ends)text template.format(adjadj, objectobj,adj_enadj_en, object_enobj_en,poetic_endpoetic_end)rows.append({text: text,style: style,style_id: style_id_map[style]})df pd.DataFrame(rows)# 打乱顺序df df.sample(frac1, random_stateseed).reset_index(dropTrue)return df# ──────────────────────────────────────────────# 3. 文本预处理模块# ──────────────────────────────────────────────class TextPreprocessor:文本预处理清洗 分词 去停用词# 中文字符检测CN_PATTERN re.compile(r[\u4e00-\u9fff])# 日文字符检测JP_PATTERN re.compile(r[\u3040-\u309f\u30a0-\u30ff])# 英文字符检测EN_PATTERN re.compile(r[a-zA-Z])classmethoddef detect_language(cls, text: str) - str:检测文本主要语言cn_count len(cls.CN_PATTERN.findall(text))jp_count len(cls.JP_PATTERN.findall(text))en_count len(cls.EN_PATTERN.findall(text))total cn_count jp_count en_countif total 0:return unknownif cn_count / total 0.3:return zhelif jp_count / total 0.3:return jpelif en_count / total 0.5:return enelse:return mixedclassmethoddef tokenize(cls, text: str) - List[str]:简易分词不依赖外部库中文按字符切分 二元组日文按空格/の切分英文按空格切分tokens []# 中文提取所有中文字符 二元组cn_chars cls.CN_PATTERN.findall(text)for word in cn_chars:tokens.append(word)if len(word) 2:for i in range(len(word) - 1):tokens.append(word[i:i2])# 日文按空格和の切分jp_words re.split(r[\\sのにでがをはが], text)tokens.extend([w for w in jp_words if len(w) 1])# 英文提取单词en_words cls.EN_PATTERN.findall(text.lower())tokens.extend([w for w in en_words if len(w) 2])# 去停用词stopwords StyleDictionary.STOPWORDStokens [t for t in tokens if t not in stopwords and len(t) 1]return tokensclassmethoddef preprocess(cls, texts: pd.Series) - pd.Series:批量预处理return texts.apply(lambda t: .join(cls.tokenize(t)))# ──────────────────────────────────────────────# 4. 分类器训练与评估模块# ──────────────────────────────────────────────class StyleClassifier:服饰文案调性分类器STYLE_NAMES {0: 国风, 1: 法式, 2: 日系, 3: 美式}STYLE_COLORS {0: #E74C3C, 1: #3498DB, 2: #2ECC71, 3: #F39C12}def __init__(self):self.vectorizer TfidfVectorizer(max_features2000,ngram_range(1, 2),min_df2,max_df0.8)self.classifier MultinomialNB(alpha0.1)self.pipeline Pipeline([(tfidf, self.vectorizer),(nb, self.classifier)])self.is_trained Falsedef train(self, X_train: pd.Series, y_train: pd.Series):训练分类器X_processed TextPreprocessor.preprocess(X_train)self.pipeline.fit(X_processed, y_train)self.is_trained True# 记录训练集特征重要性feature_names self.vectorizer.get_feature_names_out()log_prob self.classifier.feature_log_prob_self.feature_importance {}for i, style_id in enumerate(self.classifier.classes_):style_name self.STYLE_NAMES.get(style_id, str(style_id))# 取 log 概率最高的特征top_indices np.argsort(log_prob[i])[-20:]self.feature_importance[style_name] [(feature_names[idx], round(np.exp(log_prob[i][idx]), 4))for idx in reversed(top_indices)]def predict(self, texts: pd.Series) - np.ndarray:预测调性类别if not self.is_trained:raise RuntimeError(分类器未训练请先调用 train())X_processed TextPreprocessor.preprocess(texts)return self.pipeline.predict(X_processed)def predict_proba(self, texts: pd.Series) - np.ndarray:预测各类别概率if not self.is_trained:raise RuntimeError(分类器未训练)X_processed TextPreprocessor.preprocess(texts)return self.pipeline.predict_proba(X_processed)def evaluate(self, X_test: pd.Series, y_test: pd.Series) - Dict:评估模型性能y_pred self.predict(X_test)accuracy accuracy_score(y_test, y_pred)report classification_report(y_test, y_pred,target_names[国风, 法式, 日系, 美式],output_dictTrue)cm confusion_matrix(y_test, y_pred)return {accuracy: accuracy,report: report,confusion_matrix: cm,y_pred: y_pred}def analyze_misclassifications(self,X_test: pd.Series,y_test: pd.Series,texts: pd.Series) - pd.DataFrame:分析分类错误的样本y_pred self.predict(X_test)misclassified texts[y_test ! y_pred].reset_index(dropTrue)true_labels y_test[y_test ! y_pred].reset_index(dropTrue)pred_labels pd.Series(y_pred[y_test ! y_pred]).reset_index(dropTrue)rows []for i in range(len(misclassified)):rows.append({text: misclassified.iloc[i],true_style: self.STYLE_NAMES.get(true_labels.iloc[i], ?),pred_style: self.STYLE_NAMES.get(pred_labels.iloc[i], ?)})return pd.DataFrame(rows)# ──────────────────────────────────────────────# 5. 可视化仪表盘模块# ──────────────────────────────────────────────class Dashboard:分类结果可视化仪表盘STYLE_COLORS {国风: #E74C3C, 法式: #3498DB,日系: #2ECC71, 美式: #F39C12}classmethoddef plot_dashboard(cls,classifier: StyleClassifier,eval_result: Dict,test_texts: pd.Series,train_df: pd.DataFrame,misclass_df: pd.DataFrame,filename: str style_classification_dashboard.png):fig plt.figure(figsize(22, 18))fig.suptitle(服饰文案文化调性自动分类 — 分析仪表盘,fontsize20, fontweightbold, y0.99)# ── 图1混淆矩阵 ──ax1 fig.add_subplot(2, 3, 1)cls._plot_confusion_matrix(ax1, eval_result[confusion_matrix])# ── 图2各类别准确率 ──ax2 fig.add_subplot(2, 3, 2)cls._plot_per_class_accuracy(ax2, eval_result[report])# ── 图3特征重要性词云替代水平柱状图 ──ax3 fig.add_subplot(2, 3, 3)cls._plot_feature_importance(ax3, classifier.feature_importance)# ── 图4语言分布 ──ax4 fig.add_subplot(2, 3, 4)cls._plot_language_distribution(ax4, train_df)# ── 图5分类错误分析 ──ax5 fig.add_subplot(2, 3, 5)cls._plot_misclassification(ax5, misclass_df)# ── 图6样本预测概率分布 ──ax6 fig.add_subplot(2, 3, 6)cls._plot_probability_distribution(ax6, classifier, test_texts)plt.tight_layout(rect[0, 0, 1, 0.96])plt.savefig(filename, dpi150, bbox_inchestight)plt.show()print(f[INFO] 仪表盘已保存: {filename})classmethoddef _plot_confusion_matrix(cls, ax, cm: np.ndarray):混淆矩阵热力图im ax.imshow(cm, cmapYlOrRd, aspectauto)labels [国风, 法式, 日系, 美式]ax.set_xticks(range(4))ax.set_yticks(range(4))ax.set_xticklabels(labels, fontsize9)ax.set_yticklabels(labels, fontsize9)ax.set_xlabel(预测类别)ax.set_ylabel(真实类别)ax.set_title(混淆矩阵, fontsize13, fontweightbold)for i in range(4):for j in range(4):color white if cm[i, j] cm.max() * 0.5 else blackax.text(j, i, str(cm[i, j]), hacenter, vacenter,colorcolor, fontsize11, fontweightbold)plt.colorbar(im, axax, shrink0.8)classmethoddef _plot_per_class_accuracy(cls, ax, report: Dict):各类别精确率/召回率/F1styles [国风, 法式, 日系, 美式]metrics [precision, recall, f1-score]x np.arange(len(styles))width 0.25for i, metric in enumerate(metrics):values [report[s][metric] for s in styles]ax.bar(x i * width, values, width,labelmetric, color[#3498db, #e74c3c, #2ecc71][i])ax.set_xticks(x width)ax.set_xticklabels(styles, fontsize9)ax.set_ylabel(Score)ax.set_title(各类别分类性能, fontsize13, fontweightbold)ax.legend(fontsize8)ax.grid(axisy, alpha0.3)ax.set_ylim(0, 1.15)classmethoddef _plot_feature_importance(cls, ax, feat_imp: Dict):各调性 Top 特征词styles list(feat_imp.keys())[:4]n_words 10y_pos np.arange(n_words)colors [#E74C3C, #3498DB, #2ECC71, #F39C12]for i, (style, words) in enumerate(feat_imp.items()):if i 4:breaktop_words words[:n_words]scores [w[1] for w in top_words][::-1]labels [w[0] for w in top_words][::-1]offset (i - 1.5) * (n_words 0.5)ax.barh(y_pos offset, scores, height0.7,colorcolors[i], alpha0.7, labelstyle)ax.set_yticks(y_pos)all_labels []for style in styles:all_labels.extend([w[0] for w in feat_imp[style][:n_利用 AI解决实际问题如果你觉得这个工具好用欢迎关注长安牧笛