时尚佛罗伦萨:微调Florence-2实现结构化时尚属性提取 / Fashion Florence: Fine-Tuning Florence-2 for Structured Fashion Attribute Extraction
1️⃣ 一句话总结
本文通过LoRA微调Florence-2视觉语言模型,开发出能直接从服装照片中提取类别、颜色、材质等结构化JSON数据的轻量级AI系统,在更小参数规模下超越了GPT-4o等大型模型的提取精度,并已集成到开源服装推荐系统中。
We present Fashion Florence, a Florence-2 vision-language model fine-tuned with LoRA to extract structured fashion attributes from clothing images. Given a single photograph, the model generates a JSON object containing category, color, material, style tags, and occasion tags, structured output suitable for direct programmatic consumption by downstream recommendation and retrieval systems. Fine-tuning data is derived from the iMaterialist Fashion dataset (228 labels), where we collapse fine-grained annotations into a compact 6-category, 16-color, 19-style schema via rule-based label engineering. We apply LoRA (r=16, alpha=32) to all decoder linear layers, training for 3 epochs on 3,688 examples. On a held-out test set of 461 images, Fashion Florence achieves 94.6% category accuracy and 63.0% material accuracy, compared to 89.3% / 43.3% for GPT-4o-mini and 87.4% for Gemini 2.5 Flash. Fashion Florence produces valid JSON in 99.8% of outputs while running at 0.77B parameters on a single GPU at zero marginal inference cost. Style tag F1 reaches 0.753 vs. 0.612 (Gemini) and 0.398 (GPT-4o-mini). The model is deployed as a Hugging Face Space and integrated into Loom, an open-source outfit recommendation system.
时尚佛罗伦萨:微调Florence-2实现结构化时尚属性提取 / Fashion Florence: Fine-Tuning Florence-2 for Structured Fashion Attribute Extraction
本文通过LoRA微调Florence-2视觉语言模型,开发出能直接从服装照片中提取类别、颜色、材质等结构化JSON数据的轻量级AI系统,在更小参数规模下超越了GPT-4o等大型模型的提取精度,并已集成到开源服装推荐系统中。
源自 arXiv: 2605.09827