qwen3-4b-structured-output-20260102test_lora
This repository provides a LoRA adapter fine-tuned from unsloth/Qwen3-4B-Instruct-2507 using Unsloth (QLoRA, 4-bit base).
The adapter is optimized for structured output generation, including format conversion and information extraction tasks that require high output consistency.
⚠️ This repository contains LoRA adapter weights only (PEFT). The base model must be loaded separately.
Model Overview
- Base model: unsloth/Qwen3-4B-Instruct-2507
- Fine-tuning method: QLoRA (via Unsloth)
- Adapter type: LoRA (PEFT)
- Maximum sequence length: 4096
- Training objective: Improve assistant-side generation quality for structured and schema-like outputs.
Trainable Parameters
- Trainable parameters (LoRA only): ~33M
- Base model parameters: ~4B
- Trainable ratio: ~0.8%
Only LoRA adapter weights are trained; the base model remains frozen. This design ensures:
- efficient fine-tuning,
- low storage footprint,
- and safe reuse of the original base model.
Training Data
- Dataset: daichira/structeval-t-sft-v4-massive
- Data format: OpenAI-style
messages - Filtering rule: Samples without a non-empty
assistantturn are removed.
Supervision strategy (assistant-only loss)
This model is trained using assistant-only supervision:
- Conversations are rendered using the model's chat template.
- Tokens corresponding to system / user instructions are masked (
label = -100). - Loss is computed only on assistant response tokens.
- The prompt–response boundary is estimated by comparing:
- tokenized full conversation text, and
- tokenized prefix text ending immediately before assistant generation, using a maximum suffix–prefix overlap heuristic.
- Samples where the boundary cannot be reliably determined are fully masked and do not contribute to training loss.
This focuses learning on output correctness and structural fidelity.
Training Configuration (Key Parameters)
- Epochs: 0.2
- Learning rate: 2e-4
- Per-device batch size: 2
- Gradient accumulation steps: 8 → Effective global batch size: 16
- Warmup ratio: 0.03
- Weight decay: 0.01
- Optimizer: AdamW (8-bit, via Unsloth)
- Evaluation: Enabled (shuffle + slice split; no
mapused)
LoRA Configuration
- Rank (r): 16
- Alpha: 32
- Dropout: 0.0
- Target modules: Attention and MLP projection layers
(
q_proj,k_proj,v_proj,o_proj,gate_proj,up_proj,down_proj)
Reproducibility
To ensure reproducibility, the following constraints are enforced:
- Random seed: 42
- Max sequence length: 4096
- Data splitting: Shuffle + index slicing (no
datasets.map) - Loss masking: Deterministic assistant-only masking
- Frameworks:
- Transformers (>= 4.57)
- PEFT
- Unsloth
- PyTorch (CUDA)
Exact numerical results may vary slightly depending on GPU type and low-level CUDA kernels, but training behavior and trends should remain consistent.
Usage
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
import torch
base_id = "unsloth/Qwen3-4B-Instruct-2507"
adapter_id = "daichira/qwen3-4b-structured-output-20260102test_lora"
tokenizer = AutoTokenizer.from_pretrained(base_id, use_fast=True)
model = AutoModelForCausalLM.from_pretrained(
base_id,
torch_dtype=torch.float16,
device_map="auto"
)
model = PeftModel.from_pretrained(model, adapter_id)
model.eval()
Intended Use
- Structured text generation (JSON, schema-aligned outputs)
- Format conversion and normalization
- Information extraction pipelines
- Baseline adapter for structured-output–focused LLM experiments
Limitations
- Output validity is not guaranteed for underspecified prompts.
- Optimized for assistant output quality rather than dialogue modeling.
- General-purpose reasoning performance is not the primary target.
License
- Other
- Inherits the license of the base model: unsloth/Qwen3-4B-Instruct-2507
Acknowledgements
- Qwen / Alibaba for the base model
- Unsloth for efficient QLoRA training
- Hugging Face Transformers & PEFT ecosystem
- Downloads last month
- 9
Model tree for daichira/qwen3-4b-structured-output-20260102test_lora
Base model
Qwen/Qwen3-4B-Instruct-2507