qwen3-4b-structured-output-20260102test_lora

This repository provides a LoRA adapter fine-tuned from unsloth/Qwen3-4B-Instruct-2507 using Unsloth (QLoRA, 4-bit base).

The adapter is optimized for structured output generation, including format conversion and information extraction tasks that require high output consistency.

⚠️ This repository contains LoRA adapter weights only (PEFT). The base model must be loaded separately.

Model Overview

Base model: unsloth/Qwen3-4B-Instruct-2507
Fine-tuning method: QLoRA (via Unsloth)
Adapter type: LoRA (PEFT)
Maximum sequence length: 4096
Training objective: Improve assistant-side generation quality for structured and schema-like outputs.

Trainable Parameters

Trainable parameters (LoRA only): ~33M
Base model parameters: ~4B
Trainable ratio: ~0.8%

Only LoRA adapter weights are trained; the base model remains frozen. This design ensures:

efficient fine-tuning,
low storage footprint,
and safe reuse of the original base model.

Training Data

Dataset: daichira/structeval-t-sft-v4-massive
Data format: OpenAI-style messages
Filtering rule: Samples without a non-empty assistant turn are removed.

Supervision strategy (assistant-only loss)

This model is trained using assistant-only supervision:

Conversations are rendered using the model's chat template.
Tokens corresponding to system / user instructions are masked (label = -100).
Loss is computed only on assistant response tokens.
The prompt–response boundary is estimated by comparing:
- tokenized full conversation text, and
- tokenized prefix text ending immediately before assistant generation, using a maximum suffix–prefix overlap heuristic.
Samples where the boundary cannot be reliably determined are fully masked and do not contribute to training loss.

This focuses learning on output correctness and structural fidelity.

Training Configuration (Key Parameters)

Epochs: 0.2
Learning rate: 2e-4
Per-device batch size: 2
Gradient accumulation steps: 8 → Effective global batch size: 16
Warmup ratio: 0.03
Weight decay: 0.01
Optimizer: AdamW (8-bit, via Unsloth)
Evaluation: Enabled (shuffle + slice split; no map used)

LoRA Configuration

Rank (r): 16
Alpha: 32
Dropout: 0.0
Target modules: Attention and MLP projection layers (q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj)

Reproducibility

To ensure reproducibility, the following constraints are enforced:

Random seed: 42
Max sequence length: 4096
Data splitting: Shuffle + index slicing (no datasets.map)
Loss masking: Deterministic assistant-only masking
Frameworks:
- Transformers (>= 4.57)
- PEFT
- Unsloth
- PyTorch (CUDA)

Exact numerical results may vary slightly depending on GPU type and low-level CUDA kernels, but training behavior and trends should remain consistent.

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
import torch

base_id = "unsloth/Qwen3-4B-Instruct-2507"
adapter_id = "daichira/qwen3-4b-structured-output-20260102test_lora"

tokenizer = AutoTokenizer.from_pretrained(base_id, use_fast=True)
model = AutoModelForCausalLM.from_pretrained(
    base_id,
    torch_dtype=torch.float16,
    device_map="auto"
)

model = PeftModel.from_pretrained(model, adapter_id)
model.eval()

Intended Use

Structured text generation (JSON, schema-aligned outputs)
Format conversion and normalization
Information extraction pipelines
Baseline adapter for structured-output–focused LLM experiments

Limitations

Output validity is not guaranteed for underspecified prompts.
Optimized for assistant output quality rather than dialogue modeling.
General-purpose reasoning performance is not the primary target.

License

Other
Inherits the license of the base model: unsloth/Qwen3-4B-Instruct-2507

Acknowledgements

Qwen / Alibaba for the base model
Unsloth for efficient QLoRA training
Hugging Face Transformers & PEFT ecosystem

Downloads last month: 9

Model tree for daichira/qwen3-4b-structured-output-20260102test_lora

Base model

Qwen/Qwen3-4B-Instruct-2507

Finetuned

unsloth/Qwen3-4B-Instruct-2507

Adapter

(31)

this model