qwen3-4b-structured-output-20260102test_lora

This repository provides a LoRA adapter fine-tuned from unsloth/Qwen3-4B-Instruct-2507 using Unsloth (QLoRA, 4-bit base).

The adapter is optimized for structured output generation, including format conversion and information extraction tasks that require high output consistency.

⚠️ This repository contains LoRA adapter weights only (PEFT). The base model must be loaded separately.


Model Overview

  • Base model: unsloth/Qwen3-4B-Instruct-2507
  • Fine-tuning method: QLoRA (via Unsloth)
  • Adapter type: LoRA (PEFT)
  • Maximum sequence length: 4096
  • Training objective: Improve assistant-side generation quality for structured and schema-like outputs.

Trainable Parameters

  • Trainable parameters (LoRA only): ~33M
  • Base model parameters: ~4B
  • Trainable ratio: ~0.8%

Only LoRA adapter weights are trained; the base model remains frozen. This design ensures:

  • efficient fine-tuning,
  • low storage footprint,
  • and safe reuse of the original base model.

Training Data

Supervision strategy (assistant-only loss)

This model is trained using assistant-only supervision:

  • Conversations are rendered using the model's chat template.
  • Tokens corresponding to system / user instructions are masked (label = -100).
  • Loss is computed only on assistant response tokens.
  • The prompt–response boundary is estimated by comparing:
    • tokenized full conversation text, and
    • tokenized prefix text ending immediately before assistant generation, using a maximum suffix–prefix overlap heuristic.
  • Samples where the boundary cannot be reliably determined are fully masked and do not contribute to training loss.

This focuses learning on output correctness and structural fidelity.


Training Configuration (Key Parameters)

  • Epochs: 0.2
  • Learning rate: 2e-4
  • Per-device batch size: 2
  • Gradient accumulation steps: 8 → Effective global batch size: 16
  • Warmup ratio: 0.03
  • Weight decay: 0.01
  • Optimizer: AdamW (8-bit, via Unsloth)
  • Evaluation: Enabled (shuffle + slice split; no map used)

LoRA Configuration

  • Rank (r): 16
  • Alpha: 32
  • Dropout: 0.0
  • Target modules: Attention and MLP projection layers (q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj)

Reproducibility

To ensure reproducibility, the following constraints are enforced:

  • Random seed: 42
  • Max sequence length: 4096
  • Data splitting: Shuffle + index slicing (no datasets.map)
  • Loss masking: Deterministic assistant-only masking
  • Frameworks:
    • Transformers (>= 4.57)
    • PEFT
    • Unsloth
    • PyTorch (CUDA)

Exact numerical results may vary slightly depending on GPU type and low-level CUDA kernels, but training behavior and trends should remain consistent.


Usage

from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
import torch

base_id = "unsloth/Qwen3-4B-Instruct-2507"
adapter_id = "daichira/qwen3-4b-structured-output-20260102test_lora"

tokenizer = AutoTokenizer.from_pretrained(base_id, use_fast=True)
model = AutoModelForCausalLM.from_pretrained(
    base_id,
    torch_dtype=torch.float16,
    device_map="auto"
)

model = PeftModel.from_pretrained(model, adapter_id)
model.eval()

Intended Use

  • Structured text generation (JSON, schema-aligned outputs)
  • Format conversion and normalization
  • Information extraction pipelines
  • Baseline adapter for structured-output–focused LLM experiments

Limitations

  • Output validity is not guaranteed for underspecified prompts.
  • Optimized for assistant output quality rather than dialogue modeling.
  • General-purpose reasoning performance is not the primary target.

License


Acknowledgements

  • Qwen / Alibaba for the base model
  • Unsloth for efficient QLoRA training
  • Hugging Face Transformers & PEFT ecosystem
Downloads last month
9
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for daichira/qwen3-4b-structured-output-20260102test_lora

Adapter
(31)
this model