Vision Transformer (ViT) Fine-Tuned Model
Vision Transformer (ViT) Fine-Tuned Model
This repository contains a fine-tuned version of google/vit-large-patch16-224, optimized for a custom image classification task.
π Model Overview
- Base model:
google/vit-large-patch16-224
- Architecture: Vision Transformer (ViT)
- Patch size: 16Γ16
- Image resolution: 224Γ224
- Frameworks: PyTorch, Hugging Face Transformers
π Performance
| Metric |
Value |
| Final Validation Loss |
0.3268 |
| Lowest Validation Loss |
0.2548 (Epoch 18) |
Training loss and validation loss trends indicate good convergence with slight overfitting after ~30 epochs.
π§ Training Configuration
| Hyperparameter |
Value |
| Learning rate |
2e-5 |
| Train batch size |
20 |
| Eval batch size |
8 |
| Optimizer |
AdamW (betas=(0.9, 0.999), eps=1e-8) |
| LR scheduler |
Linear |
| Epochs |
40 |
| Seed |
42 |
| Framework versions |
Transformers 4.52.4, PyTorch 2.6.0+cu124, Datasets 3.6.0, Tokenizers 0.21.2 |
π Training Results
| Epoch |
Step |
Validation Loss |
| 1 |
24 |
0.5601 |
| 5 |
120 |
0.3421 |
| 10 |
240 |
0.2901 |
| 14 |
336 |
0.2737 |
| 18 |
432 |
0.2548 |
| 40 |
960 |
0.3268 |
π Intended Uses
- Image classification on datasets with characteristics similar to the training dataset.
- Fine-tuning for domain-specific classification tasks.
β Limitations
- Trained on a custom dataset β may not generalize well to unrelated domains without additional fine-tuning.
- No guarantees on fairness, bias, or ethical implications without dataset analysis.
π How to Use
You can use this model in two main ways:
1οΈβ£ Using the High-Level pipeline API
from transformers import pipeline
pipe = pipeline("image-classification", model="rakib730/output-models")
result = pipe("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/hub/parrots.png")
print(result)
2οΈβ£ Using the Processor and Model Directly**
from transformers import AutoImageProcessor, AutoModelForImageClassification
from PIL import Image
import requests
import torch
processor = AutoImageProcessor.from_pretrained("rakib730/output-models")
model = AutoModelForImageClassification.from_pretrained("rakib730/output-models")
url = "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/hub/parrots.png"
image = Image.open(requests.get(url, stream=True).raw).convert("RGB")
inputs = processor(images=image, return_tensors="pt")
with torch.no_grad():
outputs = model(**inputs)
logits = outputs.logits
predicted_class_id = logits.argmax(-1).item()
print("Predicted class:", model.config.id2label[predicted_class_id])