mT5-Small (Maltese News Headlines)
This model is a fine-tuned version of google/mt5-small on the MLRS/maltese_news_headlines dataset. It achieves the following results on the test set:
- Loss: 2.0476
- Chrf:
- Score: 32.1775
- Char Order: 6
- Word Order: 0
- Beta: 2
- Rouge:
- Rouge1: 0.3078
- Rouge2: 0.1667
- Rougel: 0.2809
- Rougelsum: 0.2808
- Gen Len: 26.6689
Intended uses & limitations
The model is fine-tuned on a specific task and it should be used on the same or similar task. Any limitations present in the base model are inherited.
Training procedure
The model was fine-tuned using a customised script.
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 0.001
- train_batch_size: 32
- eval_batch_size: 32
- seed: 42
- optimizer: Use adafactor and the args are: No additional optimizer arguments
- lr_scheduler_type: linear
- num_epochs: 200.0
- early_stopping_patience: 20
Training results
| Training Loss | Epoch | Step | Validation Loss | Chrf Score | Chrf Char Order | Chrf Word Order | Chrf Beta | Rouge Rouge1 | Rouge Rouge2 | Rouge Rougel | Rouge Rougelsum | Gen Len |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 3.4416 | 1.0 | 556 | 2.3096 | 30.5960 | 6 | 0 | 2 | 0.2743 | 0.1409 | 0.2492 | 0.2490 | 36.1606 |
| 2.4043 | 2.0 | 1112 | 2.1213 | 28.3827 | 6 | 0 | 2 | 0.2671 | 0.1374 | 0.2455 | 0.2456 | 28.0633 |
| 2.1593 | 3.0 | 1668 | 2.0488 | 31.1100 | 6 | 0 | 2 | 0.2971 | 0.1577 | 0.2720 | 0.2721 | 28.6908 |
| 1.9768 | 4.0 | 2224 | 2.0320 | 31.7961 | 6 | 0 | 2 | 0.2957 | 0.1559 | 0.2701 | 0.2701 | 31.3063 |
| 1.824 | 5.0 | 2780 | 1.9928 | 32.1849 | 6 | 0 | 2 | 0.3024 | 0.1630 | 0.2771 | 0.2770 | 32.5491 |
| 1.7178 | 6.0 | 3336 | 1.9852 | 31.7565 | 6 | 0 | 2 | 0.3037 | 0.1645 | 0.2792 | 0.2790 | 34.9514 |
| 1.5956 | 7.0 | 3892 | 2.0045 | 33.2805 | 6 | 0 | 2 | 0.3134 | 0.1726 | 0.2879 | 0.2878 | 28.2378 |
| 1.521 | 8.0 | 4448 | 2.0457 | 33.1853 | 6 | 0 | 2 | 0.3194 | 0.1784 | 0.2945 | 0.2943 | 26.8732 |
| 1.3417 | 9.0 | 5004 | 2.0346 | 32.6798 | 6 | 0 | 2 | 0.3084 | 0.1702 | 0.2857 | 0.2856 | 29.0178 |
| 1.2508 | 10.0 | 5560 | 2.0682 | 32.9948 | 6 | 0 | 2 | 0.3158 | 0.1729 | 0.2904 | 0.2903 | 26.7743 |
| 1.1808 | 11.0 | 6116 | 2.1129 | 32.5438 | 6 | 0 | 2 | 0.3144 | 0.1732 | 0.2885 | 0.2885 | 25.1255 |
| 1.1192 | 12.0 | 6672 | 2.1251 | 32.1996 | 6 | 0 | 2 | 0.3057 | 0.1672 | 0.2816 | 0.2817 | 26.9772 |
| 1.0642 | 13.0 | 7228 | 2.1905 | 33.0654 | 6 | 0 | 2 | 0.3145 | 0.1747 | 0.2915 | 0.2914 | 26.4312 |
| 1.0012 | 14.0 | 7784 | 2.2181 | 32.7913 | 6 | 0 | 2 | 0.3137 | 0.1724 | 0.2893 | 0.2893 | 25.9110 |
| 0.9486 | 15.0 | 8340 | 2.2740 | 32.8274 | 6 | 0 | 2 | 0.3112 | 0.1713 | 0.2858 | 0.2856 | 27.1084 |
| 0.8971 | 16.0 | 8896 | 2.3449 | 32.8081 | 6 | 0 | 2 | 0.3151 | 0.1742 | 0.2899 | 0.2898 | 26.5010 |
| 0.8501 | 17.0 | 9452 | 2.3809 | 32.5864 | 6 | 0 | 2 | 0.3121 | 0.1701 | 0.2873 | 0.2874 | 26.3136 |
| 0.7626 | 18.0 | 10008 | 2.3940 | 32.5931 | 6 | 0 | 2 | 0.3077 | 0.1676 | 0.2826 | 0.2825 | 27.4178 |
| 0.708 | 19.0 | 10564 | 2.5105 | 32.3410 | 6 | 0 | 2 | 0.3148 | 0.1728 | 0.2897 | 0.2896 | 25.2108 |
| 0.6698 | 20.0 | 11120 | 2.5310 | 32.7337 | 6 | 0 | 2 | 0.3141 | 0.1718 | 0.2885 | 0.2886 | 26.5286 |
| 0.6335 | 21.0 | 11676 | 2.5854 | 32.6948 | 6 | 0 | 2 | 0.3136 | 0.1722 | 0.2888 | 0.2891 | 25.3147 |
| 0.6 | 22.0 | 12232 | 2.6345 | 31.8629 | 6 | 0 | 2 | 0.3076 | 0.1681 | 0.2839 | 0.2841 | 24.3055 |
| 0.5678 | 23.0 | 12788 | 2.7192 | 32.6265 | 6 | 0 | 2 | 0.3075 | 0.1674 | 0.2839 | 0.2840 | 25.9609 |
| 0.537 | 24.0 | 13344 | 2.7482 | 32.0641 | 6 | 0 | 2 | 0.3071 | 0.1660 | 0.2825 | 0.2826 | 26.3948 |
| 0.5063 | 25.0 | 13900 | 2.7691 | 32.7935 | 6 | 0 | 2 | 0.3113 | 0.1676 | 0.2852 | 0.2850 | 26.6690 |
| 0.4846 | 26.0 | 14456 | 2.8247 | 31.9925 | 6 | 0 | 2 | 0.3045 | 0.1657 | 0.2803 | 0.2803 | 25.9360 |
| 0.43 | 27.0 | 15012 | 2.8367 | 31.8532 | 6 | 0 | 2 | 0.3033 | 0.1611 | 0.2769 | 0.2767 | 26.7688 |
| 0.4008 | 28.0 | 15568 | 2.9464 | 31.7433 | 6 | 0 | 2 | 0.3037 | 0.1650 | 0.2792 | 0.2793 | 26.1735 |
Framework versions
- Transformers 4.48.2
- Pytorch 2.4.1+cu121
- Datasets 3.2.0
- Tokenizers 0.21.0
License
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. Permissions beyond the scope of this license may be available at https://mlrs.research.um.edu.mt/.
Citation
This work was first presented in MELABenchv1: Benchmarking Large Language Models against Smaller Fine-Tuned Models for Low-Resource Maltese NLP. Cite it as follows:
@inproceedings{micallef-borg-2025-melabenchv1,
title = "{MELAB}enchv1: Benchmarking Large Language Models against Smaller Fine-Tuned Models for Low-Resource {M}altese {NLP}",
author = "Micallef, Kurt and
Borg, Claudia",
editor = "Che, Wanxiang and
Nabende, Joyce and
Shutova, Ekaterina and
Pilehvar, Mohammad Taher",
booktitle = "Findings of the Association for Computational Linguistics: ACL 2025",
month = jul,
year = "2025",
address = "Vienna, Austria",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2025.findings-acl.1053/",
doi = "10.18653/v1/2025.findings-acl.1053",
pages = "20505--20527",
ISBN = "979-8-89176-256-5",
}
- Downloads last month
- -
Model tree for MLRS/mt5-small_maltese-news-headlines
Base model
google/mt5-smallDataset used to train MLRS/mt5-small_maltese-news-headlines
Collection including MLRS/mt5-small_maltese-news-headlines
Evaluation results
- ChrF on MLRS/maltese_news_headlinesMELABench Leaderboard33.190
- Rouge-L on MLRS/maltese_news_headlinesMELABench Leaderboard0.290
