You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

mT5-Small (Maltese News Headlines)

This model is a fine-tuned version of google/mt5-small on the MLRS/maltese_news_headlines dataset. It achieves the following results on the test set:

  • Loss: 2.0476
  • Chrf:
    • Score: 32.1775
    • Char Order: 6
    • Word Order: 0
    • Beta: 2
  • Rouge:
    • Rouge1: 0.3078
    • Rouge2: 0.1667
    • Rougel: 0.2809
    • Rougelsum: 0.2808
  • Gen Len: 26.6689

Intended uses & limitations

The model is fine-tuned on a specific task and it should be used on the same or similar task. Any limitations present in the base model are inherited.

Training procedure

The model was fine-tuned using a customised script.

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.001
  • train_batch_size: 32
  • eval_batch_size: 32
  • seed: 42
  • optimizer: Use adafactor and the args are: No additional optimizer arguments
  • lr_scheduler_type: linear
  • num_epochs: 200.0
  • early_stopping_patience: 20

Training results

Training Loss Epoch Step Validation Loss Chrf Score Chrf Char Order Chrf Word Order Chrf Beta Rouge Rouge1 Rouge Rouge2 Rouge Rougel Rouge Rougelsum Gen Len
3.4416 1.0 556 2.3096 30.5960 6 0 2 0.2743 0.1409 0.2492 0.2490 36.1606
2.4043 2.0 1112 2.1213 28.3827 6 0 2 0.2671 0.1374 0.2455 0.2456 28.0633
2.1593 3.0 1668 2.0488 31.1100 6 0 2 0.2971 0.1577 0.2720 0.2721 28.6908
1.9768 4.0 2224 2.0320 31.7961 6 0 2 0.2957 0.1559 0.2701 0.2701 31.3063
1.824 5.0 2780 1.9928 32.1849 6 0 2 0.3024 0.1630 0.2771 0.2770 32.5491
1.7178 6.0 3336 1.9852 31.7565 6 0 2 0.3037 0.1645 0.2792 0.2790 34.9514
1.5956 7.0 3892 2.0045 33.2805 6 0 2 0.3134 0.1726 0.2879 0.2878 28.2378
1.521 8.0 4448 2.0457 33.1853 6 0 2 0.3194 0.1784 0.2945 0.2943 26.8732
1.3417 9.0 5004 2.0346 32.6798 6 0 2 0.3084 0.1702 0.2857 0.2856 29.0178
1.2508 10.0 5560 2.0682 32.9948 6 0 2 0.3158 0.1729 0.2904 0.2903 26.7743
1.1808 11.0 6116 2.1129 32.5438 6 0 2 0.3144 0.1732 0.2885 0.2885 25.1255
1.1192 12.0 6672 2.1251 32.1996 6 0 2 0.3057 0.1672 0.2816 0.2817 26.9772
1.0642 13.0 7228 2.1905 33.0654 6 0 2 0.3145 0.1747 0.2915 0.2914 26.4312
1.0012 14.0 7784 2.2181 32.7913 6 0 2 0.3137 0.1724 0.2893 0.2893 25.9110
0.9486 15.0 8340 2.2740 32.8274 6 0 2 0.3112 0.1713 0.2858 0.2856 27.1084
0.8971 16.0 8896 2.3449 32.8081 6 0 2 0.3151 0.1742 0.2899 0.2898 26.5010
0.8501 17.0 9452 2.3809 32.5864 6 0 2 0.3121 0.1701 0.2873 0.2874 26.3136
0.7626 18.0 10008 2.3940 32.5931 6 0 2 0.3077 0.1676 0.2826 0.2825 27.4178
0.708 19.0 10564 2.5105 32.3410 6 0 2 0.3148 0.1728 0.2897 0.2896 25.2108
0.6698 20.0 11120 2.5310 32.7337 6 0 2 0.3141 0.1718 0.2885 0.2886 26.5286
0.6335 21.0 11676 2.5854 32.6948 6 0 2 0.3136 0.1722 0.2888 0.2891 25.3147
0.6 22.0 12232 2.6345 31.8629 6 0 2 0.3076 0.1681 0.2839 0.2841 24.3055
0.5678 23.0 12788 2.7192 32.6265 6 0 2 0.3075 0.1674 0.2839 0.2840 25.9609
0.537 24.0 13344 2.7482 32.0641 6 0 2 0.3071 0.1660 0.2825 0.2826 26.3948
0.5063 25.0 13900 2.7691 32.7935 6 0 2 0.3113 0.1676 0.2852 0.2850 26.6690
0.4846 26.0 14456 2.8247 31.9925 6 0 2 0.3045 0.1657 0.2803 0.2803 25.9360
0.43 27.0 15012 2.8367 31.8532 6 0 2 0.3033 0.1611 0.2769 0.2767 26.7688
0.4008 28.0 15568 2.9464 31.7433 6 0 2 0.3037 0.1650 0.2792 0.2793 26.1735

Framework versions

  • Transformers 4.48.2
  • Pytorch 2.4.1+cu121
  • Datasets 3.2.0
  • Tokenizers 0.21.0

License

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. Permissions beyond the scope of this license may be available at https://mlrs.research.um.edu.mt/.

CC BY-NC-SA 4.0

Citation

This work was first presented in MELABenchv1: Benchmarking Large Language Models against Smaller Fine-Tuned Models for Low-Resource Maltese NLP. Cite it as follows:

@inproceedings{micallef-borg-2025-melabenchv1,
    title = "{MELAB}enchv1: Benchmarking Large Language Models against Smaller Fine-Tuned Models for Low-Resource {M}altese {NLP}",
    author = "Micallef, Kurt  and
      Borg, Claudia",
    editor = "Che, Wanxiang  and
      Nabende, Joyce  and
      Shutova, Ekaterina  and
      Pilehvar, Mohammad Taher",
    booktitle = "Findings of the Association for Computational Linguistics: ACL 2025",
    month = jul,
    year = "2025",
    address = "Vienna, Austria",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2025.findings-acl.1053/",
    doi = "10.18653/v1/2025.findings-acl.1053",
    pages = "20505--20527",
    ISBN = "979-8-89176-256-5",
}
Downloads last month
-
Safetensors
Model size
0.3B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for MLRS/mt5-small_maltese-news-headlines

Base model

google/mt5-small
Finetuned
(612)
this model

Dataset used to train MLRS/mt5-small_maltese-news-headlines

Collection including MLRS/mt5-small_maltese-news-headlines

Evaluation results