Instructions to use Amirhossein75/multi-label-emotion-classification-reddit-comments-roberta with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use Amirhossein75/multi-label-emotion-classification-reddit-comments-roberta with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-classification", model="Amirhossein75/multi-label-emotion-classification-reddit-comments-roberta")# Load model directly from transformers import AutoTokenizer, AutoModelForSequenceClassification tokenizer = AutoTokenizer.from_pretrained("Amirhossein75/multi-label-emotion-classification-reddit-comments-roberta") model = AutoModelForSequenceClassification.from_pretrained("Amirhossein75/multi-label-emotion-classification-reddit-comments-roberta") - PEFT
How to use Amirhossein75/multi-label-emotion-classification-reddit-comments-roberta with PEFT:
Task type is invalid.
- Notebooks
- Google Colab
- Kaggle
- Model Card for Multi‑Label Emotion Classification on Reddit Comments
Model Card for Multi‑Label Emotion Classification on Reddit Comments
This repository contains training and inference code for multi‑label emotion classification of Reddit comments using the GoEmotions dataset (27 emotions + neutral) with a RoBERTa‑base encoder. It includes a configuration‑driven training script, evaluation, decision‑threshold tuning, and a lightweight inference entrypoint.
Repository: https://github.com/amirhossein-yousefi/multi-label-emotion-classification-reddit-comments
Model Details
Model Description
This project fine‑tunes a Transformer encoder for multi‑label emotion detection on Reddit comments. The default configuration uses roberta-base, binary cross‑entropy loss (optionally focal loss), and grid‑search threshold tuning on the validation set.
- Developed by: GitHub @amirhossein-yousefi
- Model type: Multi‑label text classification (Transformer encoder)
- Language(s) (NLP): English
- License: No explicit license file was found in the repository; treat as “all rights reserved” unless the author adds a license.
- Finetuned from model :
roberta-base
Model Sources
- Repository: https://github.com/amirhossein-yousefi/multi-label-emotion-classification-reddit-comments
- Paper [dataset]: GoEmotions: A Dataset of Fine‑Grained Emotions (Demszky et al., 2020)
Uses
Direct Use
- Tagging short English texts (e.g., social posts, comments) with multiple emotions from the GoEmotions taxonomy (e.g., joy, sadness, anger, admiration, gratitude, etc.).
- Exploratory analytics and visualization of emotion distributions in corpora similar to Reddit.
Downstream Use
- Fine‑tuning or domain adaptation to platforms beyond Reddit (forums, support tickets, app reviews).
- Serving as a baseline component in moderation pipelines or empathetic response systems (with careful human oversight).
Out‑of‑Scope Use
- Medical, psychological, or diagnostic use; mental‑health inference.
- High‑stakes decisions (employment, lending, safety) without rigorous, domain‑specific validation.
- Non‑English or heavily code‑switched text without additional training/testing.
Bias, Risks, and Limitations
- Dataset origin: GoEmotions is built from Reddit comments; models may inherit Reddit‑specific discourse, slang, and toxicity patterns and may underperform on other domains.
- Annotation noise: Third‑party analyses have raised concerns about mislabels in GoEmotions; treat labels as imperfect and consider human review for critical use cases.
- Multi‑label uncertainty: Threshold choice materially affects precision/recall trade‑offs. The repo tunes the threshold on validation data; you should recalibrate for your domain.
Recommendations
- Calibrate thresholds on in‑domain validation data (the repo grid‑searches 0.05–0.95).
- Report per‑label metrics, especially for minority emotions.
- Consider bias audits and human‑in‑the‑loop review before deployment.
How to Get Started with the Model
Environment
- Python ≥ 3.13
- Install dependencies:
pip install -r requirements.txt
Train
The Makefile provides a default train target:
python -m emoclass.train --config configs/base.yaml
Inference
After training (or pointing to a trained directory), run:
python -m emoclass.inference --model_dir outputs/goemotions_roberta --text "I love this!" "This is awful."
Training Details
Training Data
- Dataset: GoEmotions (27 emotions + neutral). The default config uses the
simplifiedvariant. - Text column:
text - Labels column:
labels - Max sequence length: 192
Training Procedure
Preprocessing
- Standard Transformer tokenization for
roberta-base. - Multi‑hot label encoding for emotions.
Training Hyperparameters
- Base model:
roberta-base - Batch size: 16 (train), 32 (eval)
- Learning rate: 2e‑5
- Epochs: 5
- Weight decay: 0.01
- Warmup ratio: 0.06
- Gradient accumulation: 1
- Precision: bf16/fp16 if available
- Loss: Binary Cross‑Entropy (optionally focal loss with γ=2.0, α=0.25)
- Threshold tuning: grid 0.05 → 0.95 (step 0.01); best val micro‑F1 ≈ 0.84
- LoRA/PEFT: available in config (default off)
Speeds, Sizes, Times
- See
results.txtfor an example run’s timing & throughput logs.
Evaluation
Testing Data, Factors & Metrics
- Test split: GoEmotions
simplifiedtest. - Metrics: micro/macro/sample F1, micro/macro Average Precision (AP), micro/macro ROC‑AUC.
Results (example run)
- Threshold (val‑tuned): 0.84
- F1 (micro): 0.5284
- F1 (macro): 0.4995
- F1 (samples): 0.5301
- AP (micro): 0.5352
- AP (macro): 0.5087
- ROC‑AUC (micro): 0.9517
- ROC‑AUC (macro): 0.9310
(See results.txt for the full log and any updates.)
Model Examination
- Inspect per‑label thresholds and confusion patterns; minority emotions (e.g., grief, pride, nervousness) often suffer lower F1 and need more tuning or class‑balancing strategies.
Environmental Impact
- Not measured. If desired, log GPU type, hours, region, and estimate emissions using the ML CO2 calculator.
Technical Specifications
Model Architecture and Objective
- Transformer encoder (
roberta-base) fine‑tuned with a sigmoid multi‑label head and BCE (or focal) loss.
Compute Infrastructure
- Frameworks:
transformers,datasets,accelerate,evaluate,scikit-learn, optionalpeft. - Hardware/software specifics are user‑dependent.
Citation
GoEmotions (dataset/paper):
Demszky, D., Movshovitz-Attias, D., Ko, J., Cowen, A., Nemade, G., & Ravi, S. (2020). GoEmotions: A Dataset of Fine‑Grained Emotions. ACL 2020. https://arxiv.org/abs/2005.00547
BibTeX:
@inproceedings{demszky2020goemotions,
title={GoEmotions: A Dataset of Fine-Grained Emotions},
author={Demszky, Dorottya and Movshovitz-Attias, Dana and Ko, Jeongwoo and Cowen, Alan and Nemade, Gaurav and Ravi, Sujith},
booktitle={Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics},
year={2020}
}
Glossary
- AP: Average Precision (area under precision–recall curve).
- AUC: Area under ROC curve.
- Micro/Macro F1: Micro aggregates over all labels; macro averages per‑label F1.
More Information
- The configuration file at
configs/base.yamldocuments tweakable knobs (loss type, LoRA, precision, etc.). - Artifacts are saved under
outputs/by default.
Model Card Authors
- Original code: @amirhossein-yousefi
- Model card: generated programmatically for documentation purposes.
Model Card Contact
- Open an issue in the GitHub repository.
- Downloads last month
- 2
Model tree for Amirhossein75/multi-label-emotion-classification-reddit-comments-roberta
Base model
FacebookAI/roberta-baseDataset used to train Amirhossein75/multi-label-emotion-classification-reddit-comments-roberta
Paper for Amirhossein75/multi-label-emotion-classification-reddit-comments-roberta
Evaluation results
- F1 (micro) on GoEmotionstest set self-reported0.528
- F1 (macro) on GoEmotionstest set self-reported0.500
- F1 (samples) on GoEmotionstest set self-reported0.530
- Average Precision (micro) on GoEmotionstest set self-reported0.535
- Average Precision (macro) on GoEmotionstest set self-reported0.509
- ROC AUC (micro) on GoEmotionstest set self-reported0.952
- ROC AUC (macro) on GoEmotionstest set self-reported0.931