FireRed Team

Add pipeline tag, paper link and sample usage to model card (#1)

1bb4d28 about 1 month ago

2.84 kB

	---
	language:
	- en
	- zh
	license: apache-2.0
	pipeline_tag: audio-classification
	tags:
	- Language Identification
	- LID
	- Audio Classification
	- VoxLingua107
	- audio
	- automatic-speech-recognition
	- asr
	---

	<div align="center">
	<h1>
	FireRedASR2S - FireRedLID
	<br>
	A SOTA Industrial-Grade Spoken Language Identification System
	</h1>

	</div>

	[[Paper]](https://huggingface.co/papers/2603.10420)
	[[Code]](https://github.com/FireRedTeam/FireRedASR2S)
	[[Blog]](https://fireredteam.github.io/demos/firered_asr/)
	[[Demo]](https://huggingface.co/spaces/FireRedTeam/FireRedASR)


	FireRedLID is the Spoken Language Identification (LID) module of FireRedASR2S, a state-of-the-art (SOTA), industrial-grade, all-in-one ASR system. It supports 100+ languages and 20+ Chinese dialects/accents, achieving 97.18% accuracy on the FLEURS benchmark, outperforming Whisper and SpeechBrain-LID.

	This model was introduced in the paper [FireRedASR2S: A State-of-the-Art Industrial-Grade All-in-One Automatic Speech Recognition System](https://huggingface.co/papers/2603.10420).

	## 🔥 News
	- [2026.02.12] We release FireRedASR2S (FireRedASR2-AED, FireRedVAD, FireRedLID, and FireRedPunc) with model weights and inference code.

	## Evaluation
	### FireRedLID
	Metric: Utterance-level LID Accuracy (%). Higher is better.

	\|Testset\Model\|Languages\|FireRedLID\|Whisper\|SpeechBrain\|Dolphin\|
	\|:-----------------:\|:---------:\|:---------:\|:-----:\|:---------:\|:-----:\|
	\|FLEURS test \|82 languages \|97.18 \|79.41 \|92.91 \|-\|
	\|CommonVoice test \|74 languages \|92.07 \|80.81 \|78.75 \|-\|
	\|KeSpeech + MagicData\|20+ Chinese dialects/accents \|88.47 \|-\|-\|69.01\|


	## Sample Usage

	To use this module independently, first clone the [GitHub repository](https://github.com/FireRedTeam/FireRedASR2S) and install the dependencies.

	### Python API Usage
	```python
	from fireredasr2s.fireredlid import FireRedLid, FireRedLidConfig

	batch_uttid = ["hello_zh", "hello_en"]
	batch_wav_path = ["assets/hello_zh.wav", "assets/hello_en.wav"]

	config = FireRedLidConfig(use_gpu=True, use_half=False)
	model = FireRedLid.from_pretrained("FireRedTeam/FireRedLID", config)

	results = model.process(batch_uttid, batch_wav_path)
	print(results)
	# [{'uttid': 'hello_zh', 'lang': 'zh mandarin', 'confidence': 0.996, 'dur_s': 2.32, 'rtf': '0.0741', 'wav': 'assets/hello_zh.wav'}, {'uttid': 'hello_en', 'lang': 'en', 'confidence': 0.996, 'dur_s': 2.24, 'rtf': '0.0741', 'wav': 'assets/hello_en.wav'}]
	```

	## Citation
	```bibtex
	@article{xu2026fireredasr2s,
	title={FireRedASR2S: A State-of-the-Art Industrial-Grade All-in-One Automatic Speech Recognition System},
	author={Xu, Kaituo and Jia, Yan and Huang, Kai and Chen, Junjie and Li, Wenpeng and Liu, Kun and Xie, Feng-Long and Tang, Xu and Hu, Yao},
	journal={arXiv preprint arXiv:2603.10420},
	year={2026}
	}
	```