Hugging Face
Models
Datasets
Spaces
Buckets
new
Docs
Enterprise
Pricing
Website
Tasks
HuggingChat
Collections
Languages
Organizations
Community
Blog
Posts
Daily Papers
Learn
Discord
Forum
GitHub
Solutions
Team & Enterprise
Hugging Face PRO
Enterprise Support
Inference Providers
Inference Endpoints
Storage Buckets
Log In
Sign Up
Deqing Fu
PRO
deqing
12
21
9
Follow
Gargaz's profile picture
Bill1235813's profile picture
shk-bd's profile picture
13 followers
·
19 following
https://deqingfu.github.io
DeqingFu
DeqingFu
AI & ML interests
None yet
Recent Activity
updated
a model
25 days ago
deqing/convergent-llama-300M-muon-6digit-addition_6digit_custom6
upvoted
a
paper
27 days ago
Value-Aware Stochastic KV Cache Eviction for Reasoning Models
submitted
a paper
27 days ago
Value-Aware Stochastic KV Cache Eviction for Reasoning Models
View all activity
Organizations
deqing
's models
158
Sort: Recently updated
deqing/convergent-llama-300M-adamw-swap_numbers
Text Generation
•
0.3B
•
Updated
Mar 29
•
7
deqing/convergent-llama-300M-adamw-isolate
Text Generation
•
0.3B
•
Updated
Mar 29
•
7
deqing/convergent-llama-300M-adamw-unigram
Text Generation
•
0.3B
•
Updated
Mar 29
•
9
deqing/convergent-mamba2-300M-muon-original
Text Generation
•
0.3B
•
Updated
Mar 29
•
10
deqing/llama-window-4-old
Text Generation
•
0.3B
•
Updated
Mar 29
•
7
deqing/llama-window-2-old
Text Generation
•
0.3B
•
Updated
Mar 29
•
7
deqing/convergent-llama-300M-muon-unk_number
Text Generation
•
0.3B
•
Updated
Mar 29
•
5
deqing/convergent-llama-300M-muon-swap_numbers
Text Generation
•
0.3B
•
Updated
Mar 29
•
7
deqing/llama-isolate-old
Text Generation
•
0.3B
•
Updated
Mar 29
•
7
deqing/convergent-llama-300M-muon-fivegram
Text Generation
•
0.3B
•
Updated
Mar 29
•
7
deqing/convergent-llama-300M-muon-permute
Text Generation
•
0.3B
•
Updated
Mar 29
•
7
deqing/convergent-llama-300M-muon-bigram
Text Generation
•
0.3B
•
Updated
Mar 29
•
7
deqing/convergent-llama-300M-muon-unigram
Text Generation
•
0.3B
•
Updated
Mar 29
•
18
deqing/mamba2-300M-v5-mamba2
Text Generation
•
0.3B
•
Updated
Mar 29
•
19
deqing/lstm-12layer-v5
0.2B
•
Updated
Mar 29
•
5
deqing/llama-300M-v5-original
Text Generation
•
0.3B
•
Updated
Mar 27
•
8
deqing/llama-300M-v5-unk_number
Text Generation
•
0.3B
•
Updated
Mar 26
•
7
deqing/llama-300M-v5-addition_3digit_adamw
0.3B
•
Updated
Mar 25
•
2
deqing/llama-300M-v5-addition_3digit
0.3B
•
Updated
Mar 25
•
2
deqing/llama-300M-v5-addition
Text Generation
•
0.3B
•
Updated
Mar 25
•
8
deqing/llama-300M-v5-addition_adamw
Text Generation
•
0.3B
•
Updated
Mar 24
•
7
deqing/llama-300M-v5-addition_adamw-old
0.3B
•
Updated
Mar 22
•
1
deqing/llama-300M-v5-addition_3digit-old
0.3B
•
Updated
Mar 22
•
2
deqing/llama-300M-v5-adamw-addition_3digit_adamw-old
0.3B
•
Updated
Mar 22
•
2
deqing/llama-300M-v5-original-random_init_sft
Updated
Mar 21
deqing/llama-300M-v5-isolate_sft
Updated
Mar 21
deqing/llama-300M-v5-swap_numbers_sft
Updated
Mar 21
deqing/llama-300M-v5-addition-old
0.3B
•
Updated
Mar 21
•
1
deqing/llama-300M-v5-original_sft
Updated
Mar 20
deqing/llama-300M-v5-bigram
Text Generation
•
0.3B
•
Updated
Mar 20
•
7
Previous
1
2
3
4
5
6
Next