-
deepseek-ai/DeepSeek-R1
Text Generation • 685B • Updated • 436k • • 12.9k -
Qwen/Qwen2.5-Coder-32B-Instruct
Text Generation • 33B • Updated • 253k • • 1.96k -
google/gemma-2-27b-it
Text Generation • 27B • Updated • 567k • 556 -
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Paper • 2201.11903 • Published • 15
Collections
Discover the best community collections!
Collections including paper arxiv:2201.11903
-
Language Models are Few-Shot Learners
Paper • 2005.14165 • Published • 18 -
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Paper • 1810.04805 • Published • 25 -
Attention Is All You Need
Paper • 1706.03762 • Published • 108 -
Lookahead Anchoring: Preserving Character Identity in Audio-Driven Human Animation
Paper • 2510.23581 • Published • 41
-
DeBERTa: Decoding-enhanced BERT with Disentangled Attention
Paper • 2006.03654 • Published • 3 -
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Paper • 1810.04805 • Published • 25 -
RoBERTa: A Robustly Optimized BERT Pretraining Approach
Paper • 1907.11692 • Published • 9 -
Language Models are Few-Shot Learners
Paper • 2005.14165 • Published • 18
-
ReST meets ReAct: Self-Improvement for Multi-Step Reasoning LLM Agent
Paper • 2312.10003 • Published • 44 -
Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks
Paper • 2005.11401 • Published • 14 -
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Paper • 2201.11903 • Published • 15 -
Attention Is All You Need
Paper • 1706.03762 • Published • 108
-
Internal Consistency and Self-Feedback in Large Language Models: A Survey
Paper • 2407.14507 • Published • 46 -
Large Language Models are Zero-Shot Reasoners
Paper • 2205.11916 • Published • 3 -
Let's Verify Step by Step
Paper • 2305.20050 • Published • 11 -
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Paper • 2201.11903 • Published • 15
-
Less is More: Recursive Reasoning with Tiny Networks
Paper • 2510.04871 • Published • 501 -
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Paper • 2201.11903 • Published • 15 -
Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters
Paper • 2408.03314 • Published • 63 -
Universal Language Model Fine-tuning for Text Classification
Paper • 1801.06146 • Published • 8
-
Don't Do RAG: When Cache-Augmented Generation is All You Need for Knowledge Tasks
Paper • 2412.15605 • Published • 2 -
Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection
Paper • 2310.11511 • Published • 78 -
Adaptive-RAG: Learning to Adapt Retrieval-Augmented Large Language Models through Question Complexity
Paper • 2403.14403 • Published • 7 -
Compressed Chain of Thought: Efficient Reasoning Through Dense Representations
Paper • 2412.13171 • Published • 35
-
Will we run out of data? An analysis of the limits of scaling datasets in Machine Learning
Paper • 2211.04325 • Published • 1 -
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Paper • 1810.04805 • Published • 25 -
On the Opportunities and Risks of Foundation Models
Paper • 2108.07258 • Published • 2 -
Super-NaturalInstructions: Generalization via Declarative Instructions on 1600+ NLP Tasks
Paper • 2204.07705 • Published • 2
-
Chameleon: Plug-and-Play Compositional Reasoning with Large Language Models
Paper • 2304.09842 • Published • 2 -
ReAct: Synergizing Reasoning and Acting in Language Models
Paper • 2210.03629 • Published • 31 -
Gorilla: Large Language Model Connected with Massive APIs
Paper • 2305.15334 • Published • 5 -
Reflexion: Language Agents with Verbal Reinforcement Learning
Paper • 2303.11366 • Published • 5
-
deepseek-ai/DeepSeek-R1
Text Generation • 685B • Updated • 436k • • 12.9k -
Qwen/Qwen2.5-Coder-32B-Instruct
Text Generation • 33B • Updated • 253k • • 1.96k -
google/gemma-2-27b-it
Text Generation • 27B • Updated • 567k • 556 -
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Paper • 2201.11903 • Published • 15
-
Language Models are Few-Shot Learners
Paper • 2005.14165 • Published • 18 -
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Paper • 1810.04805 • Published • 25 -
Attention Is All You Need
Paper • 1706.03762 • Published • 108 -
Lookahead Anchoring: Preserving Character Identity in Audio-Driven Human Animation
Paper • 2510.23581 • Published • 41
-
Less is More: Recursive Reasoning with Tiny Networks
Paper • 2510.04871 • Published • 501 -
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Paper • 2201.11903 • Published • 15 -
Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters
Paper • 2408.03314 • Published • 63 -
Universal Language Model Fine-tuning for Text Classification
Paper • 1801.06146 • Published • 8
-
DeBERTa: Decoding-enhanced BERT with Disentangled Attention
Paper • 2006.03654 • Published • 3 -
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Paper • 1810.04805 • Published • 25 -
RoBERTa: A Robustly Optimized BERT Pretraining Approach
Paper • 1907.11692 • Published • 9 -
Language Models are Few-Shot Learners
Paper • 2005.14165 • Published • 18
-
Don't Do RAG: When Cache-Augmented Generation is All You Need for Knowledge Tasks
Paper • 2412.15605 • Published • 2 -
Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection
Paper • 2310.11511 • Published • 78 -
Adaptive-RAG: Learning to Adapt Retrieval-Augmented Large Language Models through Question Complexity
Paper • 2403.14403 • Published • 7 -
Compressed Chain of Thought: Efficient Reasoning Through Dense Representations
Paper • 2412.13171 • Published • 35
-
ReST meets ReAct: Self-Improvement for Multi-Step Reasoning LLM Agent
Paper • 2312.10003 • Published • 44 -
Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks
Paper • 2005.11401 • Published • 14 -
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Paper • 2201.11903 • Published • 15 -
Attention Is All You Need
Paper • 1706.03762 • Published • 108
-
Will we run out of data? An analysis of the limits of scaling datasets in Machine Learning
Paper • 2211.04325 • Published • 1 -
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Paper • 1810.04805 • Published • 25 -
On the Opportunities and Risks of Foundation Models
Paper • 2108.07258 • Published • 2 -
Super-NaturalInstructions: Generalization via Declarative Instructions on 1600+ NLP Tasks
Paper • 2204.07705 • Published • 2
-
Internal Consistency and Self-Feedback in Large Language Models: A Survey
Paper • 2407.14507 • Published • 46 -
Large Language Models are Zero-Shot Reasoners
Paper • 2205.11916 • Published • 3 -
Let's Verify Step by Step
Paper • 2305.20050 • Published • 11 -
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Paper • 2201.11903 • Published • 15
-
Chameleon: Plug-and-Play Compositional Reasoning with Large Language Models
Paper • 2304.09842 • Published • 2 -
ReAct: Synergizing Reasoning and Acting in Language Models
Paper • 2210.03629 • Published • 31 -
Gorilla: Large Language Model Connected with Massive APIs
Paper • 2305.15334 • Published • 5 -
Reflexion: Language Agents with Verbal Reinforcement Learning
Paper • 2303.11366 • Published • 5