HuggingFaceFW/fineweb-edu
Viewer • Updated • 3.5B • 506k • 1.13k
How to use jrahn/gpt3_125M_edu_hermes with Transformers:
# Use a pipeline as a high-level helper
from transformers import pipeline
pipe = pipeline("text-generation", model="jrahn/gpt3_125M_edu_hermes") # Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("jrahn/gpt3_125M_edu_hermes")
model = AutoModelForCausalLM.from_pretrained("jrahn/gpt3_125M_edu_hermes")How to use jrahn/gpt3_125M_edu_hermes with vLLM:
# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "jrahn/gpt3_125M_edu_hermes"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
-H "Content-Type: application/json" \
--data '{
"model": "jrahn/gpt3_125M_edu_hermes",
"prompt": "Once upon a time,",
"max_tokens": 512,
"temperature": 0.5
}'docker model run hf.co/jrahn/gpt3_125M_edu_hermes
How to use jrahn/gpt3_125M_edu_hermes with SGLang:
# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
--model-path "jrahn/gpt3_125M_edu_hermes" \
--host 0.0.0.0 \
--port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
-H "Content-Type: application/json" \
--data '{
"model": "jrahn/gpt3_125M_edu_hermes",
"prompt": "Once upon a time,",
"max_tokens": 512,
"temperature": 0.5
}'docker run --gpus all \
--shm-size 32g \
-p 30000:30000 \
-v ~/.cache/huggingface:/root/.cache/huggingface \
--env "HF_TOKEN=<secret>" \
--ipc=host \
lmsysorg/sglang:latest \
python3 -m sglang.launch_server \
--model-path "jrahn/gpt3_125M_edu_hermes" \
--host 0.0.0.0 \
--port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
-H "Content-Type: application/json" \
--data '{
"model": "jrahn/gpt3_125M_edu_hermes",
"prompt": "Once upon a time,",
"max_tokens": 512,
"temperature": 0.5
}'How to use jrahn/gpt3_125M_edu_hermes with Docker Model Runner:
docker model run hf.co/jrahn/gpt3_125M_edu_hermes
Compare training on fineweb-edu 10b only vs. interleaved
Use the code below to get started with the model.
from transformers import pipeline
p = pipeline("text-generation", "jrahn/gpt3_125M_edu_hermes")
# instruction following
p("<|im_start|>user\nTeach me to fish.<|im_end|>\n<|im_start|>assistant\n", max_length=128)
# [{'generated_text': '<|im_start|>user\nTeach me to fish.<|im_end|>\n<|im_start|>assistant\nTeach me to fish.\n\nTeach me to fish.\n\nTeach me to fish.\n\nTeach me to fish.\n\nTeach me to fish.\n\nTeach me to fish.\n\nTeach me to fish.\n\nTeach me to fish.\n\nTeach me to fish.\n\nTeach me to fish.\n\nTeach me to fish.\n\n'}]
# text completion
p("In a shocking finding, scientist discovered a herd of unicorns living in a remote, previously unexplored valley, in the Andes Mountains. Even more surprising to the researchers was the fact that the unicorns spoke perfect English. ", max_length=128)
# [{'generated_text': 'In a shocking finding, scientist discovered a herd of unicorns living in a remote, previously unexplored valley, in the Andes Mountains. Even more surprising to the researchers was the fact that the unicorns spoke perfect English. \nThe researchers were able to identify the unicorns by their unique language. The researchers found that the unicorns spoke a language that is similar to the language of the Andes Mountains.\nThe researchers also found that the unicorns spoke a language that is similar to the language of the Andes Mountains. This is the first time that the researchers have been able to identify the language of the Andes Mountains.'}]
Datasets used: Fineweb-Edu 10B + OpenHermes 2.5
Dataset proportions:
Params: 125M -> 250MB / checkpoint
Tokens: ~10B (10,287,579,136)
Total training time: ~12hrs
Hardware: 2x RTX4090
MFU: 70% (266,000 tok/s)
HellaSwag: 30.5
GTP3 125M, Causal Language Modeling
2x RTX4090