Instructions to use jrahn/gpt3_125M_edu_hermes with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use jrahn/gpt3_125M_edu_hermes with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="jrahn/gpt3_125M_edu_hermes")

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("jrahn/gpt3_125M_edu_hermes")
model = AutoModelForCausalLM.from_pretrained("jrahn/gpt3_125M_edu_hermes")

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use jrahn/gpt3_125M_edu_hermes with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "jrahn/gpt3_125M_edu_hermes"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "jrahn/gpt3_125M_edu_hermes",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker

docker model run hf.co/jrahn/gpt3_125M_edu_hermes

SGLang

How to use jrahn/gpt3_125M_edu_hermes with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "jrahn/gpt3_125M_edu_hermes" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "jrahn/gpt3_125M_edu_hermes",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "jrahn/gpt3_125M_edu_hermes" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "jrahn/gpt3_125M_edu_hermes",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Docker Model Runner
How to use jrahn/gpt3_125M_edu_hermes with Docker Model Runner:
```
docker model run hf.co/jrahn/gpt3_125M_edu_hermes
```
Browse Quantizations to use this model in llama.cpp, Ollama, LM Studio, or any compatible app.

Model Card for llm.c GPT3_125M

Instruction Pretraining: Fineweb-edu 10B interleaved with OpenHermes 2.5

Compare training on fineweb-edu 10b only vs. interleaved

Model Details

How to Get Started with the Model

Use the code below to get started with the model.

from transformers import pipeline
p = pipeline("text-generation", "jrahn/gpt3_125M_edu_hermes")

# instruction following
p("<|im_start|>user\nTeach me to fish.<|im_end|>\n<|im_start|>assistant\n", max_length=128)
# [{'generated_text': '<|im_start|>user\nTeach me to fish.<|im_end|>\n<|im_start|>assistant\nTeach me to fish.\n\nTeach me to fish.\n\nTeach me to fish.\n\nTeach me to fish.\n\nTeach me to fish.\n\nTeach me to fish.\n\nTeach me to fish.\n\nTeach me to fish.\n\nTeach me to fish.\n\nTeach me to fish.\n\nTeach me to fish.\n\n'}]

# text completion
p("In a shocking finding, scientist discovered a herd of unicorns living in a remote, previously unexplored valley, in the Andes Mountains. Even more surprising to the researchers was the fact that the unicorns spoke perfect English. ", max_length=128)
# [{'generated_text': 'In a shocking finding, scientist discovered a herd of unicorns living in a remote, previously unexplored valley, in the Andes Mountains. Even more surprising to the researchers was the fact that the unicorns spoke perfect English. \nThe researchers were able to identify the unicorns by their unique language. The researchers found that the unicorns spoke a language that is similar to the language of the Andes Mountains.\nThe researchers also found that the unicorns spoke a language that is similar to the language of the Andes Mountains. This is the first time that the researchers have been able to identify the language of the Andes Mountains.'}]

Training Details

Training Data

Datasets used: Fineweb-Edu 10B + OpenHermes 2.5

Dataset proportions:

Part 1: FWE 4,836,050 + OH 100,000 (2.03%) = 4,936,050
Part 2: FWE 4,336,051 + OH 400,000 (8.45%) = 4,736,051
Part 3: FWE 500,000 + OH 501,551 (50.08%) = 1,001,551
Total documents: 10,669,024

Training Procedure

Preprocessing [optional]

Fineweb-Edu: none, just the "text" feature
OpenHermes 2.5: applied ChatML prompt template to "conversations" to create the "text" feature

Training Hyperparameters

Training regime:
bf16
context length 2048
per device batch size 16, global batch size 524,288 -> gradient accumulation 16
zero stage 1
lr 6e-4, cosine schedule, 700 warmup steps
more details see run script

Speeds, Sizes, Times [optional]

Params: 125M -> 250MB / checkpoint
Tokens: ~10B (10,287,579,136)
Total training time: ~12hrs
Hardware: 2x RTX4090
MFU: 70% (266,000 tok/s)

Evaluation

Results

HellaSwag: 30.5

more details see main.log

Technical Specifications [optional]

Model Architecture and Objective

GTP3 125M, Causal Language Modeling

Compute Infrastructure

Hardware

2x RTX4090

Software

llm.c

Downloads last month: 18

Safetensors

Model size

0.1B params

Tensor type

BF16

Model tree for jrahn/gpt3_125M_edu_hermes

Quantizations

1 model

jrahn
/

gpt3_125M_edu_hermes