BEE-spoke-data/bees-internal
Viewer • Updated • 4.08k • 22 • 7
How to use BEE-spoke-data/TinyLlama-3T-1.1bee with Transformers:
# Use a pipeline as a high-level helper
from transformers import pipeline
pipe = pipeline("text-generation", model="BEE-spoke-data/TinyLlama-3T-1.1bee") # Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("BEE-spoke-data/TinyLlama-3T-1.1bee")
model = AutoModelForCausalLM.from_pretrained("BEE-spoke-data/TinyLlama-3T-1.1bee")How to use BEE-spoke-data/TinyLlama-3T-1.1bee with vLLM:
# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "BEE-spoke-data/TinyLlama-3T-1.1bee"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
-H "Content-Type: application/json" \
--data '{
"model": "BEE-spoke-data/TinyLlama-3T-1.1bee",
"prompt": "Once upon a time,",
"max_tokens": 512,
"temperature": 0.5
}'docker model run hf.co/BEE-spoke-data/TinyLlama-3T-1.1bee
How to use BEE-spoke-data/TinyLlama-3T-1.1bee with SGLang:
# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
--model-path "BEE-spoke-data/TinyLlama-3T-1.1bee" \
--host 0.0.0.0 \
--port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
-H "Content-Type: application/json" \
--data '{
"model": "BEE-spoke-data/TinyLlama-3T-1.1bee",
"prompt": "Once upon a time,",
"max_tokens": 512,
"temperature": 0.5
}'docker run --gpus all \
--shm-size 32g \
-p 30000:30000 \
-v ~/.cache/huggingface:/root/.cache/huggingface \
--env "HF_TOKEN=<secret>" \
--ipc=host \
lmsysorg/sglang:latest \
python3 -m sglang.launch_server \
--model-path "BEE-spoke-data/TinyLlama-3T-1.1bee" \
--host 0.0.0.0 \
--port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
-H "Content-Type: application/json" \
--data '{
"model": "BEE-spoke-data/TinyLlama-3T-1.1bee",
"prompt": "Once upon a time,",
"max_tokens": 512,
"temperature": 0.5
}'How to use BEE-spoke-data/TinyLlama-3T-1.1bee with Docker Model Runner:
docker model run hf.co/BEE-spoke-data/TinyLlama-3T-1.1bee
A grand successor to the original. This one has the following improvements:
This model is a fine-tuned version of TinyLlama-1.1b-3T on the BEE-spoke-data/bees-internal dataset.
It achieves the following results on the evaluation set:
The following hyperparameters were used during training:
| Training Loss | Epoch | Step | Validation Loss | Accuracy |
|---|---|---|---|---|
| 2.4432 | 0.19 | 50 | 2.3850 | 0.5033 |
| 2.3655 | 0.39 | 100 | 2.3124 | 0.5129 |
| 2.374 | 0.58 | 150 | 2.2588 | 0.5215 |
| 2.3558 | 0.78 | 200 | 2.2132 | 0.5291 |
| 2.2677 | 0.97 | 250 | 2.1828 | 0.5348 |
| 2.0701 | 1.17 | 300 | 2.1788 | 0.5373 |
| 2.0766 | 1.36 | 350 | 2.1673 | 0.5398 |
| 2.0669 | 1.56 | 400 | 2.1651 | 0.5402 |
| 2.0314 | 1.75 | 450 | 2.1641 | 0.5406 |
| 2.0281 | 1.95 | 500 | 2.1639 | 0.5407 |
Detailed results can be found here
| Metric | Value |
|---|---|
| Avg. | 36.46 |
| AI2 Reasoning Challenge (25-Shot) | 33.79 |
| HellaSwag (10-Shot) | 60.29 |
| MMLU (5-Shot) | 25.86 |
| TruthfulQA (0-shot) | 38.13 |
| Winogrande (5-shot) | 60.22 |
| GSM8k (5-shot) | 0.45 |