Text Generation
Transformers
Safetensors
llama
research
code
mathematics
reasoning
multilingual
long-context
custom_code
text-generation-inference
Instructions to use DeepXR/Helion-V2.5-Rnd with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use DeepXR/Helion-V2.5-Rnd with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="DeepXR/Helion-V2.5-Rnd", trust_remote_code=True)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("DeepXR/Helion-V2.5-Rnd", trust_remote_code=True) model = AutoModelForCausalLM.from_pretrained("DeepXR/Helion-V2.5-Rnd", trust_remote_code=True) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use DeepXR/Helion-V2.5-Rnd with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "DeepXR/Helion-V2.5-Rnd" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "DeepXR/Helion-V2.5-Rnd", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/DeepXR/Helion-V2.5-Rnd
- SGLang
How to use DeepXR/Helion-V2.5-Rnd with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "DeepXR/Helion-V2.5-Rnd" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "DeepXR/Helion-V2.5-Rnd", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "DeepXR/Helion-V2.5-Rnd" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "DeepXR/Helion-V2.5-Rnd", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use DeepXR/Helion-V2.5-Rnd with Docker Model Runner:
docker model run hf.co/DeepXR/Helion-V2.5-Rnd
Update model_config.yaml (#6)
Browse files- Update model_config.yaml (8551ec1adc52c649623c4c23f0388237b5e84a5e)
Co-authored-by: Alex Gall <AlexGall@users.noreply.huggingface.co>
- model_config.yaml +2 -16
model_config.yaml
CHANGED
|
@@ -33,7 +33,6 @@ model:
|
|
| 33 |
attention_bias: false
|
| 34 |
attention_dropout: 0.0
|
| 35 |
mlp_bias: false
|
| 36 |
-
|
| 37 |
tokenizer:
|
| 38 |
type: "sentencepiece"
|
| 39 |
model_max_length: 131072
|
|
@@ -42,15 +41,6 @@ model:
|
|
| 42 |
chat_template: "{% for message in messages %}{{ '<|im_start|>' + message['role'] + '\n' + message['content'] + '<|im_end|>\n' }}{% endfor %}{{ '<|im_start|>assistant\n' }}"
|
| 43 |
|
| 44 |
training:
|
| 45 |
-
base_model: "meta-llama/Meta-Llama-3.1-70B"
|
| 46 |
-
training_data:
|
| 47 |
-
- "scientific_papers"
|
| 48 |
-
- "code_repositories"
|
| 49 |
-
- "mathematical_proofs"
|
| 50 |
-
- "conversational_data"
|
| 51 |
-
- "multilingual_corpus"
|
| 52 |
-
- "technical_documentation"
|
| 53 |
-
total_tokens: "2.5T"
|
| 54 |
training_steps: 150000
|
| 55 |
warmup_steps: 2000
|
| 56 |
learning_rate: 2.0e-5
|
|
@@ -70,14 +60,10 @@ model:
|
|
| 70 |
|
| 71 |
quantization:
|
| 72 |
bits: 16
|
|
|
|
| 73 |
supported_formats:
|
| 74 |
- "fp16"
|
| 75 |
-
|
| 76 |
-
- "int8"
|
| 77 |
-
- "int4"
|
| 78 |
-
- "awq"
|
| 79 |
-
- "gptq"
|
| 80 |
-
- "gguf"
|
| 81 |
|
| 82 |
inference:
|
| 83 |
default_parameters:
|
|
|
|
| 33 |
attention_bias: false
|
| 34 |
attention_dropout: 0.0
|
| 35 |
mlp_bias: false
|
|
|
|
| 36 |
tokenizer:
|
| 37 |
type: "sentencepiece"
|
| 38 |
model_max_length: 131072
|
|
|
|
| 41 |
chat_template: "{% for message in messages %}{{ '<|im_start|>' + message['role'] + '\n' + message['content'] + '<|im_end|>\n' }}{% endfor %}{{ '<|im_start|>assistant\n' }}"
|
| 42 |
|
| 43 |
training:
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 44 |
training_steps: 150000
|
| 45 |
warmup_steps: 2000
|
| 46 |
learning_rate: 2.0e-5
|
|
|
|
| 60 |
|
| 61 |
quantization:
|
| 62 |
bits: 16
|
| 63 |
+
precision: "float16"
|
| 64 |
supported_formats:
|
| 65 |
- "fp16"
|
| 66 |
+
note: "Model is provided in full FP16 precision without quantization"
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 67 |
|
| 68 |
inference:
|
| 69 |
default_parameters:
|