Instructions to use RichardErkhov/CardinalOperations_-_ORLM-LLaMA-3-8B-gguf with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use RichardErkhov/CardinalOperations_-_ORLM-LLaMA-3-8B-gguf with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="RichardErkhov/CardinalOperations_-_ORLM-LLaMA-3-8B-gguf")

# Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("RichardErkhov/CardinalOperations_-_ORLM-LLaMA-3-8B-gguf", dtype="auto")

llama-cpp-python

How to use RichardErkhov/CardinalOperations_-_ORLM-LLaMA-3-8B-gguf with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="RichardErkhov/CardinalOperations_-_ORLM-LLaMA-3-8B-gguf",
	filename="ORLM-LLaMA-3-8B.IQ3_M.gguf",
)

output = llm(
	"Once upon a time,",
	max_tokens=512,
	echo=True
)
print(output)

Notebooks
Google Colab
Kaggle
Local Apps

llama.cpp

How to use RichardErkhov/CardinalOperations_-_ORLM-LLaMA-3-8B-gguf with llama.cpp:

Install from brew

brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf RichardErkhov/CardinalOperations_-_ORLM-LLaMA-3-8B-gguf:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf RichardErkhov/CardinalOperations_-_ORLM-LLaMA-3-8B-gguf:Q4_K_M

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf RichardErkhov/CardinalOperations_-_ORLM-LLaMA-3-8B-gguf:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf RichardErkhov/CardinalOperations_-_ORLM-LLaMA-3-8B-gguf:Q4_K_M

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf RichardErkhov/CardinalOperations_-_ORLM-LLaMA-3-8B-gguf:Q4_K_M
# Run inference directly in the terminal:
./llama-cli -hf RichardErkhov/CardinalOperations_-_ORLM-LLaMA-3-8B-gguf:Q4_K_M

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf RichardErkhov/CardinalOperations_-_ORLM-LLaMA-3-8B-gguf:Q4_K_M
# Run inference directly in the terminal:
./build/bin/llama-cli -hf RichardErkhov/CardinalOperations_-_ORLM-LLaMA-3-8B-gguf:Q4_K_M

Use Docker

docker model run hf.co/RichardErkhov/CardinalOperations_-_ORLM-LLaMA-3-8B-gguf:Q4_K_M

LM Studio
Jan

vLLM

How to use RichardErkhov/CardinalOperations_-_ORLM-LLaMA-3-8B-gguf with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "RichardErkhov/CardinalOperations_-_ORLM-LLaMA-3-8B-gguf"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "RichardErkhov/CardinalOperations_-_ORLM-LLaMA-3-8B-gguf",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker

docker model run hf.co/RichardErkhov/CardinalOperations_-_ORLM-LLaMA-3-8B-gguf:Q4_K_M

SGLang

How to use RichardErkhov/CardinalOperations_-_ORLM-LLaMA-3-8B-gguf with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "RichardErkhov/CardinalOperations_-_ORLM-LLaMA-3-8B-gguf" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "RichardErkhov/CardinalOperations_-_ORLM-LLaMA-3-8B-gguf",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "RichardErkhov/CardinalOperations_-_ORLM-LLaMA-3-8B-gguf" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "RichardErkhov/CardinalOperations_-_ORLM-LLaMA-3-8B-gguf",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Ollama
How to use RichardErkhov/CardinalOperations_-_ORLM-LLaMA-3-8B-gguf with Ollama:
```
ollama run hf.co/RichardErkhov/CardinalOperations_-_ORLM-LLaMA-3-8B-gguf:Q4_K_M
```

Unsloth Studio new

How to use RichardErkhov/CardinalOperations_-_ORLM-LLaMA-3-8B-gguf with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for RichardErkhov/CardinalOperations_-_ORLM-LLaMA-3-8B-gguf to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for RichardErkhov/CardinalOperations_-_ORLM-LLaMA-3-8B-gguf to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for RichardErkhov/CardinalOperations_-_ORLM-LLaMA-3-8B-gguf to start chatting

Docker Model Runner
How to use RichardErkhov/CardinalOperations_-_ORLM-LLaMA-3-8B-gguf with Docker Model Runner:
```
docker model run hf.co/RichardErkhov/CardinalOperations_-_ORLM-LLaMA-3-8B-gguf:Q4_K_M
```

Lemonade

How to use RichardErkhov/CardinalOperations_-_ORLM-LLaMA-3-8B-gguf with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull RichardErkhov/CardinalOperations_-_ORLM-LLaMA-3-8B-gguf:Q4_K_M

Run and chat with the model

lemonade run user.CardinalOperations_-_ORLM-LLaMA-3-8B-gguf-Q4_K_M

List all available models

lemonade list

ORLM-LLaMA-3-8B: A Large Language Model for Optimization Modeling

This repository contains the quantized model weights for ORLM-LLaMA-3-8B, a fine-tuned version of Meta's Llama 3 8B model, specifically trained for generating optimization models and corresponding Python code. It's presented in the paper ORLM: A Customizable Framework in Training Large Models for Automated Optimization Modeling. This model significantly improves the capability of automating optimization modeling and solving, achieving state-of-the-art performance on several benchmarks.

The official repo can be found here: https://github.com/Cardinal-Operations/ORLM

Quantization made by Richard Erkhov.

Github

Discord

Request more models

ORLM-LLaMA-3-8B - GGUF

Model creator: https://huggingface.co/CardinalOperations/
Original model: https://huggingface.co/CardinalOperations/ORLM-LLaMA-3-8B/

Name	Quant method	Size
ORLM-LLaMA-3-8B.Q2_K.gguf	Q2_K	2.96GB
ORLM-LLaMA-3-8B.IQ3_XS.gguf	IQ3_XS	3.28GB
ORLM-LLaMA-3-8B.IQ3_S.gguf	IQ3_S	3.43GB
ORLM-LLaMA-3-8B.Q3_K_S.gguf	Q3_K_S	3.41GB
ORLM-LLaMA-3-8B.IQ3_M.gguf	IQ3_M	3.52GB
ORLM-LLaMA-3-8B.Q3_K.gguf	Q3_K	3.74GB
ORLM-LLaMA-3-8B.Q3_K_M.gguf	Q3_K_M	3.74GB
ORLM-LLaMA-3-8B.Q3_K_L.gguf	Q3_K_L	4.03GB
ORLM-LLaMA-3-8B.IQ4_XS.gguf	IQ4_XS	4.18GB
ORLM-LLaMA-3-8B.Q4_0.gguf	Q4_0	4.34GB
ORLM-LLaMA-3-8B.IQ4_NL.gguf	IQ4_NL	4.38GB
ORLM-LLaMA-3-8B.Q4_K_S.gguf	Q4_K_S	4.37GB
ORLM-LLaMA-3-8B.Q4_K.gguf	Q4_K	4.58GB
ORLM-LLaMA-3-8B.Q4_K_M.gguf	Q4_K_M	4.58GB
ORLM-LLaMA-3-8B.Q4_1.gguf	Q4_1	4.78GB
ORLM-LLaMA-3-8B.Q5_0.gguf	Q5_0	5.21GB
ORLM-LLaMA-3-8B.Q5_K_S.gguf	Q5_K_S	5.21GB
ORLM-LLaMA-3-8B.Q5_K.gguf	Q5_K	5.34GB
ORLM-LLaMA-3-8B.Q5_K_M.gguf	Q5_K_M	5.34GB
ORLM-LLaMA-3-8B.Q5_1.gguf	Q5_1	5.65GB
ORLM-LLaMA-3-8B.Q6_K.gguf	Q6_K	6.14GB
ORLM-LLaMA-3-8B.Q8_0.gguf	Q8_0	7.95GB

Original model description:

license: llama3

see our paper in https://arxiv.org/abs/2405.17743

github repo: https://github.com/Cardinal-Operations/ORLM

Model Details

LLaMA-3-8B-ORLM is fully fine-tuned on the OR-Instruct data and built with Meta LLaMA-3-8B model. More training details can be seen at https://arxiv.org/abs/2405.17743

Model Usage

Prompting Template:

Below is an operations research question. Build a mathematical model and corresponding python code using `coptpy` that appropriately addresses the question.

# Question:
{Question}

# Response:

Please replace the {Question} with any natural language OR question.

Examples

--------------------------OR question wrapped in the template--------------------------
Below is an operations research question. Build a mathematical model and corresponding python code using `coptpy` that appropriately addresses the question.

# Question:
A lab has 1000 units of medicinal ingredients to make two pills, a large pill and a small pill. A large pill requires 3 units of medicinal ingredients and 2 units of filler. A small pill requires 2 units of medicinal ingredients and 1 unit of filler. The lab has to make at least 100 large pills. However, since small pills are more popular at least 60% of the total number of pills must be small. How many of each should be made to minimize the total number of filler material needed?

# Response:
--------------------------top-1 completion output by LLaMA-3-8B-ORLM under greedy decoding--------------------------
## Mathematical Model:
To solve the lab's pill production problem, we need to construct a linear programming model to minimize the total number of filler material needed while satisfying the production requirements for the pills and the proportion constraint. Here is the model formulation:

### Decision Variables:
- \(x\): Number of large pills to be made.
- \(y\): Number of small pills to be made.

### Objective Function:
- Minimize the total number of filler material needed: \[Z = 2x + y\]

### Constraints:
1. Medicinal ingredient usage constraint: \[3x + 2y \leq 1000\]
2. Large pill production requirement: \[x \geq 100\]
3. Proportion constraint for small pills: \[y \geq 0.6(x + y)\]
4. Non-negativity constraint: \[x, y \geq 0\]

### Integer Constraints:
- \(x, y\) must be integers, as the number of pills cannot be fractional.

Using this model, we can apply linear programming techniques to find the optimal number of large and small pills to be made, minimizing the total number of filler material needed while satisfying the production requirements for the pills and the proportion constraint. This will provide the lab with an economically efficient pill production strategy.

## Python Code Solution Using `coptpy`:
Here is a Python script using the `coptpy` library to solve the problem:

\`\`\`python
import coptpy as cp
from coptpy import COPT

# Create a COPT environment
env = cp.Envr()

# Create the model
model = env.createModel("PillProduction")

# Add decision variables
x = model.addVar(vtype=COPT.INTEGER, name="x")  # Number of large pills
y = model.addVar(vtype=COPT.INTEGER, name="y")  # Number of small pills

# Objective function: Minimize the total number of filler material needed
model.setObjective(2 * x + y, sense=COPT.MINIMIZE)

# Constraints
model.addConstr(3 * x + 2 * y <= 1000, name="MaterialLimit")
model.addConstr(x >= 100, name="LargePillRequirement")
model.addConstr(y >= 0.6 * (x + y), name="SmallPillProportion")

# Solve the model
model.solve()

# Output results
if model.status == COPT.OPTIMAL:
    print("Minimum total number of filler material needed: {:.0f} units".format(model.objval))
    print("Number of large pills to be made: {:.0f}".format(x.x))
    print("Number of small pills to be made: {:.0f}".format(y.x))
else:
    print("No optimal solution found.")
\`\`\`

In this script, we first create a `COPT` environment and model. Then, we add two integer decision variables `x` and `y`, representing the number of large and small pills to be made, respectively.

Next, we set the objective function to minimize the total number of filler material needed, which is `2 * x + y`.

Then, we add three constraints:
1. Medicinal ingredient usage constraint: `3 * x + 2 * y <= 1000`
2. Large pill production requirement: `x >= 100`
3. Proportion constraint for small pills: `y >= 0.6 * (x + y)`

Finally, we solve the model and output the results of the optimal solution, including the minimum total number of filler material needed and the number of large and small pills to be made.

This script provides a complete example of using the `coptpy` library to solve the lab's pill production problem, while satisfying all the constraints mentioned in the problem.

Performances

Below is the comparison of performance on the NL4OPT, MAMO, and IndustryOR benchmarks. Values marked with a ^* are directly copied from original papers, with blanks where data were not reported. The highest results are highlighted in bold.

Method	NL4OPT	MAMO EasyLP	MAMO ComplexLP	IndustryOR	Micro Avg	Macro Avg
Methods based on PLMs
`tag-BART`	47.9%^*	-	-	-	-	-
Methods based on GPT-3.5
`Standard`	42.4%^*	-	-	-	-	-
`Reflexion`	50.7%^*	-	-	-	-	-
`Chain-of-Experts`	58.9%^*	-	-	-	-	-
Methods based on GPT-4
`Standard`	47.3%^*	66.5%^*	14.6%^*	28.0%	50.2%	39.1%
`Reflexion`	53.0%^*	-	-	-	-	-
`Chain-of-Experts`	64.2%^*	-	-	-	-	-
`OptiMUS`	78.8%^*	-	-	-	-	-
ORLMs based on open-source LLMs
`ORLM-Mistral-7B`	84.4%	81.4%	32.0%	27.0%	68.8%	56.2%
`ORLM-Deepseek-Math-7B-Base`	86.5%	82.2%	37.9%	33.0%	71.2%	59.9%
`ORLM-LLaMA-3-8B`	85.7%	82.3%	37.4%	38.0%	71.4%	60.8%

Citation

@article{tang2024orlm,
  title={ORLM: A Customizable Framework in Training Large Models for Automated Optimization Modeling},
  author={Tang, Zhengyang and Huang, Chenyu and Zheng, Xin and Hu, Shixi and Wang, Zizhuo and Ge, Dongdong and Wang, Benyou},
  journal={arXiv preprint arXiv:2405.17743},
  year={2024}
}

@article{llama3modelcard,
  title={Llama 3 Model Card},
  author={AI@Meta},
  year={2024},
  url = {https://github.com/meta-llama/llama3/blob/main/MODEL_CARD.md}
}

License

The use of this model is governed by the META LLAMA 3 COMMUNITY LICENSE AGREEMENT.

Downloads last month: 1,087

GGUF

Model size

8B params

Architecture

llama

Hardware compatibility

2-bit

3-bit

4-bit

5-bit

6-bit

8-bit

Paper for RichardErkhov/CardinalOperations_-_ORLM-LLaMA-3-8B-gguf

ORLM: Training Large Language Models for Optimization Modeling

Paper • 2405.17743 • Published May 28, 2024 • 3