Instructions to use digitous/Alpacino30b with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use digitous/Alpacino30b with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="digitous/Alpacino30b")# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("digitous/Alpacino30b") model = AutoModelForCausalLM.from_pretrained("digitous/Alpacino30b") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use digitous/Alpacino30b with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "digitous/Alpacino30b" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "digitous/Alpacino30b", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/digitous/Alpacino30b
- SGLang
How to use digitous/Alpacino30b with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "digitous/Alpacino30b" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "digitous/Alpacino30b", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "digitous/Alpacino30b" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "digitous/Alpacino30b", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use digitous/Alpacino30b with Docker Model Runner:
docker model run hf.co/digitous/Alpacino30b
How HHH?
The Alpaca dataset contains instructions that result in models behaving "As an AI" and sometimes refusing certain instruction. I'm wondering if, or how much, this negatively affects Alpachino's generation capabilities. Have you noticed anything in this regard?
I've also heard that 30b models can sometimes produce lower quality work than smaller models like 13b when it comes to narrative over time, also just curious if you've experienced anything like that.
Thanks for doing all the hard work to make this possible!!
No problem, I enjoy the process! So -- I had my initial reservations of using Alpaca for this model, however, ChanSung's LoRA does not seem to exhibit this "as an AI model" behavior despite the very well done training on its dataset. I'm not completely sure why. Even raw Alpaca30b using ChanSung's LoRA seems to not care; so I decided to use that as the foundation of the instruct module of the model merge (I think about 50% a model with his LoRA applied and Story+COT). I am honestly surprised what it allows. That being said you raise another valid concern that larger param llama models can produce lower quality work; as counter intuitive as it sounds, I have hit roadblocks with some 30b models and them painting themselves in a narrative corner, over reliant on nearly repeating previous information instead of applying it and running with generating new information. For this I adjust the repetition penalty and raise the temperature of the model inference a bit to circumvent the eventual repetition. It usually works, not perfect, but overall the larger size does seem to add a better understanding of how things work and how to run a narrative.
I also find when using instructions, it can help to put in persistent memory or context "tell the story as if you are a/an X" as in, if it's a horror story "as if you are a masterful horror writer and a fanatic for horror fiction" or if it's something spicy "write it as if you are into X" etc. I find that being creative with instruction prompting and balancing keeping the instructions just loose enough not to corner the model into repetitive behavior is the best.
Brilliant, thank you for the response - I think there's not enough information about how lower parameter count models compare to higher. I'm content with your answer, closing!