Instructions to use Trelis/Llama-2-7b-chat-hf-function-calling-v3 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use Trelis/Llama-2-7b-chat-hf-function-calling-v3 with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="Trelis/Llama-2-7b-chat-hf-function-calling-v3")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("Trelis/Llama-2-7b-chat-hf-function-calling-v3")
model = AutoModelForCausalLM.from_pretrained("Trelis/Llama-2-7b-chat-hf-function-calling-v3")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Inference
Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use Trelis/Llama-2-7b-chat-hf-function-calling-v3 with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "Trelis/Llama-2-7b-chat-hf-function-calling-v3"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Trelis/Llama-2-7b-chat-hf-function-calling-v3",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/Trelis/Llama-2-7b-chat-hf-function-calling-v3

SGLang

How to use Trelis/Llama-2-7b-chat-hf-function-calling-v3 with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "Trelis/Llama-2-7b-chat-hf-function-calling-v3" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Trelis/Llama-2-7b-chat-hf-function-calling-v3",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "Trelis/Llama-2-7b-chat-hf-function-calling-v3" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Trelis/Llama-2-7b-chat-hf-function-calling-v3",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use Trelis/Llama-2-7b-chat-hf-function-calling-v3 with Docker Model Runner:
```
docker model run hf.co/Trelis/Llama-2-7b-chat-hf-function-calling-v3
```

RonanMcGovern commited on Dec 4, 2023

Commit

be631f4

1 Parent(s): 50dfeeb

Update README.md

Browse files

Files changed (1) hide show

README.md +85 -10

README.md CHANGED Viewed

@@ -1,14 +1,4 @@
 ---
-extra_gated_heading: Access Llama 2 on Hugging Face
-extra_gated_description: >-
-  This is a form to enable access to Llama 2 on Hugging Face after you have been
-  granted access from Meta. Please visit the [Meta website](https://ai.meta.com/resources/models-and-libraries/llama-downloads) and accept our
-  license terms and acceptable use policy before submitting this form. Requests
-  will be processed in 1-2 days.
-extra_gated_prompt: "**Your Hugging Face account email address MUST match the email you provide on the Meta website, or your request will not be approved.**"
-extra_gated_button_content: Submit
-extra_gated_fields:
-  I agree to share my name, email address and username with Meta and confirm that I have already been granted download access on the Meta website: checkbox
 language:
 - en
 pipeline_tag: text-generation
@@ -21,6 +11,91 @@ tags:
 - llama
 - llama-2
 ---
 # **Llama 2**
 Llama 2 is a collection of pretrained and fine-tuned generative text models ranging in scale from 7 billion to 70 billion parameters. This is the repository for the 7B fine-tuned model, optimized for dialogue use cases and converted for the Hugging Face Transformers format. Links to other models can be found in the index at the bottom.

 ---
 language:
 - en
 pipeline_tag: text-generation
 - llama
 - llama-2
 ---
+# Function Calling Fine-tuned Llama 2 Chat
+The model is suitable for commercial use and is licensed with the Llama 2 Community license.
+Check out other fine-tuned function calling models [here](https://trelis.com/function-calling/).
+## Inference Scripts
+Out-of-the-box inference scripts are available for purchase [here](https://trelis.com/enterprise-server-api-and-inference-guide/).
+## Prompt Format
+```
+B_FUNC, E_FUNC = "You have access to the following functions. Use them if required:\n\n", "\n\n"
+B_INST, E_INST = "[INST] ", " [/INST]" #Llama style
+prompt = f"{E_INST}{B_FUNC}{functionList.strip()}{E_FUNC}{B_INST}{user_prompt.strip()}{E_INST}\n\n"
+```
+## Sample Prompt and Response:
+```
+[INST] You have access to the following functions. Use them if required:
+[
+    {
+        "type": "function",
+        "function": {
+            "name": "get_big_stocks",
+            "description": "Get the names of the largest N stocks by market cap",
+            "parameters": {
+                "type": "object",
+                "properties": {
+                    "number": {
+                        "type": "integer",
+                        "description": "The number of largest stocks to get the names of, e.g. 25"
+                    },
+                    "region": {
+                        "type": "string",
+                        "description": "The region to consider, can be \"US\" or \"World\"."
+                    }
+                },
+                "required": [
+                    "number"
+                ]
+            }
+        }
+    },
+    {
+        "type": "function",
+        "function": {
+            "name": "get_stock_price",
+            "description": "Get the stock price of an array of stocks",
+            "parameters": {
+                "type": "object",
+                "properties": {
+                    "names": {
+                        "type": "array",
+                        "items": {
+                            "type": "string"
+                        },
+                        "description": "An array of stocks"
+                    }
+                },
+                "required": [
+                    "names"
+                ]
+            }
+        }
+    }
+]
+[INST] Get the names of the five largest stocks in the US by market cap [/INST]
+{
+    "name": "get_big_stocks",
+    "arguments": {
+        "number": 5,
+        "region": "US"
+    }
+}</s>
+```
+# Dataset
+See [Trelis/function_calling_v3](https://huggingface.co/datasets/Trelis/function_calling_v3).
+~~~
+The original repo card follows below.
+~~~
 # **Llama 2**
 Llama 2 is a collection of pretrained and fine-tuned generative text models ranging in scale from 7 billion to 70 billion parameters. This is the repository for the 7B fine-tuned model, optimized for dialogue use cases and converted for the Hugging Face Transformers format. Links to other models can be found in the index at the bottom.