osyvokon/zno
Viewer • Updated • 3.81k • 154 • 2
How to use ponoma16/CodeKobzar13B with Transformers:
# Use a pipeline as a high-level helper
from transformers import pipeline
pipe = pipeline("text-generation", model="ponoma16/CodeKobzar13B") # Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("ponoma16/CodeKobzar13B")
model = AutoModelForCausalLM.from_pretrained("ponoma16/CodeKobzar13B")How to use ponoma16/CodeKobzar13B with vLLM:
# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "ponoma16/CodeKobzar13B"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
-H "Content-Type: application/json" \
--data '{
"model": "ponoma16/CodeKobzar13B",
"prompt": "Once upon a time,",
"max_tokens": 512,
"temperature": 0.5
}'docker model run hf.co/ponoma16/CodeKobzar13B
How to use ponoma16/CodeKobzar13B with SGLang:
# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
--model-path "ponoma16/CodeKobzar13B" \
--host 0.0.0.0 \
--port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
-H "Content-Type: application/json" \
--data '{
"model": "ponoma16/CodeKobzar13B",
"prompt": "Once upon a time,",
"max_tokens": 512,
"temperature": 0.5
}'docker run --gpus all \
--shm-size 32g \
-p 30000:30000 \
-v ~/.cache/huggingface:/root/.cache/huggingface \
--env "HF_TOKEN=<secret>" \
--ipc=host \
lmsysorg/sglang:latest \
python3 -m sglang.launch_server \
--model-path "ponoma16/CodeKobzar13B" \
--host 0.0.0.0 \
--port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
-H "Content-Type: application/json" \
--data '{
"model": "ponoma16/CodeKobzar13B",
"prompt": "Once upon a time,",
"max_tokens": 512,
"temperature": 0.5
}'How to use ponoma16/CodeKobzar13B with Docker Model Runner:
docker model run hf.co/ponoma16/CodeKobzar13B
# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("ponoma16/CodeKobzar13B")
model = AutoModelForCausalLM.from_pretrained("ponoma16/CodeKobzar13B")CodeKobzar13B is a generative model that was trained on Ukrainian Wikipedia data and Ukrainian language rules. It has knowledge of Ukrainian history, language, literature and culture.
This model is based on vicuna-13b-v1.5.
Use the following prompt template:
USER: {input} ASSISTANT:
We recommend using next configurations:
Temperature: 0.8
Top-p: 0.95
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
model_path="ponoma16/CodeKobzar13B"
tokenizer = AutoTokenizer.from_pretrained(model_path)
model = AutoModelForCausalLM.from_pretrained(
model_load_path,
low_cpu_mem_usage=True,
torch_dtype=torch.float16,
load_in_8bit=True,
device_map='auto',
)
model.eval()
prompt = "Яке місто в Україні називають найромантичнішим?"
PROMPT_TEMPLATE = """USER: {prompt} ASSISTANT: """
input_ids = tokenizer(
prompt,
return_tensors="pt",
truncation=True,
).input_ids.cuda()
outputs = model.generate(
input_ids=input_ids,
do_sample=True,
top_p=0.95,
max_new_tokens=150,
temperature=0.5,
)
prediction = tokenizer.batch_decode(outputs.cpu().numpy(), skip_special_tokens=True)[0]
print(prediction)
If you have any inquiries, please feel free to raise an issue or reach out to us via email at: mariiaponomarenko10@gmail.com, benjamin.ye@me.com. We're here to assist you!"
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="ponoma16/CodeKobzar13B")