Instructions to use qizekun/ShapeLLM_13B_general_v1.0 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use qizekun/ShapeLLM_13B_general_v1.0 with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="qizekun/ShapeLLM_13B_general_v1.0")

# Load model directly
from transformers import AutoProcessor, AutoModelForCausalLM

processor = AutoProcessor.from_pretrained("qizekun/ShapeLLM_13B_general_v1.0")
model = AutoModelForCausalLM.from_pretrained("qizekun/ShapeLLM_13B_general_v1.0")

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use qizekun/ShapeLLM_13B_general_v1.0 with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "qizekun/ShapeLLM_13B_general_v1.0"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "qizekun/ShapeLLM_13B_general_v1.0",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker

docker model run hf.co/qizekun/ShapeLLM_13B_general_v1.0

SGLang

How to use qizekun/ShapeLLM_13B_general_v1.0 with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "qizekun/ShapeLLM_13B_general_v1.0" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "qizekun/ShapeLLM_13B_general_v1.0",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "qizekun/ShapeLLM_13B_general_v1.0" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "qizekun/ShapeLLM_13B_general_v1.0",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Docker Model Runner
How to use qizekun/ShapeLLM_13B_general_v1.0 with Docker Model Runner:
```
docker model run hf.co/qizekun/ShapeLLM_13B_general_v1.0
```

ShapeLLM model

This repository contains the ShapeLLM-13B model presented in ShapeLLM: Universal 3D Object Understanding for Embodied Interaction.

Install

Clone this repository and navigate to ShapeLLM folder

git clone https://github.com/qizekun/ShapeLLM.git
cd ShapeLLM

Install Package

conda create -n shapellm python=3.10 -y
conda activate shapellm
pip install --upgrade pip  # enable PEP 660 support
pip install -e .

Install additional packages for training cases

pip install -e ".[train]"
pip install flash-attn --no-build-isolation

Install PointNet++

pip install "git+https://github.com/erikwijmans/Pointnet2_PyTorch.git#egg=pointnet2_ops&subdirectory=pointnet2_ops_lib"

ShapeLLM

model weights

Please check out our Model Zoo for all public ShapeLLM checkpoints.

Demo

CLI Inference

Chat about point clouds using CLI interface. It also supports multiple GPUs, 4-bit and 8-bit quantized inference.

python -m llava.serve.cli \
    --model-path qizekun/ShapeLLM_13B_general_v1.0 \
    --pts-file assets/instrument.npy

Training

Consistent with LLaVA, we adopt a two-stage training approach. In the first stage, we solely fine-tune the projector for semantic alignment. In the second stage, we conduct full fine-tuning using Instruction Following data. Download data following DATA, organize the data as follows in ./playground/data/shapellm/,

│playground/data/shapellm/
├── cap3d_objaverse_785k.json
├── cap3d_objaverse_sft_45k.json
├── gapartnet_sft_27k_openai.json
├── gapartnet_pcs
│   ├── Box_100129_0_0.npy
│   └── ...
└── cap3d_pcs
    ├── 00000054c36d44a2a483bdbff31d8edf.pt
    └── ...

Furthermore, ShapeLLM utilizes the Large version of ReCon++ as the point encoder. You need to download the ReCon++ weight and save it to ./checkpoints/recon/large.pth.

│checkpoints/recon/
└── large.pth

1. Feature Alignment Stage

sh scripts/pretrain.sh

2. Visual Instruction Tuning Stage

sh scripts/finetune.sh

The training takes around 14 hours for ShapeLLM-13B on 8x A100 (80G). It takes around 7 hours for ShapeLLM-7B.

Zero-shot Understanding on 3D MM-Vet

Evaluate 3D MLLMs for integrated capabilities and embodied interaction capabilities, run the script:

sh scripts/eval/mmvet.sh

Using GPT4 to calulate the 3D MM-Vet score:

sh scripts/eval/eval_mmvet.sh

Visual Grounding on GApartNet

Evaluate the performance of ShapeLLM on the GApartNet dataset, run the script:

sh scripts/eval/gapartnet_ref.sh

Calucate the generative 3D visual grounding accuracy:

sh scripts/eval/eval_gapartnet.sh

Downloads last month: 17

Dataset used to train qizekun/ShapeLLM_13B_general_v1.0

Collection including qizekun/ShapeLLM_13B_general_v1.0

ShapeLLM

Collection

Model collections of ECCV 2024 paper: "ShapeLLM: Universal 3D Object Understanding for Embodied Interaction". • 8 items • Updated Jul 16, 2024 • 5

Paper for qizekun/ShapeLLM_13B_general_v1.0

ShapeLLM: Universal 3D Object Understanding for Embodied Interaction

Paper • 2402.17766 • Published Feb 27, 2024