--- datasets: - GetSoloTech/FoodStack language: - en base_model: - lerobot/smolvla_base library_name: transformers tags: - Robotics - Lerobot - Food - PickPlace - VLA - SmolVLA - PhysicalAI --- ### SmolVLA Fine-Tuned on for Food Stacking **Summary**: This is a fine-tuned version of `lerobot/smolvla_base` for stacking food objects (e.g., burgers, sandwiches). It was fine-tuned on the `GetSoloTech/FoodStack` dataset using the LeRobot framework. ### Model details - **Base model**: `lerobot/smolvla_base` - **Task**: Vision-Language-Action control for manipulation (stacking) - **Domain**: Food item stacking (burger, sandwich, etc.) - **Params**: ~450M (SmolVLA) - **Library**: LeRobot (`lerobot`) ### Quick start Install LeRobot with SmolVLA extras: ```bash git clone https://github.com/huggingface/lerobot.git cd lerobot pip install -e ".[smolvla]" ``` Load the policy from this repo and run inference: ```python from lerobot.common.policies.smolvla.modeling_smolvla import SmolVLAPolicy # Replace with your actual model ID on the Hub model_id = "GetSoloTech/SmolVLA-FoodStack" policy = SmolVLAPolicy.from_pretrained(model_id) # Example placeholders for observation and instruction observation = { "image": ... , # BGR/RGB frame or processed observation per your setup "state": ... , # optional proprio/scene state if used } instruction = "Stack the burger: bun, patty, cheese, lettuce, bun." # Depending on your pipeline, you may wrap this in your control loop actions = policy(observation, instruction) # Send actions to your robot controller # send_actions_to_robot(actions) ``` For end-to-end examples (policy loops, camera/robot IO), see the LeRobot docs and examples. Notes: - Tune batch size/steps and augmentation to your hardware and dataset split. - Ensure your observation preprocessing at train-time matches inference. ### Limitations - Specializes in food stacking; may not generalize to unseen objects/layouts. - Sensitive to perception domain shift (lighting, textures, camera intrinsics). - Requires correct observation normalization consistent with training. ### Dataset - **Training data**: `GetSoloTech/FoodStack` ### Resources and references - SmolVLA base: `https://huggingface.co/lerobot/smolvla_base` - SmolVLA overview: `https://smolvla.net/index_en.html` - LeRobot: `https://github.com/huggingface/lerobot`