--- title: ChinaTravel emoji: 🐢 colorFrom: blue colorTo: gray sdk: gradio sdk_version: 5.34.0 app_file: app.py pinned: false license: cc-by-nc-sa-4.0 ---

ChinaTravel: An Open-Ended Benchmark for Language Agents in Chinese Travel Planning

The offical codebase for our NeurIPS'25 (Datasets and Benchmarks Track) submission "ChinaTravel: An Open-Ended Benchmark for Language Agents in Chinese Travel Planning". [Dataset (Huggingface)](https://huggingface.co/datasets/LAMDA-NeSy/chinatravel_neurips25submission) ## Quick Start ### Setup 1. Create a conda environment and install dependencies: ```bash conda create -n chinatravel python=3.9 conda activate chinatravel pip install -r requirements.txt ``` 2. Download the database and unzip it to the `chinatravel/environment/` directory (Download Links: [Google Drive](https://drive.google.com/drive/folders/1bJ7jA5cfExO_NKxKfi9qgcxEbkYeSdAU?usp=drive_link), [NJU Drive](https://box.nju.edu.cn/d/dd83e5a4a9e242ed8eb4/)). 3. Download necessary models or tokenizers (e.g. [deepseek tokenizer](https://cdn.deepseek.com/api-docs/deepseek_v3_tokenizer.zip)) to `./chinatravel/local_llm` (You need to create the folder first) ### Running We support the deepseek (offical API from deepseek), gpt-4o (chatgpt-4o-latest), glm4-plus, and local inferences with Qwen, Mistral, Llama. ```bash export OPENAI_API_KEY="" # Act ReAct0 ReAct agent python run_exp.py --splits easy --agent Act --llm gpt-4o # Replace "Act" with "ReAct0" or "ReAct" for other pure neural agents # LLM-modulo agent with 10 refine_steps python run_exp.py --splits medium --agent LLM-modulo --llm gpt-4o --refine_steps 10 # LLMNesy agent with oracle translation python run_exp.py --splits human --agent LLMNeSy --llm deepseek --oracle_translation # LLMNesy agent python run_exp.py --splits human1000 --agent LLMNeSy --llm deepseek ``` Note: 1. Please download the weights of the open-source model to `./chinatravel/open_source_llm` and modify the corresponding model path in `./chinatravel/agent/llms.py` (This step is only necessary when using a locally deployed open-source model.). 2. We implemented the following agents: 1. `Act`: zero-shot Act agent 2. `ReAct0`: zero-shot ReAct agent 3. `ReAct`: one-shot ReAct agent 4. `LLM-modulo`: LLM-modulo agent 5. `LLMNesy`: Neuro-Symbolic agent 3. We retain the DSL annotations of "Human1000" as private information to prevent performance fraud or unfair comparisons. Researchers are encouraged to submit their results to us for evaluation on Human-1000. 4. If you want to skip the completed queries, please add the parameter `--skip 1` ### Evaluation ```bash python eval_exp.py --splits human --method LLMNeSy_deepseek_oracletranslation python eval_exp.py --splits human --method LLMNeSy_deepseek ``` ## Docs [Environment](chinatravel/environment/readme.md) [Constraints](chinatravel/symbol_verification/readme.md)