Large Language Model

[AAAI 2026] Reasoning with Exploration: An Entropy Perspective

In this work, we revisit entropy–a signal of exploration in RL–and examine its relationship to exploratory reasoning in LMs.

Daixuan Cheng, Shaohan Huang, Xuekai Zhu, Bo Dai, Xin Zhao, Zhenliang Zhang, Furu Wei

[AAAI 2026] Reasoning with Exploration: An Entropy Perspective

[NeurIPS 2025 Workshop] ValuePilot: A Two-Phase Framework for Value-Driven Decision-Making

We propose ValuePilot, a two-phase value-driven decision-making framework comprising a dataset generation toolkit DGT and a decision-making module DMM trained on the generated data.

Yitong Luo, Hou Hei Lam, Ziang Chen, Zhenliang Zhang, Xue Feng

[NeurIPS 2025 Workshop] ValuePilot: A Two-Phase Framework for Value-Driven Decision-Making

[EMNLP 2025] On Domain-Adaptive Post-Training for Multimodal Large Language Models

This paper systematically investigates domain adaptation of MLLMs via post-training, focusing on data synthesis, training pipeline, and task evaluation.

Daixuan Cheng, Shaohan Huang, Ziyu Zhu, Xintong Zhang, Wayne Xin Zhao, Zhongzhi Luan, Bo Dai, Zhenliang Zhang

[EMNLP 2025] On Domain-Adaptive Post-Training for Multimodal Large Language Models