Hand-in-the-Loop: Improving Dexterous VLA via Seamless Interventional Correction

Zhuohang Li 1,2,3,* ,
Liqun Huang 3 ,
Wei Xu 3 ,
Zhengming Zhu 3 ,
Nie Lin 3,4,* ,
Xiao Ma 3 ,
Xinjun Sheng 1,2,† ,
Ruoshi Wen 3,†

1State Key Laboratory of Mechanical System and Vibration, School of Mechanical Engineering, Shanghai Jiao Tong University

2Shanghai Key Laboratory of Intelligent Robotics, Meta Robotics Institute, Shanghai Jiao Tong University, Shanghai 200240, China

3ByteDance Seed

4The University of Tokyo

*Work done at ByteDance Seed, Corresponding authors

Abstract

Vision-Language-Action (VLA) models are prone to compounding errors in dexterous manipulation, where high-dimensional action spaces and contact-rich dynamics amplify small policy deviations over long horizons. While Interactive Imitation Learning (IIL) can refine policies through human takeover data, applying it to high-degree-of-freedom (DoF) robotic hands remains challenging due to a command mismatch between human teleoperation and policy execution at the takeover moment, which causes abrupt robot-hand configuration changes, or “gesture jumps”. We present Hand-in-the-Loop (HandITL), a seamless human-in-the-loop intervention method that blends human corrective intent with autonomous policy execution to avoid gesture jumps during bimanual dexterous manipulation. Compared with direct teleoperation takeover, HandITL reduces takeover jitter by 99.8% and preserves robust post-takeover manipulation, reducing grasp failures by 87.5% and mean completion time by 19.1%. We validate HandITL on tasks requiring bimanual coordination, tool use, and fine-grained long-horizon manipulation. When used to collect intervention data for policy refinement, HandITL yields policies that outperform those trained with standard teleoperation data by 19% on average across three long-horizon dexterous tasks.

hand
Overview of HandITL.

Method

Seamless Human Intervention

HandITL enables operators to correct a running VLA policy without fully interrupting autonomous execution. At each control step, the executed command is obtained by blending the policy action with the human corrective action:

atexec=αatπ+βath\mathbf{a}_{t}^{exec} = \alpha \mathbf{a}_{t}^{\pi} + \beta \mathbf{a}_{t}^{h}

where α\alpha and β\beta control the relative authority of the policy and the human operator. This formulation supports both full takeover, where the human directly recovers the robot from severe failure states, and copilot shared control, where the human provides lightweight corrections while preserving the policy’s ongoing intent.

hand
Architecture of the seamless interventional method.

Optimization-Based Relative Hand Retargeting

A key challenge in dexterous intervention is that the human hand pose and robot hand pose are usually not aligned at the takeover moment. Directly mapping the human pose to the robot can therefore cause sudden command jumps and break stable grasps.

To address this, HandITL tracks relative fingertip motion instead of absolute hand pose. Starting from the intervention timestamp, the robot hand follows the change of human wrist-to-fingertip keyvectors, while preserving its own initial grasp configuration. This allows the operator to naturally express corrective finger motions without forcing the robot hand to first match the human hand posture.

The retargeting objective combines four terms: global hand-shape tracking, precision grasping, structural safety, and temporal regularization. Together, these terms enable smooth, safe, and contact-preserving dexterous correction during intervention.

Velocity-Based Shared Arm Control

For arm correction, HandITL converts VR controller motions into residual end-effector twists. These residuals are smoothed with an exponential moving average and added to the policy-predicted arm motion. Because the correction is velocity-based, the residual naturally decays to zero when the operator stops moving, avoiding persistent offsets and removing the need for a manually defined neutral position.

Experiments

We evaluate HandITL from three perspectives: intervention smoothness, post-takeover manipulation capability, and long-horizon policy improvement. The experiments are conducted on real bimanual dexterous manipulation tasks involving tool use, fine-grained grasping, and multi-stage task execution.

Takeover Command Discontinuity

We first measure whether different intervention methods introduce abrupt hand-command changes at the takeover moment. Direct teleoperation switching often causes large gesture jumps, leading to tool drops or unstable grasps. In contrast, HandITL preserves the robot’s current grasp and only applies relative corrective motion.

data_pyramid
Takeover command discontinuity on the Drill (top) and Bread Clip (bottom) tasks.

On the Bread Clip task, HandITL reduces the mean command discontinuity from approximately 4.38×1024.38 \times 10^{-2} to 6.8×1056.8 \times 10^{-5}, achieving a 99.8% reduction. On the Drill task, it similarly maintains stable trigger contact during intervention, avoiding failures caused by sudden finger-command changes.

Post-Takeover Manipulation Capability

We further test whether the robot can still perform precise manipulation after takeover. Compared with direct teleoperation and differential baselines, HandITL achieves smoother finger control, fewer grasp failures, and better cross-operator consistency.

makeup_result
Post-takeover manipulation capability on the Pick Up and Place the Parts and Pick Up the Drill tasks.
makeup_result
Grasping Postures and Failure Modes.

On the Pick Up and Place task, HandITL achieves the fastest mean completion time (42.8s42.8\,\mathrm{s}), improves efficiency by 19.1%19.1\%, and reduces grasp failures by 87.5%87.5\% compared with teleoperation. These results show that relative retargeting not only makes intervention smoother, but also preserves practical dexterous manipulation capability.

Long-Horizon Policy Evaluation

makeup_result
Execution sequences of the three long-horizon bimanual dexterous manipulation tasks.

Finally, we study whether intervention data can improve downstream VLA policy performance. Starting from a base policy fine-tuned on a 20-hour teleoperation dataset, we compare additional post-training with pure teleoperation data, full-takeover data, and copilot intervention data. All additional post-training datasets are 1 hour in duration and use the same training schedule.

long_horizon
Average normalized sub-goal completion scores across three long-horizon tasks.

Comparing the five policies reveals three critical insights regarding long-horizon performance:

  1. Limited Improvements from Pure Teleoperation: Simply increasing pure teleoperation data brings limited and inconsistent gains. Both Teleop_old and Teleop_new show only marginal improvements, and their effects vary across tasks. This suggests that additional off-policy demonstrations do not sufficiently address the rollout states where compounding errors occur, especially in late phases that require precise contact-rich manipulation.
  2. Effectiveness of Intervention Data: In contrast, policies fine-tuned with intervention data achieve higher average normalized completion scores. Since Copilot and Full Takeover data are collected when the deployed policy actually requires human correction, they provide targeted supervision for recovering from out-of-distribution states. This makes intervention data more effective than standard teleoperation demonstrations for improving long-horizon robustness.
  3. Copilot vs. Full Takeover: Among the two intervention strategies, Copilot generally yields the strongest overall performance. Unlike Full Takeover, which may introduce larger action deviations during authority switching, Copilot preserves the base policy’s ongoing behavior while injecting local corrections. The resulting data stays closer to the policy’s rollout distribution, leading to more stable downstream improvements.

Citation

  @misc{li2026handintheloopimprovingdexterousvla,
      title={Hand-in-the-Loop: Improving Dexterous VLA via Seamless Interventional Correction}, 
      author={Zhuohang Li and Liqun Huang and Wei Xu and Zhengming Zhu and Nie Lin and Xiao Ma and Xinjun Sheng and Ruoshi Wen},
      year={2026},
      eprint={2605.15157},
      archivePrefix={arXiv},
      primaryClass={cs.RO},
      url={https://arxiv.org/abs/2605.15157}, 
}