Intent Aware Policy Optimization Algorithms For Human Agent Cooperative Tasks
Keywords:
human-agent cooperation, intent-aware reinforcement learning, policy optimization, task efficiency, cooperation efficiency, multi-agent systems, human satisfactionAbstract
Cooperative tasks between humans and agents are becoming more common in areas like collaborative robotics, industry automation, and rescue operations. Reinforcement learning techniques usually train the agents' policies independently of human intentions, leading to unintended behavior, redundant task handling, and decreased collaboration efficiency. In this paper, we present an Intent-Aware Policy Optimization (IAPO) system, where real-time human intention prediction and adaptive multi-agent reinforcement learning work together to improve the level of cooperation and task performance. Our IAPO system is comprised of three modules, namely, the Intent Recognition Module, the Cooperative Task Scheduler, and the Policy Optimization Module. The first two modules generate task priorities according to the human intentions, while the latter optimizes the agents' policies by using the proposed intent-aware reward function. Experimental evaluation was carried out on simulated dynamic environments, where both collaborative tasks and multiple agents and humans took part. Four criteria, including the task completion rate (TCR), cooperation efficiency (CE), policy convergence time (PCT), and human satisfaction index (HSI), were used to compare our approach with the baseline approaches. The obtained results show that our framework outperforms the baselines, obtaining a TCR of 93%, CE of 88%, PCT of 120 iterations, and HSI of 4.5. The results reveal that human intention can be integrated into the policy optimization process to improve the quantitative results and qualitative cooperation between humans and agents. This is a useful framework that provides flexibility, explainability, and adaptability when it comes to use. Further research should consider applying this framework to many different scenarios related to people and robots, including learning on the Internet.




