Abstract In order to optimize reinforcement learning for the great errors causing by the unclear reward function, this paper deeply
studies and implements the behavior cloning algorithm and data aggregation algorithm in the imitation learning algorithm. The algorithm
flow is modeled by activity diagram, the relationship between classes is modeled by class diagram, and the core interaction
process is modeled by sequence diagram. According to the experimental results, this paper compares the advantages and disadvantages
of the behavior cloning algorithm and the data aggregation algorithm, and discovers that the behavior cloning algorithm offline
training can avoid interaction with the real environment, but error accumulation will lead to error results; data aggregation algorithms
must interact with the environment online, and select the corresponding state of the observation value according to the strategy
to solve the problem of error accumulation.
|