en
Doctoral thesis
Open access
English

Counterfactual Interactive Learning: designing proactive artificial agents that learn from the mistakes of other decision makers

ContributorsBlonde, Lionel
Imprimatur date2022-01-18
Defense date2021-12-21
Abstract

Modelling a reward able to convey the right incentive to the agent is fairly tedious in terms of engineering required. Imitation learning bypasses this time-consuming hurdle by enticing the agent to mimic an expert instead of trying to maximize a potentially ill-designed reward signal. We first build on the strengths of off-policy learning to design a novel adversarial imitation approach that addresses the high sample complexity suffered by Generative Adversarial Imitation Learning, the state-of-the-art imitation approach. Second, we show that forcing the adversarially learned reward function to be local Lipschitz-continuous is a sine qua non condition for the method to perform well. We complement this empirical evidence with several theoretical guarantees. Finally, we introduce the concept of dataset-grounded optimality inductive bias for offline agents. By carefully orchestrating such priors in our generalization of importance-weighted regression, we can achieve better results, while remaining agnostic with respect to the quality the dataset.

eng
Research group
Citation (ISO format)
BLONDE, Lionel. Counterfactual Interactive Learning: designing proactive artificial agents that learn from the mistakes of other decision makers. 2022. doi: 10.13097/archive-ouverte/unige:158585
Main files (1)
Thesis
accessLevelPublic
Identifiers
227views
31downloads

Technical informations

Creation02/01/2022 2:09:00 PM
First validation02/01/2022 2:09:00 PM
Update time05/15/2023 12:35:25 PM
Status update05/15/2023 12:35:25 PM
Last indexation02/01/2024 7:41:38 AM
All rights reserved by Archive ouverte UNIGE and the University of GenevaunigeBlack