Doctoral thesis
OA Policy
English

Counterfactual Interactive Learning: designing proactive artificial agents that learn from the mistakes of other decision makers

ContributorsBlonde, Lionelorcid
Imprimatur date2022-01-18
Defense date2021-12-21
Abstract

Modelling a reward able to convey the right incentive to the agent is fairly tedious in terms of engineering required. Imitation learning bypasses this time-consuming hurdle by enticing the agent to mimic an expert instead of trying to maximize a potentially ill-designed reward signal. We first build on the strengths of off-policy learning to design a novel adversarial imitation approach that addresses the high sample complexity suffered by Generative Adversarial Imitation Learning, the state-of-the-art imitation approach. Second, we show that forcing the adversarially learned reward function to be local Lipschitz-continuous is a sine qua non condition for the method to perform well. We complement this empirical evidence with several theoretical guarantees. Finally, we introduce the concept of dataset-grounded optimality inductive bias for offline agents. By carefully orchestrating such priors in our generalization of importance-weighted regression, we can achieve better results, while remaining agnostic with respect to the quality the dataset.

Research groups
Citation (ISO format)
BLONDE, Lionel. Counterfactual Interactive Learning: designing proactive artificial agents that learn from the mistakes of other decision makers. Doctoral Thesis, 2022. doi: 10.13097/archive-ouverte/unige:158585
Main files (1)
Thesis
accessLevelPublic
Identifiers
514views
164downloads

Technical informations

Creation01/02/2022 14:09:00
First validation01/02/2022 14:09:00
Update time13/10/2025 15:38:49
Status update05/06/2025 06:57:51
Last indexation03/12/2025 07:40:19
All rights reserved by Archive ouverte UNIGE and the University of GenevaunigeBlack