Yuan Hong Liao just graduated from NTHU. He was majoring in Electrical Engineering at NTHU, advised by Prof. Min Sun. After the training from NTHU, he went to USC as a visiting student. He worked with Prof. Joseph J. Lim's. His research spans from Reinforcement learning, Computer Vision and Natural Language Processing. Aside from the research experiences above, he also dedicates himself to many interesting open source projects, such as @openai/baselines and @andrewliao11/Deep-Reinforcement-Learning-Survey.
To see the more details of his research background, please refer to here (Last updated on Oct. 2017). For his curriculum vitae, please see here (Last updated on Feb. 2018). If you're interested in his research, see the Google Scholar
We propose a novel adversarial training procedure to leverage unpaired data in the target domain. Two critic networks are introduced to guide the captioner, namely domain critic and multi-modal critic. The domain critic assesses whether the generated sentences are indistinguishable from sentences in the target domain. The multi-modal critic assesses whether an image and its generated sentence are a valid pair. During training, the critics and captioner act as adversaries -- captioner aims to generate indistinguishable sentences, whereas critics aim at distinguishing them. During inference, we further propose a novel critic-based planning method to select high-quality sentences without additional supervision (e.g., tags)
We introduce two tactics to attack agents trained by deep reinforcement learning algorithms using adversarial examples: (1) Strategically-timed attack: the adversary aims at minimizing the agent's reward by only attacking the agent at a small subset of time steps in an episode. Limiting the attack activity to this subset helps prevent detection of the attack by the agent. We propose a novel method to determine when an adversarial example should be crafted and applied. (2)Enchanting attack: the adversary aims at luring the agent to a designated target state. This is achieved by combining a generative model and a planning algorithm: while the generative model predicts the future states, the planning algorithm generates a preferred sequence of actions for luring the agent. A sequence of adversarial examples are then crafted to lure the agent to take the preferred sequence of actions.
We propose a scalable approach to learn video-based question answering (QA): to answer a free-form natural language question about the contents of a video. Our approach automatically harvests a large number of videos and descriptions freely available online. Then, a large number of candidate QA pairs are automatically generated from descriptions rather than manually annotated.