In order to perform activities from demonstrations or descriptions, agents need to distill the essence of the given activity an adapt it to perform it in new environments. In this work, we address the problem of environment-aware program generation. Given a visual demonstration or a description of an activity, we generate program sketches representing the essential instructions and propose a model, ResActGraph, to transform these into full programs representing the actions needed to perform the activity under the new environmental constraints.

Activities as programs

We focus in learning human activities, composed of sequences of actions and object interactions. We represent each activity as a program, a sequence of instructions representing the atomic actions to be executed to do the activity. Programs can serve as an intermediate reprsentation for agents, providing an interpretable trace of the plan they should follow. They are less ambiguous than descriptions and can therefore be more easily parsed and executed.

Why environment-aware

Just like activities can be done in many ways, there are many programs that can represent a certain activity. However, in order to have agents executing them, these programs need to satisfy the presented environmental constraints. For example, in order to sit in a sofa crowded with objects, the program should include the actions that remove them in order to sit properly.

Method: ResactGraph

In order to generate environment-aware programs, we first translate the demonstration or description into a compact representation, called activity sketch. Next, we represent the target environment as a graph with nodes encoding the environment's objects and their properties. The activity sketch and the environment graph are then used to induce environment-aware programs. To make the model better understand the environment, we track and update the environment graphs in a way similar to residual learning.

Code and Simulator

To reproduce the full experiment results, we provide the official implementation of ResactGraph and the link to the simulator, VirtualHome.

Code VirtualHome Python API


Watch TV

Description: Sit down and watch TV while finding your cute cat on the sofa.

Environment: Cat on sofa, TV off.

Wash dishes with dishwasher

Description: Grab a plate on the table and turn on the dishwasher.

Environment: Plate on the table, dish soap on dishwasher

Read book

Description: Walk to the home office and grab a book to read.

Environment: Book on the livingroom

Experimental results

We provide all the learned weights at here.

LCS F1-relation F1-state F1 Executability Parsability
Nearest Neighbors 0.127 0.019 0.288 0.041 - -
Unaries 0.372 0.16 0.142 0.159 24.8% 75.3%
Graph 0.4 0.171 0.171 0.172 23.1% 82.2%
FCActGraph 0.469 0.261 0.273 0.263 33.7% 88.6%
GRUActGraph 0.508 0.41 0.408 0.411 48.9% 87.9%
ResActGraph (proposed method) 0.516 0.41 0.42 0.413 49.3% 85.3%


author = {Liao, Yuan-Hong and Puig, Xavier and Boben, Marko and Torralba, Antonio and Fidler, Sanja},
title = {Synthesizing Environment-Aware Activities via Activity Sketches},
booktitle = {The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2019}

Reach out to Yuan-Hong Liao for questions, suggestions, and feedback.