Programming by Demonstration for the Web Browser
This project explores programming-by-demonstration (PBD) in the web browser. We are designing the stack of components needed to build PBD applications for accessing and manipulating web data - everything from a deterministic replayer, to a tool for synthesizing relational queries from sample outputs.
Ringer: Deterministic Replay of User Actions
One of the crucial building blocks of most end-user programming systems is the ability to deterministically replay a sequence of user actions. In the context of the web, this means being able to replay events such as clicks, menu selections, and form filling actions. Although replaying a program deterministically may sound trivial, knowing when to dispatch events during replay requires overcoming some challenges. For a recorder, the execution is not fully observable. (What event did the user wait for before clicking? In what order were the browser and user events interleaved?) For a replayer, the execution is not fully controllable. (The browser does not provide the ability to reorder events arbitrarily.) And the web page itself may be different each time it is reloaded, whether because of redesigns, obfuscation, or updated data. Deterministic browser replay is impossible in the general case, but we managed to distill from several sources of non-determinism an easy-to-understand framework for which we can (i) describe assumptions under which deterministic replay is guaranteed; and (ii) design successful heuristics for situations that fall outside the replayable boundaries. Our record and replay algorithms are instantiated in a Chrome extension, Ringer, the source code of which can be found here.
WebCombine: Synthesis of Web-scraping Scripts from Demonstrations
As more and more data appears on the web, programmers and non-programmers alike are increasingly interested in web harvesting. Although many scraping languages have been developed to help programmers with these tasks, most end-user scraping tools have restricted users to scraping datasets in which all cells of a given row appear on a single list page, with no user interaction required. These techniques are very robust and effective, but unfortunately this model cannot accommodate even a task like scraping friends' cell numbers from Facebook, since getting from the list of friends to a given friend's cell number means clicking on a profile page, then an "About" link, and only then scraping. Building on top of our record and replay system, Ringer, we have produced WebCombine, a PBD web scraping system that allows users to scrape data hidden behind complicated user interactions. Users demonstrate how to collect the first row of the dataset, and the tool collects the rest. The WebCombine source code - and also a video demo - are available here. By demonstrating lists (introducing for loops) and recording interactions (adding to loop bodies), WebCombine users can build and run quite complicated scraping programs, without writing a single line of code.
Quicksilver: Automatic Synthesis of Relational Queries
Relational datasets have become so widespread that even end-users such as secretaries and teachers frequently interact with them. However, finding the right query to retrieve target data from complex databases can be very difficult for end-users. Many of these users resort to voicing their confusion on Excel help forums, hoping for experts' guidance, but often receiving little help and waiting days for useful responses. Quicksilver is a programming-by-demonstration system that derives queries from a small set of sample output rows. (See more details here.) It is designed to be easy and intuitive for users who are not familiar with database theory.