Date/Time: Friday, September 18th, 2pm (Brasília Time) Where: https://meet.google.com/cxf-zrso-uxv
Human language is ambiguous, with intended meanings recovered via pragmatic reasoning in context. Such reliance on context is essential for the efficiency of human communication. Programming languages, in stark contrast, are defined by unambiguous grammars. In this work, we aim to make programming languages more concise by allowing programmers to utilize a controlled level of ambiguity. Specifically, we allow single-character abbreviations for common keywords and identifiers. Our system first proposes a set of strings that can be abbreviated by the user. Using only 100 abbreviations, we observe that a large dataset of Python code can be compressed by 15%, a number that can be improved even further by specializing the abbreviations to a particular code base. We then use a contextualized sequence-to-sequence model to rank potential expansions of inputs that include abbreviations. In an offline reconstruction task our model achieves accuracies ranging from 92% to 99%, depending on the programming language and user settings. The model is small enough to run on a commodity CPU in real-time, and to fine-tune to a new code base in a few CPU hours. We evaluate the usability of our system in a user study, integrating it in Microsoft VSCode, a popular code text editor. We observe that our system performs well and is complementary to traditional autocomplete features.