Open Mechanized Foundations for JavaScript Regular Expressions

Category

Contribute

Institutions

EPFL

Data type

4D STEM data

Field

Materials Science

Researchers

Barrière, Aurèle

Abstract

One of the main forms of ORD in the programming languages research community is two-sided mechanized language specifications (the definition in a proof assistant of the semantics of a language). These mechanized specifications have many benefits: they can be extracted to an executable reference implementation, and used by both implementers (to verify compilers, interpreters and optimizations) and by users (to guarantee the correctness of programs in that programming language).

We propose to contribute the first open, two-sided, mechanized specification of JavaScript regular expressions (regexes). The lack of such mechanization is harming the research community: previous work has mechanized other parts of the JavaScript language but not regexes, and as a consequence researchers use paper-only semantics for JavaScript regexes. These paper semantics are neither executable nor reusable and often incorrect. We will translate to Coq the part of the open ECMAScript standard that describes JavaScript regexes; extract this mechanization to a reference implementation in OCaml; and validate our mechanization with Coq proofs.

Our project will provide a solid foundation for JavaScript regex research to build upon our Coq mechanization, including proving the correctness of regex optimizations, detecting regexes with security issues (ReDOS), or proving the correctness of entire regex engines. Our project will allow the open JavaScript community to test their regex engines.

Scroll to Top