Reinforcement Learning for Guiding the E Theorem Prover
DOI:
https://doi.org/10.32473/flairs.36.133334Keywords:
ATP, eprover, RLAbstract
Automated Theorem Proving (ATP) systems search for a
proof in a rapidly growing space of possibilities. Heuristics
have a profound impact on search, and ATP systems make
heavy use of heuristics. This work uses reinforcement learn-
ing to learn a metaheuristic that decides which heuristic to use
at each step of a proof search in the E ATP system. Proximal
policy optimization is used to dynamically select a heuristic
from a fixed set, based on the current state of E. The approach
is evaluated on its ability to reduce the number of inference
steps used in successful proof searches, as an indicator of in-
telligent search.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2023 Jack McKeown, Geoff Sutcliffe

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.