Reinforcement Learning for Guiding the E Theorem Prover

Authors

DOI:

https://doi.org/10.32473/flairs.36.133334

Keywords:

ATP, eprover, RL

Abstract

Automated Theorem Proving (ATP) systems search for a
proof in a rapidly growing space of possibilities. Heuristics
have a profound impact on search, and ATP systems make
heavy use of heuristics. This work uses reinforcement learn-
ing to learn a metaheuristic that decides which heuristic to use
at each step of a proof search in the E ATP system. Proximal
policy optimization is used to dynamically select a heuristic
from a fixed set, based on the current state of E. The approach
is evaluated on its ability to reduce the number of inference
steps used in successful proof searches, as an indicator of in-
telligent search.

Downloads

Published

08-05-2023

How to Cite

McKeown, J., & Sutcliffe, G. (2023). Reinforcement Learning for Guiding the E Theorem Prover. The International FLAIRS Conference Proceedings, 36(1). https://doi.org/10.32473/flairs.36.133334

Issue

Section

Main Track Proceedings