Feasibility of Tiny Recursion Models for the Traveling Salesman Problem
Learned Insertion and 2-opt Refinement
DOI:
https://doi.org/10.32473/flairs.39.1.141582Abstract
Tiny Recursion Models (TRMs) have recently shown strong performance on hard, highly structured reasoning tasks by repeatedly refining an internal solution representation with a small shared network. In this paper we investigate whether the same recursive refinement principle can be transferred from puzzle-style prediction to combinatorial optimization. We focus on the Euclidean Traveling Salesman Problem (TSP) and identify a key obstacle for naive TRM formulations: unconstrained iterative rewrites easily violate feasibility (duplicate or missing cities) and can collapse into short subtours that are difficult to escape during refinement. To address this, we cast TSP solving as iterative constraint satisfaction in which every intermediate state is a valid single-cycle tour. We train two supervised TRM policies from synthetic mid-trajectory data generated by classical teachers: (i) a constructive insertion policy that selects a city and an insertion edge in the current partial tour, and (ii) a learned 2-opt policy that proposes improving swaps and predicts a STOP action. Experiments on small random Euclidean instances show stable, non-degenerate refinement dynamics: the learned 2-opt policy both improves tours and learns to terminate, and the full insertion + 2-opt pipeline consistently outperforms an insertion-only ablation. While optimality gaps remain nontrivial in this proof-of-concept regime, performance matches the general range of lightweight baselines.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2026 Alexander V Mantzaris

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.