OncoMark

A Two-Stage Gated Framework for Cancer Hallmark Detection from Biomedical Text

Authors

DOI:

https://doi.org/10.32473/flairs.39.1.141834

Abstract

Automatic identification of cancer hallmarks from biomedical text is critical for large-scale literature mining but remains challenging due to extreme class imbalance, implicit biological language, and the prevalence of sentences expressing no hallmark. Most existing methods address this task using single-stage multi-class or multi-label models, which conflate hallmark presence detection with hallmark type prediction and often suffer from high false-positive rates. We introduce OncoMark, a two-stage gated framework that explicitly decouples these decisions. A binary gate first determines whether a sentence expresses any cancer hallmark. Only hallmark-positive sentences are then passed to expert models that perform either multi-label or multi-class hallmark classification. The multi-label expert incorporates label-wise attention pooling and an asymmetric loss to address severe imbalance, with threshold calibration optimised end-to-end. Experiments on the BigBio Hallmarks of Cancer dataset show that OncoMark consistently outperforms strong baselines, substantially reducing false positives and improving performance on rare hallmark categories. Code is publicly available at
https://github.com/Adrikahaha/OncoMark

Downloads

Published

06-05-2026

How to Cite

Zafor, A., Dip, S. A., & Zhang, L. (2026). OncoMark: A Two-Stage Gated Framework for Cancer Hallmark Detection from Biomedical Text. The International FLAIRS Conference Proceedings, 39(1). https://doi.org/10.32473/flairs.39.1.141834

Issue

Section

Special Track: AI in Healthcare Informatics