Dig Here! Extracting and Using Knowledge from Financial Audit Reports





Financial audits establish trust in the governance and processes in an organization, but they are time-consuming and knowledge intensive. To increase the effectiveness of financial audit, we address the task of generating audit suggestions that can help auditors to focus their investigations. Specifically, we present NLP techniques to extract hidden knowledge from a corpus of past financial audit reports of many companies, and use it for generating audit suggestions. The extracted knowledge consists of a set of automatically identified sentences containing adverse remarks, the financial variables mentioned in each sentence and automatically assigned XBRL categories for them, since XBRL is a standardized taxonomy in the financial domain. In the absence of suitable labeled data, we adopted a weak supervision approach. We designed a set of high precision linguistic rules to identify adverse remark sentences, created automatically labeled training data using them, and trained BERT-based and other classifiers to identify such sentences. We next presented novel techniques (which are either unsupervised or zero-shot) to assign zero, one, or more XBRL categories to any given adverse remark sentence. We evaluated the proposed approaches, on a large corpus of real-life financial statements and audit reports, against competent baselines. Given a company’s financial statements (already identified as suspicious), and given a subset of financial variables in them that contribute to suspiciousness, we match these with the extracted knowledge base and identify aligned adverse remarks that help the auditor in focusing on specific directions for further investigations.




How to Cite

Pawar, S., Apte, M., Pawde, A., Vaishampayan, S., Palshikar, G., & Shinde, A. (2023). Dig Here! Extracting and Using Knowledge from Financial Audit Reports. The International FLAIRS Conference Proceedings, 36(1). https://doi.org/10.32473/flairs.36.133265



Special Track: Applied Natural Language Processing