3.1. Methodological Framework
The preliminary stage of this study involves a thorough implementation of the methodological framework illustrated in 
Figure 1. The process begins with the raw financial dataset, which likely contains a diverse range of features or variables. From this original data, we identify relevant financial variables for analysis. A feature selection process is then conducted to determine the most significant variables, which helps reduce dimensionality and ensures that only the most impactful variables are incorporated into the models.
The output of this feature selection yields a refined set of financial variables. For the modeling approaches, the selected financial variables are utilized in several classification algorithms, including (1) Logistic Regression, which is designed for binary or multinomial classification; (2) KNN (K-Nearest Neighbors), a non-parametric method that classifies based on proximity to neighbors; (3) SVM (Support Vector Machine), a supervised learning model that identifies the hyperplane that best separates the classes; and (4) Decision Tree, a tree-structured algorithm used for both classification and regression by making decisions based on feature splits. Additionally, Multiple Linear Regression is employed to predict continuous outcomes using the selected variables. The outputs from these algorithms are then evaluated for performance through a comparative analysis. The performance metrics assessed include recall (sensitivity), accuracy, precision, and F1-Score.
  3.2. Data
The study will employ purposive sampling to select samples based on the following specific criteria: (1) inclusion of companies listed on the Indonesia Stock Exchange (IDX) during the period from 2015 to 2023, ensuring that the selected firms are actively traded; (2) exclusion of financial sector companies to focus exclusively on non-financial sector firms, as financial institutions possess distinct accounting practices and regulatory requirements; and (3) inclusion of companies with complete and accessible financial statements from 31 December 2015 to 31 December 2023, to ensure a robust and comprehensive dataset. However, the deliberate exclusion of financial sector companies, including banks and insurance firms, is grounded in the recognition that these entities operate under unique accounting standards and regulatory frameworks, which introduce complexities that may confound the analysis of financial statement fraud [
30]. By focusing on non-financial sector companies, this study aims to provide a more targeted examination of financial statement fraud, thereby offering valuable insights into fraud detection mechanisms within the Indonesian business environment.
The initial dataset comprised 8190 firm-year observations of companies listed on the IDX. However, financial institutions such as banks and insurance companies were excluded due to their unique accounting practices and regulatory requirements, potentially confounding the analysis. This exclusion removed 1010 firm-year observations, leaving 7180 firm-year observations. Additionally, firm-year observations with incomplete financial data were excluded to ensure the integrity and accuracy of the analysis, resulting in the removal of 4807 observations. The final refined dataset consisted of 2373 firm-year observations, representing 593 non-financial sector companies across various industries: 28 in healthcare, 86 in basic materials, 24 in transportation and logistics, 226 in technology, 104 in consumer non-cyclical, 46 in industrials, 67 in energy, 109 in consumer cyclical, 44 in infrastructure, and 59 in properties and real estate.
  3.3. Financial Ratio Variables
The accounting data used in this study are based on the balance sheet and include metrics such as total debt (TD), total equity (TE), total assets (TAs), long-term debt (LTD), fixed assets (FAs), accounts payable (AP), inventory (Inv), current assets (CAs), current liabilities (CLs), cash, cash equivalents, and short-term investments (STIs). These metrics offer a comprehensive overview of the company’s financial position. TD, TE, and LTD are crucial for evaluating leverage and financial risk, while TAs and FAs assess the company’s scale and resource base. AP, Inv, CAs, and CL are important for understanding short-term liquidity (STL) and operational efficiency. Lastly, cash, cash equivalents, and STIs provide insights into the company’s liquidity and cash management strategies. Analyzing financial statement fraud in a developing economy such as Indonesia offers the potential to uncover unique challenges and patterns that may differ from those observed in developed economies, given the varying economic factors that impact fraud detection methods.
The Current Ratio (Equation (1)), could signal potential fraud if it shows an unexplained increase, possibly due to inflated current assets or inventory manipulations [
27]. Similarly, the ratio, computed as (cash + cash equivalents + short-term investments + accounts receivable) divided by current liabilities (Equation (2)), might indicate fraudulent activity if it demonstrates unusual increases, potentially stemming from inventory misstatements or misrepresented liabilities [
18]. Activity ratios, including Accounts Receivable Turnover and Inventory Turnover, are also critical. For instance, the Accounts Receivable Turnover, expressed as annual net sales divided by average accounts receivable (Equation (3)), aids in identifying anomalies like fictitious sales or delayed bad debt recognition [
24]. Additionally, other significant metrics such as days outstanding (Equation (4)) in accounts receivable and the Average Age of Inventory can reveal inefficiencies or potential fraud [
31]. Similarly, Inventory Turnover, calculated as the Cost of Goods Sold divided by average inventory (Equation (5)), can raise red flags if there are unexpected increases, indicating potential revenue recognition issues or inventory manipulations [
26].
The Average Age of Inventory, or inventory days, measures how long it takes for a company to sell its entire inventory (Equation (6)). A higher value can indicate slower sales, potential issues with obsolete stock, or overstocking. In fraud detection, an unexplained increase in this metric could raise concerns about inventory valuation or manipulation [
31]. The Days Payables Outstanding (DPO) represents the average time a company takes to pay its suppliers (Equation (7)). A sudden increase in DPO could suggest intentional payment delays, inflating cash balances, and distorting financial ratios, signaling potential manipulation [
24]. Total asset turnover evaluates how efficiently a company uses its assets to generate revenue (Equation (8)). Significant changes, not explained by operational shifts, may suggest fraudulent activities affecting revenues or asset values [
32]. Fixed Asset Turnover looks at the company’s use of fixed assets to generate sales (Equation (9)). A declining ratio might indicate underutilization or potential overvaluation of assets, signaling a need for further investigation [
27]. Finally, the intangible asset turnover ratio assesses how effectively a company uses its intangible assets, like patents and trademarks, to generate revenue (Equation (10)). Unexplained changes may indicate shifts in the value or utilization of these assets, warranting closer scrutiny [
31].
The related party sales ratio evaluates the percentage of sales made to entities or individuals connected to the company, such as affiliates or key personnel (Equation (11)). A high ratio may raise concerns about the independence of these transactions and potential conflicts of interest [
24]. The long-term Debt to Equity ratio (Equation (12)) is vital in identifying financial distress or efforts to conceal issues through increased borrowing [
27]. The Debt to Assets ratio further evaluates a company’s financial leverage and risk to identify fraudulent financial statements (Equation (13)). The Equity to Assets ratio assesses the proportion of a company’s assets funded by equity (Equation (14)). A decreasing Equity to Assets ratio could indicate increased financial leverage and a growing dependence on debt for financing. The Times Interest Earned ratio measures a company’s capacity to meet its interest payments (Equation (15)).
Profitability ratios, such as the Gross Profit Margin (Equation (16)), are widely used in financial analysis. Research has shown that the Operating Profit Margin ratio is effective in identifying fraudulent financial statements [
32]. The Gross Profit Margin (GPM), expressed as a percentage (%), represents the portion of revenue that exceeds the Cost of Goods Sold (COGS). It serves as a key indicator of a company’s profitability from its core operations. Several studies have utilized the Operating Profit Margin ratio (Equation (17)) as an indicator for detecting fraudulent financial statements [
27]. A decreasing Operating Profit Margin may suggest difficulties in managing operating expenses or potential irregularities in revenue recognition, signaling possible financial manipulation. Similarly, the Net Income Ratio is utilized for fraud detection [
18]. A declining Net Income Ratio could indicate difficulties in managing expenses or potential problems with recognizing revenue, prompting a need for further investigation into possible manipulation of financial statements. Return on Equity provides valuable insights into financial health and operational efficiency (Equation (19)). The return on assets ratio is also used to detect fraudulent financial statements (Equation (20)). Anomalies in these ratios may raise red flags for revenue or expense manipulations, necessitating a thorough investigation into potential financial statement fraud. Collectively, these ratios form a comprehensive framework for identifying irregularities and possible fraud in financial statements, each playing a unique role in the detection process.
  3.5. Multiple Linear Regression
Multiple Linear Regression (MLR) is a statistical method widely used in detecting financial statement fraud, particularly for predicting a continuous outcome such as the severity of financial discrepancies. Unlike logistic regression, which handles binary classification, MLR provides quantitative estimates and insights into the relationships between independent variables and the outcome. Studies such as [
8,
9] have employed MLR, using dependent variables like earnings management and F-scores. The interpretability of MLR aids financial analysts in understanding complex patterns of fraudulent activities within financial statements. The general form of the multiple linear regression equation is Equation (28) as follows:
        where the terms are defined as follows:
Y = the dependent variable;
β0 = intercept;
β1, β2, …, βk = the coefficients that represent the change in Y for a one-unit change in the corresponding independent variable X1, X2, …, Xk;
X1, X2, …, Xk = the independent variables;
ε = the error term, representing the unobserved factors that affect Y but are not included in the model.
The equation expresses the linear relationship between the dependent variable Y and the independent variables X1, X2, …, Xk. The goal in multiple linear regression is to estimate the coefficients β1, β2, …, βk that best fit the observed data.