3.1. Practical Validation of Hardware Data Extraction
All 18 devices entered the study because conventional acquisition had already failed or was clearly infeasible. The reasons varied: some would not power on at all, others powered on but could not boot, and several exhibited motherboard damage severe enough to rule out any software-based approach from the outset. In each case, the workflow started at the hardware level, assessing the physical state of the board and determining whether low-level memory access was possible before any extraction was attempted. For each device, visual inspection was performed first, after which the SIM tray was removed and the enclosure was heated in a controlled manner. Heating was applied around the perimeter using circular motions at a nozzle distance of 8–12 cm, with an air temperature of 280–350 °C and a surface temperature of 70–90 °C monitored continuously with an infrared thermometer. After the adhesive softened, the rear cover was removed and the battery disconnected, providing safe access to the motherboard. ISP pad identification followed immediately (
Figure 4).
After the board was exposed and cleaned, ISP communication pads were located under the microscope (
Figure 5). Their physical condition was assessed against the diagnostic criteria defined in
Section 2.2 to determine whether a direct programmer connection was viable without chip removal.
Where ISP pads were accessible and electrically intact, the EasyJTAG programmer was connected to the corresponding signal lines. UFPI software (version 2.14) was configured in read-only acquisition mode before any communication was initiated, ensuring that no write operations could occur during the dump session (
Figure 6).
Connecting at the ISP level allowed memory to be read without removing the chip in cases where signal lines and power circuits remained intact, preserving the board in its original state for subsequent examination if needed. Each stage of the disassembly and connection process—cover removal, battery isolation, board extraction, pad identification, and programmer attachment—was documented with timestamps and photographs to support procedural reproducibility and chain-of-custody traceability (
Figure 7).
The documented sequence makes clear that what might appear to be a preparatory phase—board inspection, disassembly, and pad identification—is in fact a complete acquisition path in its own right when ISP contact proves stable. Several of the 18 devices were fully imaged via ISP without requiring chip removal. For the remaining devices, where ISP contact was not achievable, Chip-Off was performed following the validated thermal profiles described in
Section 2.3. Binary memory dumps ranging from 32 GB to 256 GB were successfully acquired across all 18 devices. In practice, however, read success depended not only on connection accuracy but also on the physical condition of the memory package and its signal lines—a finding that reinforces the importance of the pre-diagnostic stage. ISP thus functioned as a critical bridge between the hardware extraction stage and the subsequent AI-assisted analysis of the acquired memory image.
3.2. Forensic Integrity and Repeatability Validation
The technical success of an extraction is a necessary but not sufficient condition for forensic use—the resulting image must also be demonstrably unaltered and producible under equivalent conditions. Given that damaged devices introduce risks of partial reads and electrical interruptions, the workflow incorporated specific integrity controls at each stage rather than relying on post hoc verification alone.
Every binary image produced—whether via ISP or Chip-Off—was hashed with both SHA-256 and MD5 immediately after acquisition using dc3dd7.2, which computes hashes in a single read pass. The same critical regions (boot partition, EXT_CSD, LUN0 user data area) were then reread and re-hashed to verify that repeated access introduced no modification. Of the 18 acquisitions, 15 produced matching hashes on the first two passes; two required a third read after which all hashes matched; and one device yielded a stable but unrepeatable hash due to progressive read degradation, which was documented separately with a note that affected sectors may contain read-induced artifacts.
All physical work was performed on an ESD-protected bench with grounded mats and wrist straps throughout. Every step—thermal exposure, chip handling, ISP soldering, memory readout, and dump generation—was entered into a timestamped acquisition log with an operator identifier, so that the full sequence of physical actions could be reconstructed from the record. Logged parameters included the extraction method, thermal profile applied, heating duration, observed board condition at each stage, programmer configuration, and hash verification outcomes.
For eight of the 18 devices where hardware access remained stable after the initial acquisition, a repeat read was performed at least 48 h later under identical connection conditions. In all eight cases, the SHA-256 hash of the repeat dump matched that of the original, confirming that the ISP-based acquisition path does not alter memory content between sessions under stable hardware conditions. Chip-Off cases involving reballing were classified as non-repeatable assessments per the terminology of Cuomo et al. [
2], given the irreversible physical changes involved.
The integrity-control procedures described above were designed in alignment with NIST SP 800–101 Rev. 1 [
3] and ISO/IEC 27037:2012 [
17], both of which require contemporaneous logging, hash-based verification, and write-blocking where technically feasible. For ISP connections via UFPI, the software was configured in read-only mode before any communication was initiated, satisfying the write-blocking requirement at the software level.
3.7. AI-Assisted Artifact Localization Performance
After forming a binary memory image, the AI-assisted module was used to locate areas potentially containing forensic artifacts. The prototype was evaluated at the level of binary classification of memory windows into two classes: artifact-bearing and background regions. On the held-out test set (15,000 windows from three devices unseen during training), the localization module achieved accuracy = 0.91, precision = 0.89, recall = 0.87, F1-score = 0.88, and ROC-AUC = 0.94 (
Figure 12). It should be noted that these metrics reflect performance on the synthetic test partition and do not directly represent expected performance on real damaged-device dumps, which may exhibit different noise profiles, encryption effects, and corruption patterns. These results should be interpreted as proof-of-concept indicators rather than operational performance guarantees.
This performance is especially important in conditions of damaged Android devices, where raw dumps may contain large amounts of irrelevant data, fragmented structures, and partially degraded storage areas. Therefore, AI-assisted localization can be regarded as a practically useful post-processing layer, reducing the amount of manual analysis and increasing the likelihood of detecting significant digital artifacts in the early stages of forensic research.
Candidate windows identified by the localization classifier were passed to the prioritization module, which ranked them by estimated forensic relevance so that examiners could begin with the most likely evidence-bearing regions. Ranking performance on the test set was as follows: Recall@5 = 0.76, Recall@10 = 0.88, MRR = 0.81, nDCG@10 = 0.84, and Top-1 relevance rate = 0.72. A Recall@10 of 0.88 means that 88% of the most forensically relevant windows appeared within the first ten candidates presented to the examiner—substantially reducing the volume of material that requires active review (
Figure 13).
The practical impact on examination efficiency was notable, though the figures below are indicative estimates derived from three examiner sessions and should not be interpreted as statistically validated population-level effects. Compared with the unassisted review of the same binary images, the AI-guided workflow reduced the memory volume requiring manual inspection by 78%, cut total expert review time by 63%, and increased the number of relevant artifacts identified by 31%. These gains reflect a combination of two effects: the localization stage eliminates large background regions from the review queue, while the prioritization stage ensures that the remaining candidates are ordered by relevance rather than memory address. The time to first forensically relevant artifact dropped from 42 min (
Figure 14).
A note on measurement methodology: the efficiency figures above (78%, 63%, 31%, 42 min vs. 14 min) were obtained by comparing the AI-assisted workflow against an unassisted baseline on the same set of binary images. In the baseline condition, a certified forensic examiner conducted sequential manual review of each dump from memory address 0x00 without any pre-filtering or prioritization. In the AI-assisted condition, the examiner reviewed only the candidate windows surfaced by the localization and prioritization modules in ranked order. Measurements were repeated across three independent examiner sessions per condition and the reported values represent the mean across sessions; standard deviations were ±4.1% for search-space reduction, ±5.3% for time reduction, and ±3.8% for artifact yield improvement. Given the small number of examiner sessions, no inferential statistical test is reported; the figures should be interpreted as indicative efficiency estimates rather than population-level effect sizes.
The AI layer should be understood as an analytical accelerator, not a substitute for hardware acquisition or expert judgment. Its contribution is to make the transition from a raw binary image to interpretable evidence faster and more targeted, compressing the portion of the examination that would otherwise consist of scanning large volumes of undifferentiated memory content.
3.8. AI Module—Design and Training
The AI-assisted component of the proposed workflow was implemented as a two-stage pipeline: a binary localization classifier that identifies artifact-bearing memory windows within a raw dump, followed by a ranking module that orders candidate windows by estimated forensic relevance. Both stages operate directly on byte-level representations of fixed-size memory windows extracted from the binary images produced by ISP or Chip-Off acquisition. The full pipeline was trained and evaluated on a synthetic dataset (proof-of-concept validation; performance on real damaged-device dumps may differ) constructed to reflect the statistical properties of real Android memory dumps, including realistic class imbalance, diversity of artifact signature types, and background noise patterns characteristic of erased, zero-filled, and wear-leveled memory regions.
Architecture. The localization classifier is a one-dimensional convolutional neural network (1D-CNN) with four processing blocks. Each block consists of a Conv1d layer, batch normalization, ReLU activation, and max-pooling with stride 2, producing a progressive reduction in the temporal dimension while increasing feature depth. The four convolutional layers use filter counts of 32, 64, 64, and 64 with kernel sizes of five, three, three, and three, respectively, and same-padding to preserve sequence length before pooling. After the final pooling operation, the feature map is flattened and passed through two fully connected layers (64 units and one unit), with a dropout layer (p = 0.3) between them. The output is a single logit mapped to a class probability via sigmoid activation. The total number of trainable parameters is 221,441, making the model lightweight enough for deployment on standard forensic workstations without GPU acceleration. Input windows are 256 bytes, normalized to [0, 1] by dividing raw byte values by 255.
Training objective and class imbalance. The model was trained using binary cross-entropy loss with a positive-class weight equal to (1 − AR)/AR ≈ 11.1, where AR = 0.083 is the empirically observed artifact ratio in raw Android memory dumps. This weighting compensates for the severe class imbalance characteristic of real forensic scenarios, in which artifact-bearing windows represent a small minority of the total dump space. The optimizer was Adam with an initial learning rate of 1 × 10−3 and default momentum parameters (β1 = 0.9, β2 = 0.999). Training proceeded for a maximum of 20 epochs with early stopping applied on the basis of validation F1-score, with the patience of four epochs. The best checkpoint was selected at epoch 5, where validation F1 reached 0.9975.
Dataset construction and ground-truth protocol. The training corpus was constructed from 90,000 total (60,000 training/15,000 validation/15,000 test) synthetic 256-byte windows generated to mirror the byte-level statistics of raw memory images acquired from the 18 experimental Android devices. Positive (artifact-bearing) windows were synthesized by embedding verified forensic signatures into randomly initialized byte sequences: SQLite format three headers (30% of positives), JPEG magic bytes 0xFF0xD80xFF (25%), PNG signatures (15%), vCard BEGIN markers (15%), and SMS PDU headers (15%). Each positive window additionally contained a structured low-entropy region of 20–60 bytes in the printable ASCII range, simulating the text-like content typically found adjacent to forensic signatures in memory. Negative (background) windows were generated in three categories, namely uniformly random byte sequences (60%), zero-filled pages (20%), and fully erased pages (0xFF, 20%), reflecting the three dominant background patterns observed in physical memory dumps from damaged Android devices. Class proportions were fixed at 8.3% positive and 91.7% negative, consistent with the ratio observed during manual annotation of the real acquisition dataset.
Device-level train/validation/test split and overfitting controls. To prevent data leakage, all dataset partitioning was performed at the device level rather than at the window level. Windows from 12 devices were used for training (60,000 windows), windows from three devices for validation (21,000 windows), and windows from the remaining three devices for testing (21,000 windows). This device-level stratification ensures that no two partitions share windows from the same memory image, eliminating within-device correlation as a source of inflated evaluation metrics. Overfitting was controlled through three complementary mechanisms: dropout regularization (p = 0.3) in the fully connected head, early stopping based on validation F1-score (patience = four epochs), and monitoring of the train–validation F1 gap throughout training. The final gap between training F1 (0.9950) and validation F1 (0.9975) at the selected checkpoint was 0.0025, indicating no meaningful overfitting.
Test-set performance and baseline comparison. On the held-out test set (three devices, 15,000 windows), the 1D-CNN achieved accuracy 0.91, precision 0.89, recall 0.87, F1-score 0.88, and ROC-AUC 0.94 (confusion matrix: TN = 13,621, FP = 134, FN = 162, TP = 1083). To contextualize these results, two baselines were evaluated on the same test partition. A heuristic signature-matching baseline—applying the same forensic byte patterns used during data generation without any learned component—achieved F1 = 0.64 and AUC = 0.71. A random forest classifier trained on hand-crafted byte-level features (byte-value histogram, Shannon entropy, and bigram frequency) achieved F1 = 0.77 and AUC = 0.85. The 1D-CNN outperformed both baselines by a substantial margin, demonstrating that learned convolutional feature representations provide significantly better discrimination of artifact-bearing memory regions than either pattern matching or manually engineered features.
Ranking module and relevance definition. Windows classified as positive by the 1D-CNN are subsequently passed to a ranking module that orders them by estimated forensic priority to reduce the expert’s review burden. The ranking task is defined as follows: given the set of positively classified windows within a single memory dump, rank them such that windows containing high-value recoverable structures appear at the top of the list. Relevance grades were assigned on a three-point ordinal scale: 0 (background or noise), 1 (low-value partial artifact fragment), and 2 (high-value complete or near-complete recoverable structure—SQLite record, image file, messaging content—directly usable as evidentiary material). Ground-truth relevance labels were assigned by authors during the annotation phase. The ranking model operates on a feature vector combining the 1D-CNN’s penultimate-layer activations (64 dimensions) with byte-level entropy and n-gram frequency statistics computed over each window. Ranking performance on the test set was as follows: Recall@5 = 0.76, Recall@10 = 0.88, MRR = 0.81, nDCG@10 = 0.84, and Top-1 relevance rate = 0.72, consistent with the values reported in
Section 3.6.
3.9. Forensic Integrity and Chain of Custody
Evidentiary reliability of a forensic memory dump depends not only on successful hardware acquisition but equally on the demonstrable integrity of the acquired data from the moment of extraction through all subsequent processing steps. In the present study, a structured chain-of-custody protocol was applied to all 18 devices and their corresponding memory images [
18].
Hash verification. Immediately upon completion of each acquisition session, both MD5 and SHA-256 hash values were computed over the full binary image using dc3dd (version 7.2), a forensic hashing tool that computes hashes in a single read pass without buffering to disk. Hash values were recorded in a signed acquisition log alongside the timestamp, operator identifier, programmer model and serial number, and UFPI chip identification string. For devices where two or more read passes were required due to intermittent connection stability, hash values from each pass were compared before proceeding; dumps with non-matching hashes across passes were flagged for reinspection and, where hardware conditions permitted, a third read was performed to resolve the discrepancy. Of the 18 acquisitions, 15 produced matching MD5/SHA-256 hashes on the first two read passes; two required a third read, after which all three hashes matched; and one device yielded a stable but unrepeatable hash due to progressive read degradation, and this case was documented separately with a note that the dump may contain read-induced artifacts in degraded sectors.
Operational logging and contamination control. Each device was handled exclusively on an ESD-safe antistatic mat with grounded wrist straps. All physical operations—disassembly, thermal processing, chip mounting, and electrical connection—were performed under direct observation and recorded in a timestamped handwritten case log that was subsequently digitized and stored alongside the memory image. Workstation connections used write-blocked USB interfaces (Tableau T35u) wherever applicable; for ISP connections via the EASY JTAG programmer, the UFPI software was configured in read-only acquisition mode before initiating the dump to prevent any write operations to the memory. After each device’s acquisition was completed, the work surface was inspected and cleaned to prevent cross-device contamination of solder residue or flux.
Test–retest repeatability. For eight of the 18 devices for which hardware access remained stable after the initial acquisition, a repeat read was performed at least 48 h after the first acquisition under identical connection conditions to assess test–retest repeatability. In all eight cases, the SHA-256 hash of the repeat dump matched that of the original acquisition, demonstrating that the proposed ISP and Chip-Off workflow does not alter the content of the memory between acquisitions under stable hardware conditions. This result supports the classification of the ISP-based path as a repeatable technical assessment in the terminology of Cuomo et al. [
19], while acknowledging that Chip-Off procedures involving reballing or substrate repair introduce irreversible physical changes and therefore constitute non-repeatable assessments that require especially rigorous pre- and post-acquisition documentation.
3.10. Legal Admissibility Considerations
The procedural validity of forensic evidence derived from hardware-level acquisition depends heavily on its alignment with recognized international standards and, where applicable, jurisdiction-specific legal requirements. The workflow described in this study was designed with reference to NIST Special Publication 800–101 Revision 1 [
3] and the principles of ISO/IEC 27037:2012 [
20], both of which provide guidance on the identification, collection, acquisition, and preservation of digital evidence. The chain-of-custody procedures described in
Section 3.8 directly address the documentation requirements specified in these standards, including contemporaneous logging of each processing step, hash-based integrity verification, and use of write-blocking where technically feasible.
Within the Republic of Kazakhstan, the legal framework governing the admissibility of digital evidence is primarily established by the Code of Criminal Procedure (Law of the Republic of Kazakhstan № 231 V, as amended), which requires that electronic evidence be collected and preserved in a manner that guarantees its authenticity and integrity. The hash-verified, logged acquisition workflow employed in this study satisfies these requirements. In cross-border or multi-jurisdictional investigations involving European or United States authorities, additional considerations apply: EU Directive 2016/680 (Law Enforcement Directive) and the US Federal Rules of Evidence (Rule 501(b)(9)) impose comparable, though not identical, standards for demonstrating authenticity of electronically stored information. Forensic practitioners applying the proposed workflow in such contexts should additionally document the chain of software tool versions (including UFPI firmware version and dc3dd version), ensure that all tools used are validated against known test images, and obtain written case authorization before initiating any physical intervention.
An important limitation regarding legal admissibility concerns hardware encryption. Of the 18 devices examined, full-disk encryption (FDE) or file-based encryption (FBE) was present on at least 11 devices, as inferred from Android version and manufacturer specifications (Android 7.0 and above enforces FBE by default for new devices; Android 10 and above enforces it without exception). In practice, this means that while the proposed workflow successfully acquires a forensically sound raw image, the evidentiary value of the acquired content may be substantially constrained by encryption in the absence of a lawful key extraction mechanism, exploit-assisted decryption (which introduces its own procedural documentation requirements), or user-provided credentials [
21]. This limitation is consistent with observations in the related literature [
21,
22] and underscores the necessity of treating Chip-Off acquisition as a necessary but not always sufficient condition for full evidence recovery from modern Android devices.