The base code follows an object-oriented architecture with a strategy-based design, where different algorithms for identifying optimal bipartitions are implemented as classes that inherit from a common interface. This is done to create various implementations with specific functionalities that can then be compared.
4.5.1. Method 1: GPU-Accelerated Geometric Approach
This implementation stands out for the algorithm’s capability to directly evaluate the quality of different partitions through transition cost analysis between states, without the need to completely reconstruct the system dynamics. This advancement was made possible through the implementation of a cost table calculated in parallel using different threads, which significantly reduced execution times. Furthermore, two strategies were designed and applied to explore possible bipartitions from the initial state, evaluating their performance through metrics based on EMD. The implementation fundamentally revolves around a geometric-topological reformulation of state space, wherein binary states are mapped to vertices of an n-dimensional hypercube. This approach represents a paradigm shift from traditional combinatorial exhaustive methods by exploiting the inherent geometric structure of binary systems. The core algorithm (see Algorithm 5) is encapsulated within the Geometry class and employs a sophisticated bottom-up dynamic programming approach to optimize computational efficiency.
Macroalgorithm Geometry.aplicar_estrategia()
| Algorithm 5. GeometricStrategy_GPU_Accelerated |
| Prepare subsystem using condition, scope, and mechanism masks |
| Convert binary masks to active index lists: |
| scope_indices ← indices of ‘1’ bits in scope |
| mechanism_indices ← indices of ‘1’ bits in mechanism |
| Define initial state of mechanism: |
| Default: ‘1’ + ‘0’ * (n − 1) |
| Convert to index (big endian) |
| |
| Build cost table (bottom-up) for each scope variable: |
| Use multithreading to parallelize computation per variable |
| FOR each variable: |
| Calculate Hamming distances from initial state |
| Use dynamic programming by levels (from lower to higher distance) |
| Store costs in ‘cost_matrix’ |
| Choose search strategy: |
| IF total_vars ≤ 6 → USE exhaustive search (‘exhaustive_search’) |
| IF total_vars > 6 → USE heuristic strategies (‘strategy1’, ‘strategy2’, ‘cost_based’) |
| FOR each candidate partition: |
| Execute subsystem bipartition |
| Calculate marginal distribution |
| Calculate EMD against original distribution |
| IF best so far, STORE as ‘best_partition’ |
| Build and RETURN ‘Solution’ object with best partition found |
Computational Framework and Optimization Strategies
- Core Algorithmic Principles:
The software architecture demonstrates a modular design philosophy centered on performance optimization and scalability. The implementation leverages three fundamental computational strategies: (1) geometric cost table construction through dynamic programming, (2) parallel computation exploiting independence between variable calculations, and (3) adaptive heuristic approaches for large-scale system analysis.
- Key Optimizations Approaches
(a) Algorithmic Optimizations:
- -
Demand-Driven Calculation: Avoids complete partition construction for large systems
- -
ThreadPoolExecutor Parallelization: Parallel cost table computation
- -
Bottom-Up Dynamic Programming: Intermediate result storage to eliminate redundant calculations
- -
Symmetry Exploitation: Leverages complementary state relationships to reduce search space complexity. For instance, it considers pairs of states where one binary state is the complement of the other, thereby reducing the search space. Additionally, transitions with only a one-bit difference from the initial state are prioritized, further decreasing the number of required comparisons.
- -
Adaptive Heuristic Strategies: Employs multiple partition identification strategies based on cost and structural criteria.
- -
The parallelization strategy using ThreadPoolExecutor enables simultaneous cost table computation, which represents one of the most computationally expensive steps of the algorithm, across multiple CPU cores. Given that the cost computations for each variable are completely independent from one another, it is not necessary to wait for one variable to finish processing before starting the next. This independence enables the simultaneous execution of multiple variable computations, while the bottom-up dynamic programming approach stores intermediate results to eliminate redundant calculations.
(b) Hardware acceleration optimizations:
- -
GPU parallelization: Utilizes graphics processing units for computationally intensive geometric operations including hypercube traversal and cost matrix calculations
- -
Memory optimization: Implements efficient data structures and caching mechanisms for rapid access to geometric state information.
- -
Vectorized operations: Exploits SIMD capabilities for parallel processing of state transitions and cost computations.
- Performance Architecture:
The framework achieves computational efficiency through synergistic optimization layers combining algorithmic design principles with hardware-specific acceleration techniques. This multi-tier approach enables practical analysis of large-scale systems while maintaining theoretical rigor and mathematical correctness.
Performance Analysis
(a) Performance Metrics. The results obtained (see
Table 9) show that:
- -
For small systems (10 variables), Geometric was on average 104 times faster than PyPhi.
For intermediate systems (15A and 15B), the speedup was more modest (~2×), as both methods were still computationally manageable.
- -
For large systems (20 variables), Geometric achieved an extreme speedup of over 1500× compared to QNodes, demonstrating its scalability.
Demonstrated Scalability:
Systems up to 23 nodes: Successful processing (223 = 8,388,608 states)
Execution time: <15 s for 23 variables
Memory consumption: Stable even with increasing complexity
Complexity-Based Analysis:
≤6 variables: Optimal behavior, exhaustive search
7–12 variables: Maintained performance with heuristics, slight precision loss
≥13 variables: Only heuristics viable, good approximation maintained
Hardware Configuration:
CPU: AMD Ryzen 5 7535H
GPU: NVIDIA GeForce RTX 2050
Updated GPU drivers for complete CUDA support
Comparative Analysis with PyPhi:
The experimental evaluation demonstrates exceptional performance characteristics across multiple system scales. For ten-variable systems, the implementation achieves a relative error of 0.00043% while providing approximately 104× speedup compared to PyPhi, with execution times under one second (see
Figure 11). The performance for fifteen-variable systems shows interesting variation, with 15A systems achieving 0.00488% error and 1.9× speedup, while 15B systems exhibit 952.755% error and 1.69× speedup, indicating sensitivity to specific system characteristics. Twenty-variable systems demonstrate remarkable efficiency with 0.18611% error and 1531× speedup, completing execution in under ten seconds (see
Figure 12).
The scalability analysis reveals particularly impressive capabilities, with successful processing of systems up to 23 nodes representing 223 = 8.388.608 possible states. Execution times remain below 15 s for 23-variable systems, while memory consumption remains stable across increasing complexity levels. The complexity-dependent performance analysis shows optimal behavior for systems with six or fewer variables using exhaustive search, maintained performance with slight precision loss for 7–12 variable systems using heuristics, and viable heuristic-only processing for systems with 13 or more variables while preserving good approximation quality.
The geometric-topological reformulation represents a fundamental conceptual advance, mapping binary states to hypercube vertices to dramatically reduce search space complexity. The direct quality evaluation approach analyzes transition costs without requiring complete system dynamics reconstruction, providing computational efficiency while maintaining accuracy.
The GPU acceleration with CUDA represents an implementation to utilize graphics processing units for IIT calculations, establishing an interesting paradigm for computational neuroscience applications. Comprehensive NCube optimization involving JIT compilation and vectorization enables unprecedented performance scaling. The adaptive heuristic strategies dynamically activate based on system complexity, ensuring optimal performance across diverse problem scales. The geometric dynamic programming application in hypercube contexts represents a novel algorithmic contribution, while the optimized binary data handling using bit masks and bitwise operators provides significant performance improvements.
(b) Efficiency Function. The GPU-accelerated Geometry Strategy combines two levels of parallelization: CPU-level parallelization (multi-core processing) and GPU-level parallelization (massively parallel processing with CUDA/Numba). To derive a global efficiency function f(N) that captures how well the implementation leverages parallelism, we need to analyze the contribution of each component.
Total Computational Work (Without Parallelization):
The general computational problem, without parallelization, has a complexity of:
Ttotal ∼ where
: Number of purview variables.
: Number of mechanism variables.
: Number of partitions evaluated under heuristic methods.
and Exponential sizes representing dominant contributions from states and partitions.
CPU Parallelization: In the reported implementation, the
Na variables are processed simultaneously using P
CPU threads (processor cores). This reduces the effective time for cost generation:
GPU Acceleration (CUDA/NUMBA): The efficiency of the GPU (C
GPU < 1) comes from massive parallel processing. Numba CUDA divides matrix operations (such as cost_table or EMD) into thousands of simultaneous threads. The cost reduction is represented as a multiplicative factor:
Final Efficiency: The overall efficiency, combining CPU and GPU, can be described as:
For small systems (N ≤ 6) The (K2N) term dominates.
For large systems (N > 6) the term is dominant. Here, GPU parallelization drastically reduces the impact of the term.
The parallelized implementation does not reduce the theoretical complexity order (still exponential terms), but in practice, it introduces highly favorable constant factors due to PCPU and CGPU, enabling the execution of systems that would be intractable under sequential approaches.
(c) Interpretation of Accuracy Metrics
(1) Cases with exact solutions: The algorithm exhibits high accuracy in small systems (≤6 variables combined between mechanism and purview), where it is feasible to perform an exhaustive search of all possible partitions. In such cases, the strategy evaluates all bipartition combinations and selects the one that effectively minimizes the marginal distribution distance (measured using EMD), yielding optimal or near-optimal solutions.
(2) Situations causing deviations: In larger systems, where the total number of variables exceeds the threshold (more than 6 variables), heuristic strategies such as strategy1_partitions, strategy2_partitions, and cost_based_partitions are employed. These approximations reduce computational complexity but introduce a loss of accuracy. Therefore, deviations arise due to:
The non-exhaustive nature of heuristic strategies.
The simplification involved in selecting partitions based on local cost measures, without evaluating the entire space of possible combinations.
(d) CPU-Only Performance Analysis
To address diverse research environments, we evaluated Method 1 performance on CPU-only hardware configurations. While GPU acceleration provides optimal performance, the implementation maintains significant advantages over traditional methods even without GPU support:
- -
CPU-only speedup: 15–25× improvement over PyPhi for systems ≤15 variables.
- -
Scalability limitation: CPU-only processing reaches practical limits around 18–20 variables.
- -
Memory efficiency: CPU implementation maintains stable memory usage through optimized data structures.
- -
Fallback strategy: The implementation automatically detects GPU availability and gracefully degrades to CPU-only mode when necessary For research teams without GPU resources, Method 1 still provides substantial performance improvements, though Method 2 (Dynamic Programming Reformulation) may offer better cost-effectiveness for medium-scale systems (10–20 variables) in CPU-only environments.
4.5.2. Method 2: Dynamic Programming Reformulation Approach
The implementation fundamentally centers around the development of the FIND_MIP algorithm, which represents a revolutionary approach to solving the minimum information partition problem through dynamic programming techniques. The core innovation lies in a critical reformulation of the transition cost calculation model that transforms a computationally intractable recursive problem into one amenable to dynamic programming optimization. The original model (see Equation (5))
where k represents immediate neighbors of i in an optimal path toward j, presented significant computational challenges due to its recursive dependency structure that prevented effective memorization. The implemented reformulation as Equation (10):
where
k represents immediate neighbors of
j in an optimal path from
i, fundamentally alters the dependency relationships by ensuring that all transition costs
T(i,k) originate from the same initial state
i, thereby enabling comprehensive reuse of computed values.
This reformulation allows us, on one hand, to eliminate recursion and, on the other hand, to avoid computations that are used only once. The idea for this proposed modification to the model arose from a graphical analysis aimed at observing the behavior of the original model.
Here we see that, for a specific example where the transition cost from an initial state 000 to a final state 111 is to be calculated, the costs used are those from the immediate neighbors of the initial state to the final state—without considering the cost of transitioning from the initial state to those neighbors. This implies the need for specific computations for each transition. For example, in this case, the transition costs needed would be for 100 → 111, 010 → 111, and 001 → 111, that appear marked in red in
Figure 13, which will not be reused. The reformulation proposes replacing these specific computations with:
In
Figure 14, we observe that the transition costs to be summed for the transition from the initial state 000 to the final state 111 will now be those from the initial state to the neighbors of the final state, without considering the cost of transitioning from these neighbors to the final state. This will allow all calculated transition costs to originate from the initial state 000 toward all other states. In the end, each computed transition cost will be used for the identification of candidate partitions. This proposal is better illustrated in the macro algorithm presented below (see Algorithm 6).
Macro Algorithm
| Algorithm 6: FIND_MIP(initial_state, final_state) |
| // n = number of bits (variables) |
| INITIALIZE transition_table ← empty |
| paths [0] ← {initial_state} |
| // traverse Hamming distance levels d = 1…n |
| FOR d = 1 TO n DO |
| paths[d] ← ∅ |
| FOR each state ∈ paths[d-] DO |
| FOR i = 0 TO n-1 WHERE state[i] ≠ final_state[i] DO |
| new_state ← state with bit i “flipped” toward final_state |
| IF new_state NOT IN paths[d] THEN |
| paths[d].add(new_state) |
| CALCULATE_COST(initial_state, new_state) |
| END IF |
| END FOR |
| END FOR |
| END FOR |
The FIND_MIP algorithm (see Algorithm 6) operates through a systematic three-phase approach that leverages Hamming distance-based exploration to construct the solution space incrementally. The algorithm begins with initialization of the transition table and establishes paths [0] with the initial state, where paths[d] represent the set of states at Hamming distance d from the initial state progressing toward the final state. The exploration phase iterates through Hamming distance levels from 1 to n, where for each level d, the algorithm processes states from paths[d-1] and generates new states by strategically flipping bits that align with the target final state. This directed bit-flipping operation, which only considers transitions that reduce or maintain Hamming distance to the final state, dramatically reduces the search space by eliminating paths that diverge from the objective. The cost calculation phase invokes CALCULATE_COST for each new state, applying the reformulated model while leveraging previously computed transition costs stored in the memoization table. That is, it is established that the transition cost table will be filled level by level, with each level “d” representing the set of states at a Hamming distance “d” from the initial state. This is because the transition costs from the initial state to a state at level “n” require cost values from transitions between the initial state and certain states at level “n − 1”.
Software modules development
Modular Architecture:
SIA Base Class (System Information Architecture): Common interface and shared methods
- -
Central System Model: Management of n-dimensional NCube structures
- -
Geometric Class: Specific implementation with specialized methods:
- ○
aplicar_estrategia(): Main coordinator
- ○
find_mip(initial_state, final_state): Central algorithm
- ○
calcular_costos_nivel(): Hamming level processing
- ○
calcular_costo(): Reformulated cost function
- ○
identificar_particiones_optimas(): Candidate search
The implementation integrates seamlessly within a modular architecture designed to accommodate multiple strategy implementations while maintaining architectural consistency and reusability. The foundational architecture relies on a base SIA (System Information Architecture) class that provides common interface methods and shared functionality across different strategies, including essential operations like sia_preparar_subsistema for system initialization. The geometric strategy implementation extends this base through the specialized Geometric class, which implements the core method aplicar_estrategia that orchestrates the entire optimization process.
The modular design encompasses several specialized components that work synergistically to deliver the complete solution.
The find_mip() function is divided into three stages: the first involves calculating the cost table, the second focuses on identifying candidate partitions, and finally, the third evaluates and selects the most optimal partition. This function serves as the primary coordinator, encapsulating the cost calculation logic and partition search methodology. Supporting functions include calcular_costos_nivel(), which manages the systematic population of the transition table level by level according to Hamming distance, and calcular_costo(), which implements the reformulated cost function while utilizing memoization for computational efficiency.
The identificar_particiones_optimas() function handles the generation and evaluation of candidate partitions, incorporating symmetry exploitation and heuristic optimization to reduce computational overhead. This modular approach enables seamless integration with existing system components while providing clear separation of concerns and maintainability.
Interfaces and Data Structures
Optimized Data Structures:
- -
tabla_transiciones: Dictionary with key (initial_state, current_state) and value T(i,j)
- -
caminos: List of sets organized by Hamming distance
- -
memoria_particiones: Candidate storage with format ((present), (future)) → (loss, time))
The implementation employs carefully designed data structures optimized for both memory efficiency and computational performance in the context of dynamic programming operations. The central tabla_transiciones serves as a comprehensive memoization table, implemented as a dictionary or hash map structure with keys represented as tuples of state pairs (estado_inicial, estado_final_tuple) and values containing the computed transition costs T(i,j). This structure is fundamental to the dynamic programming approach, enabling constant-time lookup of previously computed transition costs and eliminating redundant calculations.
The caminos data structure represents a sophisticated level-based organization system implemented as a list of sets, where each caminos[d] contains all states at Hamming distance d from the initial state. This structure supports efficient membership testing through nuevo not in caminos[d] operations and facilitates the systematic exploration of the state space in a breadth-first manner according to Hamming distance. The implementation leverages n-dimensional NCube structures managed by the central System model to represent and manipulate state values and variable associations. State representation utilizes tuple or binary array formats to encode system configurations efficiently, while partitions are represented through sets of present and future variables that define the bipartition structure.
Optimizations Implemented
- -
Recursive Reformulation: Transformation of dependency T(k,j) → T(i,k)
- -
Level-based Dynamic Programming: Systematic bottom-up construction
- -
Efficient Memoization: Reusable transition table
- -
Symmetry Exploitation: Search up to intermediate Hamming level
- -
Directed Generation: Heuristic exploration toward final_state
The implementation incorporates multiple layers of optimization that collectively transform the computational complexity from exponential recursion to manageable dynamic programming. The fundamental optimization lies in the algorithmic reformulation that enables memoization by ensuring all subproblems T(i,k) originate from the same initial state i, making their solutions reusable across different computational contexts. The systematic Hamming distance level traversal guarantees that dependency relationships for dynamic programming are satisfied automatically, as all required subproblem solutions are computed and stored before they are needed in subsequent calculations.
Symmetry exploitation represents another significant optimization, where the partition candidate search is limited to intermediate Hamming distance levels based on the inherent symmetry properties of the bipartition problem. This approach reduces the search space without compromising the exhaustiveness of partition type consideration. The directed bit-flipping strategy focuses computational effort exclusively on state transitions that progress toward the target final state, effectively pruning the exploration tree by eliminating divergent paths. The level-by-level construction of the transition table ensures optimal ordering of dependencies, maintaining the integrity of the dynamic programming approach while maximizing computational efficiency through strategic reuse of intermediate results.
Performance Analysis
(a) Performance Metrics: The results obtained are shown in
Table 10:
(b) Demonstrated Scalability:
- -
Systems up to 20 nodes: Successful processing where PyPhi fails due to memory limitations
- -
Memory efficiency: Stable consumption through memoization and dynamic programming
- -
Execution time scaling: Maintains reasonable processing times even for large systems.
(c) Comparative Analysis:
- -
Precision: Exceptional accuracy with 96–100% hit rates and 0% relative error across all test cases
- -
Performance vs. PyPhi: Outstanding speedups ranging from 1.73x to 326.83x, with most dramatic improvements on 10-node systems
- -
Performance vs. QNodes: Competitive performance with 8.32x speedup against QNodes.
- -
Structural Fidelity: Maximum structural distances of 0–0.5 indicate partitions that are identical or structurally very similar to optimal solutions.
(d) Φ Value Validation: Critical to our validation is confirming that Φ values calculated using GeoMIP-identified MIPs match those from PyPhi (gold standard). Our comprehensive Φ validation includes:
- -
Direct Φ comparison: For each identified bipartition, we compute the complete Φ value using both GeoMIP and PyPhi methodologies to ensure consistency in consciousness quantification.
- -
Cost function accuracy: Verification that our cost function application preserves the information-theoretic foundations required for accurate Φ calculation.
The comprehensive experimental evaluation demonstrates exceptional performance characteristics across multiple system scales, with systematic comparison against established benchmarks including PyPhi and QNodes implementations. The evaluation employs rigorous metrics including hit rate (percentage of exact bipartition matches), maximum relative error in
values, maximum structural distance using Jaccard distance, and relative speedup measurements. For systems ranging from 3 to 15 nodes, the implementation consistently achieves hit rates between 96% and 100%, with maximum relative errors of 0% across all test cases, indicating perfect precision in identifying optimal partition values (see
Table 10).
The performance analysis reveals remarkable speedup achievements, particularly evident in the comparison with PyPhi where systems of 10 nodes demonstrate 326.83× acceleration (see
Figure 15), while 15-node systems achieve speedups ranging from 164.33× to 173.46×. The implementation maintains consistent structural accuracy with maximum structural distances remaining at 0 for most test cases and reaching only 0.5 in exceptional cases, demonstrating the algorithm’s ability to identify partitions that are either identical or structurally very similar to optimal solutions.
For larger systems of 20 nodes where PyPhi becomes computationally intractable, as previously mentioned, QNodes is used as the reference. In
Figure 16, it can be observed that the Geometric strategy shows a significant reduction in execution time compared to QNodes for the vast majority of the 50 evaluated subsystems. The implementation achieves 8.32× speedup compared to QNodes strategy.
(b) Efficiency Function.
FIND_MIP Algorithm with Dynamic Programming:
Key reformulation:
Reformulated model:
Phase 1: Level-by-level table construction
- -
External loop: n iterations (Hamming distance levels)
- -
States per level d:
- -
Cost per state: O(n) for neighbor calculation CALCULATE_COST
Total:
Phase 2: Partition search
- -
Systematic exploration up to intermediate level: ⌊n/2⌋
States evaluated: