Next Article in Journal
RbfCon: Construct Radial Basis Function Neural Networks with Grammatical Evolution
Previous Article in Journal
Analysing Quality Metrics and Automated Scoring of Code Reviews
 
 
Article
Peer-Review Record

Implementing Mathematics of Arrays in Modern Fortran: Efficiency and Efficacy

Software 2024, 3(4), 534-548; https://doi.org/10.3390/software3040026
by Arjen Markus 1,*,† and Lenore Mullin 2,†
Reviewer 1:
Reviewer 2: Anonymous
Reviewer 3:
Software 2024, 3(4), 534-548; https://doi.org/10.3390/software3040026
Submission received: 26 October 2024 / Revised: 23 November 2024 / Accepted: 26 November 2024 / Published: 30 November 2024

Round 1

Reviewer 1 Report

Comments and Suggestions for Authors The authors explore the potential of combining the Mathematics of Arrays (MoA) formal "approach" with modern Fortran implementation for optimizing array-based algorithms, particularly in the context of high-performance computing (HPC) and artificial intelligence (AI).   To my understanding, the manuscript presents interesting findings:   * A novel approach to implement the MoA catenation operation in Fortran without incurring the overhead of data copying. The authors introduce a new class (moa_view_type) that leverages Fortran pointers to create a combined "view" of multiple arrays, preserving the efficiency of MoA while adhering to Fortran's syntax and data structures.  Experiments demonstrate that this method is significantly faster than Fortran's default array concatenation, especially for large arrays.   * The usage analysis reveals that the moa_view_type approach maintains a consistent and predictable memory footprint even when concatenating numerous large arrays. In contrast, using Fortran's standard array extension mechanism resulted in unexpectedly high memory consumption, exceeding the anticipated allocation by a factor of 3 to 4.6, depending on the compiler used.    * The paper provides a comprehensive mapping of MoA operations to corresponding Fortran features, including intrinsic functions, array syntax, and pointer manipulation. This mapping highlights the extent to which Fortran inherently supports MoA concepts, facilitating the translation of MoA algorithms into efficient Fortran code.   * The authors argue that by expressing algorithms using MoA principles and the associated notations, Fortran compilers can leverage this high-level representation to perform more advanced optimizations. This could include eliminating intermediate results, exploiting parallelism opportunities, and selecting optimal implementations based on target hardware characteristics.    Moreover, is valuable that the authors also point out some limitations of the proposed MoA implementation:   * While the moa_view_type excels in catenation efficiency, experiments show that accessing individual array elements using this method is 2-3 times slower than plain Fortran arrays. This performance gap is attributed to factors such as range checking and the need to locate the appropriate array segment for each element access.   * The authors point out that despite the conceptual advantages of MoA, current tools' support for automating the conversion of MoA-based algorithms into optimized code (ψ-Reduction) is lacking. The reliance on manual translation and the absence of MoA-aware compilers hinder the practical adoption of MoA in Fortran.   While Fortran offers compatibility with many MoA operations, certain functionalities, such as dimension lifting, require workarounds using pointers to avoid data copying. The authors point out the limitations of Fortran's reshape() function for dimension lifting due to its reliance on data copying, which conflicts with MoA's view-based philosophy. This work expresses a need for Fortran language extensions, like index expansion, to streamline the implementation of such MoA operations.   Based on the above considerations, I think that the paper makes a compelling case for the potential synergy between MoA and Fortran in optimizing array-based algorithms. The novel implementation of the moa_view_type class demonstrates the feasibility of achieving MoA's efficiency within Fortran's framework. However, the performance overhead in array element access and the limited tool support for MoA currently hinder its widespread applicability. The paper's insights into the mapping between MoA operations and Fortran features, combined with the exploration of potential compiler-level optimizations, lay a strong foundation for future research and development in this domain. The authors' call for MoA-aware compilers and the expansion of Fortran's intrinsic function library with a complete set of MoA operations pave the way for realizing the full potential of the promising combination MoA/Fortran.

 

Author Response

Comments 1:

The authors explore the potential of combining the Mathematics of Arrays (MoA) formal "approach" with modern Fortran implementation for optimizing array-based algorithms, particularly in the context of high-performance computing (HPC) and artificial intelligence (AI).   To my understanding, the manuscript presents interesting findings:   * A novel approach to implement the MoA catenation operation in Fortran without incurring the overhead of data copying. The authors introduce a new class (moa_view_type) that leverages Fortran pointers to create a combined "view" of multiple arrays, preserving the efficiency of MoA while adhering to Fortran's syntax and data structures.  Experiments demonstrate that this method is significantly faster than Fortran's default array concatenation, especially for large arrays.   * The usage analysis reveals that the moa_view_type approach maintains a consistent and predictable memory footprint even when concatenating numerous large arrays. In contrast, using Fortran's standard array extension mechanism resulted in unexpectedly high memory consumption, exceeding the anticipated allocation by a factor of 3 to 4.6, depending on the compiler used.    * The paper provides a comprehensive mapping of MoA operations to corresponding Fortran features, including intrinsic functions, array syntax, and pointer manipulationThis mapping highlights the extent to which Fortran inherently supports MoA concepts, facilitating the translation of MoA algorithms into efficient Fortran code.   * The authors argue that by expressing algorithms using MoA principles and the associated notations, Fortran compilers can leverage this high-level representation to perform more advanced optimizations. This could include eliminating intermediate results, exploiting parallelism opportunities, and selecting optimal implementations based on target hardware characteristics.    Moreover, is valuable that the authors also point out some limitations of the proposed MoA implementation:   * While the moa_view_type excels in catenation efficiency, experiments show that accessing individual array elements using this method is 2-3 times slower than plain Fortran arrays. This performance gap is attributed to factors such as range checking and the need to locate the appropriate array segment for each element access.   * The authors point out that despite the conceptual advantages of MoA, current tools' support for automating the conversion of MoA-based algorithms into optimized code (ψ-Reduction) is lacking. The reliance on manual translation and the absence of MoA-aware compilers hinder the practical adoption of MoA in Fortran.   * While Fortran offers compatibility with many MoA operations, certain functionalities, such as dimension lifting, require workarounds using pointers to avoid data copying. The authors point out the limitations of Fortran's reshape() function for dimension lifting due to its reliance on data copying, which conflicts with MoA's view-based philosophy. This work expresses a need for Fortran language extensions, like index expansion, to streamline the implementation of such MoA operations.   Based on the above considerations, I think that the paper makes a compelling case for the potential synergy between MoA and Fortran in optimizing array-based algorithms. The novel implementation of the moa_view_typeclass demonstrates the feasibility of achieving MoA's efficiency within Fortran's framework. However, the performance overhead in array element access and the limited tool support for MoA currently hinder its widespread applicability. The paper's insights into the mapping between MoA operations and Fortran features, combined with the exploration of potential compiler-level optimizations, lay a strong foundation for future research and development in this domain. The authors' call for MoA-aware compilers and the expansion of Fortran's intrinsic function library with a complete set of MoA operations pave the way for realizing the full potential of the promising combination

Reply 1:

We thank the reviewer for the clear and kind comments. There is no comment that suggested a change to the manuscript, so we made none.

Reviewer 2 Report

Comments and Suggestions for Authors

 

The manuscript explores the integration of the Mathematics of Arrays (MoA)  with Modern Fortran language specifications, presenting how this can lead to optimized array operations without intermediates. This area certainly intrigues the numerical simulations and HPC community.

Minor Comments:

Section 3.2, L282: It would be great if the authors can add a brief comment on whether/how cache memory on CPU would influence the results shown in Figure  2.
Section 3.3, L337: The total time for the catenation experiments is more or less the same for the largest data sizes. The authors can add a comment on to why there is a difference in the timings from different chunk sizes for the smallest total size and the set expectations on when there would be difference in total timing.

Section 4, L342: If the authors can state some comparisons with (a) Comparisons with other languages as well. A comparison with programming languages such as Julia, C++17/23 with specialized libraries that has array handling capabilities could provide additional context. (b) Consider adding more practical use cases or examples where this approach could outperforms current methodologies in HPC or AI applications could enhance the paper's impact.

Author Response

The manuscript explores the integration of the Mathematics of Arrays (MoA)  with Modern Fortran language specifications, presenting how this can lead to optimized array operations without intermediates. This area certainly intrigues the numerical simulations and HPC community.

Minor Comments:

Comment 1:

Section 3.2, L282: It would be great if the authors can add a brief comment on whether/how cache memory on CPU would influence the results shown in Figure  2.

Reply:

We have added the following text:

As the memory management of current-day computers is quite complex, the details of that management in combination with the access patterns in the program are likely to influence the form of both curves. Our experiments did not explore how to prefetch addresses into caches through compiler flags and/or pragmas. We certainly would get better performance if we used them and the compiler complied. Moreover, it would be better if we employed "dimension lifting" in the compiler, so that, based on cache size, matrix size, and memory addresses, prefetching and buffering would be used. There are many optimizations that MoA can explore based on speeds and sizes of things by breaking up loops. The size of the chunks would be part of the partitioning to bring in ideal cache sizes with prefetching up and down the memory hierarchy. Thus, a loop could be broken into two or more loops based on where the addresses are.

 

Comment 2:

Section 3.3, L337: The total time for the catenation experiments is more or less the same for the largest data sizes. The authors can add a comment on to why there is a difference in the timings from different chunk sizes for the smallest total size and the set expectations on when there would be difference in total timing.

Reply:

We have added the following text:

As can be seen, the total run time is more or less constant for the two catenation experiments, whereas with the plain arrays, the run time grows steadily. For the lower range of the total size there is a difference in timing with the MoA method. In the case of 10 chunks, the individual chunks are fairly large and thus accessing the array elements with a stride (\verb+step+ in the code fragment) more frequently hits the same chunk than in the case of 100 chunks. If the chunks are large enough, the overhead of finding the new chunk and loading that into the working memory gets small in relation to the total time spent in each chunk.

 

Comment 3:

Section 4, L342: If the authors can state some comparisons with (a) Comparisons with other languages as well. A comparison with programming languages such as Julia, C++17/23 with specialized libraries that has array handling capabilities could provide additional context. (b) Consider adding more practical use cases or examples where this approach could outperforms current methodologies in HPC or AI applications could enhance the paper's impact.

Reply:

We have added the following text:

The philosophy of Mathematics of Arrays is, of course, not limited to Fortran. Many programming languages emphasize arrays as an important data structure. For instance, for C++ the Boost collection of libraries offers facilities to make working with arrays easier. The Julia programming language, in turn, defines different classes of arrays as part of the base language. Applying the ideas
of MoA to these languages would enable a more or less language-agnostic development of useful algorithms.

For several algorithms, matrix multiplication and fast Fourier transforms among others, Grout and Mullin discuss the application of MoA and the adaptation of the actual implementation to specialised hardware.
This shows the advantage of an abstract formulation via MoA of such algorithms.

 

Reviewer 3 Report

Comments and Suggestions for Authors

This manuscript offers an excellent exposition of the mathematical foundations of array-based programming languages.
The ideas presented here can guide existing and future programming languages targeting scientific applications and users.
The detailed information on the common operations in the Mathematics of Arrays (MoA) will also prove an indispensable guideline for compiler developers to resolve confusion and ambiguities, particularly in the context of the Fortran programming language.
The proposed ideas in this manuscript can also greatly aid the language standard committees, particularly the Fortran J3/WG5, toward designing and adding the new MoA concepts that are highly desired and requested within the community.
Although some ideas, such as views and windows into arrays instead of copies, are not new, this work's formal presentation of the ideas and relevant benchmarks can be a valuable reference for future discussions.

Suggestions:
+ Section 1.1: FFP must be presented in full form before being used as an abbreviation.
+   Figure 1. I recommend the axes labels be written in a more readable form, e.g., avoid scientific notation. Instead, use LaTeX's default notation for exponentiation: $10^{-5}$ (same applies to Figure 2). This will make the information in the figures more readily digestible.
    I wonder how much the computer architecture and memory specifics could impact the benchmarks presented in Figure 1; for example, does the array size where the crossing of the two curves occurs depend on the architecture?
    It may be a good idea to replace this graph with (or add) a new graph where the Y-axis shows the relative speed rather than absolute time because absolute time can be highly processor-dependent.
+   Reference 10 does not seem right for the Fortran 2018 standard. The proper citation would be:
    The following would be more appropriate:
    +   ISO/IEC (2018), 'International Standard ISO/IEC 1539-1:2018 Information technology - Programming languages - Fortran - Part 1: Base language', ISO/IEC, Geneva.
    +   @inproceedings{reid2018new,
            title={The new features of Fortran 2018},
            author={Reid, John},
            booktitle={ACM SIGPLAN fortran forum},
            volume={37},
            number={1},
            pages={5--43},
            year={2018},
            organization={ACM New York, NY, USA}
        }
    If the authors wish to keep the current reference 10 as is, then the sentence in the manuscript must be revised to avoid misleading the audience into thinking that reference 10 is an official publication of the Fortran 2018 standard by ISO.

Once the authors resolve or address the above minor issues, I believe this manuscript will be ready for publication in MDPI Software.

Author Response

Comment 1:

+ Section 1.1: FFP must be presented in full form before being used as an abbreviation.

Reply:

Indeed, the abbreviation needed to be explained. The new text reads:

... will allow Fortran to be the formal functional programming system (FFP) envisioned by Backus, with low-level details left to the compiler.

 

Comment 2:
+   Figure 1. I recommend the axes labels be written in a more readable form, e.g., avoid scientific notation. Instead, use LaTeX's default notation for exponentiation: $10^{-5}$ (same applies to Figure 2). This will make the information in the figures more readily digestible.

Reply:

We have changed the axis labelling, this makes the figures more readable.

 

Comment 3:
    I wonder how much the computer architecture and memory specifics could impact the benchmarks presented in Figure 1; for example, does the array size where the crossing of the two curves occurs depend on the architecture?

Reply:

The reviewer makes an interesting remark and that could be the subject of further research. To highlight this in the manuscript we added the following text:

As the memory management of current-day computers is quite complex, the details of that management in combination with the access patterns in the program are likely to influence the form of both curves. Our experiments did not explore how to prefetch addresses into caches through compiler flags and/or pragmas. We certainly would get better performance if we used them and the compiler complied. Moreover, it would be better if we employed "dimension lifting" in the compiler, so that, based on cache size, matrix size, and memory addresses, prefetching and buffering would be used. There are many optimizations that MoA can explore based on speeds and sizes of things by breaking up loops. The size of the chunks would be part of the partitioning to bring in ideal cache sizes with prefetching up and down the memory hierarchy. Thus, a loop could be broken into two or more loops based on where the addresses are.

 

Comment 4:
    It may be a good idea to replace this graph with (or add) a new graph where the Y-axis shows the relative speed rather than absolute time because absolute time can be highly processor-dependent.

Reply:

Yes, such timings are very dependent on the actual machine and the (system) software involved. The suggestion is worthwhile, so we present the timings now as relative to the "smallest" case.

 

Comment 5:
+   Reference 10 does not seem right for the Fortran 2018 standard. The proper citation would be:
    The following would be more appropriate:
    +   ISO/IEC (2018), 'International Standard ISO/IEC 1539-1:2018 Information technology - Programming languages - Fortran - Part 1: Base language', ISO/IEC, Geneva.
    +   @inproceedings{reid2018new,
            title={The new features of Fortran 2018},
            author={Reid, John},
            booktitle={ACM SIGPLAN fortran forum},
            volume={37},
            number={1},
            pages={5--43},
            year={2018},
            organization={ACM New York, NY, USA}
        }
    If the authors wish to keep the current reference 10 as is, then the sentence in the manuscript must be revised to avoid misleading the audience into thinking that reference 10 is an official publication of the Fortran 2018 standard by ISO.

Reply:

While the book by Milan Curcic is definitely worth reading, it does not serve well as a reference to the standard. We have adopted the suggestions by the reviewer.

 

Comment 6:

Once the authors resolve or address the above minor issues, I believe this manuscript will be ready for publication in MDPI Software.

Reply:

We thank the reviewer for the kind words and the suggestions.

Back to TopTop