Machine Learning Techniques in Radio-over-Fiber Systems and Networks

: The radio-over-ﬁber (RoF) technology has been widely studied during the past decades to extend the wireless communication coverage by leveraging the low-loss and broad bandwidth advantages of the optical ﬁber. With the increasing need for wireless communications, using millimeter-waves (mm-wave) in wireless communications has become the recent trend and many attempts have been made to build high-throughput and robust mm-wave RoF systems during the past a few years. Whilst the RoF technology provides many beneﬁts, it su ﬀ ers from several fundamental limitations due to the analog optical link, including the ﬁber chromatic dispersion and nonlinear impairments. Various approaches have been proposed to address these limitations. In particular, machine learning (ML) algorithms have attracted intensive research attention as a promising candidate for handling the complicated physical layer impairments in RoF systems, especially the nonlinearity during signal modulation, transmission and detection. In this paper, we review recent advancements in ML techniques for RoF systems, especially those which utilize ML models as physical layer signal processors to mitigate various types of impairments and to improve the system performance. In addition, ML algorithms have also been widely adopted for highly e ﬃ cient RoF network management and resource allocation, such as the dynamic bandwidth allocation and network fault detection. In this paper, we also review the recent works in these research domains. Finally, several key open questions that need to be addressed in the future and possible solutions of these questions are also discussed. RoF-based fronthaul, and RoF-based ﬁber-wireless converged access networks (e.g., passive optical networks). achieved using the autonomous reinforcement learning algorithm. In Reference [117], a reinforcement learning scheme has been proposed for the dynamic channel selection in a cognitive RoF network, where the best RF channel has been selected amongst di ﬀ erent frequency bands to minimize the interference and to optimize the network-wide performance. The Q-learning has been used as well and the signal-to-interference-plus noise ratio (SINR) has been selected as the main criteria of the reward. Results have demonstrated the capability of the strategy in avoiding aggregated interference, reducing network outage probability and increasing the throughput. In Reference [118], the reinforcement learning algorithm has been applied to optimize the placement of distributed unit (DU) and central unit (CU) in the 5G and beyond-5G ﬁber-wireless networks to serve diverse services. In the reinforcement learning scheme, the combination of the service type, the residual capacity of links and the processing pools have been used as the state, the locations of DU and CN and the optical path have been used as the action and the reward has been designed to reﬂect the capacity and latency constrains. The Q-learning algorithm has been implemented with a neural network, which consists of two CNNs followed by two fully connected layers. Results have shown that both large-scale service paradigm and bandwidth resource-saving can be achieved. In Reference [119], the BBU placement and routing in a C-RAN network has also been optimized using the reinforcement learning approach to improve the network resource allocation and to reduce the network latency. The reinforcement learning has also been applied in the mm-wave RoF system for real-time interference avoidance [120]. The log-value of BER di ﬀ erence between di ﬀ erent states has been used for the reward and the SARSA reinforcement learning algorithm has been used, which has been shown to be more e ﬀ ective than the Q-learning algorithm in real-time systems.


Introduction
With the wide availability of high-performance and portable personal electronic devices, such as smart phones and tablets, our demand for ubiquitous wireless communications has grown explosively during the past decades [1,2]. In addition, the need of high-speed wireless communications has also increased substantially, driven by broadband and bandwidth-intensive applications, such as ultra-high-definition video-on-demand, virtual reality (VR) and augmented reality (AR). To meet these requirements, we have seen rapid development and deployment of wireless communication technologies, such as the widely adoption of small cell and massive multi-input multi-output (MIMO) [3]. In addition, due to the congestion of lower radio frequency (RF) band, higher RF spectral region has been explored, such as the millimeter-wave (mm-wave) region and the terahertz region [4,5]. Given that the terahertz communications are still in the early stage of investigation, the mm-wave frequency is widely studied for wireless communications. Compared with the lower RF band, much broader bandwidth is Figure 1. Radio-over-fiber (RoF) system architecture, including RoF-based backhaul, RoF-based fronthaul, and RoF-based fiber-wireless converged access networks (e.g., passive optical networks).
Whilst the RoF technology has been widely studied, it suffers from several fundamental limitations, which are mainly caused by the fact that the RoF system essentially utilizes the analog optical link. For example, fiber chromatic dispersion, distortion and nonlinear effects all limit the performance of RoF systems. Several impairment mitigation principles and techniques have been proposed and demonstrated to improve the performance of RoF systems, such as the optical single sideband (OSSB) and optical carrier suppression (OCS) schemes from the modulation perspective [7]. The digitized-radio-over-fiber (DRoF) solution that changes the fiber transmission link from analog to digital has also been investigated [7]. Although the DRoF scheme can substantially improve the system performance, high-speed and broadband analog-to-digital converter (ADC) and digital-toanalog converters (DAC) are required.
In addition to these solutions, various electrical domain dispersion and nonlinearity mitigation algorithms have been studied as well from the signal processing perspective [11]. Whilst these algorithms have been shown to be effective, they typically are designed to handle various impairments separately. However, it is crucial to mitigate different types of impairments jointly other than separately. This is because different types of impairments in the same channel may interact with each other. In addition, these conventional signal processing algorithms typically have limited capability in suppressing nonlinear effects, whilst the analog RoF systems normally suffer from substantial nonlinearity, including both optical fiber nonlinearity and the nonlinear distortions caused by the signal modulation (e.g., inter-modulation distortion) and signal detection (e.g., squarelaw intensity detection). Whilst the RoF technology has been widely studied, it suffers from several fundamental limitations, which are mainly caused by the fact that the RoF system essentially utilizes the analog optical link. For example, fiber chromatic dispersion, distortion and nonlinear effects all limit the performance of RoF systems. Several impairment mitigation principles and techniques have been proposed and demonstrated to improve the performance of RoF systems, such as the optical single sideband (OSSB) and optical carrier suppression (OCS) schemes from the modulation perspective [7]. The digitized-radio-over-fiber (DRoF) solution that changes the fiber transmission link from analog to digital has also been investigated [7]. Although the DRoF scheme can substantially improve the system performance, high-speed and broadband analog-to-digital converter (ADC) and digital-to-analog converters (DAC) are required.
In addition to these solutions, various electrical domain dispersion and nonlinearity mitigation algorithms have been studied as well from the signal processing perspective [11]. Whilst these algorithms have been shown to be effective, they typically are designed to handle various impairments separately. However, it is crucial to mitigate different types of impairments jointly other than separately. This is because different types of impairments in the same channel may interact with each other. In addition, these conventional signal processing algorithms typically have limited capability in suppressing nonlinear effects, whilst the analog RoF systems normally suffer from substantial nonlinearity, including both optical fiber nonlinearity and the nonlinear distortions caused by the signal modulation (e.g., inter-modulation distortion) and signal detection (e.g., square-law intensity detection). To address these limitations, the use of machine learning (ML) techniques in RoF systems has been proposed and become an active research field in the past a few years. Different from conventional approaches based on domain experts' knowledge, the ML approach is data-driven, where the impairments existing in the system are learnt from the training data and the learnt information is then used to mitigate the impairments and to improve the system performance. Compared with conventional signal processing methods, the ML-based scheme considers and processes all impairments simultaneously and hence, the interactions between different types of impairments are incorporated. In addition, ML-based methods also have outstanding capability in handling nonlinear effects in the system. Therefore, the strong nonlinearity in RoF systems can be suppressed to improve the system performance. In this paper, we review the recent developments of various ML schemes for RoF systems, especially for the mm-wave RoF systems, since this type of system suffers from more severe impairments compared with RoF systems that transport in lower RF bands.
In addition to serving as the physical layer signal processor, ML techniques have also been widely considered for efficient network management and resource allocation in optical communications. Compared with other types of optical communication networks, the network management and resource allocation is more challenging in RoF-based fiber-wireless converged networks, since multi-dimensional resources, including both RF and optical resources, need to be allocated flexibly and efficiently whilst satisfying various requirements, such as the quality of service (QoS) and latency requirements [1]. In addition, with the rapid deployment of 5G communications, the centralized or cloud radio access network (C-RAN) concept has been proposed and widely studied, where the remote radio heads (RRHs) and the baseband processing units (BBUs) are separated and the BBUs are moved to centralized locations for efficient resource sharing and cost and energy savings. All of these require more advanced algorithms that can better allocate network resources adaptively. Conventional algorithms normally use single or limited traffic or network feature and hence, the effectiveness is typically sub-optimal. To solve this limitation, the use of ML algorithms has been proposed and attracted intensive interests recently, as the ML algorithms have the capability of solving complex tasks. Whilst there are still many open questions to answer in ML-based RoF network management and resource allocations, exciting results have been achieved and demonstrated using both supervised learning and reinforcement learning. Therefore, in this paper we will also review the recent developments in this field.
There have been a number of recent survey papers in relevant research fields [12][13][14]. In References [12,13], comprehensive reviews of recent applications of ML techniques in optical communication systems and networks have been presented. However, since [12,13] focuses on the general optical communication systems and systems, the review of ML studies in the RoF filed is relatively brief and limited. On the other hand, recent ML applications in RoF systems have been reviewed in ref. [14], which has summarized the use of ML in nonlinearity mitigation, performance monitoring and fault detection of RoF systems and networks. However, as a mini review with page limit, the recent studies are partially reviewed. Based on previous important surveys, this paper further provides a focused whilst comprehensive review on different aspects of studies in applying ML techniques in RoF systems and networks, as shown in Figure 2. In addition, we also provide our perspectives on the current limitations and possible future directions in this field.
The rest of paper is organized as follows. In Section 2, we briefly summarize the key physical layer impairments and some recent studies using non-ML methods to suppress the impacts of these impairments in RoF system. In Section 3, we review the recent development of ML-based signal processing approaches for RoF systems, including both neural network-and non-neural network-based schemes. In Section 4, we discuss some key open questions and possible future directions in the research domains of applying ML models to signal processing in the physical layer of RoF systems. In Section 5, we review the applications of ML algorithms to RoF network management and resource allocation and we summarize the paper with Section 6. The rest of paper is organized as follows. In Section 2, we briefly summarize the key physical layer impairments and some recent studies using non-ML methods to suppress the impacts of these impairments in RoF system. In Section 3, we review the recent development of ML-based signal processing approaches for RoF systems, including both neural network-and non-neural networkbased schemes. In Section 4, we discuss some key open questions and possible future directions in the research domains of applying ML models to signal processing in the physical layer of RoF systems. In Section 5, we review the applications of ML algorithms to RoF network management and resource allocation and we summarize the paper with Section 6.

Impairments in RoF Systems
Due to the transmission of analog signals in the optical fiber, the RoF system performance is usually limited by a number of impairments, such as the optical fiber chromatic dispersion, the phase noise and the nonlinearity. Before we review the ML based approaches as the physical layer signal processors, we first briefly overview the major types of impairments in RoF systems.

Fiber Chromatic Dispersion-Fading
One key impairment in RoF systems, especially when the higher RF band is used, such as the mm-wave band, is the impact of fiber chromatic dispersion [15,16]. Fiber chromatic dispersion refers to the phenomenon that the phase velocity of light in the optical fiber depends on its frequency. Hence, different frequency components of the lightwave signal have different velocities while propagating through the fiber. The chromatic dispersion typically leads to the fading effect in RoF systems. When the RF signal is modulated onto optical carrier for transmission in the optical fiber, as shown in Figure 3, two sidebands located at either side of the optical carrier are generated and the distance between the optical carrier frequency and the sidebands equals to the RF signal frequency. When such a RF modulated optical signal propagates through the optical fiber, the optical fiber chromatic dispersion leads to different phase delays for the two sidebands. Hence, the generated RF signal upon square-law detection by the photo-detector (PD) is impacted by the different phase delays caused by the fiber dispersion. To solve this limitation, the OSSB and OCS schemes have been proposed and widely studied [17][18][19][20][21]. In the OSSB scheme, only one RF sideband together with the optical carrier are maintained and hence, the impact of fiber chromatic dispersion is significantly reduced. On the other hand, in the OCS scheme, the optical carrier is suppressed after optical modulation and two RF sidebands with a spacing equal to the RF frequency are utilized. The beating of these two RF sidebands generates the RF signal upon the detection by PD and this scheme is more tolerant to the fiber chromatic dispersion as well.

Impairments in RoF Systems
Due to the transmission of analog signals in the optical fiber, the RoF system performance is usually limited by a number of impairments, such as the optical fiber chromatic dispersion, the phase noise and the nonlinearity. Before we review the ML based approaches as the physical layer signal processors, we first briefly overview the major types of impairments in RoF systems.

Fiber Chromatic Dispersion-Fading
One key impairment in RoF systems, especially when the higher RF band is used, such as the mm-wave band, is the impact of fiber chromatic dispersion [15,16]. Fiber chromatic dispersion refers to the phenomenon that the phase velocity of light in the optical fiber depends on its frequency. Hence, different frequency components of the lightwave signal have different velocities while propagating through the fiber. The chromatic dispersion typically leads to the fading effect in RoF systems. When the RF signal is modulated onto optical carrier for transmission in the optical fiber, as shown in Figure 3, two sidebands located at either side of the optical carrier are generated and the distance between the optical carrier frequency and the sidebands equals to the RF signal frequency. When such a RF modulated optical signal propagates through the optical fiber, the optical fiber chromatic dispersion leads to different phase delays for the two sidebands. Hence, the generated RF signal upon square-law detection by the photo-detector (PD) is impacted by the different phase delays caused by the fiber dispersion. To solve this limitation, the OSSB and OCS schemes have been proposed and widely studied [17][18][19][20][21]. In the OSSB scheme, only one RF sideband together with the optical carrier are maintained and hence, the impact of fiber chromatic dispersion is significantly reduced. On the other hand, in the OCS scheme, the optical carrier is suppressed after optical modulation and two RF sidebands with a spacing equal to the RF frequency are utilized. The beating of these two RF sidebands generates the RF signal upon the detection by PD and this scheme is more tolerant to the fiber chromatic dispersion as well.

Nonlinearity
Another key limitation in RoF systems is the nonlinearity, including both the optical fiber nonlinearity and the nonlinearity of optical modulators [22][23][24][25]. The nonlinear response of the optical modulator leads to the inter-modulation distortion (IMD) and it typically limits the system dynamic range [22,23]. For multiband RoF systems, it also leads to subcarrier intermodulation and datadependent cross modulation [25]. The optical fiber for signal transmission is also a nonlinear medium and hence, nonlinear impairments are also imposed to the RoF signal after fiber transmission, such as the self-phase modulation and the four-wave mixing. To solve the nonlinear impairments in RoF systems, linearization techniques have been widely studied [23][24][25][26][27][28][29][30][31][32][33][34][35][36]. One widely investigated linearization scheme is feedforward [26,27], such as the up to 10 dB spur-free dynamic range (SFDR) improvement over a broad bandwidth from 7 GHz to 18 GHz achieved in Reference [27]. However, the feedforward scheme normally leads to high complexity and requires precise adjustment. To solve this issue, other optical linearization techniques have been proposed, such as the dual parallel modulation scheme that has suppressed the IMD by up to 38 dB [28], the mixed-polarization scheme [29,30], the dual electro-absorption modulators scheme [31][32][33] (e.g., IMD suppressed by over 16 dB and SFDR improved by over 8 dB in Reference [32]), the gain modulation scheme that has suppressed the IMD by over 7 dB and improved the dynamic range by about 11 dB [34] and the cascaded modulator and semiconductor optical amplifier (SOA) scheme [35]. The pre-distortion technique has also been proposed and demonstrated [36][37][38][39], where predictable nonlinearities can be compensated. 6 dB improvement of IMD together with a 14 dB peak improvement of the dynamic range has been achieved [37]. With these techniques, the nonlinear effects in RoF systems have been suppressed and the modulation depth of the RF signal can be increased to improve the system dynamic range.
In addition to the fiber chromatic dispersion and the nonlinearity caused by optical fiber and optical modulator, the RoF systems also suffer from other impairments, such as the amplified spontaneous emission (ASE) noise due to optical amplifiers (e.g., EDFA), the phase noise due to the laser source limited linewidth [40,41] and the square-law detection property of the optical to electrical conversion by PD [25]. These impairments limit the performance of RoF systems and the performance limitation becomes more pronounced with the increase of the RF carrier frequency (e.g., mm-wave frequency).

Impairment Compensataion Techniques-Digital Signal Processing and DRoF
To compensate the impact of these impairments in RoF systems, the digital signal processing (DSP) techniques have attracted intensive attention and have achieved significantly improved system performance [42][43][44][45][46]. One advantage of using DSP for impairment compensation is its flexibility, which can be changed adaptively. A large number of DSP algorithms have been proposed and studied for RoF systems targeting at various impairments, such as the laser phase noise [42,43], the limited dynamic range caused by optical modulator nonlinearity [23,44,45] and the fiber chromatic dispersion [46]. In addition to applying DSP at the receiver side to process the signal after transmission, the transmitter side has also been studied via the digital predistortion principle, where

Nonlinearity
Another key limitation in RoF systems is the nonlinearity, including both the optical fiber nonlinearity and the nonlinearity of optical modulators [22][23][24][25]. The nonlinear response of the optical modulator leads to the inter-modulation distortion (IMD) and it typically limits the system dynamic range [22,23]. For multiband RoF systems, it also leads to subcarrier intermodulation and data-dependent cross modulation [25]. The optical fiber for signal transmission is also a nonlinear medium and hence, nonlinear impairments are also imposed to the RoF signal after fiber transmission, such as the self-phase modulation and the four-wave mixing. To solve the nonlinear impairments in RoF systems, linearization techniques have been widely studied [23][24][25][26][27][28][29][30][31][32][33][34][35][36]. One widely investigated linearization scheme is feedforward [26,27], such as the up to 10 dB spur-free dynamic range (SFDR) improvement over a broad bandwidth from 7 GHz to 18 GHz achieved in Reference [27]. However, the feedforward scheme normally leads to high complexity and requires precise adjustment. To solve this issue, other optical linearization techniques have been proposed, such as the dual parallel modulation scheme that has suppressed the IMD by up to 38 dB [28], the mixed-polarization scheme [29,30], the dual electro-absorption modulators scheme [31][32][33] (e.g., IMD suppressed by over 16 dB and SFDR improved by over 8 dB in Reference [32]), the gain modulation scheme that has suppressed the IMD by over 7 dB and improved the dynamic range by about 11 dB [34] and the cascaded modulator and semiconductor optical amplifier (SOA) scheme [35]. The pre-distortion technique has also been proposed and demonstrated [36][37][38][39], where predictable nonlinearities can be compensated. 6 dB improvement of IMD together with a 14 dB peak improvement of the dynamic range has been achieved [37]. With these techniques, the nonlinear effects in RoF systems have been suppressed and the modulation depth of the RF signal can be increased to improve the system dynamic range.
In addition to the fiber chromatic dispersion and the nonlinearity caused by optical fiber and optical modulator, the RoF systems also suffer from other impairments, such as the amplified spontaneous emission (ASE) noise due to optical amplifiers (e.g., EDFA), the phase noise due to the laser source limited linewidth [40,41] and the square-law detection property of the optical to electrical conversion by PD [25]. These impairments limit the performance of RoF systems and the performance limitation becomes more pronounced with the increase of the RF carrier frequency (e.g., mm-wave frequency).

Impairment Compensataion Techniques-Digital Signal Processing and DRoF
To compensate the impact of these impairments in RoF systems, the digital signal processing (DSP) techniques have attracted intensive attention and have achieved significantly improved system performance [42][43][44][45][46]. One advantage of using DSP for impairment compensation is its flexibility, which can be changed adaptively. A large number of DSP algorithms have been proposed and studied for RoF systems targeting at various impairments, such as the laser phase noise [42,43], the limited dynamic range caused by optical modulator nonlinearity [23,44,45] and the fiber chromatic dispersion [46]. In addition to applying DSP at the receiver side to process the signal after transmission, the transmitter side has also been studied via the digital predistortion principle, where the signal is pre-processed before transmission [47][48][49][50][51][52]. In Reference [47], the pre-distortion technique has been studied to compensate the laser chirp and the chromatic dispersion of optical fiber to suppress the impact of nonlinearities in RoF systems. In References [48,49], VCSEL-based RoF systems have been studied and the digital pre-distortion has been used for linearization. The digital pre-distortion scheme has also been studied when approaching laser resonance, which is a region with large nonlinearity and significant performance improvements have been achieved [50].
Significant advancements have been achieved in DSP algorithms and digital pre-distortion techniques and the RoF system performance has been substantially improved. As knowledge based methods, they have the key advantages of relatively small number of training data needed, relatively low computation cost and explainable outcomes. However, one common limitation is that each signal processing step only targets at one (or few) impairments. Therefore, the interactions amongst various types of impairments are not effectively considered and handled in conventional signal processing techniques. In addition, conventional signal processing schemes also have limited capability in solving the nonlinear effects in RoF systems. These are the strengths of the ML-based methods to be discussed in the next section.
In addition to the signal processing approaches, the impairments caused by the analog link in RoF systems have also been mitigated using the DRoF principle [53]. The general architectures of analog RoF and DRoF are shown in Figure 4. Here we categorize these two types of RoF systems according to if the wireless signal is digitized prior to the optical fiber transmission link. Hence, in the DRoF systems, the signal before optical up-conversion is a sampled digital signal. This technique has been included in the Common Public Radio Interface (CPRI) standard and it is the most common way of performing RoF at present. The DRoF principle is especially advantageous in suppressing various types of nonlinearity in analog RoF systems, as the RF signal is digitized and transmitted in the digital format via the optical fiber. Significant developments have been achieved during the past a number of years in the DRoF scheme [54][55][56]. In Reference [55], an improved nonlinear quantization method has been proposed in the DRoF system transmitting broadcast signal and the transmission rate has been reduced by more than 25% via using 5-bit quantization resolution. In Reference [56], the transmission rate requirement has also been reduced using an adaptive compression method, where 4-bits quantization resolution has been used. However, high-speed and broad bandwidth ADC/DAC are needed for the use of DRoF principle in higher RF bands, such as the mm-wave frequency band and hence, the practical application of DRoF scheme in such scenarios is still challenging and limited.
Photonics 2020, 7, x FOR PEER REVIEW 6 of 31 the signal is pre-processed before transmission [47][48][49][50][51][52]. In Reference [47], the pre-distortion technique has been studied to compensate the laser chirp and the chromatic dispersion of optical fiber to suppress the impact of nonlinearities in RoF systems. In References [48,49], VCSEL-based RoF systems have been studied and the digital pre-distortion has been used for linearization. The digital pre-distortion scheme has also been studied when approaching laser resonance, which is a region with large nonlinearity and significant performance improvements have been achieved [50]. Significant advancements have been achieved in DSP algorithms and digital pre-distortion techniques and the RoF system performance has been substantially improved. As knowledge based methods, they have the key advantages of relatively small number of training data needed, relatively low computation cost and explainable outcomes. However, one common limitation is that each signal processing step only targets at one (or few) impairments. Therefore, the interactions amongst various types of impairments are not effectively considered and handled in conventional signal processing techniques. In addition, conventional signal processing schemes also have limited capability in solving the nonlinear effects in RoF systems. These are the strengths of the ML-based methods to be discussed in the next section.
In addition to the signal processing approaches, the impairments caused by the analog link in RoF systems have also been mitigated using the DRoF principle [53]. The general architectures of analog RoF and DRoF are shown in Figure 4. Here we categorize these two types of RoF systems according to if the wireless signal is digitized prior to the optical fiber transmission link. Hence, in the DRoF systems, the signal before optical up-conversion is a sampled digital signal. This technique has been included in the Common Public Radio Interface (CPRI) standard and it is the most common way of performing RoF at present. The DRoF principle is especially advantageous in suppressing various types of nonlinearity in analog RoF systems, as the RF signal is digitized and transmitted in the digital format via the optical fiber. Significant developments have been achieved during the past a number of years in the DRoF scheme [54][55][56]. In Reference [55], an improved nonlinear quantization method has been proposed in the DRoF system transmitting broadcast signal and the transmission rate has been reduced by more than 25% via using 5-bit quantization resolution. In Reference [56], the transmission rate requirement has also been reduced using an adaptive compression method, where 4-bits quantization resolution has been used. However, high-speed and broad bandwidth ADC/DAC are needed for the use of DRoF principle in higher RF bands, such as the mm-wave frequency band and hence, the practical application of DRoF scheme in such scenarios is still challenging and limited. Recently, the sigma-delta radio-over-fiber (S-DRoF) technology has also been proposed and studied. The S-DRoF technology combines the advantage of DRoF in the robustness to impairments and the advantage of analog RoF in the low complexity [57][58][59][60]. A sigma delta modulator is used in S-DRoF systems and the 1-bit output drives a laser. Hence, digital optical links are used for data transmission. At the receiver side, the RF signal is then recovered, amplified and transmitted via the wireless link. The S-DRoF technology has been compared with the analog RoF technology [61]. Results have shown that when an optical transmitter with low linearity is used, S-DRoF is advantageous compared with analog RoF. However, when a relatively linear transmitter (e.g., DFB laser) is used, results have shown that both analog RoF and S-DRoF have similar performance. Recently, the sigma-delta radio-over-fiber (S-DRoF) technology has also been proposed and studied. The S-DRoF technology combines the advantage of DRoF in the robustness to impairments and the advantage of analog RoF in the low complexity [57][58][59][60]. A sigma delta modulator is used in S-DRoF systems and the 1-bit output drives a laser. Hence, digital optical links are used for data transmission. At the receiver side, the RF signal is then recovered, amplified and transmitted via the wireless link. The S-DRoF technology has been compared with the analog RoF technology [61]. Results have shown that when an optical transmitter with low linearity is used, S-DRoF is advantageous Photonics 2020, 7, 105 7 of 31 compared with analog RoF. However, when a relatively linear transmitter (e.g., DFB laser) is used, results have shown that both analog RoF and S-DRoF have similar performance.

ML-Based Signal Processing in RoF Systems
Whilst a large number of studies have been carried out and the performance of RoF systems has been improved substantially, there are still a number of remaining challenges. The first major challenge is the suppression and mitigation of multiple impairments simultaneously, so that the interactions between different types of impairments are compensated jointly. Since there are typically more than one types of impairments existing in RoF systems (e.g., phase noise, modulator nonlinearity, fiber nonlinearity, etc.) and these impairments do not affect the transmitted signal individually or separately, the resulting combined impact on the signal is more complicated. The second major challenge is suppressing the nonlinear effects in RoF systems, which is difficult for conventional signal processing techniques. Compared with other types of optical communication systems, the nonlinearity is more important for RoF systems, due to the typical use of analog optical link and the additional IMD introduced at the transmitter side. To address these challenges, ML-based signal processing schemes have been proposed and investigated.
In this section, we review the recent advancements in this research domain. We first summarize how ML principles are generally applied to optical communication systems in Section 3.1. Then we divide the ML-based signal processing techniques for RoF systems into two categories: non-neural network models and neural network models. We discuss these two categories in Sections 3.2 and 3.3, respectively.

ML Techniques in Optical Communication Systems
ML algorithms have been proposed and studied in different optical communication systems, including short-distance optical access networks, long-distance optical transmission systems, data-center optical interconnects, free-space optical communications, underwater optical communication and fiber-wireless integrations [12,[62][63][64][65][66][67][68][69]. Depending on the learning scheme, existing ML-based algorithms for optical communication systems can be divided into three main categories: methods based on supervised learning, unsupervised learning and reinforcement learning. Note that these categories have been explored not only for signal processing but also for network management and resource allocation [12]. In this section, we provide a high-level overview and brief comparison of these three categories of ML methods in optical communications.
In general, a machine learning model can be viewed as a complex function, which maps input data instance to a target output label, for example, an image classification model which tells you whether an input image contains a cat or not. The major differences between the above three categories of ML models lie in their settings, the required training data and of course, their potential applications.
In the scheme of supervised learning, the training dataset is usually a set of training instances serving as the inputs, with a label of each instance serving as the target model output. The model aims to learn a mapping function that converts each input instance to its corresponding correct label and thus, these labels in the training data can be viewed as the supervision provided to the model during training process. After training, the model then can be used to predict labels for any new data instances. It is widely acknowledged that the supervised learning performs well on tasks such as classification and regression. As such, it has been widely applied in optical communications and used for signal processing in the physical layer and for network management and resource allocation in upper layers.
One requirement in supervised learning is that all training data needs to be labelled. However, in some scenarios, it is difficult or expensive to obtain high-quality labelled data. The scheme of unsupervised learning lifts this requirement. The training process only needs the input data instances without knowing their corresponding labels. That is, supervision is not available during the model training. This scheme is particularly useful for detecting data distributions in the training data. A typical task of unsupervised learning is the clustering task, which aims to cluster data points with Photonics 2020, 7, 105 8 of 31 high similarity together. One application of this scheme is the modulation format recognition in optical communications, for which different types of clustering algorithms have been proposed, such as k-means, expectation maximization and density-based spatial clustering of applications with noise (DBSCAN) [62].
The third category is the reinforcement learning. Reinforcement learning usually assumes an environment and an agent that can interact with the environment, for example, a gamer playing a computer game where the gamer is the agent and the game is the interactive environment. Unlike supervised learning or unsupervised learning, the target task in reinforcement learning usually involves interactions between the agent and the environment, for example, a task of winning a computer game. Thus, a reinforcement learning model usually aims to obtain a policy/strategy to accomplish a task, for example, a strategy of playing the game "Mario." Such a policy can be instantiated as a mapping function which converts the state of an agent (e.g., the position of Mario) to the next action should be performed by the agent (e.g., go up/right) so that the agent can be closer to accomplishing the task. Intuitively, the learning process of reinforcement learning models is based on trial-and-error: exploring various actions that can be performed by the agent and evaluating the resulting "reward" given by the environment. Lots of efforts have been made to allow reinforcement models effectively learn from the rewards and excel at a target task. The reinforcement learning has been widely applied in the network management and resource allocation in optical communications due to its powerful capability of learning a policy and self-learning. For example, the reinforcement learning has been applied in optical burst switching networks, where the path selection and wavelength selection can be optimized to minimize the burst loss probability [63]. The reinforcement learning has also been applied in the emerging elastic optical networks (EONs) to enable an autonomic and cognitive network that is capable of efficient routing, service provisioning and modulation and spectrum assignment [64,65].

Non-Neural Network-Based Signal Processing Techniques in RoF Systems
As discussed in Section 2, due to the use of analog optical link, the RoF system suffers from various types of impairments and relatively large nonlinearity. The ML-based signal processing techniques have been proposed to solve these limitations in RoF systems, including both backhaul and fronthaul scenarios. Various ML algorithms have been investigated. In this section, we focus on the models that are non-neural network models and discuss how they are utilized for signal processing and combating distortions in the physical layer of RoF systems. The models discussed in this section are summarized in Table 1.
One ML technique that has been investigated to mitigate the nonlinear impairment in RoF systems is the k-nearest neighbors (KNN) based classification method [70,71]. KNN is a non-parametric (i.e., the number of parameters used in the model is not fixed and depends on the training dataset) supervised ML algorithm and when it is used for classification, an input data is classified to the class that is most common amongst its k nearest neighbors. For the RoF signal processing application, the "input" is normally the received data after photodetection and the "class" is the symbol decision (e.g., constellation point) after processing. The number of classes depends on the modulation used in the RoF system (e.g., 4 classes for 4-QAM modulation). In KNN, training data points with known corresponding classes are used. For the testing point, as illustrated in Figure 5, the distance between the testing data point to be classified and each training data point is calculated. Typically, the Euclidean distance can be used [71]. Based on the calculated distance, k nearest training data points can be determined (k = 4 in the example shown in Figure 5) and the majority of "class" of these k points (class 3 in the example) is used as the class of the testing data point. With the KNN for signal processing, 75 GHz mm-wave RoF system over 80 km fiber transmission with 16-QAM discrete multi-tone (DMT) modulation has been demonstrated [70]. Compared with the conventional maximum likelihood detection, the nonlinearity tolerance is increased and the receiver sensitivity is improved by about 0.6 dB with the KNN scheme.
In addition to KNN, the supporting vector machine (SVM) based ML algorithm also has been studied in RoF systems as a nonlinear detector [71][72][73]. The signal processing and detection by the SVM scheme is treated as the classification task, where the SVM uses the statistical learning framework. A binary SVM can split the data points into two groups with the optimum hyperplane, which is typically determined using the maximum-margin hyperplane. Hyperplane is a subspace of the data space which partitions the data space into two. For example, if the data space has two dimensions (e.g., Figure 5), a hyperplane of this data space is a one-dimensional line. The hyperplane can be either linear or nonlinear and the nonlinear case is especially suited for handling the nonlinearity in analog RoF systems. To support higher order modulations (i.e., more than two groups of data points), the M-ary SVM [74] was used and the number of SVM models required is log 2 N, where N is the number of classes in the modulation format used. The SVM has been shown to be capable of suppressing both the linear and nonlinear effects in the RoF system with over Gbps data rate via experiment [72], where larger driving electrical signal can be used to increase the dynamic range and more than 1 dB receiver sensitivity improvement can be achieved. The SVM-based scheme has also been shown to be effective in suppressing the data-dependent cross-modulation when multiple RF signals are transmitted [73].
The k-means ML algorithm has been proposed and demonstrated for the signal processing and detection in RoF systems as well [75]. The k-means is a widely used unsupervised ML method for clustering tasks, where the algorithm targets at grouping data points into a pre-defined number of clusters. Hence, the k-means has been applied in RoF systems to group received data points to k clusters according to the order of modulation format and the centroids of these clusters can be updated and optimized progressively through the training process, as shown in Figure 6. Due to this capability, the k-means algorithm has shown to be robust against RF phase offset and it is effective in the RF phase recovery. It also has the advantages of relatively low complexity and good adaptability to different modulation formats.  The k-means ML algorithm has been proposed and demonstrated for the signal processing and detection in RoF systems as well [75]. The k-means is a widely used unsupervised ML method for clustering tasks, where the algorithm targets at grouping data points into a pre-defined number of clusters. Hence, the k-means has been applied in RoF systems to group received data points to clusters according to the order of modulation format and the centroids of these clusters can be updated and optimized progressively through the training process, as shown in Figure 6. Due to this capability, the k-means algorithm has shown to be robust against RF phase offset and it is effective in the RF phase recovery. It also has the advantages of relatively low complexity and good adaptability to different modulation formats. In addition to the k-means, other types of clustering algorithms have also been studied in RoF systems, such as the fuzzy c-means Gustafson-Kessel (FCM-GK) algorithm [76]. Similar with kmeans, the FCM-GK algorithm also targets at grouping the received data points in the constellation diagram to mitigate impairments in the system. However, different from the k-means, which mainly handles the additive white Gaussian noise (AWGN) noise and impairments, the FCM-GK algorithm can better handle the non-AWGN noises and impairments existing in the RoF systems. The Voronoi contours have also been used to realize nonlinear decision boundaries that better process the data points in the corners of decision regions. With the FCM-GK algorithm and Voronoi contours, up to 3.1 dB and 1.4 dB optical signal-to-noise ratio (OSNR) improvement has been demonstrated for the RoF system with 16-QAM and 4+12 phase-shift-keying (PSK), respectively. Compared with k-means, the required OSNR has been reduced by 1 dB and 1.4 dB, respectively.

Introduction of Neural Network-Based Signal Processing
In recent years, the development of neural network models has taken ML into the next level. The notion of neural network is biologically inspired and it aims to mimic the structure and function of human brains. A common neural network structure is a set of connected neurons designed to solve a certain target task. Thanks to the advancement of the affordable computation capability in the past years, it now becomes possible and cost-effective to build and train neural networks, which can solve complex tasks. The success of neural networks has benefitted numerous disciplines, such as computer vision, information retrieval, natural language processing and of course, optical communications [77][78][79].
In optical communication systems, the general application of neural network-based physical layer signal processing is shown in Figure 7a. The architecture of neural network signal processor In addition to the k-means, other types of clustering algorithms have also been studied in RoF systems, such as the fuzzy c-means Gustafson-Kessel (FCM-GK) algorithm [76]. Similar with k-means, the FCM-GK algorithm also targets at grouping the received data points in the constellation diagram to mitigate impairments in the system. However, different from the k-means, which mainly handles the additive white Gaussian noise (AWGN) noise and impairments, the FCM-GK algorithm can better handle the non-AWGN noises and impairments existing in the RoF systems. The Voronoi contours have also been used to realize nonlinear decision boundaries that better process the data points in the corners of decision regions. With the FCM-GK algorithm and Voronoi contours, up to 3.1 dB and 1.4 dB optical signal-to-noise ratio (OSNR) improvement has been demonstrated for the RoF system with 16-QAM and 4+12 phase-shift-keying (PSK), respectively. Compared with k-means, the required OSNR has been reduced by 1 dB and 1.4 dB, respectively.

Introduction of Neural Network-Based Signal Processing
In recent years, the development of neural network models has taken ML into the next level. The notion of neural network is biologically inspired and it aims to mimic the structure and function of human brains. A common neural network structure is a set of connected neurons designed to solve a certain target task. Thanks to the advancement of the affordable computation capability in the past years, it now becomes possible and cost-effective to build and train neural networks, which can solve complex tasks. The success of neural networks has benefitted numerous disciplines, such as computer vision, information retrieval, natural language processing and of course, optical communications [77][78][79].
In optical communication systems, the general application of neural network-based physical layer signal processing is shown in Figure 7a. The architecture of neural network signal processor based on the fully connected neural network (FCNN) is shown in Figure 7b, which consists of the input layer, the hidden layers and the output layer. We use FCNN as an example here and other types of neural networks have also been proposed and studied, which will be discussed later in detail. In the FCNN based signal processing scheme, the input layer of the FCNN takes the received data points from the optical detector and then pass them to the hidden layers. The hidden layers can have one or more layers with a number of neurons (i.e., nodes). The typical structure of a neuron is shown in Figure 7c. In the FCNN architecture, the neuron in a layer is connected to all neurons in the previous layer and it combines the outputs of neurons in the previous layer via learnable weight parameters w i . Each neuron also has a learnable bias parameter and an activation function. The activation function in neural networks is designed to mimic the mechanisms of human brain neurons and the function allows a neuron to have non-linear responses to the input signal. A number of functions have been widely used as the nonlinear activation function, such as the Sigmoid function, ReLU function, leaky-ReLU function and Tanh function. As shown in Figure 7c, the functionality of each neuron can be considered as a linear combination of previous layer outputs that is wrapped by a nonlinear function. That is, the conversion performed by a neuron is a combination of linear and nonlinear mapping of its previous layer. Various objective functions can be used at the output layer depending on the target task but the general idea is to design an objective function that is able to quantify the differences between the outputs of a neural network and desired target outputs. Feeding training data into the neural network, the network learns to update its weights and biases of each neuron via optimizing the pre-defined objective function. By increasing the width (i.e., the number of neurons in each layer) and the depth (i.e., the number of hidden layers) of the neural network, the learning capabilities of neuron networks can be varied. This intrinsic structure of neural networks has allowed it to be able to capture complex distributions/properties carried by a large amount of data.
Photonics 2020, 7, x FOR PEER REVIEW 11 of 31 function in neural networks is designed to mimic the mechanisms of human brain neurons and the function allows a neuron to have non-linear responses to the input signal. A number of functions have been widely used as the nonlinear activation function, such as the Sigmoid function, ReLU function, leaky-ReLU function and Tanh function. As shown in Figure 7c, the functionality of each neuron can be considered as a linear combination of previous layer outputs that is wrapped by a nonlinear function. That is, the conversion performed by a neuron is a combination of linear and nonlinear mapping of its previous layer. Various objective functions can be used at the output layer depending on the target task but the general idea is to design an objective function that is able to quantify the differences between the outputs of a neural network and desired target outputs. Feeding training data into the neural network, the network learns to update its weights and biases of each neuron via optimizing the pre-defined objective function. By increasing the width (i.e., the number of neurons in each layer) and the depth (i.e., the number of hidden layers) of the neural network, the learning capabilities of neuron networks can be varied. This intrinsic structure of neural networks has allowed it to be able to capture complex distributions/properties carried by a large amount of data.

Summary of Neural Network-Based Signal Processing in RoF Systems
Leveraging these advantages, models based on neural networks have become a promising candidate for physical layer signal processing in RoF systems [80][81][82][83][84][85][86][87][88][89][90][91][92][93][94]. Compared with other types of optical communication systems, because of the RF carrier, the wireless communication link and the analog optical link, the RoF system suffers more from complicated impairments, especially the nonlinear impairments, which are challenging for conventional signal processing schemes. As discussed, the algorithms based on neural networks have demonstrated powerful capabilities in learning complex nonlinear distributions and thus, they have been considered as promising techniques to solve the physical layer limitations in present RoF systems. A common way to leverage these algorithms is to use them as the signal equalizer, decoder and demultiplexer in RoF systems. In this section, we review the recent development in this field, which are summarized in Table 2. In the ML-based studies, since the training data used typically have the impacts of multiple limiting factors at the same time, the ML algorithm normally handles multiple limitations simultaneously. Therefore, the receiver sensitivity or BER, which reflects the result of multiple impacting factors, is normally

Summary of Neural Network-Based Signal Processing in RoF Systems
Leveraging these advantages, models based on neural networks have become a promising candidate for physical layer signal processing in RoF systems [80][81][82][83][84][85][86][87][88][89][90][91][92][93][94]. Compared with other types of optical communication systems, because of the RF carrier, the wireless communication link and the analog optical link, the RoF system suffers more from complicated impairments, especially the nonlinear impairments, which are challenging for conventional signal processing schemes. As discussed, the algorithms based on neural networks have demonstrated powerful capabilities in learning complex nonlinear distributions and thus, they have been considered as promising techniques to solve the physical layer limitations in present RoF systems. A common way to leverage these algorithms is to use them as the signal equalizer, decoder and demultiplexer in RoF systems. In this section, we review the recent development in this field, which are summarized in Table 2. In the ML-based studies, since the training data used typically have the impacts of multiple limiting factors at the same time, the ML algorithm normally handles multiple limitations simultaneously. Therefore, the receiver sensitivity or BER, which reflects the result of multiple impacting factors, is normally selected as the performance indicator in the literature.

Neural Network Equalizers in RoF Systems
An FCNN with an adaptive activation function as the nonlinear equalizer (NLE) has been proposed and demonstrated in the 60 GHz RoF system [80]. The FCNN has an input layer with 12 neurons, one hidden layer with 16 neurons and an output layer with 1 neuron. The mean square error (MSE) is used as the loss function and a modified Sigmoid function is adopted as the activation function of neurons. The modified Sigmoid function has a variable coefficient that changes at each training iteration to achieve the deepest descent and hence, faster convergence can be achieved and the number of training iterations required can be reduced. Based on a 60 GHz RoF system with 5 Gbps BPSK modulated data, up to 2 dB receiver sensitivity improvement over the Volterra equalizer has been achieved. Compared with the FCNN NLE with standard Sigmoid activation function, similar BER performance has been achieved whilst the number of training iterations has been reduced by about 50% from 4000 to 2000. With the FCNN NLE to mitigate the nonlinearity, similar performance improvement has also been reported in a 16-QAM 40 GHz mm-wave RoF system [81].
The conventional activation functions in FCNN NLE has two saturation regions and hence, they are better suited for binary coding and modulations. To better handle multi-level modulations that are widely used in RoF systems, the multilevel activation function has been proposed and studied [82], as shown in Figure 8. The multilevel activation function has multiple saturation regions to better handle the nonlinearity in multi-level modulations. To further cope with the in-phase and quadrature-phase components of vector signals and the resulting intra-band cross-modulations (XM), the complex valued multi-level activation function modified based on the Sigmoid function has been proposed. Results have shown that various types of nonlinearity in a 60 GHz RoF system (baseband over fiber) with 16-QAM or 64-QAM modulation, including the XM interference, can be effectively suppressed by the FCNN with the complex valued multi-level activation function and the MSE loss function. In addition to the nonlinear effects, the proposed FCNN NLE has also shown to be capable of suppressing the phase rotation impairment.
Photonics 2020, 7, x FOR PEER REVIEW 13 of 31 are widely used in RoF systems, the multilevel activation function has been proposed and studied [82], as shown in Figure 8. The multilevel activation function has multiple saturation regions to better handle the nonlinearity in multi-level modulations. To further cope with the in-phase and quadrature-phase components of vector signals and the resulting intra-band cross-modulations (XM), the complex valued multi-level activation function modified based on the Sigmoid function has been proposed. Results have shown that various types of nonlinearity in a 60 GHz RoF system (baseband over fiber) with 16-QAM or 64-QAM modulation, including the XM interference, can be effectively suppressed by the FCNN with the complex valued multi-level activation function and the MSE loss function. In addition to the nonlinear effects, the proposed FCNN NLE has also shown to be capable of suppressing the phase rotation impairment. In addition to the intra-band XM, the mm-wave RoF system also suffers from the inter-band XM under the multi-user scenario in the uplink, which leads to inter-user interference. The FCNN NLE has also been investigated to mitigate the inter-band and intra-band XM simultaneously [83,84]. Both joint FCNN equalization of all users (the input layer, hidden layer and output layer of FCNN has 20, 24 and 4 neurons, respectively) and individual user FCNN equalizations (the input layer, hidden layer and output layer of each FCNN has 5, 12 and 1 neurons, respectively) have been studied, as shown in Figure 9. Results have shown that both schemes can suppress the XM nonlinearity, whilst the joint equalization scheme has better capability in suppressing the inter-user interference. The demonstrated joint equalization scheme using FCNN NLE provides a promising solution for multiuser RoF systems.  In addition to the intra-band XM, the mm-wave RoF system also suffers from the inter-band XM under the multi-user scenario in the uplink, which leads to inter-user interference. The FCNN NLE has also been investigated to mitigate the inter-band and intra-band XM simultaneously [83,84]. Both joint FCNN equalization of all users (the input layer, hidden layer and output layer of FCNN has 20, 24 and 4 neurons, respectively) and individual user FCNN equalizations (the input layer, hidden layer and output layer of each FCNN has 5, 12 and 1 neurons, respectively) have been studied, as shown in Figure 9. Results have shown that both schemes can suppress the XM nonlinearity, whilst the joint equalization scheme has better capability in suppressing the inter-user interference. The demonstrated joint equalization scheme using FCNN NLE provides a promising solution for multi-user RoF systems.
The FCNN based signal processing has also been applied in wideband and multi-carrier RoF systems to suppress the third-order inter-modulation distortion (IMD3) and XM distortion simultaneously [85]. Results have shown that both IMD3 and XM distortion of a RoF system with two RF signals can be suppressed by over 20 dB and better generalization against the interfere input RF power can also be achieved.
Another recent study has applied the FCNN based NLE in the RoF system with the universal filtered multi-carrier (UFMC) waveform [86]. The UFMC has the advantage of low out-of-band emission and is suitable for asynchronous transmission in IoT applications. However, UMFC based RoF system suffers from the high peak-to-average power ratio (PAPR) problem, which leads to significant nonlinear distortions and hence, the FCNN provides a promising solution. Compared with the zero-forcing (ZF) equalization, results have shown that the error vector magnitude (EVM) can be reduced by more than 50% using the FCNN NLE. The FCNN NLE has also been applied in the OFDM based RoF system [87], where the amplitude and phase parts of the signal have been learnt and compensated using two sub-networks.
under the multi-user scenario in the uplink, which leads to inter-user interference. The FCNN NLE has also been investigated to mitigate the inter-band and intra-band XM simultaneously [83,84]. Both joint FCNN equalization of all users (the input layer, hidden layer and output layer of FCNN has 20, 24 and 4 neurons, respectively) and individual user FCNN equalizations (the input layer, hidden layer and output layer of each FCNN has 5, 12 and 1 neurons, respectively) have been studied, as shown in Figure 9. Results have shown that both schemes can suppress the XM nonlinearity, whilst the joint equalization scheme has better capability in suppressing the inter-user interference. The demonstrated joint equalization scheme using FCNN NLE provides a promising solution for multiuser RoF systems. The FCNN based signal processing has also been applied in wideband and multi-carrier RoF systems to suppress the third-order inter-modulation distortion (IMD3) and XM distortion simultaneously [85]. Results have shown that both IMD3 and XM distortion of a RoF system with two RF signals can be suppressed by over 20 dB and better generalization against the interfere input RF power can also be achieved.
Another recent study has applied the FCNN based NLE in the RoF system with the universal filtered multi-carrier (UFMC) waveform [86]. The UFMC has the advantage of low out-of-band

Neural Network Decoders in RoF Systems
In previous studies, the FCNN is utilized as an equalizer to suppress various types of impairments, especially the nonlinear impairments. In addition to this, the FCNN has also been proposed and studied for both equalization and decoding in one step [88], as shown in Figure 10. In this type of FCNN decoder, the output layer typically uses the cross-entropy loss function and the Softmax function and hence, all possible output values with the corresponding probability are computed. By selecting the output value with the highest probability, the decoding of received signal can be realized, whilst the nonlinear equalization capability of the FCNN is maintained. Results have shown that compared with the Volterra nonlinear equalizer and hard decision, the receiver sensitivity with the FCNN decoder can be improved by 1.6 dB in a 60 Gb/s 8-pulse amplitude modulation (PAM8) RoF system, whilst the computation complexity is similar. A similar approach has also been applied to an OSSB system with Nyquist PAM-4 (NPAM-4) [89], where the linear impairments are compensated by a feedforward equalizer (FFE) and the nonlinear equalization and decoding are realized by the FCNN in one step. Results have shown that compared with the FFE only, the receiver sensitivity can be improved by about 2 dB at 25 Gb/s by the FCNN decoder and the improvement further increases at higher data rate.
Photonics 2020, 7, x FOR PEER REVIEW 14 of 31 emission and is suitable for asynchronous transmission in IoT applications. However, UMFC based RoF system suffers from the high peak-to-average power ratio (PAPR) problem, which leads to significant nonlinear distortions and hence, the FCNN provides a promising solution. Compared with the zero-forcing (ZF) equalization, results have shown that the error vector magnitude (EVM) can be reduced by more than 50% using the FCNN NLE. The FCNN NLE has also been applied in the OFDM based RoF system [87], where the amplitude and phase parts of the signal have been learnt and compensated using two sub-networks.

Neural Network Decoders in RoF Systems
In previous studies, the FCNN is utilized as an equalizer to suppress various types of impairments, especially the nonlinear impairments. In addition to this, the FCNN has also been proposed and studied for both equalization and decoding in one step [88], as shown in Figure 10. In this type of FCNN decoder, the output layer typically uses the cross-entropy loss function and the Softmax function and hence, all possible output values with the corresponding probability are computed. By selecting the output value with the highest probability, the decoding of received signal can be realized, whilst the nonlinear equalization capability of the FCNN is maintained. Results have shown that compared with the Volterra nonlinear equalizer and hard decision, the receiver sensitivity with the FCNN decoder can be improved by 1.6 dB in a 60 Gb/s 8-pulse amplitude modulation (PAM8) RoF system, whilst the computation complexity is similar. A similar approach has also been applied to an OSSB system with Nyquist PAM-4 (NPAM-4) [89], where the linear impairments are compensated by a feedforward equalizer (FFE) and the nonlinear equalization and decoding are realized by the FCNN in one step. Results have shown that compared with the FFE only, the receiver sensitivity can be improved by about 2 dB at 25 Gb/s by the FCNN decoder and the improvement further increases at higher data rate. The FCNN based decoder for both equalization and decoding has also been studied in the 5G RoF fronthaul system in joint with the probabilistic shaping, which can optimize the signal constellation probabilistic distribution [90]. The probabilistic shaping normally results in large PAPR, since it increases the minimum Euclidean distance amongst the constellation points under a given average power. Hence, this can lead to severe nonlinearity in RoF fronthaul systems. The FCNN decoder used consists of 2 hidden layers and an output layer with 8 output neurons, corresponding to the PAM8 modulation format. Results have shown that compared with the least mean square (LMS) equalization scheme, 3.2 dB improvement on the receiver sensitivity can be achieved with the FCNN decoder, which is realized by better suppressing the nonlinear impairment in the system.
In the studies discussed above, the FCNN has been used as either equalizer or decoder. We also summarize the key parameters of the FCNN models used in these studies in Table 3, including the number of hidden layers, the number of neurons in each layer, the nonlinear activation function and the output layer type. It can be seen that the model parameters of the FCNN used in different studies change considerably. This is because that these parameters highly depend on the specific task and The FCNN based decoder for both equalization and decoding has also been studied in the 5G RoF fronthaul system in joint with the probabilistic shaping, which can optimize the signal constellation probabilistic distribution [90]. The probabilistic shaping normally results in large PAPR, since it increases the minimum Euclidean distance amongst the constellation points under a given average power. Hence, this can lead to severe nonlinearity in RoF fronthaul systems. The FCNN decoder used consists of 2 hidden layers and an output layer with 8 output neurons, corresponding to the PAM8 modulation format. Results have shown that compared with the least mean square (LMS) equalization scheme, 3.2 dB improvement on the receiver sensitivity can be achieved with the FCNN decoder, which is realized by better suppressing the nonlinear impairment in the system.
In the studies discussed above, the FCNN has been used as either equalizer or decoder. We also summarize the key parameters of the FCNN models used in these studies in Table 3, including the number of hidden layers, the number of neurons in each layer, the nonlinear activation function and the output layer type. It can be seen that the model parameters of the FCNN used in different studies change considerably. This is because that these parameters highly depend on the specific task and the RoF system setting, such as the modulation format. In addition, the receiver sensitivity or the OSNR is widely used in previous studies to characterize the performance improvement enabled by the ML schemes. These performance indicators are typically selected since the ML models normally handle multiple impairments simultaneously and these parameters can reflect the result of multiple impacting factors. However, the use of these performance indicators also leads to unclear net improvement against the nonlinearity in RoF systems. One exception is ref. [85], where the IMD3 improvement has been characterized to direct show the capability of ML models against the nonlinearity.

Other Physical Layer Signal Processing Applications of Neural Networks in RoF Systems
In addition to the NLE and the decoder, the FCNN has also been investigated for the physical layer signal demultiplexing in RoF systems. One such study has focused on the demultiplexing of a 28 GHz mm-wave uplink RoF system with the non-orthogonal multiple access (NOMA) scheme [91]. The NOMA principle has been widely studied for simple multiplexing and demultiplexing in the power domain. However, for RoF systems with centralized BBUs, the latency of NOMA demultiplexing is normally large due to the large number of user signals. The FCNN has been proposed to solve this limitation and results have shown that the latency can be greatly reduced by the parallel demultiplexing of packed signals. The receiver sensitivity has also been slightly improved by the FCNN. In addition, the FCNN has also been proposed and studied for the MIMO demultiplexing in small-cell mm-wave RoF systems [92]. Compared with the conventional matrix inverse-based demultiplexing method, the FCNN based approach has shown improved tolerance to the time discrepancy amongst multiple channels and achieved better coordinated multi-point transmission capacity.
The studies summarized above are all based on the use of FCNN for the signal equalization, decoding and demultiplexing in RoF systems. In addition to FCNNs, other types of neural networks have also been proposed and investigated to compensate the impairments and to improve the performance in RoF systems. Recent studies have proposed the use of convolutional neural network (CNN) and the CNN based signal decoder has been demonstrated in a 60 GHz mm-wave RoF system [93,94]. The CNN has been widely studied and used for image recognition and speech recognition tasks and one unique advantage of the CNN is the parallel internal structure that enables efficient parallel computations. The general principle of the CNN based decoder in the RoF system is shown in Figure 11, which consists of the input layer, the convolutional layers and the output layer. The received data from the RoF system is treated as one-dimensional vector by the input layer and one-dimensional convolution is then used in the subsequent convolutional layers. The convolution operation is also followed by the nonlinear activation function and the max-pooling in each convolutional layer. Through this way, various impairments in the system carried by the received data, including both linear and nonlinear impairments, can be captured and suppressed accordingly. Compared with the FCNN, where each neuron is connected to all neurons in the previous layer, the number of parameters to be learnt is reduced in the CNN based decoder and hence, the computation cost can be reduced, especially during the computation intensive training process. Experimental results have shown that in a 60 GHz mm-wave RoF system, the CNN based decoder has achieved similar BER performance as the FCNN based decoder, whilst the number of training iterations and the size of training dataset has been reduced by about 50% and over 30%, respectively.
Photonics 2020, 7, x FOR PEER REVIEW 16 of 31 dimensional convolution is then used in the subsequent convolutional layers. The convolution operation is also followed by the nonlinear activation function and the max-pooling in each convolutional layer. Through this way, various impairments in the system carried by the received data, including both linear and nonlinear impairments, can be captured and suppressed accordingly. Compared with the FCNN, where each neuron is connected to all neurons in the previous layer, the number of parameters to be learnt is reduced in the CNN based decoder and hence, the computation cost can be reduced, especially during the computation intensive training process. Experimental results have shown that in a 60 GHz mm-wave RoF system, the CNN based decoder has achieved similar BER performance as the FCNN based decoder, whilst the number of training iterations and the size of training dataset has been reduced by about 50% and over 30%, respectively. In addition to the CNN based scheme, the binary convolutional neural network (BCNN) has also been investigated for 60 GHz mm-wave RoF systems [93,94]. As shown in Figure 12, the structure of BCNN based decoder is similar to the CNN decoder in general, whilst the convolution computation is replaced with the binary convolution computation using the most significant bit (MSB) multiplications and additions. The nonlinear activation function, batch normalization and maxpooling outputs are also binarized in BCNN. The binarized parameters and outputs of the convolutional layers and the binary convolution operation can further reduce the computation cost for practical implementations in RoF systems. In addition to the CNN based scheme, the binary convolutional neural network (BCNN) has also been investigated for 60 GHz mm-wave RoF systems [93,94]. As shown in Figure 12, the structure of BCNN based decoder is similar to the CNN decoder in general, whilst the convolution computation is replaced with the binary convolution computation using the most significant bit (MSB) multiplications and additions. The nonlinear activation function, batch normalization and max-pooling outputs are also binarized in BCNN. The binarized parameters and outputs of the convolutional layers and the binary convolution operation can further reduce the computation cost for practical implementations in RoF systems.
In all previous studies reviewed so far, the neural network is applied at the receiver side for equalization, decoding or demultiplexing. In addition to the receiver side, the neural network can also be applied to the transmitter side to realize adaptive predistortion in RoF systems [95]. An FCNN has been used as a pre-distorter to compensate the nonlinearity in the RoF system before fiber transmission. Results have shown that the linearization of RoF system can be realized and better performance can be achieved.
BCNN based decoder is similar to the CNN decoder in general, whilst the convolution computation is replaced with the binary convolution computation using the most significant bit (MSB) multiplications and additions. The nonlinear activation function, batch normalization and maxpooling outputs are also binarized in BCNN. The binarized parameters and outputs of the convolutional layers and the binary convolution operation can further reduce the computation cost for practical implementations in RoF systems. In all previous studies reviewed so far, the neural network is applied at the receiver side for equalization, decoding or demultiplexing. In addition to the receiver side, the neural network can also be applied to the transmitter side to realize adaptive predistortion in RoF systems [95]. An FCNN

Open Questions and Possible Future Directions: ML-Based Signal Processing in RoF Systems
Whilst the ML-based signal processing algorithms for RoF systems have attracted a lot of research attention during the past years and significantly improved transmission performances have been demonstrated, there are still a number of critical questions that need to be addressed in future studies. In this section, we discuss such questions and provide our perspectives.
One key question of ML-based approaches in RoF systems, especially the neural network-based methods, is related to the creation of training data. A notable scenario is when the pseudo-random bit sequences (PRBS) with relatively short pattern lengths are used in training [96]. In this case, there may exist a large artificial gain when the PRBS pattern is learnt by the neural network, whilst the actual system impairments are not effectively captured or compensated. Therefore, truly random numbers or long PRBS patterns need to be used in related studies and careful checks and validations are required when applying the neural network-based signal processors into RoF systems.
Another important problem that needs to be addressed in this rapidly developing field is the computation cost associated with the use of neural networks. Whilst neural networks have great capabilities in solving complex tasks, it also has a large number of learnable parameters that require a large number of training data, which results in high computation cost. The implementations of neural networks-based signal processors in previous studies have mainly been based on high-end computers with powerful CPUs or graphic processing units (GPUs). However, the costs of high-end CPUs and GPUs are usually high and their sizes are usually bulky, preventing them from practical applications. In addition, the training of the neural network signal processors may require a large number of epochs and hence, relatively long training time is needed even with high-end CPUs or GPUs. Furthermore, different ML application scenarios in RoF systems also have different computation cost requirements and tolerance. When the ML model is used in a centralized location, such as in the central office, the tolerance of high computation cost is relatively large and the relatively complicated ML algorithms with more advanced capabilities can be used. On the other hand, when the ML model is deployed towards the remote node side, such as in the small cell cases, the computation cost needs to be much lower (the hardware accelerators discussed later can be a promising solution in this case). Therefore, applying neural networks for real-time signal processing in RoF systems requires further study.
One possible solution proposed recently to reduce the computation cost of neural network-based signal processing in RoF systems is the pruning method, which can reduce the complexity of a neural network, especially the FCNN [97]. The principle of the pruning method is shown in Figure 13 and it is based on the fact that there are redundant connections in the FCNN. Therefore, by pruning the connections between neurons, the number of learnable parameters becomes smaller and the computation cost is reduced. The pruning process normally starts with removing the connections that have low weights and then retrain the new neural network. The pruning and retraining can be iteratively carried out to gradually optimize the neural network in the RoF system, whilst maintaining impairment compensation capability and the system performance.
One possible solution proposed recently to reduce the computation cost of neural network-based signal processing in RoF systems is the pruning method, which can reduce the complexity of a neural network, especially the FCNN [97]. The principle of the pruning method is shown in Figure 13 and it is based on the fact that there are redundant connections in the FCNN. Therefore, by pruning the connections between neurons, the number of learnable parameters becomes smaller and the computation cost is reduced. The pruning process normally starts with removing the connections that have low weights and then retrain the new neural network. The pruning and retraining can be iteratively carried out to gradually optimize the neural network in the RoF system, whilst maintaining impairment compensation capability and the system performance. Figure 13. The pruning principle of FCNN to reduce the computation cost. Figure 13. The pruning principle of FCNN to reduce the computation cost.
Another practical issue related with the usage of neural networks as RoF system signal processing units lies in that the neural network is normally trained for a specific system setting and channel condition, such as for a particular signal power level. Therefore, when the condition changes in the RoF system, the neural network signal processor needs to be retrained, which again requires a large number of training data and has high computation cost. This issue is particularly challenging in RoF systems, since the RoF system has the rapid-changing wireless channel that may frequently trigger the need for the retraining of neural network.
One promising solution to solve this problem is the transfer learning principle [98,99], which can retain and apply the knowledge captured from a previous task to a new task. In the transfer learning scheme, the parameters of neural network that is trained are reused when the neural network needs to be retrained (e.g., caused by the wireless channel condition change). Hence, the number of iterations needed in the retraining process can be reduced significantly and the size of dataset for retraining can be also much smaller.
Previous research of neural network-based signal processing in RoF systems has mainly focused on the use of FCNN, CNN and BCNN. In addition to these neural structures, another category of neural network is also promising, which is the recurrent neural network (RNN). The RNN has a recurrent structure and is known to be powerful in capturing information in sequential data. In RNN-based signal processors, the long-and short-term memory (LSTM) scheme is promising, as it can capture the impairments carried by both directly neighboring symbols and remote symbols. The LSTM based decoder has been demonstrated in optical communications [66], whilst the adaptation to RoF system requires further investigation. The attention mechanism can be introduced to the LSTM based signal decoder as well [69], to further enhance the capability in mitigating the impairments that mainly affect distant received symbols. In addition to the RNN, other types of modified neural network structures also worth more detailed study in analog RoF systems, such as the MIMO FCNN with multiple interconnected FCNNs [100] and the self-organizing feature map (SOFM) neural network that has been studied in the digitized RoF transmissions [101].
As discussed above, current neural network-based signal processing schemes proposed and demonstrated in RoF systems have been implemented using the high-end CPUs or GPUs, which leads to high computational cost, large form factor and large power consumption issues. In addition, the processing latency due to the neural network signal processors has not been thoroughly studied either. Whilst the latency is not treated as a critical issue for other research domains where neural networks are being actively adopted, it is important in telecommunication applications. To solve these limitations, the neural network hardware accelerators have been proposed. The neural network hardware accelerator can be based on application specific integrated circuits (ASICs) or field programmable gate arrays (FPGAs) [102]. The ASICs based neural network hardware accelerators have the advantages of low power consumption and high speed. However, since they are typically application specific, the flexibility in different applications are limited and the resulting cost is normally high. On the other hand, the FPGA based neural network hardware accelerators can be reconfigured for different targeting applications flexibly, whilst the low power consumption advantage is maintained. In addition, due to the parallel computation capability of FPGAs, realizing low processing latency is also possible. Therefore, the FPGA based neural network hardware accelerator can be a good option in the RoF system applications. Recently, the FPGA based CNN and BCNN hardware accelerators for RoF systems have been proposed and demonstrated [103]. Results have shown that compared with the GPU based implementation, the FPGA neural network hardware accelerator can reduce the power consumption and the processing latency significantly. The trade-off amongst the processing latency, power consumption and hardware resources required have also been studied. Whilst the results are promising, the processing latency is still in the range of tens to hundreds of microseconds, which is insufficient for applications with stringent real-time and latency requirements. In addition, only the testing part of the neural network signal processor has been implemented, whilst the more challenging training part needs further studies.
Finally, from the implementation point of view, the neural network based models may also face overfitting or underfitting issues, both of which affect the performance of ML models. The typical trend of overfitting is that the training and validation process achieves high accuracy, whilst the testing has low accuracy. On the other hand, in underfitting, all of training, validation and testing processes tend to obtain low accuracy. In each ML model, the hyperparameters need to be tuned to avoid either overfitting or underfitting.

ML Models for RoF Network Management and Resource Allocation
In Section 3, we have reviewed the recent studies on applying ML-based signal processing schemes in RoF systems. The main goal of these studies is using the ML-based algorithms to suppress various physical impairments introduced during the RoF signal generation, modulation, transmission and detection. Both neural network-and non-neural network-based algorithms have been investigated. Due to the processing of multiple impairments simultaneously and the capability of handling nonlinearity, results have shown that better physical layer signal transmission performance can be achieved in RoF systems.
In addition to using the ML algorithms as the physical layer signal processors, various ML schemes have also been proposed and studied to manage the RoF networks and to allocate resources efficiently. In this section, we review the recent developments in this field, including the application of ML techniques for the RoF system and network performance monitoring, dynamic bandwidth allocation (DBA), fault detection and positioning and reinforcement learning based network management and resource allocation strategies, as summarized in Table 4. Different from the previous work that reviews ML techniques in general optical networks [12], which uses an application-oriented approach, here we adopt the learning-oriented method and categorize relevant studies in the RoF networks field based on the learning method used (i.e., supervised learning or reinforcement learning).

Supervised Learning in RoF Network Management and Resource Allocation
Two types of ML algorithms have been widely studied in RoF network management and resource allocation tasks, namely the supervised learning and the reinforcement learning. Several types of supervised learning algorithms have been applied in RoF networks, such as KNN, SVM and neural networks. In this section, we will focus on the supervised learning approaches in RoF networks. We will review the reinforcement learning based schemes in Section 5.2.
The optical performance monitoring (OPM) is important to adaptively control the network operation and signal transmission and hence, it is highly demanded in fiber-wireless converged networks. One key aspect of OPM is recognizing the signal modulation format and the neural network-based approach has been proposed and studied to satisfy this need [104]. An FCNN has been used and the asynchronous amplitude histograms of detected signals with four different data rates and modulation formats have been used as the input to the neural network. Results have shown that with the presence of impairments (e.g., noise, chromatic dispersion), the modulation format can be recognized with over 99.9% accuracy. The modulation format recognition in RoF networks has also been achieved using the autoencoder neural network scheme [105]. The autoencoder neural networks is a special case of FCNN with two parts, as shown in Figure 14. The first part of the autoencoder neural network works as an encoder that compress the input data into a smaller dimension and the second part works as a decoder that maps the data back to the original dimension. This working principle of the autoencoder neural network provides the capability of extracting features from the received raw data directly and effectively. In a 28 GHz RoF system with chromatic dispersion and amplified spontaneous emission (ASE) noise [105], results have shown that accurate modulation format recognition over six modulation formats can be achieved for >10 dB OSNR range. In addition to the modulation format recognition, neural network schemes have also been widely studied in the converged fiber-wireless access networks, such as the radio over passive optical networks (PON), for efficient dynamic backhaul network resource allocation. In this type of networks, dynamic resource allocations to BSs are needed for better robustness and efficiency and the neural network-based schemes have been investigated. In References [106,107], an intelligent agent has been deployed at the BS for monitoring and collecting the data from the environment. Then the future bandwidth requirements have been predicted to request resources in the access network. The supervised learning method is used, where the input data consists of the time information and the output data is the averaged bandwidth demand. Due to the capability of capturing the nonlinearity in the data traffic, the neural network-based approach has shown better performance compared with conventional methods.
The use of ML techniques in converged fiber-wireless networks to reduce the uplink latency and optimize the bandwidth allocation simultaneously has been proposed and studied as well, leading to the predictive DBA schemes [108][109][110][111][112][113][114]. These studies have focused on exploring various types of network features jointly using ML techniques to improve the DBA efficiency and to reduce the uplink latency. In general, the task involved has been considered as the classification or regression problems. When considered as the classification task, the ML algorithm identifies the class that the current scenario belongs to and then the bandwidth resource is allocated in an optimized and predictive way to reduce the latency in the fiber-wireless networks. On the other hand, when the task is considered as the regression problem, the numeric value corresponding to the current condition are normally predicted, which is the optimum value of bandwidth needed to achieve the minimum latency. In addition to the modulation format recognition, neural network schemes have also been widely studied in the converged fiber-wireless access networks, such as the radio over passive optical networks (PON), for efficient dynamic backhaul network resource allocation. In this type of networks, dynamic resource allocations to BSs are needed for better robustness and efficiency and the neural network-based schemes have been investigated. In References [106,107], an intelligent agent has been deployed at the BS for monitoring and collecting the data from the environment. Then the future bandwidth requirements have been predicted to request resources in the access network. The supervised learning method is used, where the input data consists of the time information and the output data is the averaged bandwidth demand. Due to the capability of capturing the nonlinearity in the data traffic, the neural network-based approach has shown better performance compared with conventional methods.
The use of ML techniques in converged fiber-wireless networks to reduce the uplink latency and optimize the bandwidth allocation simultaneously has been proposed and studied as well, leading to the predictive DBA schemes [108][109][110][111][112][113][114]. These studies have focused on exploring various types of network features jointly using ML techniques to improve the DBA efficiency and to reduce the uplink latency. In general, the task involved has been considered as the classification or regression problems. When considered as the classification task, the ML algorithm identifies the class that the current scenario belongs to and then the bandwidth resource is allocated in an optimized and predictive way to reduce the latency in the fiber-wireless networks. On the other hand, when the task is considered as the regression problem, the numeric value corresponding to the current condition are normally predicted, which is the optimum value of bandwidth needed to achieve the minimum latency.  [116] High precision time synchronization <100 ns synchronization accuracy achieved Q-learning algorithm [117] Dynamic RF channel selection Reward criteria: Minimizing SINR Q-learning algorithm [118] Optimize the placement of DU and CU Reward criteria: Capacity and latency Q-learning algorithm [119] Optimize BBU placement and routing in C-RAN Reward criteria: Bandwidth and latency SARSA learning algorithm [120] Real-time interference avoidance Reward criteria: Log-value of BER difference between different states Q-learning algorithm [121] Minimize power consumption Reward criteria: Network power consumption and transition power consumption AC learning algorithm [122] Maximize the users' satisfaction in QoS Reward criteria: Average data rate, head-of-line packet delay and average packet loss Q-learning algorithm [123] Routing policy to maximize profit of infrastructure provider Reward criteria: Revenue generated by a connectivity service Q-learning algorithm [124] Slice admission strategy to maximize profit of infrastructure provider Reward criteria: Loss induced by the slice request and the maximum potential revenue of a slice request Various ML algorithms have been investigated for the DBA and uplink latency reduction in fiber-wireless converged networks, such as the KNN [108], the SVM [110] and the neural network [111][112][113][114]. The basic principle of KNN has been discussed in Section 3.2 and this type of supervised learning method has been used as a classifier for DBA and reducing the uplink latency [108]. In the KNN scheme, the average of the nearest k instances in the past have been used for the resource allocation of the current instance. The SVM has also been applied for the DBA in fiber-wireless networks [110], where the SVM has been used as a traffic classifier with nonlinear hyperplanes to better handle the nonlinearity amongst data traffic.
Compared with the KNN and SVM based methods, recently the neural network-based schemes have attracted even more attention for RoF network management and resource allocation. Neural networks have the capability of capturing complicated information or knowledge carried by the input data and they have been proposed to predict the traffic and the corresponding bandwidth demand in cloud radio access network (C-RAN) RoF fronthaul networks [111][112][113][114]. In Reference [111], an FCNN has been used in a time-division-multiplexing passive optical network (TDM-PON) based C-RAN RoF fronthaul for traffic prediction and resource allocation. The FCNN is trained with optical network unit (ONU) reports collected previously during BBU processing cycle and then the bandwidth demand in the fronthaul is predicted for backhaul resource allocation. In addition to the FCNN, the RNN has also been utilized for traffic prediction and resource allocation in C-RAN fronthaul networks [112,113]. With the LSTM, 30 min predictions in advance have been achieved for the reconfiguration of the optical network and the network throughput has increased by 7% together with 18% reduction in the required processing resource [112]. The LSTM based RoF fronthaul traffic prediction and resource allocation has also been demonstrated in combination with edge computing to further reduce the computation resource requirements [113]. Furthermore, the capsule neural network (CapsNet) principle has been studied as well [114], where the CapsNet has been used to provide traffic classification with high accuracy via the parallel networks structure. Compared with LSTM and CNN based schemes, results have shown that higher traffic classification accuracy and lower latency can be achieved.
In addition to predicting the traffic and optimizing the bandwidth allocation accordingly, the neural networks have also been applied to other network management aspects in fiber-wireless networks, such as the fault detection and positioning [115]. In Reference [115], a discrete Hopefield neural network (DHNN) has been utilized for the rapid locating of multiple network failures in a network combining RoF transmission, RF wireless and optical wireless communications. The DHNN has optimized the analysis of faults and alarms for improved positioning accuracy and hence, better network performance and service quality have been achieved.

Reinforcement Learning in RoF Network Management and Resource Allocation
In addition to the models based on supervised learning, there also exists models that are based on the reinforcement learning framework for RoF networks. The reinforcement learning is an iterative learning framework. It assumes an agent, an environment that the agent can interact with and a certain task within the environment that we aim to train the agent to excel at. The goal of training is to allow the agent to learn a strategy/policy for accomplishing or excelling at the task, which can be seen as a decision-making strategy. Given a "state" of the agent is in within the environment, the strategy provides the next action that the agent should perform in order to accomplish a task. The strategy learning process is shown in Figure 15. Each time the agent performs an action, the environment may change the state of the agent. In the meantime, the environment also provides a feedback for the past action, a "reward" usually indicating if the agent is closer to accomplishing the target task. The agent then can use the reward to determine whether the past action is a good or bad move and update its own strategy for the target task. Note that reward is not always available after each action and this is perceived as one of the main challenges in developing reinforcement learning models. Compared with other types of learning, the reinforcement learning is defined in an interactive environment and the learning process simply happen between the interactions between the agent and the environment. Hence, it is perceived as a way of achieving self-optimization/self-learning. With this advantage, the reinforcement learning approach has been proposed and studied for the network management and resource allocation in RoF networks [116][117][118][119][120][121][122][123][124].
In Reference [116], a reinforcement learning algorithm has been applied to achieve ultra-high precision time synchronization in the cloud RoF network. The Q-learning algorithm has been used, where the input data includes the current link states, the features of the link and the link type. The Q-learning not only pays attention to the recent result as the reward, it also considers the past instances as the long-term reward. Results have shown that <100 ns synchronization accuracy can be achieved using the autonomous reinforcement learning algorithm. In Reference [117], a reinforcement learning scheme has been proposed for the dynamic channel selection in a cognitive RoF network, where the best RF channel has been selected amongst different frequency bands to minimize the interference and to optimize the network-wide performance. The Q-learning has been used as well and the signal-to-interference-plus noise ratio (SINR) has been selected as the main criteria of the reward. Results have demonstrated the capability of the strategy in avoiding aggregated interference, reducing network outage probability and increasing the throughput. In Reference [118], the reinforcement learning algorithm has been applied to optimize the placement of distributed unit (DU) and central unit (CU) in the 5G and beyond-5G fiber-wireless networks to serve diverse services. In the reinforcement learning scheme, the combination of the service type, the residual capacity of links and the processing pools have been used as the state, the locations of DU and CN and the optical path have been used as the action and the reward has been designed to reflect the capacity and latency constrains. The Q-learning algorithm has been implemented with a neural network, which consists of two CNNs followed by two fully connected layers. Results have shown that both large-scale service paradigm and bandwidth resource-saving can be achieved. In Reference [119], the BBU placement and routing in a C-RAN network has also been optimized using the reinforcement learning approach to improve the network resource allocation and to reduce the network latency. The reinforcement learning has also been applied in the mm-wave RoF system for real-time interference avoidance [120]. The log-value of BER difference between different states has been used for the reward and the SARSA reinforcement learning algorithm has been used, which has been shown to be more effective than the Q-learning algorithm in real-time systems. lower latency can be achieved.
In addition to predicting the traffic and optimizing the bandwidth allocation accordingly, the neural networks have also been applied to other network management aspects in fiber-wireless networks, such as the fault detection and positioning [115]. In Reference [115], a discrete Hopefield neural network (DHNN) has been utilized for the rapid locating of multiple network failures in a network combining RoF transmission, RF wireless and optical wireless communications. The DHNN has optimized the analysis of faults and alarms for improved positioning accuracy and hence, better network performance and service quality have been achieved.

Reinforcement Learning in RoF Network Management and Resource Allocation
In addition to the models based on supervised learning, there also exists models that are based on the reinforcement learning framework for RoF networks. The reinforcement learning is an iterative learning framework. It assumes an agent, an environment that the agent can interact with and a certain task within the environment that we aim to train the agent to excel at. The goal of training is to allow the agent to learn a strategy/policy for accomplishing or excelling at the task, which can be seen as a decision-making strategy. Given a "state" of the agent is in within the environment, the strategy provides the next action that the agent should perform in order to accomplish a task. The strategy learning process is shown in Figure 15. Each time the agent performs an action, the environment may change the state of the agent. In the meantime, the environment also provides a feedback for the past action, a "reward" usually indicating if the agent is closer to accomplishing the target task. The agent then can use the reward to determine whether the past action is a good or bad move and update its own strategy for the target task. Note that reward is not always available after each action and this is perceived as one of the main challenges in developing reinforcement learning models. Compared with other types of learning, the reinforcement learning is defined in an interactive environment and the learning process simply happen between the interactions between the agent and the environment. Hence, it is perceived as a way of achieving self-optimization/selflearning. With this advantage, the reinforcement learning approach has been proposed and studied for the network management and resource allocation in RoF networks [116][117][118][119][120][121][122][123][124]. In Reference [116], a reinforcement learning algorithm has been applied to achieve ultra-high precision time synchronization in the cloud RoF network. The Q-learning algorithm has been used, where the input data includes the current link states, the features of the link and the link type. The Q-learning not only pays attention to the recent result as the reward, it also considers the past instances as the long-term reward. Results have shown that <100 ns synchronization accuracy can be achieved using the autonomous reinforcement learning algorithm. In Reference [117], a reinforcement learning scheme has been proposed for the dynamic channel selection in a cognitive RoF network, where the best RF channel has been selected amongst different frequency bands to In addition to the synchronization, RF channel selection, BBU, DU and CU placement and real-time interference avoidance applications discussed above, the reinforcement learning has also been widely applied to other network resource scheduling and routing optimization tasks. In Reference [121], the resource allocation strategy in C-RAN has been studied using the reinforcement learning with the objective of minimizing the power consumption. In Reference [122], a reinforcement learning based scheduling framework has been proposed to maximize the users' satisfaction in terms of the QoS. The actor-critic (AC) reinforcement learning algorithm has been used to achieve higher learning stability. In Reference [123], a reinforcement learning based routing policy has been studied to maximize the profit of an infrastructure provider whilst satisfying the QoS constrains. In Reference [124], a slice admission strategy based on reinforcement learning has been proposed for services with different priorities in the 5G flexible RAN networks to maximize the profit of the infrastructure provider. Results have shown that over 23% improvement can be achieved compared with benchmarking deterministic heuristics.

Open Questions and Possible Future Directions
Whilst a large number of ML-based studies have been conducted during the past a few years in RoF networks to more efficiently manage the network and allocate resources, there are still a number of open questions that need to be addressed. In this section, we briefly discuss some open questions and possible solutions from our perspective. One key requirement of using ML schemes effectively is the availability of large datasets. Whilst the large datasets requirement is not such a big issue when ML algorithms are used for the physical layer signal processing in RoF systems, the data acquisition for RoF network management is challenging when taking the overhead, time and energy costs into consideration. The real-world dataset availability is even more challenging when the target is related with network failures, as practical networks normally have conservative designs to minimize the possibility of network faults. To solve the dataset availability challenge, one possible solution is the transfer learning principle discussed in Section 4 [98,99]. This approach can be useful if historical datasets are available and the network evolves relatively slowly over time. In this case, ML models can be gradually updated with much fewer amount of training data required. Another possible solution is incorporating domain knowledge into the ML models. Therefore, the ML algorithms can be trained based on expert knowledge that is already available and the size of datasets required can be significantly reduced. However, how to combine the domain knowledge effectively into ML models remains a challenge that needs further innovative studies.
Another open question that may restrict the application of current ML based algorithms in practical RoF networks is the theoretical understanding of such schemes. Currently, there are a large number of hyperparameters in the ML-based approaches that need to be manually selected and tuned. This is especially the case for the neural network-based schemes, which has a large number of hyperparameters, such as the dimension of neural networks and regularization terms. There is no theory or rules that can definitely guide the selection of these hyperparameters effectively. At present, a common practice is to tune the hyperparameters using grid search starting from a set of parameters suggested by previous experiences/related tasks. A rule of thumb is: a ML model normally has small variances in performance when the hyperparameters are close to the optimal and on the opposite, the performance changes significantly if the hyperparameters are far from the optimal. Thus, the selection process is usually conducted by finding a close-to-optimal region quickly first and then followed by a grid search within that region. In this process, the overhead caused by hyperparameters tuning is dependent on whether a good starting point is selected based on pervious experiences. The actual cost, however, is difficult to quantify since it is highly dependent on the model selected and the complexity of the task. Therefore, deep theoretical understanding of these ML algorithms is needed so they can be effectively applied in practical RoF network management and resource allocation applications.
In addition, whilst the ML-based algorithms may achieve improvement over the conventional network management schemes, the additional cost involved also needs to be carefully considered. One additional cost is the large training dataset needed discussed above: without a large training dataset, the performance of ML-based schemes may be worse than conventional schemes. Another is the amount of computation required in ML-based schemes, which may lead to additional deployment cost (e.g., computation hardware cost) and power consumption. Therefore, the trade-offs need to be thoroughly investigated to better demonstrate the advantages of applying ML-based algorithms in RoF networks.
One more open question that needs to be addressed is the design of ML-based algorithms for heterogenous fiber-wireless backhaul and fronthaul networks. It has been widely envisioned that the future wireless networks will incorporate various backhaul and fronthaul technologies, such as conventional RF, optical fiber, mm-wave and even terahertz waves and optical wireless signals [125,126]. Each technology has its unique requirements. Therefore, the network management and resource allocation become more challenging and ML schemes that can provide high efficiency in this type of heterogenous networks are needed.

Conclusions
With the recent rapid development and advancement of ML technology, the application of ML-based algorithms to improve the performance of optical communication systems and networks has been widely studied. Compared with other types of optical communication systems, the RoF systems normally suffer from more severe impairments, especially the nonlinear impairments during signal modulation, transmission and detection. Hence, due to the capability of suppressing nonlinearity, the ML-based schemes have attracted intensive interests during the past a few years to suppress physical layer impairments in RoF systems. In this paper, we have reviewed the recent studies of various types of ML schemes in this field, including the k-means, KNN, SVM, FCM-GK and neural network-based algorithms. In these studies, the supervised or unsupervised ML algorithms have been mainly used as the equalizer, the decoder or the demultiplexer and better RoF system performance has been achieved.
In addition to the physical layer, the ML techniques have also been widely studied in the network layer for efficient RoF network management and resource allocation. Hence, in this paper, we have also reviewed ML schemes for the network layer, where both supervised learning and reinforcement learning principles have been investigated. With the supervised learning, the ML algorithms have been studied for the modulation format recognition, the traffic classification and predicting for DBA and latency reduction and the network faults detection and positioning. The reinforcement learning that is capable of learning a policy has also been studied for the automatic network self-optimization. In RoF networks, various reinforcement learning algorithms, such as Q-learning, AC learning and SARSA learning algorithms, have been proposed and demonstrated. It has been shown that channel selection, interference avoidance, power consumption minimization, network synchronization, QoS satisfaction maximization and infrastructure provider profit optimization strategies can be successfully developed with reinforcement learning principles.
Together with the recent developments, in this paper we have also discussed some key open questions of applying ML techniques in RoF systems and networks that still need to be addressed in future studies, such as the dataset requirement, the training process and hyperparameter selection and the computation cost issues. We have discussed a few possible solutions to these challenges that need to be further studied in our perspective, such as the neural network hardware accelerators, the transfer learning schemes and the incorporation of domain knowledge with ML algorithms. In spite of these challenges, given the capability of ML schemes, in our opinion they will be an essential part in future fiber-wireless integrated systems and networks.