1. Introduction
Two-dimensional (2D) transition metal dichalcogenides, and in particular monolayer
, offer a direct band gap, atomically sharp interfaces, and excellent electrostatic control at the ultimate thickness limit, which makes them attractive candidates to extend device scaling beyond silicon [
1,
2,
3,
4]. Chemical vapor deposition (CVD) enables wafer-scale synthesis of monolayer
and integration into circuits. Alternative synthesis routes include molecular beam epitaxy [
5,
6], atomic layer deposition [
7,
8], sulfurization of Mo layers [
9], reactive sputtering [
10], pulsed laser deposition [
11,
12], Langmuir–Blodgett coating [
13], and printing [
14]. For CVD, the resulting films often contain grain boundaries, point defects, and substrate-induced disorder that alter transport and introduce pronounced hysteresis in field-effect characteristics [
4]. Recent work has shown that growth recipes, nucleation promoters, and substrate choice strongly influence crystalline quality and defect landscapes in CVD
[
15], yet the consequences for non-conventional computing schemes are still only partially quantified.
In conventional MOSFET logic, threshold-voltage drift and hysteresis caused by charge trapping at the
interface, oxide traps, and adsorbates are treated as reliability problems that degrade noise margins and bias stability [
16]. For CVD
, several studies attribute the dominant contribution to traps at grain boundaries and at the semiconductor–oxide interface, consistent with the intrinsic-oxide-trap picture developed for few-layer and thin-film devices [
16,
17]. The same history dependence that limits digital performance can, however, be harnessed in memristive operation: gate-tunable
memtransistors exploit hysteresis to implement non-volatile states, analog weight updates, and rate-dependent responses that are useful for neuromorphic computing [
18]. In previous work we showed that hysteresis in CVD
memristors can be quantified via an energy-normalized hysteresis area and that energy-efficient operation windows emerge from an interplay of trap dynamics, bias range, and device geometry [
18].
Neuromorphic computing aims to process information using distributed nonlinear dynamics and local memory rather than explicit instruction sequences. Reservoir computing (RC) is a particularly hardware-friendly framework in which a nonlinear dynamical system (the reservoir) is driven by an input signal and only a linear readout is trained. The reservoir may be implemented in software using recurrent neural networks, or physically using optical, electronic, or mechanical systems with suitable nonlinearity and fading memory. For chaotic time-series prediction, such as the Lorenz-63 system, RC can reconstruct attractors and achieve long prediction horizons when reservoir dynamics are matched to task timescales and remain stable but sufficiently rich [
19,
20]. Recent works show that even small reservoirs can reconstruct chaotic attractors reliably and that topology and spectral properties strongly influence long-term prediction performance [
21,
22].
Physical reservoirs based on memristive devices promise compact and low-power implementations [
19,
23]. Numerous material systems have been explored for chaotic prediction or classification, including oxide-based memristors [
24,
25,
26,
27,
28,
29], phase-change devices [
30,
31], organic synaptic transistors [
32,
33], and
-based dynamic reservoirs [
34,
35] for chaotic prediction or classification tasks. At the same time, 2D-material-based memtransistors are emerging as attractive candidates due to their scalable growth, tunable interfaces, and compatibility with back-end-of-line thermal budgets [
23]. Yet most demonstrations of
-based neuromorphic devices either rely on exfoliated or transferred flakes [
36], or focus on synaptic plasticity and classification tasks without a quantitative link between growth-induced defects, device hysteresis metrics, and system-level RC performance [
37].
Device-focused studies on typically report field-effect mobility, subthreshold slope, ON/OFF ratio, and a qualitative measure of hysteresis, but they rarely specify temporal response, fading-memory time constants, or stable operating windows needed to embed these devices into concrete neuromorphic or RC architectures. Conversely, system-level neuromorphic and machine-learning works usually prescribe abstract requirements such as nonlinearity, state dimensionality, and memory depth, without translating them into constraints on current densities, gate-voltage ranges, or switching speeds that a fabrication process can realistically deliver. Chaotic systems are most often treated in dimensionless units, leaving open how to map state variables to physical voltages and currents for a given device technology. As a result, it is difficult to answer seemingly simple questions such as whether a given memtransistor, grown under a particular CVD recipe, is suitable as a reservoir node for a specified task and under which bias conditions.
In this work we address part of this gap by linking CVD growth, device-level hysteresis, and reservoir computing performance in a single, experimentally realized platform. We grow monolayer by atmospheric-pressure CVD on , fabricate back-gated memtransistors, and use a single device as a nonlinear elements in a time-multiplexed reservoir that performs one-step prediction of the Lorenz-63 x-component. We first characterize the crystalline quality of the CVD using atomic force microscopy (AFM), Raman spectroscopy, and photoluminescence (PL), confirming monolayer thickness. We then extract key device figures of merit from transfer and output characteristics. Finally, we implement a Lorenz-63 RC pipeline in which the memtransistor is driven by a masked, time-multiplexed input sequence, and a linear readout is trained by ridge regression to predict the next time step. By systematically varying the gate-voltage window, input scaling, and number of virtual nodes, we map NRMSE for Lorenz-63 prediction to experimentally accessible device parameters.
We find that intermediate hysteresis and moderate drift, corresponding to a specific range of gate biases and input amplitudes, maximize state richness and yield-normalized root mean square errors down to normalized root mean square error for one-step prediction of the Lorenz-63 x-component. Lower hysteresis reduces nonlinearity, while excessive hysteresis or strong drift degrades memory and leads to unstable trajectories. These results provide a concrete example of how CVD-grown monolayer memtransistors can be engineered and biased to serve as physical reservoir nodes for chaotic time-series prediction. These results provide a concrete example of how CVD-grown monolayer memtransistors can be engineered and biased to serve as physical reservoir nodes for chaotic time-series prediction: despite only moderate transistor figures of merit (field-effect mobility of order and ON/OFF ), trap-mediated fading memory on 1–4 s time scales enables Lorenz-63 one-step prediction down to NRMSE under time-multiplexed operation (baseline NRMSE without multiplexing). Because the platform relies on atmospheric-pressure CVD and a standard solid-state gate stack rather than transferred flakes or electrolyte gating, it is compatible with wafer-scale replication and array-style space multiplexing, which is essential for scalable and cost-effective neuromorphic hardware.
2. Materials and Methods
2.1. CVD Growth of Monolayer
Monolayer
was grown by chemical vapor deposition (CVD) in a horizontal two-zone quartz-tube reactor operated at atmospheric pressure with nitrogen (
) as the carrier gas as demonstrated in
Figure 1a. Degenerately doped
substrates with a 90 nm thermally grown
layer served as both the growth substrate and the global back gate in the final devices.
Prior to growth, the substrates underwent a standard solvent cleaning (acetone, isopropanol, isopropanol in ultrasonic bath, and deionized water), followed by drying with . The substrates were then treated in an plasma at 300 W for 10 min to remove residual organic contamination and activate the surface. Immediately after the plasma step, an aqueous KCl solution with a concentration of 0.01 M was dispensed onto the substrates and spread by spin coating at 4000 rpm. The coated substrates were subsequently dried on a hot plate at 80 °C for 1 min. This KCl layer acts as a seeding promoter for domain nucleation on .
Solid sulfur (S) and molybdenum trioxide () powders were used as precursors. A total of 400 mg of S powder was loaded into a graphite crucible placed upstream in the low-temperature zone of the reactor, while 4 mg of powder and the KCl-treated (90 nm) substrates were placed side by side in the central hot zone of the furnace. The crucible and the substrates thus experienced the same nominal temperature profile. Sulfur solid source was heated indirectly by an external halogen lamp focused on the graphite crucible and maintained at approximately 140 °C during the growth, providing a stable S vapor flux into the reaction zone.
Before deposition the reactor was purged with
at 1000 sccm for 1 h at room temperature to remove residual air and moisture. For the growth step, the
flow was reduced to 500 sccm and kept constant throughout the temperature ramp, hold, and cool-down. The measured temperature profile during the CVD is shown in
Figure 1b. Starting from room temperature, the furnace was ramped to 800 °C in 18 min. After reaching 800 °C, the temperature was held for 2 min to allow
nucleation and lateral domain growth on the
surface. At the end of the hold step, the furnace heating was switched off and the system was allowed to cool naturally to room temperature under continuous
flow. Under these conditions, the growth yields predominantly monolayer
on
, as confirmed by the structural and optical characterization.
2.2. Memtransistor Fabrication
memtransistors were fabricated on the CVD-grown monolayer (90 nm) substrates using optical lithography and reactive ion etching (RIE) followed by metal contact definition. All photolithography steps were performed on a maskless aligner MLA150 (Heidelberg Instruments Mikrotechnik GmbH, Heidelberg, Germany.).
In the first lithography step, the as-grown film was patterned into isolated channel regions. A positive-tone photoresist AZ 1518 was spin coated onto the substrates and exposed using the MLA150 to define the areas to be preserved. After development, the exposed was etched by RIE using the patterned AZ 1518 as an etch mask. Immediately after the etch, an descum step was carried out in the same chamber. To minimize damage to the remaining during resist stripping, the polymerized AZ 1518 was removed in a two-stage wet process: first, the samples were immersed in acetone for 10 min, followed by AZ 100 remover at 60 °C for 10 min, and then a second AZ 100 remover step for 5 min at room temperature. Finally, the substrates were rinsed in isopropanol and deionized water and dried with nitrogen.
Source and drain contacts were defined in a second photolithography step. A negative-tone photoresist AZ LNR-003 was spin coated and patterned to open contact pads and channel regions in the resistor. Gold contacts with a thickness of 100 nm were then deposited by electron-beam evaporation (CS400ES, VON ARDENNE GmbH, Dresden, Germany). No additional adhesion or capping layers were used in this process. After metallization, lift-off was carried out using the PS 3121 photoresist stripper (Intelligent fluids GmbH, Leipzig, Germany), which selectively removed the remaining AZ LNR-003 and the overlying metal, leaving well-defined Au source and drain electrodes contacting the channels. Channel geometries and additional process parameters are listed with the corresponding device statistics in and in the section “Geometrical Definition and Tolerances”.
As shown in
Appendix D, the device architecture and fabricated layout are illustrated by a schematic cross-section and representative SEM micrographs of the patterned memtransistor structures.
Geometrical Definition and Tolerances
All memtransistors in this work were defined by the same mask layout (no intentional geometry variation) with nominal channel length and width . Because the channel is lithographically defined, the dominant geometrical uncertainties arise from mask-to-mask alignment accuracy, resist development bias, and contact edge definition. Based on the lithography tool specifications and optical inspection of representative structures across the chip, we estimate the effective dimensional tolerances to be within and for the present device set (i.e., sub-micrometer deviations around the nominal values). The source/drain electrodes were designed with a fixed overlap to the channel region to ensure robust contacting. The back-gate oxide thickness was fixed by the starting substrate with 90 nm thickness.
2.3. Structural and Electrical Characterization of Memtransistors
The as-grown films were first examined by atomic force microscopy (AFM) to confirm the surface morphology and layer thickness. AFM measurements were performed on a Dimension V system (Veeco Instruments Inc., Plainview, NY, USA) operated in tapping mode. Height and phase images were recorded over scan areas in the 25 μm range on multiple regions of each sample. The apparent step height between domains and the surrounding surface was used to verify monolayer thickness, while the absence of significant multilayer islands within the device channels was checked by combining height and phase contrast.
Raman and photoluminescence (PL) spectroscopy were used to further confirm the monolayer character and crystalline quality of the films. Spectra were acquired with a Alpha300 apyron Raman imaging microscope (WITEC GmbH, Kroppach, Germany) using a 532 nm excitation laser and an excitation power of 0.5 mW at the sample. Raman and PL maps were collected across representative device areas to assess spatial uniformity. The relative positions and intensities of the characteristic Raman modes, together with the strong direct-gap PL response, were used to identify monolayer regions.
Electrical measurements of the memtransistors were carried out using a shielded probe station integrated with a Keithley 4200-SCS Semiconductor Characterization System. All measurements reported in this work were performed in a nitrogen atmosphere at room temperature to minimize adsorption-related drifts and hysteresis. Transfer characteristics (–) and output characteristics (–) were recorded for multiple devices. Unless stated otherwise, the drain voltage was fixed at mV for transfer measurements, while the back-gate voltage was swept quasi-statically over the range relevant for reservoir computing operation. All electrical measurements were performed in a shielded probe station under a dry nitrogen atmosphere to minimize the influence of ambient humidity and adsorbates. Measurements were conducted under standard laboratory conditions (room temperature) with the device kept under nitrogen purge during transfer-curve, fading-memory, and time-series experiments.
The accuracy of the applied electrical biases is set by the source-measure units of the Keithley 4200-SCS; the resulting uncertainty in and is negligible compared to the voltage windows used throughout this work. The drain-current readout noise floor is far below the A-level currents relevant for the reported transfer curves and hysteresis metrics. Consequently, the dominant experimental uncertainties in extracted quantities arise from sweep discretization and numerical post-processing: the hysteresis area H is limited by the step size and current noise propagation through numerical integration, while is limited by numerical differentiation and smoothing choices.
Figures of merit were extracted from the – curves at mV. The field-effect mobility was estimated in the linear regime using the channel geometry and the gate-oxide capacitance per unit area. The threshold voltage was obtained by linear extrapolation of the – characteristics in the above-threshold region, and the ON/OFF current ratio was defined as the ratio of at the chosen ON- and OFF-state gate voltages within the accessible bias window.
2.4. Fading Memory: Definition and Experimental Extraction
Fading memory is the property that the influence of past inputs on the present device state decays with increasing delay. For a driven memtransistor used as a physical reservoir, this means that two input histories that differ only in the distant past produce reservoir states that converge in time. Here, we quantify fading memory directly from electrical measurements by estimating a characteristic decay time, denoted as the fading-memory time constant .
To quantify the intrinsic relaxation dynamics under the operating conditions used for reservoir computing, we estimated characteristic time constants from a drain-current decay trace recorded at fixed bias
and
. The measured
exhibits a gradual decrease over tens of seconds, consistent with bias-stress relaxation commonly associated with charge trapping/detrapping in the dielectric and at the
interface. Because the decay is not well described by a single exponential over the full acquisition window, we fit the data using a bi-exponential model from Equation (
1):
where
and
are amplitudes,
and
are relaxation time constants, and
is the long-time offset current. The resulting time constants capture a fast component (
, seconds) and a slow component (
, tens of seconds), which we use as an experimental proxy for the time scales that bound the usable fading memory in time-multiplexed operation. The full set of fitted parameters and uncertainties is provided in the
Appendix B.
2.5. Reservoir Computing Protocol with Memtransistors
We implemented reservoir computing using single
memtransistors driven by a scalar chaotic input derived from the Lorenz-63 system. The Lorenz dynamics [
38] are defined by
with
,
, and
, a shown in
Figure 2a. The equations were integrated numerically using a python ODEint with a time step of 0.001 s to generate a long trajectory in the attractor. After discarding the initial transient, the
component (denoted
X in
Figure 2b) was sampled at a uniform time-step to form a one-dimensional discrete-time sequence
and used as the input signal with a varied time step. The integration and sampling parameters were chosen such that the dataset covered many visits to both lobes of the attractor. The full time series and scripts are provided in [
39,
40].
Because the raw Lorenz-
X values do not match the gate-voltage range required for safe device operation, we applied an affine normalization to map
into a chosen gate window. First, the samples were linearly rescaled to the unit interval,
where
and
are the minimum and maximum of the training portion of the sequence. The normalized values were then mapped to the gate range
as
The specific voltage windows used in the non-time-multiplexed and time-multiplexed regimes (A–H) are summarized in
Table A1. During measurements, the drain-to-source voltage
and the sampling time interval
were kept constant within each regime, while the back gate was driven by piecewise-constant pulses corresponding to the sequence
.
Non-time-multiplexed (non-TM) operation was realized by directly sampling the memtransistor drain current at the end of each gate pulse. For a given regime, the Lorenz-X sequence was applied as a train of steps with dwell time and fixed , and the stabilized current at the end of each dwell interval was recorded as the scalar reservoir state. We investigated four such configurations (A–D), which differ in , , and the chosen gate window, corresponding to different trade-offs between subthreshold sensitivity, channel conduction, and hysteresis. The resulting current sequence was later aligned with the input sequence to form pairs of input states and targets for training and testing.
To increase the effective state dimensionality, we also implemented time-multiplexed (TM) operation with binary masks. In this case, the input sequence was first normalized to a symmetric range
at a macro-step cadence indexed by
n. Each macro-step was then expanded into
N virtual nodes by driving the gate with a masked waveform
where
is the center of a narrow, device-safe gate window,
A is the modulation amplitude, and
is a Rademacher mask drawn once with a fixed pseudo-random seed. Within each macro-step of duration
, the gate was held at each
for a dwell time
, and the corresponding drain current
was sampled at the end of the dwell. This produced
N virtual nodes per macro-step. In the TM configurations (E–H), the device state for node
k was defined as
with
A added to suppress numerical issues near the noise floor. In some regimes, a one-window lag was included by concatenating the present and previous window states, as detailed in
Table A1.
For each macro-step
n, the node states were collected into a window vector
and the final feature vector for readout was formed as
after discarding a short washout of initial windows. The target for each macro-step was the Lorenz-
X value at the next macro-step,
, so that the memtransistor reservoir performs one-step-ahead prediction at the macro cadence.
The linear readout was trained by ridge regression on standardized features. The sequence of windowed pairs
was split chronologically into training and test sets (e.g., 70/30 split), without shuffling, to respect the temporal structure of the data. On the training set, we computed the per-feature mean
and standard deviation
and standardized both training and test features accordingly. A bias term was included by augmenting the standardized feature matrix with an intercept column. The readout weights were obtained by minimizing a Tikhonov-regularized least-squares objective with penalty parameter chosen by grid search on the training set. Performance was evaluated on the held-out test set using the normalized root-mean-square error (NRMSE) and the coefficient of determination
, defined as
where
and
denote the true and predicted Lorenz-
X values on the test set,
is their mean, and
T is the number of test samples. The detailed hyperparameters and the full measurement logs for regimes A–H are provided in
Table A1 and in the open dataset and code repository [
39,
40].
In many experimental forecasting scenarios, only a subset of the underlying dynamical state is available to the learning system (partial observation). In this setting, the instantaneous input at a single time step does not uniquely define the state of the task system, and accurate prediction requires access to recent history. Reservoir computing naturally addresses partial observation because the reservoir state depends on past inputs and can therefore act as an implicit delay embedding of the observed signal [
41,
42]. In the context of chaotic time-series prediction, this perspective clarifies why the relevant device requirement is not maximizing conventional transistor figures of merit, but matching the nonlinear fading-memory dynamics (memory depth and response timescales) to the sampling interval and the task dynamics.
3. Results
3.1. Crystal Growth and Monolayer Quality
The CVD process described in
Section 2.1 yields discrete triangular
domains with an average lateral size of about
, which is large enough to accommodate the
transistor channels fully within a single flake. An optical micrograph of a representative chip after growth is shown in
Figure 3a. Individual triangular flakes with lateral dimensions of tens of micrometres are visible on the
surface, and the regions selected for device fabrication were chosen such that the entire channel resides inside a single monolayer domain.
The surface morphology and layer thickness were examined by AFM.
Figure 3b presents a tapping-mode AFM height image of a typical
flake, and
Figure 3c shows a line profile across the
step. The apparent step height between the flake and the surrounding
lies in the range of 0.7–0.9 nm, consistent with monolayer
on
when tip convolution and adsorbates are taken into account. Within the channel regions, we did not detect extended areas with multiples of this step height, indicating that the active devices are formed predominantly within monolayer flakes rather than multilayer aggregates or overlapped domains.
Raman spectroscopy confirms the monolayer character of the CVD
and provides a first handle on strain, doping, and crystallinity (
Figure 3d). The spectrum, measured with 532 nm excitation at 0.5 mW power, exhibits the characteristic in-plane
and out-of-plane
modes. Single-Gaussian fits to the two main peaks yield positions of
(
) and
(
), with full widths at half maximum (FWHM) of
and
, respectively. The peak separation
is
. For exfoliated monolayers on
, reported separations are typically in the 19–
range, whereas thicker layers show larger
and broader
lines [
1,
43]. Our slightly reduced separation and modest
broadening therefore point to monolayer
with a small tensile strain component and moderate
n-type doping, but without strong degradation of crystallinity. The
intensity is only slightly higher than that of
, with an amplitude ratio
, which is comparable to values reported for high-quality monolayers and markedly different from heavily oxidized or disordered films where
broadens and dominates the spectrum [
43].
Photoluminescence (PL) spectroscopy further corroborates the monolayer assignment and reveals the excitonic structure (
Figure 3e). A three-Gaussian deconvolution of the PL spectrum in the 1.7–2.1 eV range identifies the A trion, A exciton, and B exciton contributions. The fitted peak energies are 1.80 eV for the A trion, 1.84 eV for the A exciton, and 1.96 eV for the B exciton. The corresponding FWHM values are approximately 0.12 eV (A trion), 0.09 eV (A exciton), and 0.12 eV (B exciton). The intensity ratios, normalized to the A exciton amplitude, are
and
. Canonical monolayer
on
exhibits a dominant A exciton at ∼1.85–1.90 eV, a weaker B exciton around 2.0 eV, and a trion shoulder whose strength scales with electron density [
1,
2]. Our spectra match this pattern: the strong direct-gap A exciton with sub-0.1 eV linewidth indicates relatively low inhomogeneous broadening and good crystalline quality, while the finite trion contribution is consistent with moderate
n-type doping in the
–
range, as commonly observed in CVD-grown monolayers [
1].
The main Raman and PL parameters extracted from these fits are summarized in
Table 1. Compared with reference data for exfoliated and CVD-grown monolayers, our values fall in the range associated with structurally intact, lightly doped monolayer
, rather than strongly strained or defective material [
1,
2,
43]. Together with the AFM and optical data, this supports the conclusion that the transistor channels are defined within high-quality monolayer flakes, providing a reproducible materials baseline for the memtransistor and reservoir computing experiments discussed in the following sections.
These structural and optical signatures demonstrate that the active regions of the devices are formed within isolated monolayer flakes of well-controlled thickness, strain, and excitonic response. This provides a reproducible materials baseline for the electrical characterization and reservoir computing experiments presented in the subsequent sections.
3.2. Memtransistor Characteristics and Hysteresis
We next quantify the DC characteristics of a representative CVD
memtransistor and the strength of its hysteresis.
Figure 4 shows the electrical response of a back-gated device with channel length
and width
on 90 nm
. All measurements are performed in a nitrogen atmosphere at room temperature. The output characteristics
under different gate biases span drain currents from the sub-nanoampere range at
up to
at
and
. At low drain bias the
I–
V curves are nearly linear, with a small-signal on-state resistance on the order of
at the highest gate bias, and then gradually evolve towards quasi-saturation as
increases, as expected for monolayer
transistors on
.
The transfer characteristics
at
and
show n-type field-effect behavior with a clear turn-on around positive gate voltages. For the forward branch, the drain current at
increases from ∼
at
to
at
. At
the corresponding current reaches
. Depending on whether the forward or reverse branch is used for normalization, the ON/OFF ratio thus lies in the
–
range. Using the standard linear-regime expression,
with
for 90 nm
, and fitting the forward
branch between
and
at
, we obtain a field-effect mobility
. Linear extrapolation of the same fit to
yields a threshold voltage
. Both values are typical for back-gated CVD monolayer
devices on 90 nm
and indicate that the device remains transistor-like despite the presence of hysteresis.
Hysteresis appears as a systematic separation between the up- and down-sweep
-
traces over the full
to
gate window as shown in
Figure 4b. We quantify this memory effect by the gate-sweep hysteresis loop area:
where
and
denote the currents for the forward and backward sweeps, respectively, and here
and
. Numerical integration of the measured traces on a 0.2 V grid gives
at
and
at
for the device used in the subsequent reservoir computing experiments. The area is dominated by the high-current region above
, while the deep-off regime at negative gate voltages contributes negligibly to
H.
An independent view of the memory window comes from output hysteresis at fixed gate bias (
Figure 4d). Using
we obtain loop areas of approximately
,
, and
for
, 30, and
, respectively. The corresponding high-resistance to low-resistance state ratios at a small read bias are
and 42 at
(for
and
), decreasing to
and
at
and approaching unity at
. Thus, the largest state contrast is available when reading near the subthreshold regime, while the hysteresis becomes almost invisible once the channel is strongly accumulated.
Recapitulating, the CVD
memtransistor combines a technologically relevant mobility of order
, an ON/OFF ratio above
, and a moderate but well-defined hysteresis window in both transfer and output characteristics. This balance between transistor performance and memory effect is essential for later use as a nonlinear, history-dependent element, but it also ensures that the device remains representative of CVD-grown
field-effect transistors more broadly, keeping the focus of the manuscript on growth and device-level behavior. Device-to-device statistics for
H,
and
extracted from the full transfer-curve dataset are summarized in
Appendix C (
Figure A2).
3.3. Lorenz-63 Prediction Without Time-Multiplexing
We first establish a non-time-multiplexed baseline where the Lorenz-63 signal directly drives a single
memtransistor. The scalar Lorenz-
X trajectory, preprocessed and mapped to the gate window as described above, is converted into a sequence of gate-to-source voltages
, while the drain-to-source voltage
is held constant within each regime (
Table A1). The drain current
thus realizes a scalar, history-dependent nonlinear mapping of the chaotic input. After logarithmic transformation and standardization, we train a linear ridge-regression readout to perform one-step prediction of the Lorenz-
X signal; performance is quantified by NRMSE and
on a held-out test set.
Measurement results (purple dots) and applied gate-to-source voltage (yellow lines) inputs are presented in
Figure 5. In the first pair of regimes (A and B,
Figure 5a and
Figure 5b, respectively) we probe a wide gate window that covers subthreshold and above-threshold operation. The normalized Lorenz-
X is mapped to
V, and the device is biased either at
V with sampling interval
s (regime A) or at
V with
s (regime B). At these settings the memtransistor output current spans several decades, from the subthreshold noise floor up to the microampere range. The best non-time-multiplexed predictions for these two regimes yield NRMSE
and
(regime A) and NRMSE
with
(regime B) for a 60/40 chronological train-test split. These values indicate that the device captures some structure of the Lorenz dynamics but leaves a large fraction of the variance unexplained.
To better exploit the steep part of the transfer characteristic, we confine the gate window to
V in regimes C (
Figure 5c) and D (
Figure 5d) while keeping
V. Regime C uses a relatively slow sampling interval
s, whereas regime D samples faster at
s. This focuses the input on the region of highest transconductance and reduces the time available for slow drift between samples. The best results in these regimes, obtained for a 70/30 split, improve to NRMSE
and
(regime C) and NRMSE
with
(regime D). Thus, concentrating
on the high-gain region and shortening
both enhance the useful nonlinearity and effective memory encoded in
, but the prediction quality remains well below the level typically targeted for high-fidelity Lorenz forecasting.
Overall, the non-time-multiplexed experiments confirm that a single CVD memtransistor, driven directly by a Lorenz-derived waveform, can provide a reproducible nonlinear transformation with moderate short-term memory. At the same time, the achievable NRMSE and in regimes A–D remain limited by the scalar nature of the node, the finite signal-to-noise ratio in the subthreshold range, and drift on the time scales of the measurement. We therefore treat these results as a device-level baseline and, in the next subsection, introduce time-multiplexed operation to increase the effective state dimensionality without modifying the device or growth process.
3.4. Time-Multiplexed Reservoir Operation
We next introduce time-multiplexed operation to increase the effective state dimensionality of the reservoir without changing the underlying device or growth process. Instead of using each Lorenz-X sample once, as in regimes A–D, we expand every macro-step into N short sub-steps of duration . Within each macro-step j, the Lorenz-derived scalar is first mapped into the gate window and then multiplied by a fixed Rademacher mask , , to generate a masked waveform . The drain voltage is held constant, and the resulting drain current is sampled at the end of each sub-step, yielding N virtual nodes per macro-step.
The left axis in
Figure 6 shows typical masked
traces (yellow lines), while the right axis shows the corresponding
response for the four time-multiplexed regimes E–H (purple dots). The Lorenz-63 prediction performance for all regimes is summarized in
Table A1. All time-multiplexed measurements are performed at
and in a narrow gate window
, that is, close to the high-transconductance region of the transfer characteristic. This choice minimizes the influence of deep subthreshold noise and strongly accumulated on-state drift, and it keeps the instantaneous currents in a technologically reasonable range.
Regime E maximizes the number of virtual nodes under these conditions. We use
nodes with
, which gives a macro-step duration
, and record
macro-steps. The raw currents
are standardized node-wise (per-node
z score) and used directly as the reservoir state. This configuration yields the largest state vector but also exposes the device to the longest effective memory window. Consistent with that, the test NRMSE is highly split-dependent: for a 60/40 split it remains high (NRMSE
,
), whereas for an 80/20 split it improves to NRMSE
and
(
Table A1, label E), indicating that long-term drift and low-frequency fluctuations limit the useful fading memory when
approaches tens of seconds.
Regime F shortens the macro-step and compresses the current dynamic range. Here, we use and , so that , and increase the number of macro-steps to . Instead of raw current we feed into the readout, which improves numerical conditioning and reduces the weight of rare high-current excursions. Across all chronological splits, the resulting test errors are low and relatively stable; the best result (60/40 split) reaches NRMSE with (label F). This shows that, once is comparable to the intrinsic fading-memory time of the device and the state distribution is well conditioned, even a modest number of virtual nodes can support accurate short-horizon Lorenz-63 prediction.
Regime G increases the memory window at fixed dimensionality. We keep
nodes but increase the sub-step duration to
, so that
, again with
. As in regime F we use
but now augment the reservoir state with a one-step lag: the readout sees both the current and previous macro-step states. This explicit lag partially compensates for the shorter physical memory of the device. The best test metrics (60/40 split) reach NRMSE
and
(label G). The time traces in
Figure 6c show that this configuration still samples a broad range of
values while remaining less sensitive to slow drift than regime E.
Finally, regime H explores a larger state dimension at an intermediate memory window. We double the number of nodes to at , giving and , and retain both the representation and the one-step lag. An explicit bias term (intercept) is included in the linear regression. The best result for regime H (80/20 split) achieves NRMSE and (label H), comparable to regime F. Compared with , doubling N yields only a modest improvement, consistent with the expectation that, once the reservoir dimension exceeds the effective dimensionality of the task and the physical memory window, additional virtual nodes offer diminishing returns.
The numerical trends in
Table A1 are reflected directly in the reconstructed Lorenz-63 trajectories.
Figure 7 compares ground-truth and predicted Lorenz-
X time traces for all regimes A–H, using the best-performing chronological split in each case. In the non-time-multiplexed configurations A–D (
Figure 7a–d), the training segments can be fitted reasonably well, but the test trajectories show visibly damped oscillations and phase errors, consistent with the relatively high NRMSE of 0.46–0.61. In contrast, the time-multiplexed regimes F and H (
Figure 7f,h) yield test traces that track both the amplitude and phase of the Lorenz dynamics over several oscillation periods before diverging, in line with their lower errors
–0.10. These qualitative differences support the view that tuning the effective memory window
, the number of virtual nodes
N, and the current preprocessing is more important for RC performance than further incremental improvements in conventional FET metrics for the present devices.
Taken together, regimes E–H demonstrate that time multiplexing allows a single CVD memtransistor to reach NRMSE values near and for short-horizon Lorenz-63 prediction, while keeping all device-level conditions (growth, contacts, and oxide stack) fixed. From a device perspective, the comparison highlights three trends. First, excessively long effective memory windows amplify drift and low-frequency noise and can degrade performance despite large N. Second, logarithmic current scaling is beneficial once the device spans several decades of , because it balances on- and off-state contributions. Third, modest architectural tweaks such as lagged states can recover part of the effective memory without changing the physical device. We therefore view time-multiplexed RC here primarily as a characterization protocol for the nonlinear, history-dependent response of CVD-grown memtransistors, rather than as a fully optimized neuromorphic system.
4. Discussion
4.1. Benchmarking Against Existing Hardware Reservoirs
The best time-multiplexed regimes demonstrate that a single back-gated CVD
memtransistor can reach one-step Lorenz-63 prediction errors on the order of
under realistic device-bias constraints. From
Table A1, the optimal non-time-multiplexed configuration (regime D) saturates at
and
, whereas the best time-multiplexed settings (regimes F and H) reduce the error to
–0.10 with
. Thus, time multiplexing and simple pre-processing of
improve the prediction accuracy by roughly a factor of five relative to the direct, non-time-multiplexed baseline at the same drain bias, without modifying the device or growth stack.
Table 2 consolidates the benchmarking of numerous experiments performed on the devices as well as simulations.
It is instructive to place these values alongside other hardware reservoirs that have tackled Lorenz-type tasks. State-of-the-art numerical reservoirs such as next-generation RC and symmetry-aware RC reach
values around
–
for Lorenz-63 short-term forecasting and variable inference in simulation, and optimized echo state networks can achieve comparable errors for multi-step autoregressive prediction [
21]. Experimental photonic implementations of next-generation RC report
in the
range for Lorenz-63, similar to other optical reservoirs that trade device complexity for low error and long valid-prediction times [
21]. Among memristive and ferroelectric platforms, dynamic
memories and
ferroelectric memtransistors have demonstrated Lorenz-type chaotic time-series prediction with reported errors between
and
while operating at low voltages [
49,
50].
Closer to the present work, polymer electrolyte-gated
transistors have recently been used as reservoir nodes for time-series processing, including Lorenz prediction [
36]. In that platform,
intercalation drives a reversible
phase transition that provides strong nonlinearity and long fading memory, and a time-multiplexed single-node reservoir achieved Lorenz
as low as ∼
[
36]. Our back-gated CVD
memtransistor reservoir operates in a more conservative regime:
is confined to (31,35) V on 90 nm
,
V, and the device is read through a conventional solid-state gate stack without electrolytes, ionic motion, or phase transitions. Under these constraints the best errors,
with
, are about a factor of two above the polymer-electrolyte
benchmark and an order of magnitude above the most optimized photonic or memristive reservoirs, but they are obtained on a standard CVD-grown, back-gated transistor structure.
From a device perspective this comparison highlights three points. First, the absolute RC performance of our platform is limited less by the crystalline quality of the monolayer and more by the choice of gate stack and biasing: dry confines the available nonlinearity and memory to trap-mediated charge storage, whereas ionic or ferroelectric media provide stronger history dependence at lower voltages. Second, within those constraints, the CVD memtransistor still attains as a single physical node with a purely linear readout and no network-level optimization or feedback. Third, because both the semiconductor and gate dielectric are wafer-process-compatible, the same device platform can in principle be scaled to multi-node arrays or 3D-integrated stacks without introducing new materials or processing steps.
We therefore view the Lorenz-63 benchmarks in this work not as an attempt to surpass the best reported RC error, but as a quantitative sanity check that links device-level metrics (mobility, hysteresis, and trap-mediated memory on sub- to multi-second scales) to an application-level figure of merit (NRMSE on a standard chaotic prediction task). In combination with the open time-series dataset and RC pipeline released with this work, this provides a reproducible baseline against which future growth, contact engineering, or gate-stack modifications can be judged in terms of their impact on both device characteristics and reservoir-computing performance.
We finally note that the Lorenz-63 benchmarks collected in
Table 2 are not strictly equivalent in task difficulty to the configuration used here. Several of the best-performing numerical schemes operate in a “full observation” regime, where all three Lorenz coordinates
or at least two components are supplied to the model at each time step, and the readout learns the flow map
[
20,
21,
44,
45]. In our
memtransistor implementation, the reservoir is driven by a single scalar input proportional to the
x-component only, and the readout is trained to predict
from this one-dimensional observation and its fading memory. From a dynamical-systems perspective, this corresponds to forecasting from a partial observation rather than from the full state; the mapping
is generally many-to-one on the Lorenz attractor, so the reservoir must internally reconstruct the missing coordinates via its finite memory, in line with delay-embedding arguments for attractor reconstruction [
21]. This makes the task more sensitive to noise and memory depth than full-state forecasts at the same sampling interval and reservoir size.
A second difference concerns the effective prediction horizon per step. Many Lorenz-63 RC studies integrate the equations with a small internal time step (for numerical accuracy) and then train on data sampled with
–
[
20,
44]. In that regime, each one-step forecast covers only a short fraction of a Lyapunov time, and nearby trajectories separate relatively weak between successive samples, which typically leads to lower one-step NRMSE at fixed model capacity. In contrast, our best hardware configuration operates at a coarser sampling interval (here,
in Lorenz units), so that each prediction step spans a larger fraction of
. Because prediction errors in chaotic systems grow with elapsed time along the trajectory, a larger sampling interval intrinsically amplifies one-step errors and reduces the attainable NRMSE for a given reservoir size. The use of a single scalar input and a longer effective time step means that the NRMSE of the present
memtransistor reservoir should be viewed as a conservative benchmark relative to numerical RC results obtained under full-state, finely sampled conditions.
4.2. Device Metrics and Design Window for Memtransistor Reservoirs
The experiments above show that, for the present CVD
memtransistors, reservoir performance is governed more by how we bias and read the device than by further incremental improvements in classical FET metrics. The representative transistor combines a field-effect mobility of order
, an ON/OFF ratio above
, and a moderate hysteresis window quantified by
at
and
at
over
. Within this envelope, the non-time-multiplexed Lorenz-63 experiments in regimes A–D show that confining
to the high-transconductance region and shortening the sampling interval are more important for NRMSE than pushing mobility or ON/OFF higher. Form
Figure 8 we can summarize that operating in
at
and
(regime D) already reduces the error to
, whereas wide gate windows that spend substantial time in deep subthreshold or strongly accumulated regimes perform worse despite similar static device figures of merit.
Time-multiplexed operation refines this picture. Regimes E–H keep the same physical device and gate stack but adjust the effective memory window
, the number of virtual nodes
N, and the preprocessing of
. Performance improves markedly when
is made comparable to the intrinsic fading-memory time of the trap-mediated hysteresis, on the order of a few seconds, rather than being pushed to 11.2 s as in regime E. The best Lorenz-63 errors,
–0.10 with
in regimes F and H, occur for
between 1.6 and 3.2 s,
–32, and a narrow gate window
at
, combined with a
representation and a simple one-step lag. In this bias range the device operates on the steep, nearly linear part of the transfer characteristic where small
variations are efficiently converted into current changes, while the hysteresis is strong enough to provide short-term memory but not so slow as to drift over the full duration of a macro-step. The stability of the best NRMSE across chronological splits in
Table A1 supports the view that we are exploiting reproducible trap dynamics rather than uncontrolled long-term drift.
These observations suggest a practical design window for CVD memtransistor reservoirs. At the device level, it is sufficient to reach mobilities in the class and ON/OFF ratios above –; further gains in these parameters are unlikely to translate into proportional reductions in NRMSE unless the hysteresis spectrum and gate stack are also engineered. More critical is to realize a controllable, intermediate hysteresis: large enough that exceeds a few tens near the chosen read bias and that fading memory covers 1–4 s, but not so large that the device fails to wash out past inputs within the reservoir time horizon, defined by the macro-step duration of time-multiplexed reservoir. On the operation side, the results here point to narrow windows around the high-transconductance region, moderate (a few volts), and time-multiplexing parameters tuned so that matches the device’s intrinsic memory time. Within this window, the memtransistor behaves as a compact, wafer-process-compatible nonlinear node whose RC performance is acceptable to resolve trends when growth, contact engineering, or trap spectra are varied, while keeping the focus of the platform on scalable device physics rather than on RC optimization.
4.3. Stability, Reproducibility, and Scope of Structure–Performance Relations
All results reported here were obtained from a single CVD-grown chip processed in one fabrication run, thereby eliminating run-to-run variation and enabling a focused assessment of device-to-device reproducibility within a fixed process flow. In total, 65 transistor structures (monolayer CVD , , ) were patterned, contacted, and electrically characterized. Across this set, the hysteresis area H exhibits a broad distribution, consistent with device-to-device dispersion in trap populations and local electrostatics even on an optically uniform monolayer film, whereas the maximum transconductance and field-effect mobility show comparatively narrower spreads. To probe task-level reproducibility, we further recorded Lorenz-63 time-series measurements on devices selected from the higher-hysteresis subset under an identical input mapping and a fixed chronological 70/30 train/test split. Using as the reservoir state to mitigate slow baseline drift consistent with charge trapping/detrapping, the one-step prediction error shows low device-to-device scatter (range –), motivating the selection of an average-performing device for the detailed demonstrations rather than an outlier. We note that the absolute NRMSE values in this multi-device dataset remain comparatively high because time multiplexing was not implemented for every device under the available measurement time; nevertheless, combining the same 17 devices as a space-multiplexed reservoir improves the one-step NRMSE to , consistent with the expected benefit of increased state dimensionality.
Finally, we emphasize the scope of this work regarding structure–performance relationships. Devices were intentionally fabricated with a fixed channel geometry and monolayer thickness, and we therefore do not claim a systematic dependence of fading-memory time constants or prediction accuracy on thickness (1L vs multilayer) or geometry (L, W). These parameters are expected to influence response times through electrostatic scaling, trap occupancy, and RC time constants associated with charge trapping and dielectric relaxation, and they provide clear design parameters for the future studies.
4.4. Limitations and Outlook Toward CMOS-Compatible Neuromorphic Hardware
The present implementation has several clear limitations from a neuromorphic-hardware perspective. First, the back-gated geometry with 90 nm requires up to 35–40 V and V, which is far above typical CMOS core voltages and is acceptable here only because we target a device-physics demonstration rather than an integrated circuit. Second, the relevant fading-memory times are on the order of 1–4 s and the sampling interval is at best 0.1 s, so the reservoir operates in a quasi-static regime set by trap dynamics and instrumentation rather than at radiofrequency or GHz bandwidths.
At the same time, the materials stack and fabrication flow are compatible with an eventual transition toward CMOS back-end integration. Moving from a global back gate to patterned top gates with thin high-k dielectrics or ferroelectric layers would reduce the required by one to two orders of magnitude while preserving or even enhancing the useful hysteresis. The same CVD growth and contact scheme can, in principle, be adapted to BEOL-compatible thermal budgets, enabling local memtransistor arrays above conventional logic. On the architecture side, the single-node time-multiplexed reservoir demonstrated here can be extended to small multi-node ensembles, co-integrated with CMOS readout and training circuits, and exercised on a broader set of temporal tasks. In this sense, the current work should be viewed as a device-level benchmark that links CVD growth and hysteresis engineering to a standard RC task; future efforts will need to trade some of this simplicity for lower voltages, faster dynamics, and array-level integration to approach truly CMOS-compatible neuromorphic hardware.
5. Conclusions
We have demonstrated that back-gated memtransistors based on CVD-grown monolayer on 90 nm combine conventional transistor performance with sufficient hysteresis and fading memory to support chaotic time-series prediction via reservoir computing. The representative devices exhibit field-effect mobility on the order of , ON/OFF ratios above , and a moderate hysteresis window quantified by gate-sweep loop areas at and at over . In a single-node reservoir configuration, direct (non-time-multiplexed) driving of the memtransistor by a Lorenz-63 waveform yields non-trivial one-step prediction with best , while time-multiplexed operation in a narrow high-transconductance window, combined with simple preprocessing and lagged states, improves the error to –0.10 with .
From a growth and device-engineering perspective, these results indicate that once a basic target is reached (mobility of order and ON/OFF above –), further improvements in traditional FET figures of merit alone are unlikely to translate into proportional gains in reservoir performance. Instead, the shape and time scales of the hysteresis window, set by the trap spectrum in the /dielectric stack, become the key control parameters: the most useful operating points are those where biases the device on the steep part of the transfer characteristic and where fading memory naturally spans the 1–4 s range probed by the time-multiplexed reservoir. In this sense, the Lorenz-63 task functions as a quantitative probe of trap-mediated dynamics rather than as an ultimate benchmark of prediction accuracy.
Device-to-device variability was quantified at both the DC and task level. Across the fabrication run, the hysteresis area spans nearly three orders of magnitude, indicating substantial dispersion in trap-mediated memory characteristics. For reservoir computing, measured devices, despite the broad hysteresis distribution, showed NRMSE scatters in the range from 0.58 to 0.61. The absolute NRMSE is comparatively high in this multi-device dataset because time multiplexing was not performed under the available measurement time.
With respect to CMOS compatibility, the present devices still operate at voltages well above typical logic levels because of the global back-gate geometry and thick dielectric. However, the material set and thermal budget are compatible with back-end-of-line integration, and the same CVD process can be combined with patterned top gates and thin high-k or ferroelectric dielectrics to reduce by one to two orders of magnitude while preserving or enhancing the usable hysteresis. A practical path toward CMOS-compatible architectures is to reduce the operating voltages and increase reservoir dimensionality by device and array engineering rather than by complex mixed-signal circuitry. First, replacing the global back gate with a local top gate (thin high-k dielectric and short gate-to-channel spacing) directly increases and enables voltage scaling, so that the same input mapping can be implemented with gate swings in the few-volt range. Second, geometrical scaling of the channel (shorter L and optimized ) can reduce the required and improve transconductance at lower bias. Third, array-style space multiplexing provides a scalable route to richer reservoir states under realistic measurement time. Multiple memtransistors fabricated on the same chip can be driven by a common waveform and read out as parallel nodes, converting device-to-device variability into state diversity and improving prediction accuracy without time multiplexing.
Together, voltage-scaled top-gated devices, modest geometric scaling, and space-multiplexed arrays define an experimentally accessible roadmap from the present back-gated proof of concept toward CMOS-constrained, wafer-scalable reservoir hardware. The open RC protocol and Lorenz-63 benchmark used here provide a reproducible framework against which such materials and device modifications can be evaluated in terms of both conventional transistor metrics and application-level reservoir computing performance.