1. Introduction
The ability to accurately estimation grape yield is important because it allows viticulturist to plan, increase profitability, and improve the quality of the grapes produced. Yield estimation allows viticulturists to implement precision agriculture techniques including crop thinning, variable rate applications, and selective harvesting [
1]. Traditionally, manual processes are used to estimate yield such as visual inspection and cutting and weighing grapes within a section of the vineyard [
2]. However, these manual processes can be time consuming and the generally low number of samples taken can lead to inaccurate estimations. There is a need for an automated technique to accurately estimate grape yield.
Computer vision techniques have therefore been developed for automatically counting the number of grapes visible in camera images, and a high accuracy has been reported [
3]. However, one limitation is that these techniques rely on being able to see the grapes. Errors in grape yield estimation occur where grapes are occluded by leaves or other grape bunches [
4]. This has been addressed by assuming a certain percentage of grapes are occluded and compensating using a scaling factor [
5]. However, this is not ideal and can lead to errors. Another approach is to remove leaves from the grape vines which could cause occlusions [
6,
7]. However, this can be laborious unless specialised machinery is available. In addition, we understand that there are grape verities such as Gewürztraminer where foliage is normally not removed. Occlusion is perhaps the most significant unsolved issue for yield estimation using computer vision solutions.
One solution that has been suggested to address the issue of occlusion is microwave-based yield estimation [
8]. The high frequency radio waves are able to propagate through foliage and reflect off the grape clusters behind. However, these devices are expensive and are not near commercial implementation. In this paper, we explore a previously unexplored alternative technology, ultrasound, for image through the leaves and detecting occluded grape bunches.
Within the field of precision viticulture, there have been several studies that have used ultrasound to map the outer leaf canopy shape for improved vineyard management. Gil et al. used three ultrasonic sensors to independently measure the distance to the vine foliage from spray nozzles positioned at different heights [
9]. These transducers were positioned vertically in a line (tens of cm apart) and were operated independently to measure the distance to the foliage at three different heights. They were not used as an array. The closest distance reported by each sensor was used in real-time to control the application flow-rate from the nozzles. The benefit of this approach was verified by Llorens et al. who established that an average of 58% saving of application volume was obtainable [
10]. In addition to variable rate application, independent scans taken over the growing season have been reported to have the potential to be an effective approach to monitoring vine vigour [
11]. However, the effectiveness of these studies was limited by their use of ultrasonic transducers, which operated independently and not as arrays, to measure the distance to the outer surface of the foliage. These individual transducers have had a relatively wide beamwidth, and generally, the only information used from the reflected signal is the time of first echo from the foliage [
11]. This results in low resolution imaging of the grapevine outer canopy and can give an overestimation of the canopy volume due to a few outer leaves sticking out [
12]. Further work by Llorens et al. compared the same ultrasonic canopy measurements to a colocated 2D Light Detection and Ranging (LIDAR) scanner, a common alternative approach [
13]. They found that the precise directionality of the laser distance measurements resulted in significant improvement in canopy surface estimation, albeit at the cost of a more complicated postprocessing procedure [
14]. This highlights the utility that narrower beam-width ultrasonic sensors may offer.
Recent work by Palleja et al. utilised four ultrasonic transducers to generate a volumetric estimation of a vine canopy using the signal envelope of multiple echoes [
15,
16]. In a similar manner to Gil et al. [
9], these transducers were arranged vertically in a line with each transducer being spaced 45 cm apart. They were not used as an array but as four transducers operating independently. However, the transducers employed had a wide beam pattern and therefore poor imaging resolution. For busy scenes, an independent ultrasonic transducer will be sensitive to multiple echoes from objects in a wide field of view. This is beneficial for applications such as a car reversing system where the system is only interested in the distance to the closest object. However, for an imaging system where one wants to image through leaves, using a single ultrasonic transducer will result in poor angular resolution. This is not desirable as it will make it hard to detect structure behind the closest leaf, see
Figure 1a. Traditionally, one might increase the directionality of ultrasonic transmission by using transducers which are operational at high ultrasonic frequencies (several hundred kHz). However, we anticipate that this would come at the expense of reduced penetration through foliage and increased attenuation. These difficulties may explain why no previous studies have been found in the literature that have used ultrasound to image fruit occluded by leaves.
Arrays of ultrasonic transducers can be used to increase angular resolution [
17,
18].
Figure 1b shows how an array of ultrasonic transducers can achieve a higher angular resolution compared with a single transducer. This significantly improves the potential for imaging structure behind the outer leaves. However, no previous study has been found which has used ultrasonic arrays to image any type of foliage apart from the authors’ work with pasture in references [
19,
20].
In this study, we present the first work where an ultrasonic array has been used to image grapes and foliage. To achieve an adequate angular resolution at lower ultrasonic frequencies (<60 kHz), we have utilised a novel air-coupled ultrasonic array developed by the authors [
19,
20]. Another issue with using low ultrasonic frequencies is the low depth resolution due to the large wavelengths and ringing of the transducers [
21]. This has been addressed in this work using coded waveforms, cross-correlation, and operating away from the transducers’ resonant frequency. Ultrasonic arrays and coded waveforms have not been used before in precision viticulture.
The high spatial and depth resolution from the array allowed the echoes from grapes and leaves to be separated. However, the ultrasonic echoes from leaves and grapes appeared to be identical. This was addressed by making multiple ultrasonic measurements at the same location while lightly agitating the leaves with a fan directed at the measurement area. Since the leaves moved while the heavier grapes remained stationary, the mean and variance the ultrasonic measurements could be used to identify the grape bunch.
Initially, imaging was performed with the array focused in the far-field. Work was then undertaken to investigate the improvement in imaging resolution using near-field focusing of the array. This includes a novel technique to compensate the cross-correlation for near-field defocusing of the transmitted signal. The improved spatial resolution in the resulting volumetric scans will be a benefit for precision viticulture management processes such as variable rate applications where an accurate understanding of the vine canopy is vital.
This paper has the following significant contributions to knowledge. It is the first work to use an air-coupled ultrasonic phased array and coded waveforms for the purpose of analysing vine canopies. It is also the first study to investigate if it is possible to use ultrasound to image through leaves, to detect fruit located behind leaves, and to differentiate echoes that come from leaves through agitation. In addition, we present a new technique for improving the resolution of the array based cross-correlation for near-field echoes. This approach simulates the effect of focusing the transmission of the array at any desired depth in postprocessing. This eliminates the need for the complex electronics required for focusing the array’s transmission to a desired scan depth. Some preliminary results of this work were presented in the conference paper [
22].
The paper is organised as follows.
Section 2 introduces the ultrasonic array hardware and measurement parameters used in this work. The experimental setup and measurement procedure are described in
Section 3. The signal processing applied to the array data for imaging grapes is then presented in
Section 4.
Section 5 and
Section 6 provide results for the array focused in the far-field and near-field respectively. Finally, in
Section 7, we end the paper with some final points and a discussion about future directions that can be taken.
2. Ultrasonic Array
Figure 2 shows the ultrasonic array thas has been used in this work. This was custom designed and built by the authors for precision agriculture requirements. A full description of this array is given in reference [
19]. It has optimised spiral arrays of 160 ultrasonic transducers and 204 microphones, which are arranged into rings. The transducers are surface mounted to the front of the array PCB. In contrast, the MEMS microphones (which can operate at ultrasonic frequencies) are surface mounted to the back of the PCB with holes passing though the PCB to allow the acoustic signal to be measured. The radius of the transducer and microphone rings are given in
Table 1.
The microphone array had 12 independent rings of microphones. All the microphones in a ring were connected in parallel and then captured by one of 12 simultaneous sampling Analogue to Digital Converter (ADC) channels of a Data Translation DT9836 module [
23], refer to Figure 12b in reference [
19]. A sampling rate of 225 kHz and a resolution of 16 bits were used. Note that since all 12 microphone ring channels were saved to file, it was possible to dynamically change the focus distance of the reception in postprocessing using beamforming.
The transducers used in the array were surface mount air-coupled transducers which had a resonance frequency of 40 kHz and a frequency response which dropped from this peak by about 20 dB at 25 kHz and 60 kHz on either side. The measured frequency response can be seen in
Figure 3. Although the transmission gain is highest around 40 kHz, the transducers have a tenancy to ring at this resonance frequency, which is undesirable if cross-correlation is being used to improved depth resolution. We therefore operate them at frequency ranges on either side of the resonant peak (e.g., 20–35 kHz and 45–60 kHz). The transducers were arranged in 10 rings. The DT9836 board’s two Digital to Analogue (DAC) channels were used to drive the 10 rings (half of the rings for each DAC channel) through two power amplifiers, refer to Figure 12a in reference [
19]. These had an output sampling rate of 500 kHz and resolution of 16 bit and were synced with the ADC channels. Data acquisition software was written in MATLAB to transmit the signal and capture the resulting echoes using the DT9836 board.
The same excitation signal (a linear chirp) was applied to all the transducers. Since the array was planar (on a flat PCB), this meant that it was effectively using far-field beamforming with the transmission focused at a point in front of the array at infinity. Near-field focusing was not possible for transmission since we did not have a separate DAC channel controlling each transducer ring.
Figure 4 shows the measured combined transmit/receive beam pattern of the array when the array is focused at infinity. This shows a full beamwidth of 3.3
and a dynamic range of up to 33 dB. Please refer to reference [
19] for details on how this beam pattern was obtained. The array had a dead-zone of about 500 mm where the signal measured by the receiver channels was dominated by the vibrations caused by the ultrasonic transmission. Objects closer than this were hard to detect.
Air-coupled transducers generally achieve a high gain at the expense of ringing at the resonant frequency of the transducer. As a result, digital codes such as Barker Codes or Maximum Length Sequence (MLS) with sharp transitions can cause ringing and may not be reproduced correctly by the transducers. In contrast, the lack of sharp temporal transitions for a chirp waveform means that it is less prone to exciting ringing of the transducer compared with some other waveforms.
The transmit linear chirp signal applied to the transducers can be described by
where
is the time of the
transmit sample,
is the start frequency at
,
is the bandwidth,
is the pulse duration, and
is a Hamming window [
24].
A chirp excitation signal was chosen as it can be used with cross-correlation to improve the depth resolution. After testing, a linear chirp with a duration of 1.5 ms and bandwidth of 45 to 60 kHz was chosen. This transmitted signal was verified through independent recording using a calibrated microphone (GRAS 46BF-1 1/4 inch). The signal time and frequency domain representations can be seen in
Figure 5. The small time delay is due to the separation between the transducer and microphone. The 1.5 ms duration chirp appeared to provide improved cross-correlation resolution compared to a shorter duration chirp. The frequency bandwidth was chosen based on the frequency response of the transducers, which have a usable frequency range between 25 kHz and 60 kHz and a resonant frequency of 40 kHz, see
Figure 3. To avoid ringing at this resonant frequency, the chirp used a frequency range from 45 to 60 kHz. It was also felt that this frequency range gave slightly better depth resolution than the 25–35 kHz range due to the smaller wavelength. The attenuation experienced by the ultrasound as it travels through the air can be calculated using the atmospheric absorption model given by International Standard ISO 9613-1:1993 [
25]. It can be shown that at a standard atmospheric pressure, a temperature of 20
C and 50% humidity at 60 kHz this is about 1.98 dB/m. For practical operation in a vineyard, the width of the rows limits the operating distance to about 1 meter. Over this distance, the attenuation is negligible.
3. Experimental Set-Up and Procedure
The ability of the array for detecting grapes was evaluated using a 2D Computer Numerical Controlled (CNC) gantry system. This CNC had a range of motion of 1.4 × 1.4 m and a resolution of 0.025 mm. Ideally the array would have been mounted to the CNC machine. However, the array was originally designed for operation from a farm vehicle and was too heavy in its current mounting. Instead a grape vine was mounted directly to the CNC machine. The grapes were fixed to 3 mm rods. This was done to reduce the amount of movement when the CNC was moving and to minimise reflections from this support. The vine was mounted to a bamboo pole and its roots were surrounded by a plastic bag with most of the soil removed to reduce weight. Refer to
Figure 6 for a photo of the setup.
Initially acoustic foam had been placed behind the CNC to dampen echoes from the wall behind, as shown in
Figure 6. However, subsequent measurements were made with the foam removed and no noticeable difference in measurement performance was observed. This was expected as the array has a highly directional beamwidth and as a result is very insensitive to reflections outside its field of view, as shown in
Figure 4. This shows that in a field environment such precautions would not be necessary.
The experimental setup and measurement scan volume are illustrated in
Figure 7. The ultrasonic transducer was positioned facing the CNC machine at a distance of 1100 mm in front of the grapes. Measurements were made over a 460 × 400 mm wide grid with a spatial separation between ultrasonic measurements of 20 and 50 mm in the
x and
y axis respectively. This gave 216 measurement points. Between each ultrasonic measurement, the CNC was paused 3 seconds to allow time for the vine and grapes to stop moving before ultrasonic measurements were made. This measurement procedure was repeated for each of the types of scans described below.
It was anticipated that it would be challenging to differentiate echoes from leaves from that of grape bunches. To try to address this, ultrasonic measurements were therefore made with a fan lightly agitating the leaves, while the heavier grape bunches remained stationary. This agitation could be achieved in the field using a fan or even possibly utilising naturally occurring wind.
The following sets of ultrasonic measurements were therefore made for (a) grapes only with no vine present, (b) both grapes and vine with no fan, and (c) grapes and vine with the fan operating. The fan was was pointed in front of the array and used to lightly agitate the vine leaves. Using a handheld anemometer, the wind-speed at the location of the vine-foliage was measured to be 2.5 m/s. More work is needed in the future to investigate the relationship between air-speed and the resulting agitation performance.
Measurement of Grape Volume Using Photogrammetry
In
Section 5 and
Section 6, the ultrasonic measurements are processed to provide an estimate of the volume of the grapes. To provide a comparison (ground truth), the volume of the grapes needed to be measured using an alternative technique. A photogrammetry process was therefore used to construct an accurate 3D scan of the grape cluster. This was achieved by using Agisoft Metashape Professional v1.5.2 to process 30 images captured by a Sony A6300 covering the grape cluster from all sides. The resulting scan can be seen in conjunction with a convex hull approximation in
Figure 8.
We have used a convex hull as it offers a representation closer in likeness to the results of this acoustic scan, in that, the concave details of the individual grapes are removed. The convex hull was computed using the convex hull tool in Meshlab 2020.07. The volume of the 3D scan and convex hull are given in
Table 2.
6. Results for Near-Field Focusing of the Array
The results shown this far present a potential process for the identification of grape clusters in the presence of foliage. However, while promising, the results suggest that more resolution and detail of the canopy can be obtained if the acoustic array had a narrower beamwidth. Although the inherent far-field beamwidth of the array is very narrow, it still diverges at roughly 3.3 degrees. At a distance of 1 m, this equates to a circular cross section of around 58 mm, making it difficult to distinguish between tightly packed objects. Decreasing this beamwidth further would improve the array’s ability to reject reflected sound from nearby objects. Near-field focusing of the array could help improve imaging resolution and hence provide more accurate representation of the scene resulting in a better understanding of the true canopy volume.
As discussed in
Section 4.1, we can achieve near-field focusing of the array using beamforming of the microphone/receiver signal (RX beamforming). With this approach, the microphone receiver array can be focused at a particular distance from the centre of the array, increasing sensitivity at that point and reducing sensitivity to surrounding points. It will also minimise distortion of the signal, which will improve cross-correlation performance. This focusing can be achieved by calculating the phase difference of arrival to each microphone from a sound wave reflected off an object at the focus distance. A corresponding phase shift is then applied to each microphone channel’s recording. The simulated beam patterns shown in
Figure 14 indicate that focusing the array in this way could improve the angular resolution substantially. These beam patterns were generated by simulating the sound propagation from each transducer in the array to a reflector situated at a perpendicular distance of 700 mm from the face of the array. The received signal after processing is compared to the transmitted signal using the maximum cross-correlation as discussed in
Section 4.2. The resulting maximum correlation for each x position is shown in power form, normalised to 0 dB. The simulation shows a significant reduction in −3 dB beamwidth, from 44 mm to 17 mm, and reduced sidelobes when the correct focus obtained with near-field beamforming is used.
We could extend this process further by applying the same technique to the transmitted signal. As the transducers also have a significant spatial separation, a synchronously transmitted waveform from each ring of transducers, will reach a particular focus distance at slightly different times. This will cause the apparent signal at that point to become distorted. Traditionally beamforming of transmission signal (TX beamforming) would be performed before transmission to compensate for these delays and ensure the signal reaches its target distance undistorted. The result of performing this TX beamforming is shown simulated in
Figure 14. It shows a marked improvement in −3dB beamwidth, from 44 mm to 16 mm when using both RX and TX beamforming. Furthermore, sidelobes suppression is significantly improved, showing a 13 dB improvement over just using RX beamforming.
RX beamforming provides a significant improvement to the arrays performance and can be applied to a single recording for all distances in postprocessing. Unfortunately, traditional TX beamforming requires multiple transmissions to cover the entire depth range which makes it largely impractical for our situation. These unique transmissions would take a considerable amount of time to perform and increases the complexity of deploying a real-time system for use in vineyards. Furthermore, it removes the ability to evaluate different focus distances after measurements are conducted. In some situations, it may be beneficial to change the z-axis resolution to get a more detailed view of the scene. Additionally, TX-focusing requires additional hardware in terms of an independent DAC and power amplifier per transmission ring.
To work around these limitations, in
Section 4.3, we introduced a novel technique to compensate the cross-correlation for the distortion that affects a transmitted signal when it is not correctly focused. This has the benefit of being computed after capture during postprocessing, in conjunction with RX beamforming, allowing for optimal results at all distances with a single scan. Equation (
9) from
Section 4.2 describes how the distorted signal at a desired depth can be calculated to then enhance cross-correlation performance. The simulated performance of this technique can be seen in
Figure 14. The process results in a substantial improvement over just using RX beamforming. Sidelobes see a further 4 dB of suppression and the −3 dB beamwidth is reduced from 18 to 16 mm. These improvements translate to more granular resolution in the 3D volumetric scans of grape vines. The narrower beamwidth should allow more detail to be captured of the vines and the reduced sidelobes will reduce susceptibility to multipath interference from nearby foliage and other reflectors.
If we repeat the process used to generate an RMS volume as discussed in
Section 4.4, we can compute a comparable volumetric representation using the improved near-field beamforming technique. The resulting RMS volume shown in
Figure 15 is presented as an isosurface with the threshold set to 10% of the maximum RMS. A direct comparison can be made to the unfocused scan seen in
Figure 12. As can be seen, there is a significant increase in the level of detail in the 3D volume. The volume is less globular and more defined. Increasing observable detail of the structure of the vine canopy could lead to improved vineyard management through more precise knowledge about foliage density and crop loading.
The reflections from the leaves can be mitigated in the focused scans using the technique described in
Section 5.1. A fan was used to agitate the leaves. Filtering was performed using averaging and variance.
Figure 16 shows the resulting isosurface plot.
Table 4 compares the grape and foliage volumes obtained using the near-field focused techniques.
7. Conclusions and Future Work
This paper presents a novel approach for the detection of grape clusters which are occluded by foliage using an ultrasonic array. It utilises a low frequency ultrasonic chirp transmitted from a highly directional acoustic array. This is the first time that an ultrasonic phased array has been used to analyse canopy structures and the first time ultrasound has been used to visualise grape clusters. The results show that it is possible for low frequency ultrasound to penetrate through leaves and generate echoes from the grapes behind. In addition, the echoes from grapes and leaves can be distinguished by agitating the leaves using a fan and using the variance of multiple recordings as a filter.
We further demonstrate how increased detail in the acoustic volumes can be achieved through near-field focusing the reception of the array using beamforming and cross-correlation defocusing correction techniques. This significantly reduces the beamwidth and increases directionality of the array. The increased level of detail has direct benefit for more accurate canopy estimation and as a result, improved precision viticulture practices.
Improved spatial and depth resolution would also be expected to reduce the overestimation in volume measurement obtained using the ultrasonic measurement.
Table 5 compares the percentage overestimation in volume obtained in
Table 3 and
Table 4 using the ultrasonic methods compared to the volumes of the the 3D photogrammetric scan and the convex hull given in
Table 2 which was obtained using photogrammetry. Here we can see that the use of near-field focusing techniques with averaging reduced the ultrasonic measured overestimation in grape volume from 222% to 112% compared to the photogrammetry scan or from 56% to 2.5% compared to the convex hull of this scan. More work is needed to investigate how these results would vary with different volume estimation techniques from that used in this work or using a finer measurement grid spacing with the CNC.
It is worth noting that while it may be possible to determine true volume estimates using the acoustic techniques mentioned in this paper, the presented numerical volumes should only be considered as relative comparisons of the effect of each stage of the process. The establishment of an accurate relationship between acoustic volume and true cluster volume will require further study with a range of different grape clusters and foliage conditions. However, it should be noted that accurate measurement of the occluded grape volume using ultrasound is not necessarily essential. For example, it could potentially provide improved estimates of the proportion of occluded grapes to enhance yield estimates obtained using other methods such as computer vision techniques.
The process presented in this paper represents a significant improvement over the current state of the art ultrasonic methods for vine canopy assessment. The increased achievable detail will have a direct benefit for 3D volume estimation of vine canopies as well as improved ability to resolve potential grape clusters. These improvements should enable viticulturists to implement advanced precision viticulture techniques such as crop thinning, precise variable rate applications, and selective harvesting. We also anticipate that the techniques used will have applications beyond viticulture to other areas of horticulture.
Future Work
The lab results presented in this paper show promising initial results. However, field trials are needed to investigate how this system performs in a vineyard environment with different grapes varieties. The performance of the system needs to be investigated further with more leaves, grapes in closer proximity to the leaves, and occluding objects such as vine stems, trunks, and trellis materials. Solid obstacles such as trunks and trellis would not be disturbed by the agitation. This could be addressed by using a fusion of ultrasound and computer vision to assist in identifying these objects. Traditional computer vision techniques can be used to label visible areas of the ultrasonic scan such as vine stems, trunks, and other solid objects. This could be extended further to develop an unsupervised machine learning process to directly classify regions of the acoustic recordings.
The effect of the presence of neighbouring grape clusters also needs to be investigated since they are likely to appear as a single larger cluster with the current processing and hardware. Scanning from different directions may provide improved ability to see behind solid objects or differentiate grape bunches which would otherwise be hidden by another cluster. Work is also needed to identify how early in the season this ultrasonic technique can be used to identify grapes bunch clusters and the relationship between the acoustic scan output and the true cluster weight.
Near-field focusing required additional processing overhead compared with far-field beamforming. One approach to address this may be to preclassify regions of the signal that contain significant reflected components and only perform beamforming on these regions. This could be assisted by incorporate a 3D depth cameras to provide additional information on where processing should be performed. Additionally, as each measurement location is independent of the others, simple parallelization techniques can be used to vastly improve processing times. In addition, improved resolution could be achieved by modifying the hardware so that the transmission could be focused in the near-field.
The array used in this study featured a very narrow beamwidth and could only image directly in front of the array. It used a highly accurate CNC machine to generate the 3D acoustic scans of the grapes. Although beyond the scope of this project, we believe we can enhance the hardware further to increase its practical use within a vineyard by reducing the scan time and remove the need for the CNC machine. Accurate tracking of the position of the array without the use of a CNC could be achieved using techniques such as a fusion of differential GPS and optical pose estimation.
If the array was redeveloped for large scale field trials, different transducers are likely to be used which may have different optimal transmitted signal. Therefore, it would be beneficial to further investigate the effect of transmitted waveform on the resulting scan and their resilience to sources of interference such as multipath reflections. Furthermore, given the significant physical differences between grape clusters and vine foliage, it may be possible to identify unique frequencies of absorption or reflection for each potentially making it possible to classify directly from the recorded waveforms.