The modeling of the occurrence of a rainfall dry spell and wet spell (

Rainfall is an intermittent process that is characterized by the alternation of wet and dry statuses. Indeed, a very simple representation of the rain process consists of an alternating sequence of two opposite conditions (dry and wet), each lasting for a given duration. Although some aspects of this intermittent process can only be observed at small timescales (i.e., hourly or smaller), particularly those concerning patterns of maximum-intensity events, the daily timescale allows the fundamental sequence of dry and wet events to be captured, as they are usually driven by the dynamics of large-scale precipitation systems (Bonsal and Lawford, 1999; Osei et al., 2021; Zhang et al., 2015). The daily scale is also quite appealing since precipitation records over several decades are reliably collected at this frequency, a feature seldom shared by subdaily time series.

Many approaches have been proposed in the scientific literature to model rainfall intermittence, including Poisson clusters, multifractals, power spectral analyses, Markov chains, and geostatistics (Dey, 2023; Hershfield, 1970; Schleiss and Smith, 2016). At the local scale, a classical approach to address intermittency in rainfall records is to statistically analyze the sequences of rainy days, called wet spells and denoted as

In his pioneer study, Chatfield (1966) analyzed a short series of daily rainfall recorded at a single station in Kew (London) and investigated the ratios between observed frequencies of increasing values of

Recently, Agnese et al. (2014) suggested modeling both

Agnese et al. (2014) showed that both the

Although the modeling of the

The objective of this paper is to verify to which extent the assumption of i.i.d. on renewal times

The HLZ (also known as Lerch) family is a set of discrete probability distributions, whose probability mass function (

Lerch family of probability distributions with the corresponding parameter domains, together with the theoretical means (arithmetic, AM, geometric, GM, and/or harmonic, HM) that match (x) or not (–) with the empirical ones according to the MLE method.

This distribution makes it possible to account for some peculiar characters observed in the

To estimate the parameters of the Lerch distribution, the maximum likelihood estimation (MLE) method was applied. For the general case (three-parameter, Eq. 1), the analytical solution of the MLE method returns arithmetic, geometric, and harmonic expectations equal to the corresponding sample values. If

While the general three-parameter form (ID

To assess the adequacy of the Lerch family distribution in reproducing the observed frequencies, we employed a simulated

We simulated 3000 replicates of the sample by random sampling from the inferred theoretical distribution. The associated

Let a time series of rainfall data be defined as

The interarrival time series

Examples of interarrival times (

From a hydrological point of view, the above-described

According to previous studies (Agnese at al., 2014; Baiamonte et al., 2019), a direct inference on the

According to Eq. (6), the

An alternative, and more general, formulation of

Using similar arguments but with

Equations (7), (10), (11), (12), and (13) show that in the DM the probability distributions of the wet and dry spells and of the wet and dry chains can be directly recovered from the

The IM is based on modeling individually the random variables

Once both the

As in the DM, the probability distribution of

In this paper, to obtain the series of rainfall time variables (

Locations of the six considered stations, together with latitudes and elevations above sea level (© QGIS 2023).

Due to the relevance of rainfall regimes for the distribution of rainy and non-rainy days, in addition to the analysis of the entire year (

The analyzed stations are characterized by different rainfall regimes, as shown in Fig. 3 by the average number of rainy days in each month (panel a) and the fraction of the yearly-average rainy days in each month (panel b) (as a standardization of the data in panel a). The stations of Trapani (TRA) and Floresta (FLO) represent the typical Mediterranean climate, with a strong seasonality in the rainfall regime (see Fig. 3b) and precipitation concentrated in the S2 season. The two stations differ in the total amount of average annual rainfall, which is low for TRA (420 mm) and high for FLO (1133 mm), a difference that is also revealed by the different number of rainy days per month (Fig. 3a). Torino (TOR) and Ceva (CEV) are characterized by a mid-latitude sublitoranean climate with a high rain frequency in spring (Fig. 3b). CEV also exhibits a secondary peak in autumn, mainly due to the influence of the Tyrrhenian Sea warming in summer. Despite this difference, the two stations are characterized by a similar total annual rainfall (829 and 836 mm, respectively) and number of rainy days (Fig. 3a). Oxford (OXF) is a northern European station with a relatively low average rainfall amount (592 mm) homogeneously distributed throughout the year, whereas Stornoway (STW) has a very high rain frequency and a higher amount all through the year (1072 mm) due to its location in far northwestern Scotland and the direct effect of the wet fronts from the Atlantic Ocean. Both stations in the UK have low seasonality compared to the other stations (see Fig. 3b).

Time variability of

For the TRA and FLO stations, seasons S1 and S2 clearly correspond to the low and high frequencies of rain events, respectively (Fig. 3b). A similar pattern can be observed for OXF and STW, although with less marked differences between the two seasons. Due to the considerable length of the data records, sample sizes remain large even for the two seasonal datasets, as summarized by the data in Table 2. The sample size is less than 500 only for

Sample sizes of the time variables for the six stations and for the three periods Year, S1 (from April to September), and S2 (from October to March).

It is noteworthy that the splitting of the two seasons of CEV and TOR was done differently in a previous paper (Baiamonte et al., 2019). However, in this paper the same splitting into two 6-month seasons is used for the sake of the homogeneity of the present analysis (Fig. 3a and b).

As is well known in the literature, the presence of a trend in the datasets can affect the assumptions made for

However, a known limitation of the MK test is the increased probability of finding trends in the presence of a significant autocorrelation in the data (Hamed and Rao, 1998). In such a case, the variance of the MK test statistic depends on the true unknown autocorrelation structure, and it is typically larger (lower) if positive (negative) autocorrelation occurs with respect to the case of independent data. Therefore, in the presence of autocorrelation, a correction is needed, as the critical values of the classical MK test would lead to incorrect results. Hamed and Rao (1998) proposed an approximation of the true variance of the MK test statistic in the case of autocorrelated data.

Let us recall that a key difference between the DM and the IM, as introduced in Sect. 2.2, is related to modeling the

The presence of a trend on the recorded

Values of Kendall's

To verify the memoryless property of

For the three periods

The main statistics of all the rainfall time variables,

It is interesting to observe that the STW station shows the highest

Box plots of the statistics of time variables

Ratios between observed cumulative frequencies

In-depth analysis of the relationship between spells and chains can be made of the data reported in Fig. 6, where, for the six stations and for the two seasons S1 (a, c) and S2 (b, d), the ratios of the observed cumulated frequencies

For the time series of the six rain gauges, the Lerch family (as given in Eq. 1) was fitted on the three periods (

Parameters of the Lerch family of probability distributions fitted on

In the case of

For the IM, the geometric distribution (

Concerning the

The assessment of the goodness of fit for the selected distributions (for

The data depicted on the right-hand side of Fig. 7 (i.e., IM) suggest that, when the IM is applied, there is a significant reduction in the number of unsatisfactory fits (red and orange classes), particularly for

Summary of the results of the

Observed frequencies and fitted probability distributions for the six stations according to the DM and for the period “Year”. The variables on the

Observed frequencies and fitted probability distributions for the six stations according to the IM and for the period Year. The variables on the

The plots in Figs. 8 and 9 show the cumulative observed frequencies and the corresponding fitted Lerch family probability distributions for the annual period (

Since the previous results show that the main difference between the two methods concerns the ability to model

Scatterplots of the absolute difference between observed frequencies and fitted probabilities with the DM (

Comparison between the empirical and theoretical quantile

The Lerch family distribution also allows the probability of extremes of the time variables to be predicted. The overall consistency of the latter can be observed in Fig. 11, where the empirical 99th percentiles,

The fitting of the Lerch distribution to the selected six stations extends the studies previously carried out for the stations in Sicily and Piedmont (Agnese at al., 2014; Baiamonte et al., 2019). The adequacy of this distribution in fitting

However, the results obtained for

The need to resort to a more complex distribution than the geometric one to reproduce the probabilistic structure of

On the other hand, the geometric distribution seems to perform poorly on STW regardless of seasonality, which is actually quite limited for this station. The STW station seems to represent a case where the memoryless property is violated, as also confirmed by the inspection of the

Splitting the entire dataset into subperiods seems to improve the performance of fittings crosswise for both the DM and the IM. This result is well suited for possible implementations of the methodology for operational applications related to ecohydrology models (e.g., D'Odorico et al., 2000; Petrie and Brunsell, 2011) and stochastic weather generators (e.g., Paek et al., 2023). In these fields, to express the climatic component of weather variables (Semenov et al., 1998), not only does the overall probabilistic structure of rainfall need to be reproduced, but information on a seasonal or even subseasonal (i.e., monthly) scale is also required. Other studies, carried out over regions characterized by a climate with no distinct monsoon seasons, have also highlighted the importance of focusing on either the dry summer seasons or wet winter seasons (Caloiero and Coscarelli, 2020; Paton, 2022; Raymond et al., 2016). Wan et al. (2015) also suggested the need to account for the seasonality to properly reproduce the duration of

Another consequence of the inadequacy of the geometric distribution in describing wet periods is that the daily structure of rainfall needs to be taken into account for modeling processes such as the seasonal dynamics of soil moisture and vegetation. Ratan and Venugopal (2013) did an assessment for tropical areas using satellite rainfall data. They found wet spell durations with a peak at 1 d for dry regions, while the duration of 2–4 d is predominant for humid areas. A similar but reversed observation was made for dry spells, resulting in 1 d for humid areas and 3–4 d for dry areas.

For some cases, the results obtained for the IM suggest that the classical application of the geometric distribution for

Finally, it is worth mentioning that the models proposed in this paper are local, and hence spatial dependency in parameters may need to be accounted for in applications to multiple stations located at shorter distances.

In this paper, daily rainfall data belonging to a large range of rainfall regimes across Europe (latitudes 38–58° N) have been analyzed to model the frequency distribution of some key rainfall time variables. By using two different methods, the assumption of the renewal property that implies the geometric distribution of wet spells has been investigated. First, a direct method (DM), where the geometric distribution of wet spells is assumed, has been applied. Second, the latter assumption is relaxed by using an indirect method (IM) where wet spells and dry spells were modeled separately, hence including the possibility of accounting for a non-constant rain probability inside the rainfall cluster.

As a general rule, the results of comparing the DM and the IM suggest that the Lerch distribution can be successfully used for both interarrival times and dry spells in a wide variety of rainfall regimes, whereas a preliminary analysis of the memory property (e.g., of the

The analysis was extended to include two additional time variables strongly associated with wet and dry spells, referred to as wet and dry chains. These variables extend the concept of wet and dry spells to sequences characterized by an interruption of 1 non-rainy day or 1 rainy day, respectively, as they represent two quantities that may be of interest for practical hydrological applications. The results obtained for the two chains generally reflect the findings obtained for the spells, albeit highlighting additional difficulties in the probabilistic modeling, especially at sites where the sample size may become a limiting factor.

The effects of the seasonality on the results were also addressed, splitting the data into two 6-month subperiods. This separation tends to improve the performances for both the DM and the IM, stressing how at most of the sites the DM applied to seasonal data is still a suitable straightforward approach. The results of this study may help in scenario simulations of drought and flood events, considering that probabilistic functions, such as those applied in this work, are at the root of stochastic climate modeling.

Future research aimed at investigating the neighboring location effects on parameter values will be developed.

Codes used for this study can be shared upon request to the main author.

Data used in this analysis and outputs can be shared upon request to the corresponding author.

Conceptualization: GB, CA, CC, EDN, SF, TM. Data curation: GB, CA, SF. Formal analysis: GB, CA, CC, EDN, TM. Investigation: GB, CA, CC, EDN, TM. Methodology: CA. Validation: GB, CA, CC, EDN, TM. Visualization: GB. Writing – original draft preparation: GB, CA. Writing – review and editing: GB, CA, CC, EDN, SF, TM.

The contact author has declared that none of the authors has any competing interests.

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this paper. While Copernicus Publications makes every effort to include appropriate place names, the final responsibility lies with the authors.

The authors would like to thank Nicholas Howden, School of Civil, Aerospace and Mechanical Engineering, University of Bristol, for providing the Oxford rainfall data, INDAM and especially the GNAMPA group for their support, and the anonymous reviewers and the editor for their valuable comments.

This work was supported by the PRIN MIUR 2017SL7ABC_005 WATZON project, PRIN2022PFKP Sunset, and by NODES, which received funding from the MUR-M4C2 455 1.5 of PNRR with grant agreement no. ECS00000036.

This paper was edited by Soutir Bandyopadhyay and reviewed by two anonymous referees.