Estimating ADWF at sewage treatment plants

May 27, 2021 | 20 mins read

by Water Source

D de Haas, S Ng, N Dahl, D Baulch.

First published in Water e-Journal Vol 6 No 1 2021.

Abstract

Estimating and understanding Average Dry Weather Flow (ADWF) is fundamental to the planning, design, and operation of sewage treatment plants (STPs). This paper reviewed methods for estimation of ADWF, in four general groups: Rainfall-based; Equivalent person (EP) based; Basic statistical (Percentiles); and ‘Novel’. The ‘Novel’ methods identified were: Histogram/ Mode; Antecedent Precipitation Index (API); Ratio of Short Term and Long-Term Moving Averages; K-means Clustering; Diurnal Profile Smoothing; and Kernel Density Estimation. EP-based methods were not considered useful because they shift the uncertainty from rainfall and/or flow data to population and/or loading data. The other methods were tested using datasets for two STPs of similar size (ADWF approximately 1.2 to 1.3 ML/d) in northern New South Wales, one of which is more prone to wet weather inflow/ infiltration (I/I). On balance of simplicity and performance against more complex methods, we recommend the Histogram/ Mode and/or the Percentile methods for routine reporting. For larger and more complex assignments (e.g., design projects, planning studies), it is recommended that one or more of the alternative high-performing methods described in this paper (e.g., Ratio of moving averages; Kernel Density Estimation) be employed for ADWF checks. Relatively large datasets (at least one year of daily flow totals) should be used and the results compared against the estimates from simpler methods.

Introduction

Historically, both locally and internationally, engineers and managers of water utilities have applied various methods to determine the average dry weather flow (ADWF) for sewage treatment plants (STPs). To our knowledge, at least in Australia, there is no single industry standard method to define or determine ADWF. Common methods (typically) apply some form of numerical ‘filter’ to the totalised daily data for STP inflow, based on concurrent rainfall records. Such methods vary in detail (e.g., number of days, rainfall amount etc. applied to the numerical filter). Furthermore, such methods are constrained by the issue of representative rainfall data to apply for STP catchments (e.g., accuracy of rainfall records, nearest geographic factors, spatial distribution of rainfall) and the variable degree to which rainfall affects STP inflows (e.g., prolonged effects of rainfall on infiltration/ inflow (I/I) in some but not all catchment sewer systems).

A review of industry ‘best practice’ methods for defining and calculating ADWF at STP was collated from various sources, including the following: existing environmental licenses in Australia (QLD and NSW); international governing bodies (UK Government Environment Agency and Winnipeg Water and Waste Department); and industry practice (e.g., previous approaches used by consultants, suggested alternatives by local councils, or industry associations etc.).

The methods reviewed fell into four groups:

1. Rainfall-based
2. EP based
3. Basic statistical
4. Novel

Rainfall-based methods attempt to determine which days were dry by examining historical daily rainfall records and, in many cases, also looking at rainfall on a given day along with preceding days.

EP-based methods attempt to determine dry weather flow empirically by estimating the number of equivalent persons (EP) within the catchment area and multiplying by an average wastewater production per EP.

Basic statistical methods were those that apply a very simple statistical analysis of flow data. The only example of this found in the literature review was Percentile-based estimations; however, using a histogram or mode calculation would be very similar. Percentile-based methods look at the entire set of flow data, including wet days, and estimate the average dry weather flow by taking a flow percentile (typically between the 20^th and 50^th percentile). The histogram/ mode method looks at the frequency of different flowrates and takes the most commonly occurring flowrate as an estimate for ADWF.

‘Novel’ methods were those that did not fall into one of the other three groups described above. In the literature review, only one method was considered novel, namely a ‘ground-up’ approach for very small flow systems. The average flow for household water-using devices (taps, showers, appliances etc.) is estimated and ADWF is then stochastically estimated based on the frequency and duration of use.

The aim of this paper was to compare the results of using the current industry ‘best practice’ methods for estimating ADWF against five novel estimation methods that seek to improve estimate performance. To our knowledge, the novel methods we selected for testing have never been applied for this purpose to STP flows on a routine basis.

Methodology

Data

Flow data from 2011 to 2020 was supplied by Byron Shire Council (BSC) for two sewage treatment plants (STP) in northern New South Wales: Ocean Shores STP (OSSTP) and Brunswick Valley STP (BVSTP). The two plants are located within a straight-line distance of 1.7 km from each other.

Additional short time interval flow rate data from 2017 to 2019 was provided for BVSTP for use in the Diurnal Profile Smoothing method, as described below.

Rainfall data was retrieved from the nearby weather stations from Bureau of Meteorology (BOM) data: Mullumbimby and Brunswick Heads Bowling Club (BOM station no. 058040 and 058103, respectively).

Overview

Investigations began with a preliminary set of estimates using current industry methods. The initial methods tested were:

20^th percentile
30^th percentile
50^th percentile
QLD EPA SEQ Rainfall-based method
One-week Rainfall-based method
Three-week Rainfall-based method.

These informed the initial rating system for discerning good vs. poor estimates. To do this, the above-mentioned methods were separated into three levels of strictness: Least Strict, Moderately Strict, and Strictest. The Strictest estimates were expected to be the highest performing as they eliminated the most data for days influenced by rainfall events.

Six additional methods were proposed - one basic statistical, and five ‘novel’ methods - as follows:

Histogram / Mode (basic statistical method)
Antecedent Precipitation Index (API)
Ratio of Short Term and Long-Term Moving Averages
K-means Clustering
Diurnal Profile Smoothing
Kernel Density Estimation

Each method was evaluated and compared against the results of the Strictest estimates from the above-mentioned rating of the initial methods tested. Based on the highest performing among the initial and proposed additional methods, a ‘true value ADWF’ was adopted for the datasets examined from each of the two STPs.

To assess the ability of individual methods to estimate ADWF, the results for each were compared against the ‘true value ADWF’ for the two STPs and a relative score given. All methods were then compared in a Multi-Criteria Assessment (MCA), which scored the methods semi-quantitatively for Estimate Performance, Data Requirements, Mathematical Complexity, Parameter Complexity, and Robustness.

Description of novel estimation methods

Antecedent Precipitation Index (API) Method

The first novel method is similar to rainfall-based methods. It applies an established modelling term, namely Antecedent Precipitation Index (API). API is a running day-by-day index of moisture stored within a drainage basin (Ali, et al., 2010). The difference between API versus a simple cumulative rainfall is that API considers the nature of catchments where drying out progressively occurs during periods without rain, making recent rainfall events more impactful than earlier events. Mathematically, API takes the form of:

Where:

i is the number of antecedent (preceding) days considered

k is the decay constant (d^-1)

P_t is the rainfall during a given day at time, t

t is time (days).

For this study, the values we chose for k and i were 0.9
(Ali et al., 2010; Kohler & Linsley, 1951) and 27 days, respectively. This approach considers rainfall over the 27 antecedent days up to and including a given present day (28 days total). The index progressively places less weighting on rainfall that occurred in earlier antecedent days, culminating in a weighting of 5% for rainfall measured on the 27^th antecedent day.

We defined a dry day as any day with. Using this definition, ADWF was then calculated as the median of dry day flows.

Ratio of short term and long-term moving averages

The second novel method is based on the ratio of short-term to long-term moving averages of daily flows. Based on similar applications for monitoring variance within natural systems, including human fitness (Murray et al., 2016), this method compares the short and long-term averages to determine if flow is changing significantly or relatively stable. A ratio between the short and long-term averages close to unity (1.0) is taken as an indicator of stable flow and, by implication, dry weather flow.

This method has the benefit that it technically does not classify “dry weather” based on rainfall but rather attempts to discern baseline flows. This has the advantage that it can be equally applied in regions with different climates, including those where rainfall occurs relatively frequently, causing I/I to produce on-going contributions to average flow. Similarly, it can be applied in situations where local rainfall records either do not exist or are unreliable. By contrast, rainfall-based methods depend on reliable rainfall data and a single definition of ‘dry weather’ is difficult for different situations.

Arithmetic moving average

The simplest form of calculating moving averages is arithmetically. In this case the moving average is the sum of flowrates divided by the number of days considered. Mathematically, the formula is:

Where:

F is the moving average

i is number of days considered

t is the time (days)

F_t is the flowrate on day t

For this study, we chose i to be 7 days (1 week) for the short-term average, and 28 days (4 weeks) for the long-term average. The ratio of short-term average to long-term average (ϕ) is simply expressed as:

Where F_ST and F_LT are the short term and long-term moving averages, respectively.

To classify dry weather days, by trial and error we selected an upper bound for the ratio (ϕ) of 1.025 and a lower bound 0.976 (the inverse of the upper bound).

Exponentially weighted moving average

The exponentially weighted moving average (EWMA) is a modification to the arithmetic moving average where a diminished weighting is applied to older flowrates, like the API method. The formula for the EWMA is:

Where:

F is the moving average

i is the number of days considered

t is time (days)

F_t is the flowrate on day ‘t’

k is the decay constant.

As before, we chose value of i = 7 days for the short-term average, and i = 28 days for the long-term average. We chose k to be 0.9, consistent with the API method (see above). As before, to classify dry days we chose an upper bound for the ratio of 1.025 and a lower bound of 0.976.

Ratio of moving averages with step-change limit

A modification of the Ratio of Moving Averages method was developed to limit the impact on results from large rainfall events. This modification excludes readings where the ratio of averages changes rapidly. For example, after a heavy rainfall event the short-term moving average flow spikes to a high value before returning to a lower flow. As the short-term average recedes after the high flow event, the long-term average flow rises, and the two averages will intersect at a flowrate that is higher than the ADWF. The ratio step-change limit prevents such intersections from being counted as dry days. For the step-change limit, we applied the logic that a given day (at time t) is considered dry if:

Where:

ϕ is the ratio of moving averages (arithmetic or exponentially weighted, see above) and t is time (days).

For this study, by trial-and-error we selected a step-change limit of 0.025.

K-means clustering

The third novel method is a clustering approach intended to use an advanced method of classifying flowrates with the aim of isolating dry weather flows from other groupings of daily total flows. Taken from machine learning techniques, K-means clustering seeks to sort a series of observations into a number (k) of groups called clusters, thereby revealing underlying patterns. The number of clusters is specified by the user and an algorithm seeks to select centroids such that the distance from that centroid to points within its cluster is minimised. In the case of ADWF, clusters are selected such that similar daily flowrates are grouped together. A real-life analogue might be a four-cluster grouping such as: dry weather, light rain, heavy rain, and extreme weather. Figure 1 demonstrates the idea of K-means clustering.

Figure 1: Diagram of K-means Clustering (Wikipedia, 2020c)

Diurnal profile smoothing

Figure 2: Gaussian Kernel Smoothing (Wikipedia, 2020b)

The fourth novel method is an advanced classification method attempting to discern base flowrates from short time-interval data (intervals of minutes or hours). The Diurnal Profile Smoothing method is predicated on identifying higher flow rates indicative of wet weather and separating these from the dry weather flow pattern of an STP. This is possible using large datasets of short time-interval flow rate to overlay plots at weekly time spans. Removing outliers (identified wet weather data) and averaging or smoothing the dry weather weekly flow pattern, produces an estimate of the underlying base flow.

Once the underlying base flow pattern is known, ADWF can be calculated by integration. Figure 2 shows an example of the curves produced by a Gaussian Kernel Regression (the method of smoothing we have chosen).

Kernel density estimation

The fifth novel method is a modified histogram/ mode approach that attempts to create a continuous distribution from flow data rather than the discrete formulation of a histogram. The translation to average dry weather flow is the same as for a regular histogram, namely that the most common flowrate is likely to be a good estimate of ADWF.

Kernel density estimation produces a continuous distribution by plotting all the data points (flowrate) onto the X-axis and assigning a distribution function for each point, called a kernel. In the case of Figure 3, a Gaussian distribution has been assigned and centred at each data point (dashed red lines). The final distribution is obtained by summing the values of the individual kernel at continuous x values, resulting in peaks for ranges with many data points and troughs for ranges with few.

Kernel density estimation has two advantages. Firstly, it produces a clearer picture when there are relatively few data points, compared with a histogram. Secondly, it provides weighting to adjacent data points so that a more representative estimate between data points and at extremes is created, which can improve performance with smaller and/or skewed data sets.

Figure 3: Histogram vs Kernel Density Estimation (Wikipedia, 2020a)

Results

Basic statistical methods

A summary of the results for the Percentile-based methods is given in Table 1.

Table 1: Summary of Percentile-based Estimates

It is useful to compare the percentile results with the mode and traditional histogram for the same datasets. Using the original flow datasets (without editing i.e., including potential outliers) and taking the mode as an estimate for the ADWF, results in a value of 1.29 ML/d and 1.16 ML/d for OSSTP and BVSTP respectively. As an example, the histogram for OSSTP is shown in Figure 4, where the probability plot and 20^th percentile are also plotted, demonstrating similarity between the methods.

Figure 4: OSSTP Daily Flow Histogram

Rainfall-based

The calculated ADWF was analysed for sensitivity to the number of preceding days considered, and the average rainfall over that period (i.e., a given day and its preceding days). Refer to Table 12 in Supplementary Information for the detailed results. It was concluded that considering at least six preceding days and allowing up to 7 mm cumulative rainfall over that day and its six preceding days, gave a sufficiently strict definition of a dry day. This amounted to an average rainfall of up to 1 mm/day over seven consecutive days. It enabled an estimate of ADWF to within a margin of 10% of the results produced by the strictest parameters tested. The strictest parameters tested were 27 preceding days and 0 mm/d of average rainfall (i.e., 0 mm rainfall on any given day and in aggregate over 28 days). By way of illustration, the ADWF estimated by allowing 1 mm/d average rainfall over one week (i.e., either <7 mm in aggregate over seven consecutive days), or three weeks (i.e., <21 mm in aggregate over 21 days) are both shown in Figure 5 and Figure 6 below for OSSTP and BVSTP, respectively.

Figure 5: Ocean Shores STP Comparison of Existing Methods

Figure 6: Brunswick Valley STP Comparison of Existing Methods

The Rainfall methods can be compared with the Percentile method. The ADWF estimated from the rainfall using the stricter definition of a dry day (<21 mm in aggregate over 21 days) lay between the 20^th and 30^th percentiles, whereas that estimated using (<7 mm in aggregate over 7 days) lay between the 30^th and 50^th percentiles (see Figure 5 and Figure 6). The ADWF estimated from rainfall using a method applied in a recent Queensland STP environmental license (QLD EPA, 2020) was higher than the 50^th percentile (i.e., significantly higher than that from the stricter rainfall definitions applied here - see above).

Novel estimation methods

Antecedent Precipitation Index (API) method

The results for the API method are summarised in Table 2. Like the rainfall method results (see above), the ADWF estimates lay between the 20^th and 30^th percentiles (Table 1).

Table 2: ADWF estimates by API method

Ratio of short term and long-term moving averages

Taking the ADWF as either the mean or median of flows classified as dry by the ratio of moving averages method yields the results in Table 3.

Table 3: ADWF Estimates by Ratio of Moving Averages method

Looking at the ADWF calculated from the mean flows on dry days, the arithmetic moving average and EWMA methods vary by less than 0.05 ML/d without the step-change limit, and by less than 0.03 ML/d with the step-change limit. With the parameters chosen, the two methods for calculating moving averages are almost indistinguishable in terms of ADWF estimated.

However, the step-change limit had a significant effect on the ADWF estimate (from the mean flows on dry days), particularly for BVSTP where it lowered the estimate from 1.39 to 1.23 ML/d (i.e., by a margin of 11%). Figure 7 illustrates the residual effects of wet weather impacts on flow rates for dry days, as defined by the moving average methods for the two STPs. Anecdotally, BVSTP is more prone to high wet weather flows than OSSTP.

Noting the impact of the step-change limit on the estimated ADWF using the moving average methods, as an alternative, we found that the effect of peak weather events could be reduced in the calculation by replacing mean flow on dry days with the median flow (on dry days). A comparison can be made from the results in Table 3. The median consistently predicts a lower ADWF than the mean and gives very little change in the result when altering or completely removing the step-change limit. With or without the step-change limit, the ADWF estimate changed by 0.02 ML/d or less (a margin of <2%).

Figure 7: Dry Weather Flow as defined by Ratio of Moving Averages (BVSTP and OSSTP)