309
Views
0
CrossRef citations to date
0
Altmetric
Research Article

Regression estimation for continuous-time functional data processes with missing at random response

&
Received 29 Apr 2021, Accepted 04 Mar 2024, Published online: 22 Mar 2024

Abstract

In this paper, we are interested in nonparametric kernel estimation of a generalised regression function based on an incomplete sample (Xt,Yt,ζt)t[0,T] copies of a continuous-time stationary and ergodic process (X,Y,ζ). The predictor X is valued in some infinite-dimensional space, whereas the real-valued process Y is observed when the Bernoulli process ζ=1 and missing whenever ζ=0. Uniform almost sure consistency rate as well as the evaluation of the conditional bias and asymptotic mean square error are established. The asymptotic distribution of the estimator is provided with a discussion on its use in building asymptotic confidence intervals. To illustrate the performance of the proposed estimator, a first simulation is performed to compare the efficiency of discrete-time and continuous-time estimators. A second simulation is conducted to discuss the selection of the optimal sampling mesh in the continuous-time case. Then, a third simulation is considered to build asymptotic confidence intervals. An application to financial time series is used to study the performance of the proposed estimator in terms of point and interval prediction of the IBM asset price log-returns. Finally, a second application is introduced to discuss the usage of the initial estimator to impute missing household-level peak electricity demand.

1. Introduction

Let (E,d) be an infinite-dimensional space equipped with a semi-metric d(,). Consider (Xt,Yt,ζt)tR+ a stationary and ergodic continuous-time process valued in the space X:=E×R×{0,1}, such that each triplet (Xt,Yt,ζt) has the same probability distribution as the random variable (r.v.) (X,Y,ζ) defined on probability space (Ω,F,P). Let S be a compact interval in R and ψ be a measurable function defined on the space S×R, (y,Y)ψ(y,Y), where y is a real variable such that E(|ψ(Y,y)|)<. Consider the following regression model: (1) ψ(Y,y)=mψ(X,y)+ε,(1) where mψ(X,y) is the conditional expectation of ψ(Y,y) given the r.v. X. That is, for any yS and a fixed xE, E(ψ(Y,y)|X=x)=mψ(x,y). The error term ε is independent of X such E(ε|X)=0 almost surely (a.s.)

Usually, when no data are missing, a sample of stationary and ergodic process (Xt,Yt)0tT is observed. Here, we allow the response variable Yt to be Missing At Random (MAR) any time t. To check whether an observation is complete or missing, a new variable ζ is introduced into the model as an indicator of the missing observations. Thus, for any t[0,T], ζt=1 if Yt is observed and 0 if Yt is missing. We suppose that the Bernoulli random variable ζ satisfies P(ζ=1|X=x,Y=y)=P(ζ=1|X=x):=p(x). Here, p(x)>0 is the conditional probability of observing the response variable and is usually unknown. This assumption allows to conclude that ζ and Y are conditionally independent given X. Note that the above assumption says that the response variable does not provide additional information, on top of that given by the explanatory variable, to predict whether an individual will present a missing response.

In this paper, we are interested in the estimation of the regression function mψ(,y) based on the observed data (Xt,Yt,ζt)0tT. Note that, for any t[0,T], t{Xt(ω),ωΩ} is an element in the space E, which means that, for any fixed time t=t0, Xt0 is a curve. Specifically, if E=C[0,1] is the space of square integrable functions defined on [0,1], then the predictor Xt0:={Xt0(s):s[0,1]} describes a trajectory in the functional space E observed at the fixed time t0.

In real life, there are several situations where the response variable might be missing at random. For instance, in survey sampling studies the non-response is an increasingly common problem, where the missing response reaches rates of 25%30% or even higher (see, e.g. Sikov Citation2018). In such cases, the missing data become a real source of bias in survey sampling estimation. Another case where the response may be subject to the MAR phenomena is the household electricity consumption monitoring. Indeed, the real-time collection of intra-day electricity consumption is now possible after the deployment of smart meters at the household level. The transmission of the information from the smart meter towards the information system goes usually through WIFI or optical fibre networks which are significantly dependent on the weather conditions, among other factors. Therefore, a response variable such as the daily total electricity consumption might be subject to missing at random mechanism due to bad weather conditions (for more details , see Section 5.2). In financial market, despite the modern technology, which allows to collect data at a very fine time scale, financial data can still be missing. For instance, there are some regular holidays, such as Thanksgiving Day and Christmas, for which stock price data are missing. There are many other technical reasons (such as breakdown in devises recording data, computers' sudden shutdowns, …) that make stretches of data missing (see Section 5.1 for more details about this application). For further examples and details about missing at random data the reader is referred to Chapter 1 in Little and Rubin (Citation2002).

Whenever (Xi,Yi,ζi)1in is an independent and identically distributed (i.i.d) random sample several authors investigated nonparametric and semiparametric estimations of the regression function. In the framework where X is a finite dimensional, one can quote (Cheng Citation1994; Little and Rubin Citation2002; Nittner Citation2003; Tsiatis Citation2006; Liang et al. Citation2007; Efromovich Citation2011). See also, Ferraty et al. (Citation2013) when the predictor is infinite-dimensional. However, less attention has been given to the case of dependent data including an infinite-dimensional covariate, except Ling et al. (Citation2015), where a local constant estimation of the regression operator with discrete-time ergodic processes was considered.

In the case of continuous-time finite dimensional processes (X,Y)Rd×Rd (d and d1) satisfying a strong mixing condition, the estimation of the regression function based on completely observed data was considered by several authors, see for instance, the monograph by Bosq (Citation1998) and the references therein.

Some of these results were extended by Didi and Louani (Citation2014) and Bouzebda and Didi (Citation2017) for stationary and ergodic processes. Chaouch and Laïb (Citation2019) studied the asymptotic mean square error of the kernel regression estimator for MAR stationary and ergodic process and obtained an explicit upper bound of it.

It is worth noting that, even though a continuous-time functional processes framework is considered in this paper, in practice data are often collected according to some sampling scheme and the continuous-time process is discretised. Our results are then valid when considering a discrete-time ergodic stationary process (Xtk,Ytk,ζtk)1kn sampled from a continuous-time process {(Xt,Yt,ζt)}0tT with a regular sampling mesh δ=T/n. A discussion on the optimal choice of the sampling mesh in the continuous-time case will be then of great practical interest (see Section 3.5 and Simulation 2).

In the setting where (Xt,Yt)t[0,T]E×R is an α-mixing continuous-time process with Y is completely observed, Maillot (Citation2008) established the convergence with rates of the regression operator, and a super-optimal mean square convergence rate was obtained in Chesneau and Maillot (Citation2014).

This paper aims to complete and extend (Maillot Citation2008; Chesneau and Maillot Citation2014; Ling et al. Citation2015) work at several levels. First, we suppose that the continuous-time process satisfies an ergodic assumption rather than an α-mixing one. Therefore, the dependence structure considered here is more general and involves several processes which do not satisfy the mixing property. Indeed, our results are valid for α-mixing and non α-mixing as well as long memory and Bernoulli shift processes (for more details, see examples used to discuss Assumption A3 below). Moreover, our results are stated and proved without assuming neither mixing condition nor imposing a particular covariance structure on the process. This is due to the fact that the main technical tools used here are martingale difference devices and sequence of projections on appropriate σ-fields. Second, we complete and extend results established in Ling et al. (Citation2015) for discrete-time functional data processes, to the continuous-time functional framework.

We estimate a general operator mψ(x,y) which includes conditional mean, conditional distribution function and conditional quantiles. It is worth noting that such extension is not obvious since it requires an appropriate definition of σ-fields adapted to continuous-time context. Such adaptation is crucial when using martingale difference tools to establish asymptotic properties of the estimator. Third, the response variable considered here is affected by the MAR mechanism and therefore is not completely observed as in Maillot (Citation2008). Moreover, in contrast to Maillot (Citation2008) and Chesneau and Maillot (Citation2014), we do not limit our study to the mean square convergence, but we provide a more exhaustive inference on the regression operator estimator including pointwise and uniform almost sure convergence rate, identification of the limiting distribution of our estimator, and provide method to build confidence intervals. Fourth, simulation study is also carried out to investigate the selection of the ‘optimal’ sampling mesh, which is one of the most important topics in nonparametric estimation with continuous-time processes.

The rest of this paper is organised as follows. In Section 2, we present the framework adapted to continuous-time ergodic processes and introduce assumptions needed for establishing asymptotic results. The main asymptotic properties of the estimator are discussed in Section 3. An illustration of the performance of the proposed estimator is discussed through simulated data in Section 4. Section 5 is devoted to an application of the proposed estimator to financial time series. Section 6 discusses the application of our theoretical results to continuous-time conditional quantiles estimation. Finally technical proofs are given in the Appendix.

2. Framework and assumptions

To define the framework of our study, we need to introduce some definitions. Let X=(Xt)t[0,) be a continuous-time process defined on a probability space (Ω,F,P) and observed at any time t[0,T]. For more details about the definition of continuous-time ergodic processes, the reader is referred to Didi and Louani (Citation2014). From now on, we consider {Ft,t0} the filtration defined on (Ω,F), that is {Ft,t0} is an increasing sequence of sub-σ-algebras of F.

For a positive real number δ such that n=TδN and jN[1,n], consider the δ-partition (Tj=)1jn of the interval [0,T]. Furthermore, for t>0 and 1jn, we define the following σ-fields: (2) Ftδ:=σ((Xs,Ys):0s<tδ),Fj:=FTj=σ((Xs,Ys),0s<Tj),St,δ:=σ((Xs,Ys),(Xr):0s<t,trt+δ).(2) If t<0 we take Ft the trivial σ-field. Note that, for any δ>0 and t>0, we have FtδStδ,δSt,δ. Moreover, for any j2, such that Tj1tTj, we have Fj2FtδSt,δ.

Let B(x,u) be a ball centred at xE with radius u>0. Denote Dt:=d(x,Xt) a nonnegative real-valued continuous-time process and let Fx(u)=P(Dtu):=P(XtB(x,u)) and FxFtδ=P(XtB(x,u)|Ftδ) be the distribution function and conditional distribution function of (Dt)0tT given the σ-field Ftδ, respectively.

To define an estimator of the regression function adapted to the MAR, multiply Equation (Equation1) by ζ to get ζψ(Y,y)=ζmψ(X,y)+ζε. Taking conditional expectations with respect to X = x, one gets E(ζψ(Y,y)|X=x)=mψ(x,y)E(ζ|X=x). Thus we have mψ(x,y)=E(ζψ(Y,y)|X=x)E(ζ|X=x). Given a random sample (Xt,Yt,ζt)0tT one can therefore define a kernel-type estimator of mψ(x,y), say m^ψ,T(x,y), adapted to the MAR response framework. Note that if there are missing observations in the response variable, a simple way to estimate mψ(x,y) is to consider a kernel smoothing-type estimator which only considers observed data, in other words, those for which ζt=1. Therefore, one gets (3) m^ψ,T(x,y):={0Tζtψ(Yt,y)Δt(x)dt0TζtΔt(x)dt,if0TζtΔt(x)dt01T0Tζtψ(Yt,y)dt,otherwise,(3) where Δt(x)=K(DthT), K() is a kernel density function, hT is the smoothing parameter tending to zero as T goes to infinity.

Remark 2.1

When the sample has missing observations in the response variable, two strategies can be followed to estimate mψ(x,y). The first one, m^ψ,T(x,y) given in Equation (Equation3), called simplified estimator which only uses complete observations. The second approach consists in using the simplified estimator m^ψ,T(x,y) to impute the missing values of the response variable Yt according to the following expression: ψ~(Yt,y):=ζtψ(Yt,y)+(1ζt)m^ψ,T(Xt,y), (see, e.g. Chu and Cheng Citation2003 or González-Manteiga and Pérez-González Citation2004). Consequently, an estimator, say m~ψ,T(x,y), based on imputed data may be defined as follows: m~ψ,T(x,y)=0Tψ~(Yt,y)Δt(x)dt0TΔt(x)dt.

From now on, we set Z1(x):=0δΔt(x)dt and define the conditional bias as (4) BT(x,y):=m¯ψ,T,2(x,y)m¯ψ,T,1(x)mψ(x,y):=CT(x,y)mψ(x,y),(4) where m¯ψ,T,1(x):=m¯ψ,T,1(x,1) and for i=1,2, (5) m¯ψ,T,i(x,yi1):=1nE(Z1(x))0TE{ζt(ψ(Yt,y))i1Δt(x)|Ftδ}dt.(5) Before introducing the assumptions under which we establish our asymptotic results, we add the following notations. Let oa.s.(u) denote a real random function ℓ such that (u)/u converges to zero almost surely (a.s.) as u0 and denote Oa.s.(u) a real random function ℓ such that (u)/u is almost surely bounded.

  1. (Assumptions on the kernel function). Let K be a nonnegative bounded kernel of class C1 over its support [0,1] such that K(1)>0. The derivative K exists on [0,1) and satisfies the condition K(v)<0 for all v[0,1) and |01(Kj)(v)dv|< for j=1,2.

  2. (Assumptions related to the continuous-time functional ergodic processes)

    Let α0 be a nonnegative real number and xE. Suppose, for any 0s<tT such that tsα0, there exists a nonnegative continuous random function ft,s(x):=fXt,s(x) a.s. bounded by a deterministic function bs,α0(x)Footnote1.

    Moreover, let gt,s,x() be a random function defined on R, f(x) is a deterministic nonnegative bounded function and ϕ() a nonnegative real function tending to zero (as its argument tends to 0), and assume that:

    1. Fx(u):=P(d(x,Xt)u)=ϕ(u)f(x)+o(ϕ(u)) as u0.

    2. For any 0st, FxFs(u):=PFs(d(x,Xt)u)=P(d(x,Xt)u|Fs)=ϕ(u)ft,s(x)+gt,s,x(u) with gt,s,x(u)=oa.s.(ϕ(u)) as u0, gt,s,x(u)/ϕ(u) a.s. bounded and T10Tgt,tδ,x(u)dt=oa.s.(ϕ(u)) as T and u0.

    3. For any xE: limT1T0Tft,tδ(x)dt=f(x), a.s.

    4. There exists a nondecreasing bounded function τ0:[0,1][0,1] such that, uniformly in u[0,1], ϕ(hu)ϕ(h)=τ0(u)+o(1)ash0and01(K(v))τ0(v)dv<.

    5. T10Tbt,α0(x)dtDα0(x)< as T.

  3. (Local smoothness and continuity conditions)

    Suppose for any (y,t)S×[0,T] and r>0 such that trt+δ:

    1. E(ψ(Yr,y)|St,δ)=E(ψ(Yr,y)|Xr)=mψ(Xr,y) a.s.

    2. E(ζrψ(Yr,y)|St,δ)=E(ζrψ(Yr,y)|Xr) a.s. and for κ2,

      E(ζr|ψ(Yr,y)|κ|St,δ)=E(ζr|ψ(Yr,y)|κ|Xr) a.s.

    3. β>0 and a constant c>0 such that, for any (x,x)E2, |mψ(x,y)mψ(x,y)|cdβ(x,x).

    4. For any κ12, E(|ψ(Yr,y)|κ1|St,δ)=E(|ψ(Yr,y)|κ1|Xr) a.s.

      The functions Wκ1(x,y):=E(|ψ(Y,y)|κ1|X=x) and W¯κ1(x,y):=E(|ψ(Y,y)mψ(x,y)|κ1|X=x) are continuous in the neighbourhood of x and supxC,yS|Wκ1(x,y)|< a.s.

    5. E(ζr|St,δ)=E(ζr|Xr)=p(Xr) a.s. and for any xE,

      sup{z:d(x,z)u}|p(z)p(x)|=o(1) a.s. as u0.

For j = 1, 2, define the following moments, which are independent of xE (6) Mj=K(j)(1)01(Kj)(u)τ0(u)du,(6) where K(j)() denotes the jth derivative of the kernel K() and (Kj) the first derivative of K raised to the power j.

2.1. Comments on the assumptions

Condition (A1) is related to the choice of the kernel K, which is very usual in nonparametric functional estimation. Note that Parzen symmetric kernel is not adequate in this context since the random process Dt=d(x,Xt) is positive, therefore we consider K with support [0,1]. This is a natural generalisation of the assumption usually made on the kernel in the multivariate case where K is supposed to be a spherically symmetric density function. The assumptions K(1)>0 and K<0 guarantee that M1>0 for all limit functions τ0. In the case of non-smooth processes, τ0 may be equal to the Dirac δ-function at 1, the condition K(1)>0 is needed to define the moments Mj which are, in this case, determined by the value K(1).

Conditions (A2)(i) –(ii) reflect the ergodicity property assumed on the continuous-time functional process. It plays an important role in studying the asymptotic properties of the estimator. The functions ft,s and f play the same role as the conditional and unconditional densities in finite dimensional case, whereas ϕ(u) characterises the impact of the radius u on the small ball probability as u goes to 0. Several examples to satisfy these conditions are given in Laïb and Louani (Citation2010) for discrete-time functional data process. Some other examples satisfying this condition are also given in Didi and Louani (Citation2014) when observations (Xt,Yt)0tT are sampled from an ergodic continuous-time process taking values in Rd×R space.

Condition (A2)-(iii) involves the ergodic nature of the process where the random function ft,tδ belongs to the space of continuous functions. Note that approximating the integral 0Tft,tδ(x)dt by its Riemann's sum: T10Tft,tδ(x)dtn1j=1nf,(j1)δ(x) allows to easily prove that the sequence (f,(j1)δ(x))j1 is stationary and ergodic (see Didi and Louani Citation2014). (A2)-(iv) is a usual condition when dealing with functional data, whereas (A2)-(v) is a consequence of ergodic assumption.

(A.3)(ii) is a Hölder-type assumption that requires a certain smoothness of the regression operator mψ(,y). Such assumption is commonly used in nonparametric estimation. (A.3)(iii) is a smoothness condition on the κth centred conditional moments of ψ(Y,y). (A3)(iv) assumes the continuity of the conditional probability of observing a missing response. Finally, note that the moments (Mj)j=1,2 are linked to the small probability function through τ0. One can refer to Ferraty et al. (Citation2007) for a discussion on the choice of τ0, the Kernel K and the positively of (Mj)j=1,2.

Discussion on the assumptions (A3)(i) –(i). These hypotheses are Markov-type conditions that characterise the conditional moments of ψ(Y,y). They are satisfied for a general class of processes including the α-mixing and non α-mixing as well as long memory and the Bernoulli shift processes. As pointed in Doukhan and Louhichi (Citation1999), the main attraction of Bernoulli shift processes is that they provide examples of processes that are weakly dependent, but not mixing. According to the discussion made in the introduction we consider below some examples in both context (continuous and discretised processes) where the predictor X is a stationary and ergodic Markovian process that might be α-mixing or not and satisfies the conditions (A.3)(i) –(i).

First of all let us recall the following definitions.

Definition 2.1

see Doukhan and Louhichi (Citation1999)

Let (ϵi)iZ be a sequence of independent real-valued r.v.s and F be a measurable function defined on RZ. A Bernoulli shift is a sequence (Ui)iZ defined by Ui=F(ϵij,jZ).

Definition 2.2

see Doukhan (Citation2018, p. 60)

A centred second-order stationary process (Xn) is called long-range dependent (LRD) if k=0rk2< and k=0|rk|=, where rk=Cov(Xk,X0).

Definition 2.3

A process (BtH)tR is called a fractional Brownian motion (fBm) with Hurst exponent H(0,1]; if it is almost surely continuous, centred Gaussian process with covariance ΓH(s,t)=Cov(BtH,BsH)=12(|t|2H+|s|2H(|ts|2H),s,tR.

Definition 2.4

see Lemma 4.2 in Maslowski and Pospíšil (Citation2008)

A strictly stationary centred Gaussian process (Yt)t0 is ergodic if limtR(t):=limtE(Y(0)Y(t))=0.

Example 2.5

Continuous-time long memory processes

Let λ,σ>0 and consider the Langevin equation with fBM noise BtH and initial condition Z0: (7) Zt=Z0λ0tZsds+σBtH,t0.(7) Then, for each H(0,1), the following Gaussian stationary Markovian fractional Ornstein–Uhlenbeck process (XtH)t0 defined as XtH:=σteλ(tu)dBuH,t0, is the unique (a.s.) solution of (Equation7) with initial condition Z0=X0H (for more details, see Section 2, p. 5, in Cheridito et al. Citation2003).

Note that for H(12,1), the auto-covariance function of (XtH)t0 is similar to that of the increments of (BtH)t0. Therefore, XtH is ergodic (by Definition 2.3) and exhibits long-range dependence as detailed in Theorem 2.3 and the discussion in the end of page 8 in Cheridito et al. (Citation2003).

Now, to check the condition (A.3)(i), consider the model: ψ(Yt,y)=mψ(XtH,y)+ϵt,where ϵt's are centred, i.i.d. and independent of XtH. Let St,δ be the σ-field generated by σ((XsH,ϵs),(XrH):0s<t,trt+δ).

It follows that, for any rt, E[ψ(Yr,y)|St,δ]=E[mψ(XrH,y)+ϵr|St,δ].

Since (XrH,ϵr) are Markovian then E[ψ(Yr,y)|St,δ]=m(XrH) almost surely. Thus condition (A.3)(i) is satisfied.

Example 2.6

Discrete-time processes

As discussed above, in real life we do not observe the process continuously at any time t[0,T]. We rather observe a discretised version of it based on some sampling scheme. The following examples are used to show that Assumption (A3)(i) is also satisfied for discretised processes as well.

(i) Long-memory discrete-time processes. Let (ϵt)tZ be a white noise process with variance σ2, and let I and B be the identity operator and the backshift operator, respectively. Giraitis and Leipus (Citation1995) have proved (see Theorem 1 p. 55) that the k-factor Gegenbauer process iik(I2νiB+B2)diXt=ϵt, where 0<di<1/2 if |νi|<1 or 0<di<1/4 if |νi|=1, for i=1,,k, is long memory, stationary, causal and invertible and has the moving average representation. That is Xt=j0ψj(d,ν)ϵtj with j=0ψj2(d,ν)<.

On the other hand, Guégan and Ladoucette (Citation2001) have shown that, if (ϵt)tZ is a Gaussian process, then the above process is not strong mixing whereas the moving average representation of (Xt) confirms that it is a stationary Gaussian and ergodic process.

(ii) The stationary solution of the linear Markov AR(1) process: Xi=12Xi1+ϵi, where (ϵi) are independent symmetric Bernoulli random variables taking values 1 and 1, is not α-mixing (see Andrews Citation1984). However, (Xi) is a Markovian stationary and ergodic process.

(iii) Let (ui) be an i.i.d. sequence uniformly distributed on {1,,9}, and set Xt:=i=010i1uti, where the sequence ut,ut1,, represents the decimals of Xt. The process X=(Xt) is stationary and admits the following AR(1) representation: Xt=110Xt1+110ut=110Xt1+12+ϵt where ϵt=110ut12 is a strong white noise. This process is not α-mixing (see Francq and Zakoïan Citation2010, Example A.3, p. 349), but it is ergodic.

To check the hypothesis (A3)(i) for Examples (i), (ii) and (iii), consider the regression model ψ(Yi,y)=mψ(Xi,y)+σψ(Xi)ηi, where (ηi) is a white noise process independent of (Xi) and define the σ-field: Gi=σ((X1,η1,ζ1),,(Xi,ηi,ζi),Xi+1). It is then easy to see that condition (A3)(i) is fulfilled. The discrete-time processes in examples (i)–(iii) are still valid for the regression model developed in Section 3.5 under the context of sampling schemes.

3. Main results

In this section, we investigate several asymptotic properties of the continuous-time generalised regression estimator. Some particular cases, related to specific choices of the function ψ(,y), including the conditional cumulative distribution function and the conditional quantiles will also be discussed.

3.1. Almost sure consistency rates

3.1.1. Pointwise consistency

The following theorem establishes an almost sure pointwise consistency rate of m^ψ,T(x,y).

Theorem 3.1

Pointwise consistency

Assume that (A1)–(A3) hold true and the following conditions are satisfied: (8) limT(hT)=andlimTlogT(hT)=0.(8) Then, for T sufficiently large, we have (9) m^ψ,T(x,y)mψ(x,y)=Oa.s.(hTβ)+Oa.s.(logT(hT)).(9)

The proof of Theorem 3.1 is detailed in the supplementary material in Chaouch and Laïb (Citation2023).

Remark 3.1

Theorem 3.1 generalises Theorem 1 of Laïb and Louani (Citation2011) established in the context of discrete-time stationary and ergodic processes, and Theorem 3.4 of Ferraty et al. (Citation2005) stated under a strong mixing assumption with completely observed response where the support of y is reduced to one point. Moreover, the function ϕ(hT) can decrease to zero at an exponential rate, whenever hT goes to zero, therefore hT should be chosen to decrease to zero at a logarithmic rate.

3.1.2. Uniform consistency

To establish the uniform consistency with rate of the regression operator, we need some additional definitions and assumptions that allow to express the uniform convergence rate as a function of the entropy number. Let C and S be compact sets in E and R, respectively. Consider, for any ϵ>0, the ϵ-covering number of the compact set C, say Nϵ:=N(ϵ,C,d), defined by Nϵ:=min{n:there existc1,,cnCsuch that∀xCwe can find 1infor whichd(x,ci)<ϵ}. The number Nϵ measures how full is the class C. The finite set of points c1,c2,,cNϵ is called an ϵ-net of C if Ck=1NϵB(ck,ϵ), where B(ck,ϵ) is the ball, centred at ck and of radius ϵ, with respect to the topology induced by the semi-metric d(,). The quantity φC(ϵ)=log(Nϵ) is called the Kolmogorov's ϵ-entropy of the set C that may be seen as a tool to measure the complexity of the subset C, in the sense that high entropy means that a large amount of information is needed to describe an element of C with an accuracy ϵ. Several examples of φC(ϵ) covering special cases of functional processes are given in Ferraty et al. (Citation2010) and Laïb and Louani (Citation2011).

(U0)

Assume that (A2) holds uniformly in the following sense:

  1. (A2)(i) and (A2)(ii) hold true with the remaining term o(ϕ(u)) is uniform in x.

  2. For any xC, limTsupxC|1T0Tft,tδ(x)dtf(x)|=0a.s.

  3. T10Tbt,α0(x)dtDα0(x) as T with 0<supxCDα0(x)<.

  4. b0<infxCf(x)supxCf(x)< for some nonnegative real number b0.

  5. infxCp(x)>b1 for some nonnegative real number b1.

(U1)

The kernel function K satisfies the following conditions:

  1. K is a Hölder function of order 1 with a constant aK.

  2. There exist two constants a2 and a3 such that 0<a2K(x)a3<, for any xC.

(U2)

For 12, the sequence of random variables (ψ(Yt,y))t is ergodic and E(|ψ(Y0,y)|)<.

(U3)

There exist cψ>0 and nonnegative real number γ such that for any yS supy[yu,y+u]S|ψ(Yt,y)ψ(Yt,y)|cψuγ.

(U4)

Let Tn be the integer part of T and suppose for T large enough (logT)2(hT)<φC(ϵn)<(hT)logTwithϵn=logTnTn.

Conditions in (U0) are standard in this context to get uniform consistency rate. Condition (U1) is usually used when we deal with nonparametric estimation for functional data, (U2) requires the existence of the moments up to order 2 of ψ(Y,y). (U3) is a regularity condition upon the function ψ(,y) which is necessary to obtain the uniform consistency result over the compact S. (U4) allows to cover the subset C with a finite number of balls and to express the convergence rate in terms of the Kolmogorov's entropy of this subset. Similar condition has been used in Ferraty et al. (Citation2010), where the authors have pointed out that, for a radius not too large, one requires the quantity φC(logTn/Tn) to be not too small and not too large. This condition seems to satisfy this exigence, since it implies that φC(logTn/Tn)(hT) goes to 0 for sufficiently large T. Examples given in Ferraty et al. (Citation2010) and Laïb and Louani (Citation2011) satisfy (U4).

Theorem 3.2 states uniform consistency rate of the Kernel regression estimator. It generalises Theorem 2 in Ferraty et al. (Citation2010) in the i.i.d. case and that in Laïb and Louani (Citation2011) established in the context of discrete-time stationary and ergodic processes with completely observed response.

Theorem 3.2

Uniform consistency

Assume (A1), (U0)–(U4), (A3) hold true. Moreover, suppose conditions in (Equation8) are satisfied and (10) n1nγexp{(1η)φC(lognn)}<forsomeη>0whereγisasin(U3).(10) Then we have (11) supySsupxC|m^ψ,T(x,y)mψ(x,y)|=Oa.s.(hTβ)+Oa.s.(φC(ϵT)(hT))asT+.(11)

The proof of Theorem 3.2 is detailed in the supplementary material in Chaouch and Laïb (Citation2023).

3.2. Asymptotic conditional bias and risk evaluation

Before evaluating the conditional bias, let us introduce some additional notations. Consider for i = 1, 2, the following assumption:

(BC1) Recall that Dt(x)=d(Xt,x) and, for any t0, denote E[mψ(Xt,y)mψ(x,y)|Dt(x),Ftδ]=E[mψ(Xt,y)mψ(x,y)|Dt(x)]=:Ψy(Dt(x)). Assume that the function Ψy is differentiable at 0 and satisfies Ψy(0)=0 and Ψy(0)0 for any yR. This condition was introduced in Ferraty et al. (Citation2007) and used by Laïb and Louani (Citation2010) to evaluate the conditional bias. The introduction of Ψy() allows to make an integration with respect to the real random variable Dt(x) rather than the couple of random variables (Dt(x),Xt), where Xt being functional continuous random variable.

The following proposition gives asymptotic expression of the conditional bias term, which generalises Proposition 1 in Laïb and Louani (Citation2010) for discrete-time estimator to our setting. Its proof is similar to the one in the discrete-time framework and therefore is omitted.

Proposition 3.3

Conditional Bias

Under assumptions (A1)–(A3), (BC1) and conditions in (Equation8), we have BT(x,y)=hTΨy(0)M1[K(1)01(sK(s))τ0(s)ds+oa.s.(1)]+Oa.s.(hTlogT(hT)).

The next result gives an explicit expression of the asymptotic quadratic risk associated to the estimator m^ψ,T(x,y).

Theorem 3.4

Quadratic risk

Suppose that Assumptions (A1)–(A3) hold true. Then, whenever p(x)>0 and f(x)>0, we have, for a fixed (x,y)E×R, that MSE(x,y):=E[(m^ψ,T(x,y)mψ(x,y))2]=A1hT2[A1+O(logT(hT))]+A2(x,y)(hT), where A1=Ψy(0)M1[K(1)01(sK(s))τ0(s)ds+o(1)]andA2(x,y)=4(W2(x,y)+(mψ(x,y))2)M2p(x)M12f(x).

Remark 3.2

  1. Note that, for sufficiently large T, the expression of MSE becomes A12hT2+A2(x,y)(hT). This result generalises the one in Chaouch and Laïb (Citation2019) established in the framework of real-valued continuous-time processes. Note, however that, for finite-dimensional continuous-time processes with MAR response, the bias term obtained in Chaouch and Laïb (Citation2019) is of order hT2 which is smaller than hT given in Proposition 3.3. The increase in the bias term is because of the infinite dimensional characteristic of the functional space.

  2. The mean squared error can be used as a theoretical guidance to select the ‘optimal’ bandwidth by minimising the quantity A12hT2+A2(x,y)(hT) with respect to hT. However, A1 and A2(x,y) depend on some unknown quantities which should be replaced by their empirical consistent estimators, namely Ψy,T(0),(Mj,T)j=1,2,τ0,T,pT,W2,T, and fT. Note that Ψy(0) may be viewed as real regression function with response variable mψ(X,y)mψ(x,y) and predictor d(X,x). It may be then estimated by a kernel regression estimate Ψy,T(0) by replacing mψ(x,y) by its estimator m^ψ,T(X,y).

3.3. Asymptotic normality

The following theorem establishes the asymptotic distribution of the estimator.

Theorem 3.5

Assume that conditions (A1)–(A3) are fulfilled. Suppose that, for β as defined in (A3)(ii), the following conditions hold true: (12) limT(hT)=,hTβ(hT)=o(1)andhTβlogT1/2=o(1)asT.(12) Then, for any (x,y)E×S such that f(x)>0, we have (hT)(m^ψ,T(x,y)mψ(x,y))dN(0,σ2(x,y)), where (13) σ2(x,y)1f(x)M2M12p(x)W¯2(x,y)=:1f(x)V~(x,y)asT(13) and W¯2(x,y):=E[(ψ(Y,y)mψ(x,y))2|X=x].

Note that the statement (Equation13) gives only an upper bound of the asymptotic variance σ2(x,y). The following proposition gives an estimate of V~(x,y) that will be needed to construct confidence intervals for the unknown operator mψ(x,y).

Proposition 3.6

Suppose conditions of Theorem 3.5 hold and σ2(x,y)>0, then (14) V^T(x,y):=MT,2M1,TW¯2,T(x,y)TFx,T(hT)pT(x),(14) is a consistent estimator for V~(x,y). The quantities M1,T, M2,T, W¯2,T, pT(x) and Fx,T are empirical versions of M1, M2, W¯2, p(x) and Fx, respectively.

M1,T and M2,T are calculated by replacing τ0, given in (A2)(iv), by its empirical version τ0,T(u)=Fx,T(uhT)Fx,T(hT)whereFx,T(u)=1T0T1l{d(x,Xt)u}dt. On the other hand W¯2,T(x,y) and pT(x) are given by W¯2,T(x,y)=0Tζt(ψ(Yt,y))2Δt(x)dt0TζtΔt(x)dt(m^ψ,T(x,y))2andpT(x)=0TζtΔt(x)dt0TΔt(x)dt.

3.4. Continuous-time confidence intervals

Using the non-decreasing property of the cumulative standard Gaussian distribution function, the estimator V^T(x,y), defined in (Equation14), with the help of Proposition 3.6 and Theorem 3.5, the following corollary provides estimated confidence intervals for mψ(x,y) at any x fixed.

Corollary 3.7

Assume conditions of Theorem 3.5 are fulfilled and the conditions in (Equation12) are replaced by (15) limTTFx,T(hT)=andlimThTβTFx,T(hT)=0.(15) Then, for any 0<α<1, the (1α) confidence intervals for mψ(x,y) are (16) I1α(x,y)=[m^ψ,T(x,y)cα/2V^T(x,y);m^ψ,T(x,y)+c1α/2V^T(x,y)],(16) where cα is the αth quantile of the standard normal distribution.

These intervals are similar to those given in Remark 2 in Laïb and Louani (Citation2010) for discrete-time ergodic context with complete data, and those obtained in Ling et al. (Citation2015) for discrete-time stationary ergodic data with missing at random response.

3.5. Sampling schemes and computation of the confidence intervals

In the previous section, the process was supposed to be observable over [0,T]. However, in practice the data are often collected according to a sampling scheme since it is difficult to observe a path continuously at any time t over the interval [0,T]. Hereafter, we briefly discuss the effect of a sampling scheme on the construction of confidence intervals for the regression function mψ(x,y). Assume that the data are sampled, either regularly, irregularly or even randomly, from an underlying continuous-time process at instants (tk)k=1,,n. For a sake of simplicity, we consider here the case where the instants (tk) are irregularly spaced, that is inf1kn|tk+1tk|=δ>0. Now, for k{1,n}, we define the following increasing families of σ-algebra: Fk:=Ftk=σ((Xt1,Yt1),,(Xtk,Ytk)) and Gk:=Gtk=σ((Xt1,Yt1),,(Xtk,Ytk);Xtk+1). The purpose then consists in estimating mψ(x,y) given the discrete-time ergodic stationary process (Xtk,Ytk,ζtk)1kn sampled from the underlying continuous-time process {(Xt,Yt,ζt)}0tT. In case of a regular sampling scheme, that is T=, the regression function is mψ(x,y):=E(ζtkψ(Ytk,y)|Xtk=x)E(ζ|Xtk=x),1kn and its estimator m^ψ,T(x,y) defined in (Equation3) becomes (17) m^ψ,n(x,y)=k=1nζtkψ(Ytk,y)K(d(x,Xtk)hn)k=1nζtkK(d(x,Xtk)hn),tk=(1kn).(17) Note that Theorem 3.5 holds for the estimate m^ψ,n(x,y) when replacing T by . The limiting law is a Gaussian random variable with mean zero and variance function σ2(x,y)=1f(x)M2M12p(x)W¯2(x,y). Making use of Corollary 3.7 and considering similar steps as in Laïb and Louani (Citation2010), it follows that, for any 0<α<1, the (1α) asymptotic confidence intervals of mψ(x,y) are (18) m^ψ,n(x,y)±c1α/2Mn,2Mn,1W¯n,2(x,y)nFx,n(x)pn(x),asn,(18) where c1α/2 is the quantile of standard normal distribution.

4. Simulation study

This section aims to discuss numerically some aspects related to continuous-time processes that might affect the quality of estimation of the operator mψ(x,y). Here we consider ψ(Yt,y)=Yt, therefore mψ(x,y)=m(x)=E(Yt|Xt=x), where m(x) is the conditional expectation of Yt given Xt=x. The first simulation aims to compare the quality of estimation of m(x) based on the continuous-time and discrete-time processes. In the second simulation, we discuss the choice of the ‘optimal’ sampling mesh δ in the case of continuous-time processes and assess its sensitivity to the missing at random mechanism. Finally, the third simulation discusses the effect of the MAR rate on the coverage rate and length of the estimated confidence intervals.

4.1. Simulation 1: continuous-time versus discrete-time estimators

In this first simulation, we try to compare the estimation of the regression operator when discrete- and continuous-time processes are considered. We want to know whether considering a continuous-time processes may improve the quality of the predictions or not. We suppose that the functional space E=L2([1,1]) endowed with its natural norm. The generation of continuous-time processes ({Xt(s):s[1,1]},Yt)t[0,T] is obtained by considering the following steps:

  1. First, we simulate an Ornstein–Uhlenbeck (OU) process (Zt)t0 solution of the following stochastic differential equation: (19) dZt=2(5Zt)dt+7dWt,(19) where Wt denotes a Wiener process. Here, we take dt=0.005.

  2. Let Γ() be the operator mapping R into L2([1,1]) defined, for any zR, as follows: Γ(z):=(1+zz)Pnum(z)+(zz)Pnum(z+1), where Pj is the Legendre polynomials of degree j and num(z):=1+2zsign(z)sign(z)(1+sign(z))/2 and denotes the floor function.

  3. We consider that curves are sampled at 400 equispaced values in [1,1] and defined, for any t[0,T], as Xt(s)=Γ(Zt)(s),s[1,1].

  4. To generate the real-valued process (Yt)t[0,T], the following nonlinear functional regression model is considered: (20) Yt=m(Xt)+εt,(20) where m(x):=11x2(s)ds, εt=UtUt1 where Ut is a Wiener process independent of Xt.

Observe that the OU process {Zt:t[0,T]} is a real-valued continuous-time process (since dt tends to zero). The operator Γ() has a role to transform each observation in the process Zt into a curve through the Legendre polynomials. In such way, the functional variable X is generated continuously as is the process (Zt). Moreover, note that steps 1, 2 and 3 are devoted to simulate the continuous-time functional process {Xt(s):s[1,1]}t[0,T], whereas in step 4 the real-valued continuous-time process (Yt)t[0,T] is generated. A sample of 20 simulated curves is displayed in Figure (left) and an example of the real-valued process (Yt) is given in Figure (right).

Figure 1. Left: A sample of 20 simulated curves {Xt(s):s[1,1]}. Right: A realisation of the process (Yt)t[0,200].

Figure 1. Left: A sample of 20 simulated curves {Xt(s):s∈[−1,1]}. Right: A realisation of the process (Yt)t∈[0,200].

Now, our purpose is to compare, in terms of estimation accuracy, the continuous-time estimator with the discrete-time one for different values of T = 50, 200, 1000 and several missing at random rates. It is worth noting that the continuous-time process (Xt,Yt) is observed at every instant t=δ,2δ,,, where δ=0.005 and n=T/δ. However, the discrete-time process is observed only at the instants t=1,2,,n.

As in Ferraty et al. (Citation2013) and Ling et al. (Citation2015), we consider that the missing at random mechanism is led by the following probability distribution: (21) p(x)=P(ζt=1|Xt=x)=expit(11x2(s)ds),(21) where expit(u)=eu/(1+eu), for uR. Now, we specify the tuning parameters on which depend our estimator given in (Equation3). We choose the quadratic kernel defined as K(u)=34(1u2)1l(0,1)(u) and because curves are smooth enough we choose as semi-metric the L2-norm of the second derivatives of the curves, that is for t1t2, (22) d(Xt1,Xt2)=(11[Xt1(2)(s)Xt2(2)(s)]2ds)1/2.(22) We used the local cross-validation method on the κ-nearest neighbours introduced in Ferraty and Vieu (Citation2006) page 116 to select the optimal bandwidth for both discrete- and continuous-time regression estimators. The accuracy of the discrete- and continuous-time regression estimators is evaluated over M = 500 replications. The accuracy is measured, at each replication j=1,,M, by using the squared errors SETj:=(m^Tj(x)m(x))2 and SEnj:=(m^nj(x)m(x))2 for the continuous-time and discrete-time estimators, respectively. Observe that the discrete-time estimator of the regression operator is defined as m^n(x):=t=1nζtYtΔt(x)t=1nζtΔt(x).

To get a better idea about the variability of the errors, Table  summarises the distribution of the squared errors (multiplied by 102) (SETj)j=1,,M and (SEnj)j=1,,M. It shows that continuous-time regression estimator is more accurate than the discrete-time one. Moreover, when T increases the squared errors decrease faster when working with the continuous-time process.

Table 1. Summary statistics of (SEj)j=1,,500 for discrete- and continuous-time estimators of the regression function.

4.2. Simulation 2: optimal sampling mesh selection

The purpose of this simulation is to investigate another aspect related to continuous-time processes. The selection of the ‘optimal’ sampling mesh is one of the most important topics in continuous-time processes.

First of all, we generate a continuous-time functional data process according to the following equation: Xt(s)=Zt(1sin(sπ/3)),s[0,π/3]andt[0,T], where Zt is an OU process solution of the stochastic differential equation (Equation19) and practically observed at the instants t=δ,2δ,, with n = 200 fixed. Here, we take different values of sampling mesh δ, calculate the corresponding empirical version of the Mean Integrated Square Error (MISE(δ)) and identify the optimal mesh, say δ, that minimises MISE(δ). Note that each curve observed at the instant t is discretised at 100 equidistant points over the interval [0,π/3]. The response variable is obtained following the hereafter nonlinear functional regression model (Equation20), where the operator m() is defined as m(Xt)=(0π/3Xt(s)ds)2andεtN(0,0.075). Moreover, the missing at random mechanism in this simulation is also supposed to be the same as described in the first simulation as per Equation (Equation21). For the tuning parameters used to build the estimator, we considered the quadratic kernel and given the shape of the true regression operator, which depends on the first derivative of the functional predictor, the Euclidean distance between the first-order derivatives of the curves is adopted as a semi-metric. Finally the bandwidth is selected according to the local cross-validation method based on the κ-nearest neighbours as detailed in Ferraty and Vieu (Citation2006, p. 116).

For each value of sampling mesh δ, the regression operator m() is estimated over a grid of 50 different fixed curves and the whole procedure is repeated over M = 500 replications. Finally, the empirical MISE is calculated, for each sampling mesh δ, according to the following equation: MISE(δ):=1Mk=1M150j=150(m(xj)m^n,δk(xj))2. Observe that m^n,δk() is the estimator of m(), obtained at the kth iteration, depends on the sampling mesh δ, so is the MISE.

Figure  displays the values of MISE(δ) obtained for different values of sampling mesh δ and missing at random rate of 10%, 50% and 0% (complete data), respectively. One can observe that higher is the missing at random rate, higher are the errors in estimating the regression operator.

Figure 2. The MISE(δ) obtained for different values of sampling mesh δ and several missing at random rates.

Figure 2. The MISE(δ) obtained for different values of sampling mesh δ and several missing at random rates.

Table  reports the optimal sampling mesh δ, which minimises MISE(δ), for different missing at random rates. It also provides some summary statistics to have an idea about the distribution of MISE(δ) for several values of δ. One can observe, from Table , that higher is the missing at random rate longer we need to observe the underlying process to collect the n = 200 observations to be able to reasonably estimate the regression operator. Indeed, when the data is complete the optimal time interval T=nδ=200×0.3=60. However, when MAR=10% (resp. 50%) the optimal time interval is equal to T=200×0.36=72 (resp. T=200×0.38=76). Consequently, it can be concluded that when the missing at random mechanism is heavily affecting the response variable, we need to collect data over a longer period of time. This allows to get sufficient information about the dynamic of the underlying continuous-time process and therefore get a better estimate of the regression operator.

Table 2. The optimal sampling mesh (δ) obtained for different MAR rates and some summary statistics of the MISE(δ).

4.3. Simulation 3: asymptotic confidence intervals

In this section, we are interested in evaluating the coverage rate, as well as the length, of the asymptotic confidence intervals given in (Equation16). The effect of the sampling mesh on the coverage rate will also be discussed numerically. Since this paper aims to extend results in Delsol (Citation2009) about confidence intervals to continuous-time functional data, we consider the same simulation framework.

Let Xt(s)=cos(Zt+π(2s1)),s[0,1]andt[0,T], where Zt is an OU process solution of the stochastic differential equation (Equation19) observed at the instants t=δ,2δ,, with n = 100, 200 fixed. Here, for comparison purpose, we consider two sampling mesh δ=0.1,0.7. The regression operator is defined as m(x)=12π1/23/4(x(s))2ds, while the errors {εt} are independent centred normal random variable with variance 0.1×sn2, where sn2 is the empirical variance of {m(X1),,m(Xn)}. Because the regression operator is defined as a function of the derivative of the functional random variable, the appropriate semimetric to be used in such case is based on the first derivative of the curve (see (Equation22)). Moreover, the quadratic kernel is used to perform this simulation. The optimal bandwidth is selected based on local cross-validation method on the κ-nearest neighbours. The missing at random rate is simulated according the conditional probability distribution given in (Equation21).

For a fixed α(0,1), the asymptotic (1α)-confidence intervals for m(x) with xΞ are computed and compared for several values of sample size n and sampling mesh δ. Here Ξ:={x1,,xnΞ} is a grid of nΞ=50 independently simulated curves where the regression operator is estimated. For every fixed curve xΞ, a number of M = 500 replications is considered to approximate the coverage rate. In this simulation 1α=0.95,0.90 were considered.

As expected, Table  shows that the average coverage rate varies with the sample size n, the sampling mesh and the MAR Rate. Higher are the sample size and the sampling mesh and smaller is the MAR rate closer will be the average coverage rate to 1α. Moreover, one can also observe that the asymptotic confidence intervals length decreases when the sample size increases and the MAR rate decreases.

Table 3. Average coverage over the grid Ξ and average confidence interval length appears in brackets.

Figure  (resp. Figure ) displays an example of asymptotic confidence intervals obtained for the 50 curves in the testing sample when n = 100, the MAR rate = 0%, 25%, 45%, δ=0.7 and 1α=0.95 (resp. 1α=0.9). One can observe that the coverage rate decreases with an increase in the MAR rate. Similar results are also obtained when δ=0.1.

Figure 3. Asymptotic 95% confidence intervals when δ=0.7 and n = 100. Red dot represents the true regression function value at any fixed curve xΞ and the black dot is its estimation. The vertical lines represent the confidence intervals.

Figure 3. Asymptotic 95% confidence intervals when δ=0.7 and n = 100. Red dot represents the true regression function value at any fixed curve x∈Ξ and the black dot is its estimation. The vertical lines represent the confidence intervals.

Figure 4. Asymptotic 90% confidence intervals when δ=0.7 and n = 100. Red dot represents the true regression function value at any fixed curve xΞ and the black dot is its estimation. The vertical lines represent the confidence intervals.

Figure 4. Asymptotic 90% confidence intervals when δ=0.7 and n = 100. Red dot represents the true regression function value at any fixed curve x∈Ξ and the black dot is its estimation. The vertical lines represent the confidence intervals.

5. Applications to real data

5.1. Application 1: prediction of financial asset returns

In Financial market, despite the modern technology, which allows to collect data at a very fine time scale, financial data can still be missing. For instance, there are some regular holidays, such as Thanksgiving Day and Christmas, for which stock price data are missing. There are many other technical reasons (such as breakdown in devises recording data, computers' sudden shutdowns, …) that make stretches of data missing.

This section aims to assess the performance of the estimator proposed in this paper on missing at random financial functional time series. The International Business Machine cooperation (IBM) asset price is considered as the response variable and the Standard & Poor's 500 (SP500) stock market index as predictor. While the IBM asset price is observed at a daily frequency from 24 March 2016 to 28 September 2016, the SP500 is observed every minute during the same period. Note that the daily trading activity lasts for 7 hours excluding the weekend. Since in this paper, we are interested in stationary processes, a first-order differentiation of the IBM daily asset price and the SP500 stock market index was considered to make the original time series stationary.

Our sample here can be denoted as follows: (Xd,Yd)d=1,129, where the sample size n = 129 is the total number of trading days from 24 March 2016 to 28 September 2016 after first differentiation of the original time series, Yd=ΔIBMd, Xd={Xd(s):=ΔSP500d(s):1s420}. Originally, the data are completely observed. Therefore, to validate our estimator, we artificially create missing observations. We assume here that the missing data are generated according to the conditional probability distribution given in (Equation21). We split the original sample into training and testing subsets. Our purpose then is to predict the IBM asset price in the testing subset using the regression operator. Three MAR rates 0% (complete data), 20% and 45% were considered to test the performance of the estimator in terms of prediction. Similarly, as in the simulation section, we considered here the quadratic kernel and the bandwidth was selected using the cross-validation method on the κ-nearest neighbours. For the semi-metric, because the curves are not smooth (as it can be seen in Figure , right panel) we use the PCA-semi-metric, say d4PCA(,), based on the projection on the four eigenfunctions, v1(),,v4(), associated to the four largest eigenvalues of the empirical covariance operator of the functional predictor X: d4PCA(Xt1,Xt2)=k=14(0420(Xt1(s)Xt2(s))vk(s)ds)2. As criteria to measure the accuracy of the estimator in predicting 30 observations in the testing subset, we considered the Absolute Error: AEt:=|Ytmn(Xt)| for t=1,,30. Figure  displays the distribution of the absolute errors obtained when MAR rate is 0%,20% and 45%, respectively. One can clearly observe the effect of the MAR rate on the quality of the prediction. Higher is the MAR rate lower is the quality of prediction.

Figure 5. Left: First-order differentiated IBM asset price. Right: First-order differentiated SP500 intraday (minute frequency) stock market index curves.

Figure 5. Left: First-order differentiated IBM asset price. Right: First-order differentiated SP500 intraday (minute frequency) stock market index curves.

Figure 6. The (AE)t=1,,30 obtained for different values of missing at random rates.

Figure 6. The (AE)t=1,…,30 obtained for different values of missing at random rates.

Moreover, we build a 95% prediction interval for the IBM asset price in the testing subset. Figure  shows that the coverage rate is sensitive to the percentage MAR data in the training subset.

Figure 7. A 95% prediction interval of the IBM asset price in the testing subset. Red dot represents the true values of IBM asset price and the black dot is their prediction using the regression operator. The vertical lines represent the prediction intervals.

Figure 7. A 95% prediction interval of the IBM asset price in the testing subset. Red dot represents the true values of IBM asset price and the black dot is their prediction using the regression operator. The vertical lines represent the prediction intervals.

5.2. Application 2: Daily peak electricity demand imputation

By accurately predicting household peak load, utility companies can better balance the overall electricity demand and supply. This information helps in optimising power generation and distribution, ensuring a stable and reliable electricity grid. It also allows to plan for peak demand periods and avoid potential blackouts or overloading of the grid. Moreover, predicting peak loads empowers consumers with information about their electricity consumption patterns. Thus, with this knowledge, households can make informed decisions to manage their energy usage more effectively, reduce electricity bills and contribute to energy conservation efforts. Furthermore, peak load predictions enable demand response programs, where utility companies offer incentives for consumers to adjust their energy consumption during peak periods, thereby reducing strain on the grid.

For these reasons, electricity companies deployed smart meters to replace the mechanical one. This new generation of smart meters allows to record the electricity demand of any household at very fine time scale and send it to the information system. The transmission of the information from the smart meter towards the information system goes usually through WIFI or optical fibre networks which are significantly dependent on the weather conditions, among several other factors. Therefore, the calculation of the daily peak electricity demand might be subject to missing at random mechanism due to bad weather conditions.

Figure 8. Daily peak electricity demand of a household containing 10% missing data.

Figure 8. Daily peak electricity demand of a household containing 10% missing data.

Figure 9. Intraday temperature curves. Coloured curves are for the missing peak load days.

Figure 9. Intraday temperature curves. Coloured curves are for the missing peak load days.

Figure 10. Daily peak electricity demand process. Red dots represent values of imputed missing data.

Figure 10. Daily peak electricity demand process. Red dots represent values of imputed missing data.

Figure 11. 95% confidence intervals for the imputed values.

Figure 11. 95% confidence intervals for the imputed values.

Figure  displays the daily peak load {Ytk}k=1,,1009 obtained from a household smart meter from 24 September 1996 to 29 June 1999 (leading to a total of n = 1009 days). The original data contains 10% of missing observations. Here, we assume that the intraday (3-hour frequency) temperature curve {Xtk(s):s=3,6,,24} explains the missingness mechanism in the daily peak demand. Figure  displays the intraday, 3-hour frequency, temperature curves. Our purpose in this application is to impute the missing data in the peak demand process using the initial estimator of the regression operator m^n(x):=m^ψ,n(x,y) defined in (Equation17) with ψ(Ytk,y)=Ytk. Figure  displays the imputed peak electricity demand process obtained according to the following formula: Y~tk=δtkYtk+(1δtk)m^n(Xtk). If Ytk is observed (that is δtk=1), then Y~tk=Ytk, otherwise Ytk is missing (i.e. δtk=0) and will be imputed by m^n(Xtk). The red dots in Figure  represent the imputed values of the missing observations in the peak electricity demand process. Figure  shows the 95% confidence intervals around the missing values of the peak load.

6. Discussion of a special case: conditional quantiles

Let xE be fixed and yR, then if ψ(Y,y)=1l{],y]}(Y) the operator mψ(x,y) is the conditional cumulative distribution function (df) of Y given X = x, namely F(y|x)=P(Yy|X=x) which may be estimated by F^T(y|x):=m^ψ,T(x,y). For a given α(0,1), the αth -order conditional quantile of the distribution of Y given X = x is defined as qα(x)=inf{yR:F(y|x)α}.

Notice that, whenever F(|x) is strictly increasing and continuous in a neighbourhood of qα(x), the function F(|x) has a unique quantile of order α at a point qα(x), that is F(qα(x)|x)=α. In such case qα(x)=F1(α|x)=inf{yR:F(y|x)α}, which may be estimated uniquely by q^T,α(x)=F^T1(α|x). Conditional quantiles have been widely studied in the literature when the predictor X is of finite dimension, see for instance, Gannoun et al. (Citation2003) and Ferraty et al. (Citation2005) for dependent functional data.

(a) Almost sure pointwise and uniform convergence

Under the same conditions of Theorem 3.1, the statement (Equation9) still holds for the estimator of the cumulative conditional distribution function F^T(y|x). That is F^T(α|x) converges, almost surely, towards F(y|x) with a rate O(hTβ)+O(logT/((hT))).

Consequently, since F(qα(x)|x)=α=F^T(q^T,α(x)|x) and F^T(|x) is continuous and strictly increasing, then we have ϵ>0,∃η(ϵ)>0,∀y,|F^T(y|x)F^T(qα(x)|x)|η(ϵ)|yqα(x)|ϵ which implies that, ϵ>0,∃η(ϵ)>0, (23) P(|q^T,α(x)qα(x)|η(ϵ))P(|F^T(q^T,α(x)|x)F^T(qα(x)|x)|η(ϵ))=P(|F(qα(x)|x)F^T(qα(x)|x)η(ϵ)).(23) Therefore, the statement (Equation9) still holds for the conditional quantile estimator q^T,α(x) whenever conditions of Theorem 3.1 are satisfied. Ferraty et al. (Citation2005) derived similar pointwise convergence rate by inverting the estimator of the conditional cumulative distribution function. Their result has been obtained under mixing condition and additional assumptions on the joint distribution, and the Lipschitz condition on F(y|x) and its derivatives with respect to y.

Regarding the almost sure uniform convergence, observe that under conditions of Theorem 3.2, the statement (Equation11) still holds true for the supySsupxC|F^T(y(x)FT(y(x)|, when ψ(Y,y) is replaced by 1l{],y]}(Y). Moreover, assume that, for fixed x0C, F(y|x0) is differentiable at qα(x0) with yF(y|x0)|y=qα(x0):=g(qα(x0)|x0)>ν>0, where ν is a real number, and g(|x) is uniformly continuous for all xC. Knowing that F^T(q^T,α(x)|x)=F(qα(x)|x)=α and making use of a Taylor's expansion of the function F(q^T,α(x)|x) around qα(x), we can write (24) F(q^T,α(x)|x)F(qα(x)|x)=(q^T,α(x)qα(x))g(qT,α(x)|x)(24) where qT,α(x) lies between qα(x) and q^T,α(x). It follows then from (Equation24) that the inequality (Equation23) still holds true uniformly in x and y. Moreover, the fact that q^T,α(x) converges a.s. towards qα(x) as T goes to infinity, combined with the uniformly continuity of g(|x), allow to write that (25) supxC|q^T,α(x)qα(x)|supxC|g(qα(x)|x)|=Oa.s.(supySsupxC|F^T(y|x)F(y|x)|).(25) Since g(qα(x)|x) is uniformly bounded from below, we can then claim that the estimator q^T,α(x) converges uniformly towards qα(x) with the same convergence rate given in (Equation11), as T goes to infinity.

(b) Continuous-time confidence intervals

Confidence intervals for the conditional quantiles qα(x) may be obtained according to the following steps. First, consider a Taylor's expansion of F^T(|x) around qα(x) and making use of the fact that q^T,α(x) converges a.s. towards qα(x) as T goes to infinity, one gets (26) q^T,α(x)qα(x)=1g^T(qα(x)|x)(F^T(qα(x)|x)F(qα(x)|x)),(26) where g^T(|x) is a consistent estimator of g(|x). Then, replacing ψ(Y,y) by the indicator function, we get under conditions of Corollary 3.7, the following (1α) confidence intervals for qα(x): (27) q^T,α(x)±c1α/2MT,2MT,1g^T(q^T,α(x)|x)α(1α)TFx,T(hT)pT(x),asT.(27)

Supplemental material

Supplemental Material

Download PDF (340.9 KB)

Acknowledgments

Open Access funding provided by the Qatar National Library.

We thank the editor, associate editor and the two referees for their valuable and constructive comments which helped improve the manuscript substantially.

Disclosure statement

No potential conflict of interest was reported by the author(s).

Notes

1 For any 0s<tT such that tsα0, there exists a non-negative continuous random function ft,s(ω,x) such that sups0,ωΩ|ft,s(ω,x)|bs,α0(x)p.s. where bs,α0(x) is a deterministic function.

References

  • Andrews, D.W.K. (1984), ‘Non-strong Mixing Autoregressive Processes’, Journal of Applied Probability, 21, 930–934.
  • Bosq, D. (1998), Nonparametric Statistics for Stochastic Processes: Estimation and Prediction, Lecture Notes in Statistics, Vol. 110 (2nd ed.), New York: Springer-Verlag.
  • Bouzebda, S., and Didi, S. (2017), ‘Asymptotic Results in Additive Regression Model for Strictly and Ergodic Continuous Times Processes’, Communications in Statistics-Theory and Methods, 46(5), 2454–2493.
  • Chaouch, M., and Laïb, N. (2019), ‘Optimal Asymptotic MSE of Kernel Regression Estimate for Continuous Time Processes with Missing At Random Response’, Statistics and Probability Letters, 154, 108532.
  • Chaouch, M., and Laïb, N. (2023), ‘Supplement to “Regression estimation for continuous time functional data processes with missing at random response”’.
  • Cheng, P.E. (1994), ‘Nonparametric Estimation of Mean Functionals with Data Missing at Random’, Journal of the American Statistical Association, 89, 81–87.
  • Cheridito, P., Kawaguchi, H., and Maejima, M. (2003), ‘Fractional Ornstein–Uhlenbeck Processes’, Electronic Journal of Probability, 8(3), 1–14.
  • Chesneau, C, and Maillot, B. (2014), ‘Superoptimal Rate of Convergence in Nonparametric Estimation for Functional Valued Processes’, International Scholarly Research Notices, 2014, 264217.
  • Chu, C.K., and Cheng, P.E. (2003), ‘Nonparametric Regression Estimation with Missing Data’, Journal of Statistical Planning and Inference, 48, 85–99.
  • de la Peña, V.H., and Giné, E. (1999), Decoupling: From Dependence to Independence, Probability and Its Applications, New York: Springer-Verlag.
  • Delsol, L. (2009), ‘Advances on Asymptotic Normality in Non-parametric Functional Time Series Analysis’, Statistics, 43, 13–33.
  • Didi, S., and Louani, D. (2014), ‘Asymptotic Results for the Regression Function Estimate on Continuous Time Stationary Ergodic Data’, Journal Statistics & Risk Modeling, 31(2), 129–150.
  • Doukhan, P. (2018), Stochastic Models for Time Series, New York: Springer.
  • Doukhan, P., and Louhichi, S. (1999), ‘A New Weak Dependence Condition and Applications to Moment Inequalities’, Stochastic Processes and Their Applications, 84, 313–342.
  • Efromovich, S. (2011), ‘Nonparametric Regression with Responses Missing at Random’, Journal of Statistical Planning and Inference, 141, 3744–3752.
  • Ferraty, F., Laksaci, A., Tadj, A., and Vieu, P. (2010), ‘Rate of Uniform Consistency for Nonparametric Estimates with Functional Variables’, Journal of Statistical Planning and Inference, 140, 335–352.
  • Ferraty, F., Mas, A., and Vieu, P. (2007), ‘Nonparametric Regression on Functional Data: Inference and Practical Aspects’, Australian & New Zealand Journal of Statistics, 49(3), 267–286.
  • Ferraty, F., Rabhi, A., and Vieu, P. (2005), ‘Special Issue on Quantile Regression and Related Methods’, Sankhyà : The Indian Journal of Statistics, 67(2), 378–398.
  • Ferraty, F., Sued, M., and Vieu, P. (2013), ‘Mean Estimation with Data Missing at Random for Functional Covariables’, Statistics, 47(4), 688–706.
  • Ferraty, F, and Vieu, P. (2006), Nonparametric Modelling for Functional Data, Methods, Theory, Applications and Implementations, London: Springer-Verlag.
  • Francq, C., and Zakoïan, J.M. (2010), GARCH Models: Structure, Statistical Inference and Financial Applications, John Wiley and Sons Ltd.
  • Gannoun, A., Saracco, J., and Yu, K. (2003), ‘Nonparametric Prediction by Conditional Median and Quantiles’, Journal of Statistical Planning and Inference, 117, 207–223.
  • Giraitis, L., and Leipus, R. (1995), ‘A Generalized Fractionally Differencing Approach in Long-memory Modeling’, Lithuanian Mathematical Journal, 35(1), 53–65.
  • González-Manteiga, W., and Pérez-González, A. (2004), ‘Nonparametric Mean Estimation with Missing Data’, Communications in Statistics-Theory and Methods, 33(2), 277–303.
  • Guégan, D., and Ladoucette, S. (2001), ‘Non-mixing Properties of Long Memory Processes’, Comptes Rendus De L'Académie Des Sciences – Series I – Mathematics, 333(1), 373–376.
  • Hall, P., and Heyde, C. (1980), Martingale Limit Theory and Its Application, New York: Academic Press.
  • Laïb, N., and Louani, D. (2010), ‘Nonparametric Kernel Regression Estimation for Functional Stationary Ergodic Data: Asymptotic Properties’, Journal of Multivariate Analysis, 101(10), 2266–2281.
  • Laïb, N., and Louani, D. (2011), ‘Rates of Strong Consistencies of the Regression Function Estimator for Functional Stationary Ergodic Data’, Journal of Statistical Planning and Inference, 141(1), 359–372.
  • Liang, H., Wang, S., and Carroll, R.J. (2007), ‘Partially Linear Models with Missing Response Variables and Error-prone Covariates’, Biometrika, 94(1), 185–198.
  • Ling, N., Liang, L., and Vieu, P. (2015), ‘Nonparametric Regression Estimation for Functional Stationary Ergodic Data with Missing At Random’, Journal of Statistical Planning and Inference, 162, 75–87.
  • Little, R.J.A., and Rubin, D.B. (2002), Statistical Analysis with Missing Data (2nd ed.), New York: John Wiley.
  • Maillot, B. (2008), ‘Propriétś Asymptotiques de Quelques Estimateurs Non-paramétriques Pour des Variables Vectorielles et Fonctionnelles’, Thése de Doctorat de l'Université Paris 6.
  • Maslowski, B., and Pospíšil, P. (2008), ‘Ergodicity and Parameter Estimates for Infinite-dimensional Fractional Ornstein–Uhlenbeck Process’, Applied Mathematics and Optimization, 57, 401–429.
  • Nittner, T. (2003), ‘Missing At Random (MAR) in Nonparametric Regression, a Simulation Experiment’, Statistical Methods and Applications, 12, 195–210.
  • Sikov, A. (2018), ‘A Brief Review of Approaches to Non-ignorable Non-response’, International Statistical Review, 86, 415–441.
  • Tsiatis, A. (2006), Semiparametric Theory and Missing Data, New York: Springer.

Appendix. Proofs of main results

In this section and for sake of simplification will denote m^T(x,y) and m(x,y) for m^ψ,T(x,y) and mψ(x,y), respectively, and ψy(Yt) for ψ(Yt,y). Consider now the following quantities: (A1) QT(x,y):=(m^T,2(x,y)m¯T,2(x,y))m(x,y)(m^T,1(x)m¯T,1(x)),(A1) (A2) RT(x,y):=BT(x,y)(m^T,1(x)m¯T,1(x)).(A2) We have then (A3) m^T(x,y)m(x,y)=m^T(x,y)CT(x,y)+BT(x,y)=BT(x,y)+QT(x,y)+RT(x,y)m^T,1(x).(A3) We start first by stating some technical lemmas that will be used later.

Lemma A.1

Assume that assumptions (A1)–(A2) are satisfied, then we have for any j1 and 1

  1. 1ϕ(hT)E(Δt(x)|Fj2)=Mft,Tj2(x)+Oa.s.(gt,Tj2,x(hT)ϕ(hT)),

  2. 1ϕ(hT)E(Δt(x))=Mf(x)+o(1).

Proof.

The proof is similar to the proof of Lemma 1 of Laïb and Louani (Citation2010).

Lemma A.2

Let (Zn)n1 be a sequence of real martingale differences with respect to the sequence of σ-fields (Fn=σ(Z1,Zn))n1 where σ(Z1,,Zn) is the sigma-field generated by the random variables Z1,,Zn. Set Sn=i=1nZi. For any p2 and any n1, assume that there exist some nonnegative constants C and dn such that E(Znp|Fn1)Cp2p!dn2 almost surely. Then, for any ϵ>0, we have P(|Sn|>ϵ)2exp{ϵ22(Dn+)}, where Dn=i=1ndi2.

Proof.

See Theorem 8.2.2 of de la Peña and Giné (Citation1999).

Proof

Proof of Theorem 3.4

From (EquationA3) and Lemma 1.2 (in Chaouch and Laïb Citation2023), we have for T large enough (A4) MSE(x,y)=E(m^T(x,y)m(x,y))2E(BT(x,y)+QT(x,y)+RT(x,y)p(x))2E(BT2(x,y))+1p2(x)[E(QT2(x,y))+E(RT2(x,y))],(A4) where the products 2E[BT(x,y)(QT(x,y)+RT(x,y))] and 2E[QT(x,y)×RT(x,y)] have been ignored because by the Cauchz–Schwarz inequality E[BT(x,y)(QT(x)+RT(x,y))]E(BT2(x,y))1/2×E((QT(x,y)+RT(x,y))1/2max{E(BT2(x,y)),E[(QT(x,y)+RT(x,y))2]}. We have the same inequality for the second product. The proof of Theorem 3.4 results from Proposition 3.3 and Lemma A.3 below, which gives an upper bound of the expectation of QT2(x,y) and RT2(x,y), respectively.

Lemma A.3

Assume that (A1)–(A3) hold true, then we have (A5) E(QT2(x,y))4p(x)(W2(x,y)+(m(x,y))2)M2(hT)M12f(x).(A5)

Proof.

Ignoring the product term as above, one may write E[QT2(x,y)]E[m^T,2(x,y)m¯T,2(x,y)]2+m2(x,y)E[m^T,1(x)m¯T,1(x)]2:=IT,1+m2(x,y)IT,2. The terms IT,1 and IT,2 can be handled similarly. Let us now evaluate the first one. Since (Tj=)0jn is a δ-partition of [0,T], we have (A6) m^T,2(x,y)m¯T,2(x,y)=1nE(Z1(x))j=1nTj1Tj[ζtψy(Yt)Δt(x)E{ζtψy(Yt)Δt(x)|Ftδ|}]dt:=1nE(Z1(x))j=1nLT,j(x,y).(A6) Since (LT,j(x,y))j1 is a sequence of martingale differences with respect to the family (Fj1)j1, then

E(LT,j(x,y)×LT,k(x,y))=0 for every j,k{1,n} such that jk. Therefore (by ignoring the product term), we have (A7) IT,1=E[m^T,2(x,y)m¯T,2(x,y)]21n2(E(Z1(x)))2j=1nE(LT,j(x,y))2.(A7) Using Jensen inequality and a double conditioning with respect to Stδ,δ combined with (A3)(iii) –(iv), IT,1 may bounded as IT,14n2(E(Z1(x)))2j=1nTj1TjE[Δt2(x)p(Xt)W2(Xt,y)]dt.=4(p(x)+o(1))(W2(x,y)+o(1))[M2f(x)+o(1)](hT)[M1f(x)+o(1)]2. Similarly, we have IT,24(p(x)+o(1))[M2f(x)+o(1)](hT)[M1f(x)+o(1)]2. Therefore E(QT2(x,y))IT,1+m2(x,y)IT,2=4p(x)(W2(x,y)+(m(x,y))2)M2(hT)M12f2(x). Moreover, using the decomposition (EquationA2), Theorem 3.3 and Lemma 1.1 (in Chaouch and Laïb Citation2023) one can see that E(QT2(x,y)) is negligible with respect to E(QT2(x,y)). This completes the proof.

Proof

Proof of Theorem 3.5

The proof of Theorem 3.5 is based essentially on Lemma A.4 established below, which gives the normality asymptotic of the principal term QT(x,y) in (EquationA3). Indeed, we have from (EquationA3) that (A8) (hT)(m^T(x,y)m(x,y))=(hT)BT(x,y)+(hT)QT(x,y)+(hT)RT(x,y)m^T,1(x).(A8) Under (A1)–(A3), Lemma 1.2 (in Chaouch and Laïb Citation2023) implies that m^T,1(x) converges, almost surely, to p(x) as T. Moreover, using Lemma 1.3 (in Chaouch and Laïb Citation2023), we get under (A3)(i) –(ii) combined with conditions (Equation12) that (hT)BT(x,y)=Oa.s.(hTβ(hT))=oa.s(1), and (hT)RT(x,y)=Oa.s.((hT)hTβ(logT(hT))1/2)=Oa.s.(hTβlogT1/2)=oa.s.(1). The proof may be then achieved by Lemma A.4 and Slutsky's Theorem.

Lemma A.4

Under conditions (A1)–(A3), we have (hT)(m^T(x,y)m(x,y))dN(0,σ~2(x,y))asTwhereσ~2(x,y)M2M12f(x)p(x)W¯2(x,y).

Proof

Proof of Lemma A.4

We have (A9) (hT)QT(x,y)=i=1nξT,i(x,y),withξT,i(x,y)=ηT,i(x,y)E[ηT,i(x,y)|Ftδ]andηT,i(x,y)=1EZ1ϕT(h)nTi1TiζtΔt(x)[ψy(Yt)m(x,y)]dt.(A9) Since for any i1 and t[Ti1,Ti], Fi2FtδFi1, then (ξT,i(x,y))i1 is Fi1-measurable, E(|ξT,i|)< provided E(ζt2)< and E(Xt2)<. Moreover, we have for any 1in, E(ξT,i|Fi2)=E{E[ηi|Ftδ]|Fi2}E{E[ηi|Ftδ]|Fi2}=0 a.s. Hence (ξT,i(x,y))i1 is a sequence of martingale differences with respect to the σ-fields (Fi1)i1. To prove the asymptotic normality, it suffices to check both following conditions (see Corollary 3.1, p. 56, Hall and Heyde Citation1980):

(a) i=1nE[ξT,i2(x,y)|Fi2]Pσ~2(x,y) and (b) nE[ξT,i2(x,y)1l{|ξT,i(x,y)|>ϵ}]=o(1) holds for any ϵ>0.

Proof of (a) Observe now that |i=1nE[ηT,i2(x,y)|Fi2]i=1nE[ξT,i2(x,y)|Fi2]||i=1n(E[ηT,i(x,y)|Fi2])2|. Using (A1), (A3)(i),(i), (ii) and (iv) with Lemma A.1, and a double conditioning with respect to the σ-field Stδ,δ and the fact that nEZ1(t)=O((h)), we have |E(ηT,i|Fi2)|=1EZ1ϕT(h)n|Ti1TiE(p(Xt)Δt(x)[m(Xt,y)m(x,y)]|Fi2)dt|nϕT(h)p(x)nEZ1supuB(x,h)|m(u,y)m(x,y)|supuB(x,h)|p(u)p(x)||Ti1TiE(Δt(x)|Fi2)dt|=O(nϕT(h)hβ)O(ϕ(hT)Ti1Tift,Ti2(x)dt+o(1))1nEZ1=O(nϕT(h)hβ)O(1TTi1Tibt,α0(x)dt)). It follows by (A2)-(iii) and the Cauchy–Schwarz inequality that i=1n(E[ηT,i(x,y)|Fi2])2=O(h2βϕ(h))=o(1). Thus we have only to show that i=1nE[ηT,i2(x,y)|Fi2]Pσ2(x,y). Using again the Cauchy–Schwarz inequality, one may write (A10) i=1nE[ηT,i2(x,y)|Fi2]=1(EZ1)2ϕT(h)ni=1nE[(Ti1TiζtΔt(x)[ψy(Yt)m(x,y)])2|Fi2]δ(EZ1)2ϕT(h)ni=1nE[Ti1Tiζt2Δt2(x)[ψy(Yt)m(x,y)]2dt|Fi2]=δ(EZ1)2ϕT(h)ni=1nE[Ti1Tiζt2Δt2(x)[ψy(Yt)m(Xt,y)]2dt|Fi2]+δ(EZ1)2ϕT(h)ni=1nE[Ti1TiζtΔt2(x)[m(Xt,y)m(x,y)]2dt|Fi2]=:An+Cn.(A10) Now, let us evaluate the term An. Conditioning three times with respect to Ftδ and Stδ, and making use of Conditions (A3)(i), (iii), (iv) and the fact that T=, to get from Lemma A.1 that (A11) An=δ(EZ1)2ϕT(h)ni=1nE[Ti1Tip(Xt)Δt2(x)W¯2(Xt,y)dt|Fi2]δ(EZ1)2ϕT(h)n(p(x)+o(1))|(W¯2(x,y)+o(1))i=1nE[Ti1TiΔt2(x)dt|Fi2](δ+o(1))p(x)W¯2(x,y)ϕT2(h)(EZ1)21ni=1nTi1TiE[1ϕT(h)Δt2(x)|Fi2]dtδ(δ+o(1))p(x)W¯2(x)ϕT2(h)(E(Z1)2M2{1Ti=1nTi1Tift,Ti2(x)dt+Oa.s.[1Ti=1nTi1Tigt,Ti2,x(hT)ϕT(h)dt]}(A11) The Riemann's sum combined with condition (A2)(iii) gives that 1Ti=1nTi1Tift,Ti2(x)dt1T0Tft,Ttδ(x)dtf(x)a.s.asT. Moreover, by (A2)(ii) one gets gt,Ti2,x(hT)ϕT(h)=o(1)asT. Therefore, we have (A12) Anδ(δ+o(1))p(x)W¯2(x,y)ϕT2(h)(δϕT(hT)M1f(x)+o(1))2M2[f(x)+o(1)]=M2M12f(x)p(x)W¯2(x,y):=σ~2(x,y)asT.(A12) On the other hand, by the same arguments as above combined with the fact that supuB(x,h)|m(x)m(u)|=O(h2β)a.s., we get Cn=oa.s.(1).

Proof of part (b) Using successively Hölder, Markov, Jensen and Minkowski inequalities combined with conditions (A3)(iii), (A3)(iv) and Lemma A.1, we get, for any ϵ>0 and fixed real numbers p>1 and q>1 such that 1/p+1/q=1, nE[ξT,i2(x,y)1l{|ξT,i(x,y)|>ϵ}]4n(ϵ/2)2q/pE[|ηT,i|2q]=O(((hT))γ/2)=oa.s(1) by taking 2q=2+γ (0<γ<1), since (hT) towards to infinity as T goes to infinity.

Proof

Proof of Corollary 3.7

Observe that (A13) TFx,T(hT)V~T2(x,y)(m^T(x,y)m(x,y))=Fx,T(hT)ϕ(hT)f(x)σ2(x,y)f(x)V~T2(x,y)(hT)σ2(x,y)(m^T(x,y)m(x,y)).(A13) It follows from the consistency of Fx,T(hT) and (A2)(i) that Fx,T(hT)ϕ(hT)f(x) goes to 1 a.s. as T goes to infinity. By Theorem 3.5, the quantity (hT)σ(x,y)(m^T(x,y)m(x,y)) converges to N(0,1) as T. Then using the non-decreasing property of the cumulative standard normal distribution function Ψ, we get, for a given risk 0<α<1, the (1α)- pseudo-confidence interval (A14) (hT)σ2(x,y)|m^T(x,y)m(x,y)|Ψ1(1α2).(A14) Considering now the statement (Equation13) combined with Proposition 3.6, it holds that (A15) limTσ2(x,y)f(x)V~n2(x,y)limTf(x)V2(x,y)V~n2(x,y)=limTV~2(x,y)V~n2(x,y)=1a.s.,(A15) since V~n2(x,y) is a consistent estimator of V~2(x,y). The proofs follows then from the statements (EquationA13), (EquationA14) and (EquationA15).