## Nonparametric bayes modelling of count processes

Nonparametric Bayes modelling of count processes
2013 by Antonio Canale and David B. Dunson. Any opinions expressed here are those of the authorsand not those of the Collegio Carlo Alberto.

**Biometrika Advance Access published October 5, 2013**
*Biometrika *(2013),

*pp. *1–16

*Printed in Great Britain*
**Nonparametric Bayes modelling of count processes**
BY ANTONIO CANALE

*Department of Economics and Statistics, University of Torino, Corso Unione Sovietica 218/bis,*
*10134 Torino, Italy*
AND DAVID B. DUNSON

*Department of Statistical Science, Duke University, Box 90251, Durham,*
*North Carolina 27708, U.S.A.*
Data on count processes arise in a variety of applications, including longitudinal, spatial and
imaging studies measuring count responses. The literature on statistical models for dependentcount data is dominated by models built from hierarchical Poisson components. The Poissonassumption is not warranted in many applied contexts, and hierarchical Poisson models makerestrictive assumptions about overdispersion in marginal distributions. In this article we pro-pose a class of nonparametric Bayes count process models, constructed through rounding real-valued underlying processes. The proposed class of models accommodates situations in whichseparate count-valued functional data are observed for each subject under study. Theoreticalresults on large support and posterior consistency are established, and computational algorithms
are developed based on Markov chain Monte Carlo simulation. The methods are evaluated viasimulation and illustrated by application to longitudinal tumour counts and to asthma inhalerusage.

*Some key words*: Count functional data; Generalized linear mixed model; Hierarchical model; Longitudinal data;Poisson model; Spline; Stochastic process.

A stochastic process

*y *= {

*y(s) *:

*s *∈

*S*} is a collection of random variables indexed by

*s *∈

*S*,
where the domain

*S *usually corresponds to a set of times or spatial locations and

*y(s) *is a ran-dom variable observed at a specific time or location

*s*. There is a rich frequentist and Bayesianliterature on stochastic processes, with common choices encompassing Gaussian processes andL´evy processes, such as the Poisson, Wiener, beta or gamma processes. Gaussian processesprovide a convenient and well-studied choice when

*y *:

*S *→ R is a continuous function. In theBayesian literature, there have been substantial computational and theoretical advances for Gaus-sian process models in recent years. For example, and developed improved methods for posterior computation, while ) andstudied asymptotic properties, including posterior consis-tency and rates of convergence. The Gaussian process is appealing in that it provides a priorwhich can be specified to generate functions that are within an arbitrarily small neighbourhood
2013 Biometrika Trust
A. CANALE AND D. B. DUNSON
of any continuous function with positive probability , ), while also beingcomputationally convenient.

Our interest is focused on the case where

*y *:

*S *→

*N *= {0

*, . . , *∞}, so that

*y *is a count-valued
stochastic process over the domain

*S*. There are many applications of such processes, for exam-ple epidemiological studies monitoring a count biomarker or health response of patients overtime and ecological studies recording the number of birds of a given species observed at differ-ent locations. Although there is a rich literature on models for longitudinal and spatial data, mostmodels rely on Poisson hierarchical specifications. For example, consider

*y(s) *∼ Po{

*λ(s)*}, where the Poisson mean

*λ(s) *varies over time according toa latent process. recently developed an integrated nested Laplace approxima-tion to the posterior for a broad class of latent Gaussian structured additive regression mod-els. The observed variables are assumed to belong to an exponential family, with the meansfollowing an additive model having Gaussian and Gaussian process priors on the unknowncomponents.

Although such models have a flexible mean structure, the Poisson assumption is restrictive
in that it limits the variance to equal the mean, and overdispersion is introduced in marginaliz-ing out the latent processes. This leads to a confounding of the dependence structure with thedegree of overdispersion in the marginals, as both are induced through the latent process. Suchmodelling frameworks cannot accommodate correlated count data that are underdispersed, andsubstantial bias can result for non-Poisson overdispersed data. Relying on a hierarchical Faddymodel , developed methods that generalize the Poisson dis-tribution to accommodate under- and overdispersed longitudinal counts. The Faddy distributionallows the current rate of occurrence to depend on the number of events in a previous interval,and when a dispersion parameter is less than zero the rate decreases with each new event, causingunderdispersion. This is a restrictive type of negative feedback dependence, and computation ischallenging, taking several days to implement a single analysis.

In considering models that separate the marginal distribution from the dependence struc-
ture, it is natural to focus on copulas. ) proposed a copulamodel for bivariate counts that incorporates covariates into the marginal model. proposed a copula model for high-dimensional counts, which can potentially allowunderdispersion in the marginals via a Faddy or Conway–Maxwell–Poisson ) model. ) provide a review of copula models for counts. Toour knowledge, copula models that are directly applicable to count stochastic processes havenot yet been developed. proposed a Gaussian copula processmodel to characterize dependence between arbitrarily many random variables independentlyof their marginals. ) proposed a latent stick-breaking process, which isa nonparametric Bayes approach for a stochastic process with an unknown common marginaldistribution modelled via a stick-breaking prior. They considered a spatial count processapplication, with the marginal modelled via a mixture of Poisson distributions and the spa-tial dependence characterized through a latent Gaussian process. This separates the marginaland dependence structures, but the marginal model is restrictive in being characterized as amixture of Poisson distributions, computation is intensive, and count functional data are notaccommodated.

An alternative approach relies on rounding of a stochastic process. For classification, it is
common to threshold Gaussian process regression ; ). rounded a real discrete autoregressive process to induce an integer-valued time series. used rounding of continuous kernel mixture modelsto induce nonparametric models for count distributions. In this article, we instead propose a class

*Nonparametric Bayes count processes*
of stochastic processes that map a real-valued stochastic process

*y*∗ :

*S *→ R to a count stochasticprocess

*y *:

*S *→

*N *.

2. ROUNDED STOCHASTIC PROCESSES
2·1.

*Notation and model formulation*
Let

*y *∈

*C *denote a count-valued stochastic process, with

*S *⊂ R

*p *compact and

*C *being the set
of all

*S *→

*N *functions that satisfy Assumption

*Assumption *1. The stochastic process

*y *:

*S *→

*N *is piecewise constant such that

*S *=

*l (y)*, with

*y(s) *being constant in the interior of each set

*Sl (y) *and having unit incre-
ments at the boundaries

*B(y)*. The boundary points fall within the set having the higher value of
Assumption ensures that for sufficiently small changes in the input, the corresponding
change in the output is small. We are particularly motivated by applications in which counts donot change erratically at nearby times but maintain some degree of similarity. However, Assump-tion does not rule out rapidly changing count processes, as one can have arbitrarily many jumpsin a tiny interval and still satisfy the assumption. In addition, Assumption can easily be relaxed.

We choose a prior

*y *∼

, where

* *is a probability measure over

*(C, B)*, with

*B(C) *being the
Borel

*σ *-algebra of subsets of

*C*. The measure

* *induces the marginal probability mass functions
pr{

*y(s) *=

*j*} =

{

*y *:

*y(s) *=

*j*} =

*π j (s), j *∈

*N , s *∈

*S,*
as well as the joint probability mass functions
pr{

*y(s*1

*) *=

*j*1

*, . . , y(sk) *=

*jk*} =

{

*y *:

*y(s*1

*) *=

*j*1

*, . . , y(sk) *=

*jk*} =

*π j*
1

*, . . , sk )*
where, for

*h *= 1

*, . . , k *and any

*k * 1,

*jh *∈

*N *and

*sh *∈

*S*.

In introducing the Dirichlet process, ) mentioned three appealing characteris-
tics of nonparametric Bayes priors: large support, interpretability, and ease of computation. Ourgoal is to specify a prior

* *that gets as close to this ideal as possible. Starting with the largesupport characteristic, we would like to choose a

* *that allocates positive probability to arbi-trarily small neighbourhoods around any

*y*0 ∈

*C *with respect to an appropriate distance metric,such as

*L*1. To our knowledge, there is no previously defined stochastic process that satisfiesthis large support condition. In the absence of prior knowledge allowing one to assume that

*y*belongs to a prespecified subset of

*C *with probability 1, priors must satisfy the large supportproperty to be coherently Bayesian. Large support is also a necessary condition for the posteriorfor

*y *to concentrate in small neighbourhoods of any true

*y*0 ∈

*C*.

With this in mind, we propose to induce a prior

*y *∼

* *through

*y *=

*h(y*∗

*),*
*y*∗ ∼

∗

*,*
where

*y*∗ :

*S *→ R is a real-valued stochastic process,

*h *is a thresholding operator from

*Y *→

*C*,with

*Y *denoting the set of all

*S *→ R continuous functions, and

∗ is a probability measureover

*(Y, B) *with

*B(Y) *Borel sets. Unlike with count-valued stochastic processes, there is a richliterature on real-valued stochastic processes. For example,

∗ could be chosen to correspond toa Gaussian process or could be induced through various basis or kernel expansions of

*y*∗.

There are various ways in which the thresholding operator

*h *can be defined. For interpretabil-
ity and simplicity, it is appealing to maintain similarity between

*y*∗ and

*y *in applying

*h*, while
A. CANALE AND D. B. DUNSON
Fig. 1. Effect of the thresholding operator

*h *in (a) samples from a Gaussian process (dashedand dotted lines) with mean function

*μ(s) *= 2 + sin

*(s) *+

*s *(solid line) and squared exponentialcovariance function; (b) rounded versions of the realizations in panel (a) (dashed and dotted
lines) together with the induced mean function (solid line).

restricting

*y *to lie in

*C*. Hence we focus on a rounding operator that sets

*y(s) *= 0 if

*y*∗

*(s) < *0and

*y(s) *=

*j *if

*j *− 1

*y*∗

*(s) < j *for

*j *= 1

*, . . , *∞. In other words, negative values will bemapped to zero, which is the closest nonnegative integer, while positive values will be roundedup to the nearest integer. This type of restricted rounding ensures that

*y(s) *will be a nonnega-tive integer. Using a fixed rounding function

*h *in we rely on flexibility of the prior

*y*∗ ∼

∗to induce a flexible prior

*y *∼

. For notational convenience and generality, we let

*y(s) *=

*j *if

*y*∗

*(s) *∈

*A j *= [

*a j , a j*+1

*)*, where

*a*0

*< *· · ·

*< a*∞, and we focus on the case with

*a*0 = −∞ and

*a j *=

*j *− 1 for

*j *= 1

*, . . , *∞.

This construction is particularly suitable for modelling dynamics of count processes close to
zero and, in particular, zero-inflated processes with local dependence in the zeros. Applying themapping

*h *to a latent

*y*∗ that assumes negative values across certain subregions of

*S *will leadto blocks of zeros in the count process

*y*. This incorporates dependence between zero occur-rences and the occurrence of small counts, which seems natural in most applications, such as thelongitudinal tumour count study of §
Figure illustrates the prior through showing, in panel (a), realizations of the underlying
stochastic process and, in panel (b), the resulting count process after applying the rounding oper-ator. The thick solid lines represent the mean functions of the real-valued process and the inducedprocess; the latter is

*E*{

*y(s)*} =

*j *{

*Fs(a j*+1

*) *−

*Fs(a j )*}

*,*
*s (x ) *= −∞

*fs(y*∗

*) *d

*y*∗ and

*fs *is the marginal distribution of

*y*∗

*(s)*.

The covariance structure of the induced count process inherits much of the structure of the
underlying process, as is clear from
cov{

*y(s), y(s* *)*} =

*j k *pr[{

*y*∗

*(s) *∈

*A j , y*∗

*(s* *) *∈

*Ak*}] −

*E*{

*y(s)*}

*E*{

*y(s* *)*}

*,*
*j *=0

*k*=0

*Nonparametric Bayes count processes*
where {

*y*∗

*(s), y*∗

*(s)*} has a bivariate distribution with covariance equal to cov{

*y*∗

*(s), y*∗

*(s)*}. Inthe Supplementary Material we present some plots comparing the covariance of the originalprocess with that of the induced process.

In certain applications, count data can naturally be viewed as arising through integer-valued
rounding of an underlying continuous process. For example, in the longitudinal tumour countstudy described in § as distinguishing individual tumours tends to be difficult, it is natural toposit a continuous time-varying tumour burden, with tumours fusing together and falling off overtime. Moreover, while tumour biologists attempt to make an accurate count when collecting data,measurement errors are unavoidable, and one can naturally take this into account by assuming asmoothly varying continuous tumour burden specific to each animal, with measurement errorsand rounding producing the observed tumour counts. However, even when the existence of anunderlying continuous process does not have a clear motivation in terms of the applied context,our proposed formulation still leads to a highly flexible and computationally convenient model.

2·2.

*Properties*
The mapping function

*h(*·

*) *in is many-to-one, and the inverse mapping

*h*−1

*(y) *will cor-
respond to an uncountable set of infinitely many continuous stochastic processes

*y*∗ such that

*y *=

*h(y*∗

*)*. As an important step in characterizing the support of the induced prior

*y *∼

,
Lemma ensures the existence of at least one continuous stochastic process for each count pro-cess. The proofs of all results in this section are given in the Appendix.

LEMMA 1.

*For every count stochastic process y*0 ∈

*C satisfying Asssumption **there exists at*
*least one continuous y*∗ :

*S *→ R

*such that y*0 =

*h(y*∗

*).*
Defining an

*L*1 neighbourhood around

*y*0 of size

* *as

*η(y*0

*) *=

*y *:

*d*1

*(y*0

*, y) *=

*y*0

*(s) *−

*y(s)* d

*s < ,*
we have the following theorem on the prior support.

THEOREM 1.

*If the prior *∗

*assigns positive probability to L*1

*neighbourhoods of any con-*
*tinuous function y*∗ :

*S *→ R

*, the prior induced through **assigns positive probability to L*1

*neighbourhoods of any y*0 ∈

*C satisfying Assumption *
In addition to showing the large support property of the prior, it is important to verify that the
posterior distribution for

*y *concentrates increasingly around the true process

*y*0 as the samplesize increases. Theorem provides sufficient conditions under which

*L*1 posterior consistencyholds. Assumption provides a space-filling regularity condition on the design.

*Assumption *2. Let

*S *= [0

*, *1]

*p*, and assume that the

*n *values of

*si *arise from an in-fill
design such that we can cover

*S *with

*n L*∞-balls centred around

*s*1

*, . . , sn *of size

*δ*, where2

*δ *∈

*(n*−1

*/p, *
THEOREM 2.

*Let y *∈

*C be a count stochastic process with yi *=

*y(si) for i *= 1

*, . . , n, where*
*s*1

*, . . , sn are as in Assumption **Let y*0 ∈

*C denote the true stochastic process and let y *∼

*.*

Suppose that {

*η(y*0

*)*}

*> *0

*for any and there exist sets *{

*Cn*}∞

*with C*
*n *∈

*C and *{

*C*C
A. CANALE AND D. B. DUNSON

*c*1 exp

*(*−

*c*2

*n), where C*C

*n denotes the complement of Cn and c*1

*, c*2

*are positive constants. Then*
* (y*0

*)* *y*1

*, . . , yn *→ 0

*.*
From Theorems and it follows that the prior proposed in will lead to

*L*1 posterior
consistency under Assumptions and as long as

∗ assigns positive probability to

*L*1 neigh-bourhoods of any continuous function and negligible probability to

*Y*C =
as

*n *increases.

showed that this condition holds if

*Y*C

*n *has a particular form, for

∗
corresponding to orthogonal basis expansions or Gaussian processes with continuously differ-entiable mean function and covariance having the form

*k(s, s* ;

*β) *=

*k*0

*(β* *s *−

*s* *) *where

*s *∈ R,

*k*0

*(s) *is a positive multiple of a density function that is four times continuously differentiable on
R, and a suitable hyperprior is chosen for

*β*.

2·3.

*Posterior computation*
We estimate the count process

*y *at locations

*s(N)*, including observed locations

*s(n) *=

*(s*1

*, . . , sn)*T and additional locations of interest

*sn*+1

*, . . , sN*. Our rounded Gaussian pro-cess has

*y*∗ ∼ GP

*(*0

*, k) *with

*k(s, s* *) *= cov{

*y*∗

*(s), y*∗

*(s* *)*} =

*τ*1 exp

*(*−

*τ*2

*s *−

*s* 2

*)*, where

*τ *−1 ∼
Ga

*(aτ , b ) *is a scale parameter and

*τ p *∼ Ga

*(a , b )*, with

*p *being the dimension of the
domain

*S*. Here,

*τ *−1 is a bandwidth parameter controlling smoothness, and this prior is moti-
vated by the optimality results of although their theory doesnot apply directly in our setting. The resulting joint distribution of

*y*∗

*(n) *= {

*y*∗

*(s*1

*), . . , y*∗

*(sn)*}Tis

*Nn(*0

*, n)*, with

*n *= {

*σi j *} and

*σi j *=

*k(si , s j )*.

Posterior computation can proceed via a Markov chain Monte Carlo algorithm.

*Step *1. Sample

*y*∗

*(n) *from

*Nn(*0

*, n) *truncated to fall in a hyper-rectangle with

*ay * *y*∗

*(s*
*Step *2. Sample

*τ *−1 from conditional posterior distribution Ga

*(a *+

*n/*2

*, b *+

*y*∗T

*τ*
*Step *3. Update

*τ*2 using a Metropolis–Hastings step.

*Step *4. After burn-in, sample

*y*∗

*(sn*+1

*), . . , y*∗

*(sN ) *from the multivariate Gaussian condi-
In Step 1, Gibbs sampling can be used to update each

*y*∗

*(si ) *from its univariate truncated Gaus-
sian conditional density, but in our experience this leads to slow mixing. Instead, we use the slicesampler of ), which samples multivariate Gaussian random variables restricted to arectangular region. In Step 3, the likelihood of

*y(n) *marginalizing out

*y*∗

*(n) *cannot be calculatedanalytically, so we rely on the multivariate normal likelihood of

*y*∗

*(n) *in calculating the accep-tance probability. It is well known that updating

*τ*2 conditionally on a latent Gaussian processcan lead to stickiness, but owing to the fact that our rounding approach minimizes differencesbetween the observed

*y *and the latent

*y*∗, we have not found this to be a major problem. Alterna-tively, one can improve mixing by using the slice-sampling approach of )with some additional complexity.

As with other Gaussian process models, we face a computational bottleneck and numerical
instability as we evaluate

*y*∗ at increasing numbers of locations. Particularly when the process isobserved at close locations and the covariance function favours smooth realizations, one obtainsan ill-conditioned matrix, which can lead to large computational errors that degrade performance.

There is a rich literature proposing solutions to this problem, with the paper of

*Nonparametric Bayes count processes*
being a recent contribution. In a widely used approximation, the function is represented asa linear combination of finitely many basis functions, leading to reduced instability and poten-tially improved Markov chain Monte Carlo mixing. Hence, along with the rounded Gaussianprocess, we implement an alternative that approximates

*y*∗ using penalized splines. Details ofthis approach are provided in the Supplementary Material.

3. SIMULATION STUDY
A simulation study is conducted to assess the performance of the proposed approach. The
approach is implemented using rounded Gaussian processes or P-splines, and it is comparedwith several competitors. The first set of competitors initially treats the count measurementsas continuous, assuming

*h *to be the identity function. The estimated continuous trajectory isthen rounded in a second stage to produce an estimated count process. Such ad hoc two-stageapproaches are simple to implement; we consider two-stage versions of rounded Gaussian pro-cesses and P-splines. A second approach treats the count measurements as ordered categoricalvariates, using the Gaussian process ordinal regression model of ). Thismethod faces complications when applied to counts and sparse ordered categorical data. In par-
ticular, taking

*y(s*
*i ) *∈ {0

*, *1

*, . . , d *} for

*i *= 1

*, . . , n *and defining

*n j *=

*i *=1 1{

*y(si )*=

*j*}, the total
number of observations having value

*j *, poor performance was obtained when any

*n j *was small,with lack of convergence when

*n j *= 0 for any

*j *∈ {0

*, *1

*, . . , d*}. A third approach correspondsto Poisson regression with the mean parameter

*λ(s) *estimated with a spline smoother, as done bydefault by the gam function of the R library MASS. Lastly, we consider a simple interpolatingstep function defined as

*f (s) *=

*y*11

*s<s (s) *+

*j *1

*s j **s<s j*+1
For our method, we considered the posterior median of

*y(s)*. Simulations have been run undera wide variety of settings, leading to qualitatively similar results. We report the results for fourscenarios. Scenario 1 generates count stochastic processes from Po{

*λ(s)*}, with

*λ(s) *= 2 +

*s/*5 +sin

*(s)*. In scenario 2,

*y *is generated by rounding a realization of a Gaussian process plus an errorterm,

*y *=

*h(y*∗

*),*
*y*∗ ∼ GP

*(μ, k) *+

*,*
with mean function

*μ(s) *= 2 + exp

*(s/*5

*)*, squared exponential covariance function

*k(s, s* *) *and

*(s) *consisting of independent draws from

*N(*0

*, *2

*)*. These two cases do not satisfy Assumption since infinitely many discontinuity points can occur. In scenario 3 we generate from a Poissoncount process with rate parameter 1

*/*2, and in scenario 4 we generate from with

* *= 0.

In each case, we generated data on a equispaced grid of 1000 points between 0 and 20. Taking
equispaced subsamples for different levels of sparsity, of sizes

*n *= 25

*, *50

*, *100 and 500, we esti-mate the trajectory on a fine grid for 500 replicates for each scenario and each method. UsingMarkov chain Monte Carlo simulation, we obtained draws from the posterior predictive distribu-tion and used the median as our estimate. Methods are compared based on averaging the meanabsolute deviation between the estimate and the true process across the replicates and grid points.

Table shows that the proposed rounding approaches have the best overall performance. The
Gaussian process ordinal regression model consistently performed worst. As expected, the Pois-son model with nonparametric mean performs well in scenario 1 but poorly in the other cases, par-ticularly when the sample size is not small. The interpolating step function showed consistentlypoor performance except in scenario 3. The two-stage methods perform similarly to the proposed
A. CANALE AND D. B. DUNSON
Table 1.

*Mean absolute deviation and standard deviation (in parentheses) in the simulation study*
RGP, rounded Gaussian process; GP, Gaussian process; RPS, rounded P-spline; PS, P-spline; GPOR, Gaussian processordinal regression; NPP, nonparametric Poisson model; E, empirical interpolating step function.

approaches in scenarios 1 and 2, but have substantially worse performance in scenarios 3 and 4.

The two-stage methods show particularly poor performance when the counts do not take a widerange of values, have values near zero, or tend to have many occurrences of the same value. Inaddition, the approach of rounding in a second stage can have unanticipated consequences interms of inference on functionals, which may be unreliable and biased. Interestingly, the roundedP-splines approach has somewhat better performance than the rounded Gaussian process. Sincerounded P-splines are also faster to implement, taking from 15 seconds for samples of size

*n *= 25to 30 seconds for samples of size

*n *= 500 for 10 000 Markov chain Monte Carlo iterations in eachof the simulated examples, we focus on this approach in the real-data applications. We also com-pared the methods in terms of predictive mean absolute deviation, as well as width and coverageof predictive credible intervals, and again observed better overall performance of the proposed

*Nonparametric Bayes count processes*
approaches, with the competitors having high mean absolute deviation and poor coverage in atleast one of the scenarios. Additional tables summarizing the results for predictive errors andpredictive coverage are presented in the Supplementary Material.

4·1.

*Count functional data*
We have concentrated on the case where a single count process

*y *is observed at locations

*s *=

*(s*1

*, . . , sn)*T. In many applications, however, there are multiple related count processes
{

*yi *:

*i *= 1

*, . . , n*}, with the

*i*th process observed at locations

*si *=

*(si*1

*, . . , sin )*T. We refer to
such data as count functional data. As in other functional data settings, it is of interest toborrow information across the individual functions through use of a hierarchical model. Thiscan be accomplished within our rounded stochastic processes framework by first defining afunctional data model for a collection of underlying continuous functions {

*y*∗ :

*i *= 1

*, . . , n*}
and then letting

*yi *=

*h(y*∗

*) *for

*i *= 1

*, . . , n*. There is a rich literature on appropriate models
for {

*y*∗ :

*i *= 1

*, . . , n*}, ranging from hierarchical Gaussian processes , to
wavelet-based functional mixed models , ).

Let

*yi (s) *denote the count for subject

*i *at time

*s*, let

*yit *=

*yi (sit ) *where

*sit *is the

*t*th observation
time for subject

*i *, and let

*xit *=

*(xit*1

*, . . , xitp)*T be predictors for subject

*i *at the

*t*th observationtime. As a simple model motivated by the longitudinal tumour count and asthma inhaler useapplications described below, we let

*yit *=

*h(y*∗

*),*
*i *+

*b(si t , xi t )*T

*θ *+

*i t ,*
*ξi *∼

*Q, it *∼

*N(*0

*, τ*−1

*),*
where

*ξi *is a subject-specific random effect, the

*b(*·

*) *are basis functions that depend on time andpredictors,

*θ *is a vector of unknown basis coefficients, and

*it *is a residual which allows thecounts to vary erratically from time to time about the smooth subject-specific mean curve. Weuse basis expansions motivated by the success of rounded P-splines in our simulation. To allowthe random effect distribution to be unknown, we choose a Dirichlet process prior ),

*Q *∼ DP

*(α Q*0

*)*, where

*α *is a precision parameter and the base measure

*Q*0 is chosen as

*N (*0

*, ψ) *with

*ψ *∼ Ga

*(aψ , bψ )*. As is commonly done, we fix

*α *= 1. Additionally, we choose
a hyperprior for the residual precision

*p(τ ) *∝

*τ *−1 and for the basis coefficients

*p(θ)*, with thespecific form of

*p(θ) *depending on the context.

4·2.

*Transgenic mouse bioassay*
We first analyse data from a Tg.AC mouse bioassay study of pentaerythritol triacrylate, a
chemical used in many industrial processes. Animals were randomized to a control group or toone of five dose groups each of size 30. The five dose groups are 0·75, 1·5, 3, 6 or 12 mg/kg.

The number of skin papillomas on the back of each mouse was counted weekly for 26 weeks,and it is of interest to compare the groups to see if there is an increase in tumourigenicity relativeto control, while assessing dose-response trend. ) analysed these datathrough a Poisson-gamma frailty model. As remarked earlier, Poisson hierarchical models arequite restrictive, and our aim here is to use the proposed model to improve robustness.

The only predictor for an animal is the dose group

*xi *∈ {1

*, . . , G*}, and we let

*b(sit , xi )*T

*θ *=

*b(sit )*T

*θx *in expression to allow a separate trajectory in time for each dose group. Here
the

*b(s) *are B-spline basis functions, the

*θg *are basis coefficients specific to group

*g*, and

*p(θg * *λ) *∝ exp

*(*−

*λθ*T

*g Pθg/*2

*) *are conditionally independent P-spline priors for each dose group.

The prior is designed to only borrow information across dose groups in estimating the smoothness
A. CANALE AND D. B. DUNSON
Number of tumours
Fig. 2. Estimated cumulative mean tumour burden (lines) and weekly samplemeans (points) for: the control, 0·75 mg/kg dose and 1·5 mg/kg dose groups(solid line and circles); the 3 mg/kg dose group (dashed line and triangles); the6 mg/kg dose group (dotted line and crosses); and the 12 mg/kg dose group (dash-
dotted line and diamonds).

parameter

*λ*, to avoid the possibility of having chemical effects in higher dose groups pull up theestimated tumour response in lower dose groups. To induce a heavy-tailed prior with appealingcomputational properties, we use a multilevel hierarchical prior for

*λ*, with

*λ *∼ Ga

*(ν/*2

*, δν/*2

*)*,

*δ *∼ Ga

*(aδ, bδ) *and

*ψ *∼ Ga

*(aψ, bψ)*. We do not expect to have substantial learning from the dataabout

*δ *or

*ψ*. The computational details are reported in the Supplementary Material.

As a global measure of toxicity, we use the average papilloma burden per group. The two lower
dose groups showed no significant difference from the control group, with the posterior meanof the average tumour burden being less than 0·001 and the 95% credible intervals concentratednear zero. For the higher dose groups, the average tumour burden grows with the dose level.

Mean tumour burden and 95% credible intervals are 0·18 [0·06, 0·39], 9·51 [9·21, 9·80] and12·33 [11·90, 12·72] for the 3, 6 and 12 mg/kg dose groups, respectively. Cumulative tumourburdens along with the dose group-specific empirical means for each week are plotted in Fig.
As a measure of the time-varying increase in papilloma burden, we took the mean burden per
dose group per week and subtracted the average number for the control group. Posterior meansand 95% credible bands are reported in Fig. The two lower dose groups are indistinguishablefrom the control, so that Fig. shows a horizontal line at zero, while the 3, 6 and 12 mg/kgdose groups exhibit clear increases relative to control starting from the 17th, 9th and 8th weeks,respectively. Higher doses of the chemical are associated with higher numbers of skin papillomasand earlier onset of the first tumour. Our modelling approach allows us to estimate the averagetime of onset of the first tumour, which is in the 27th, 14th and 11th weeks for the three higherdose groups. For the other groups, the typical mouse did not develop tumours before the end ofthe study.

Our overall conclusions agree with those of ). The group compari-
son results are also consistent with results obtained from a frequentist generalized linear modelanalysis. We additionally implemented standard frequentist nonparametric tests for comparinggroups based on summaries of the tumour trajectory data, including time of first tumour andmaximum tumour burden per animal. A

*p*-value of less than 0·001 for the Kruskal–Wallis rank

*Nonparametric Bayes count processes*
Mean tumourigenicity effect
Mean tumourigenicity effect
Mean tumourigenicity effect
Mean tumourigenicity effect
Fig. 3. Time-varying chemical exposure posterior mean effect on tumourigenicity (solid line) with95% credible bands (dashed lines) for: (a) the 0·75 mg/kg and 1·5 mg/kg dose groups; (b) the3 mg/kg dose group; (c) the 6 mg/kg dose group; and (d) the 12 mg/kg dose group. The dotted
horizontal line at zero corresponds to no effect of the chemical.

sum test suggests strong evidence against equality among the dose groups in maximum tumourburden per animal. Pairwise Wilcoxon tests were performed to test the equality of the maximumtumour burden between each treated group and the control, with one-sided alternatives of highermaximum burdens in the treated groups; the

*p*-values turned out to be less than 0·01 for the threehigher dose groups and were 1 and 0·09 for the 0·75 and 1·5 mg/kg groups, respectively. Similarresults were obtained by taking the time of development of the first tumour as a summary of thetumour trajectory. As partly illustrated in Fig. which shows the empirical and estimated meantumour burdens for each group, the model provides a good fit to the data.

4·3.

*Asthma inhaler use*
As a second application, we analysed data on daily usage of albuterol asthma inhalers
Daily counts of inhaler use were recorded over a period of 36 to 122days for 48 students previously diagnosed with asthma. The total number of observations was5209. As discussed in ), the data are underdispersed. Let

*yit *denote thenumber of times the

*i *th student used the inhaler on day

*t *. We are interested in the effect ofmorning levels of PM25, small air pollutant particles of less than 25 mm in diameter, on asthmainhaler use. On each day

*t *, a vector

*xt *=

*(xt*1

*, . . , xtp)*T of environmental variables was recorded,including PM25 level; average daily temperature, in degrees Fahrenheit, divided by 100; percent-age humidity; and barometric pressure, in mmHg, divided by 1000. We modify to include
A. CANALE AND D. B. DUNSON
these predictors in an additive model as follows:

*yit *=

*h(y*∗

*),*
*b j (x jt )*T

*θ j *+

*it ,*
where

*ξi *is a random effect modelled as in § {

*b j *} is a B-spline basis, with the

*θ j *being thebasis coefficients, and

*i *∼

*N (*0

*, τ *−1

*R)*, with

*R *being the correlation matrix arising from a first-order autoregressive process with correlation parameter

*ρ*. The prior for each

*θ j *is identical to thatused for

*θg *in § and each predictor is normalized to have zero mean and unit variance priorto analysis. The correlation parameter is given a uniform prior on [−1

*, *1]. The computationaldetails are reported in the Supplementary Material.

We ran our Markov chain Monte Carlo algorithm for 11 000 iterations, discarding a burn-in
of 1000 iterations. Convergence and mixing were diagnosed by monitoring the nonlinear effectsof the different predictors at several values and also by monitoring hyperparameters. The traceplots showed excellent mixing, with an effective sample size of over 9000. Autocorrelation func-tions tended to drop near zero between lag 1 and lag 2. To obtain interpretable summaries of thenonlinear covariate effects on the inhaler use counts, we recorded for each predictor, at a densegrid of

*x jt *values for each sample after burn-in, the conditional expectation of the count for atypical student having

*ξi *=

*μQ*:

*μj(x jt) *=

*E(yit * *x jt, x j* *t *= 0

*, j* =

*j, ξi *=

*μQ, θ, τ, ρ)*
*j x j t ), τ *} −

{

*ak *;

*μ*∗

*j x j t ), τ *}]

*.*
Here

*μQ *is the mean of the random effects distribution

*Q*,

*(*· ;

*μ, τ ) *is the cumulative distribu-tion function of a normal random variable with mean

*μ *and precision

*τ *,

*K *is the 99·99% quantileof

*N *{

*μ*∗

*(x*
*j t ), τ *−1}, and

*j x j t ) *=

*b j (x j t )*T

*θ j *+

*bl(*0

*)*T

*θl *+

*μQ,*
with the other predictors fixed at their mean value. Based on these samples, we calculated pos-terior means and pointwise 95% credible intervals; the results are reported in Fig.
These data were previously analysed by ) using a Faddy distribution with
a log-linear mixed model for the mean,
log

*E(yit * *xt , β, ui , eit ) *=

*x jt β j *+

*ui *+

*eit ,*
where

*ui *is a subject-specific random effect and

*eit *is a residual that follows a first-order autore-gressive process. ) estimated a coefficient of 0·013 for PM25, which is closeto zero, with 95% confidence interval including zero. A Poisson log-linear model analysis yieldeda similar coefficient of 0·014 but with a confidence interval that is 50% wider. Our approach,which is based on a substantially more flexible model that allows for nonlinear effects and anonparametric random effects distribution, produces results that are consistent with these earlieranalyses.

*Nonparametric Bayes count processes*
PM25 (µg m–3/10)
Temperature (°F/100)
Pressure (mmHg/1000)
Fig. 4. Posterior mean (solid line) and 95% pointwise credible bands (dashed lines) for the effect of (a)concentration of PM25 pollutant, (b) average daily temperature, (c) percentage humidity, and (d) barometric
pressure on asthma inhaler use calculated with equation
We have proposed a simple new approach to modelling count stochastic processes, based
on rounding continuous stochastic processes. The general strategy is flexible and allows one touse existing algorithms and code for posterior computation for continuous stochastic processes.

Although rounding of continuous underlying processes is quite common for binary and categor-ical data, such approaches have not, to our knowledge, been applied to induce new families ofcount stochastic processes. Instead, the vast majority of the literature on count processes relieson Poisson processes and hierarchical Poisson constructions, which have well-known limitations.

We have explored some basic properties of rounding, but the primary contribution of this articleis to introduce the idea that rounding can be useful in this context. It is likely that some propertiesof the underlying continuous process, which are well known for Gaussian processes and otherstandard cases, will carry over to the induced count process. However, this issue merits furtherstudy. Other interesting directions to pursue include the modelling of counting processes corre-sponding to nondecreasing count processes via rounding nondecreasing continuous processes by
A. CANALE AND D. B. DUNSON
using monotone splines ; ; , and otherconstructions.

Comments and suggestions of the referees are gratefully acknowledged. Antonio Canale is
also affiliated with the Collegio Carlo Alberto. This research was partially supported by theNational Institute of Environmental Health Sciences of the United States National Institutes ofHealth, the University of Padua, and Regione Piemonte.

SUPPLEMENTARY MATERIAL
Supplementary material available at

*Biometrika *online includes discussion of the covariance
function of § computational details for the rounded P-splines of § an additional tableand figure showing results of the simulation in § and the computational details of the twoapplications discussed in §

*Proof of Lemma *For any count stochastic process

*y*0 satisfying Assumption we can partition the
domain

*S *into mutually disjoint sets

*Sl(y*0

*)*, with

*y*0

*(s) *constant in the interior of each

*Sl(y*0

*) *and havingunit increments at the boundaries. There are clearly infinitely many continuous functions

*y*∗ :

*S *→ R sat-isfying the constraints (i)

*y*∗

*(s) *∈ [

*ay*0

*(s), ay*0

*(s)*+1

*) *for all

*s *∈

*S*, and (ii)

*y*∗

*(s) *=

*ay*0

*(s) *for

*s *∈

*B(y*0

*)*. For allsuch

*y*∗, we have

*y*0 =

*h(y*∗

*)*.

*Proof of Theorem *The theorem is an immediate consequence of Lemma and the following lemma
which ensures that the mapping

*h *maintains

*L*1 neighbourhoods.

LEMMA A1.

*Suppose that y*∗

*and y*∗

*are continuous and bounded by M *∈ R

*and are such that*
*d*1

*(y*∗

*, y*∗

*) *=

∗

*. Let y *=

*h(y*∗

*) and y*
*). Then y *∈

*η*
*), where ζ(*∗;

*y*∗

*) is*
0 =

*h(y*∗

*(y*0

*) for all > ζ (*∗;

*y*∗0

*nondecreasing in *∗

*and satisfies *lim

∗→0

*ζ(*∗;

*y*∗

*) *= 0

*.*
*Proof of Lemma *Take

*S *= [0

*, *1]

*p *without loss of generality. Let {

*Sl(y*0

*, y)*}

*m *be the partition of

*S *induced by {

*Sl(y*0

*)*}

*m*0 and {

*S*
such that

*y(s) *=

*j*
*l (y)*}

*m*1

*l *and

*y*0

*(s) *=

*kl *for all

*s *∈

*Sl (y*0

*, y) *and some

*jl , kl *∈

*N *. Let

*δl(y*0

*, y) *=

*jl *−

*kl* for

*l *= 1

*, . . , m*, and let

*λ(*·

*) *be Lebesgue measure. Define

*ζ(*∗;

*y*∗

*) *= sup

*l *{

*y*0

*, h(y*∗

*)*}

*l *{

*y*0

*, h(y*∗

*)*}]⎭

*.*
*l*:

*δl *
Clearly,

*y *∈

*η(y*0

*) *for all

* > ζ(*∗;

*y*∗

*) *since

*d*1

*(y*0

*, y) *=

*δl(y*0

*, y)λ*{

*Sl(y*0

*, y)*}

*ζ(*∗;

*y*∗

*).*
We show first that lim

∗→0

*ζ(*∗;

*y*0

*) *= 0. What follows holds for all

*y*∗ ∈

*η*∗

*(y*∗

*)*. Consider a general

*y*∗ ∈

*l *{

*y*0

*, h(y*∗

*)*}] is finite,

*ζ(*∗;

*y*0

*) *goes to zero if max

*δl *{

*y*0

*, h(y*∗

*)*} goes to zero.

Define

*M*∗ = max

*y*∗

*(s) *−

*y*∗

*(s)* and let

*s*
*(s)* with

*s*
*M *= arg max

*y*∗

*(s) *−

*y*∗

*M *belonging to a given

*Sl*
where

*y*∗

*(s) * *a j*
*(s) * *a*
*l *+1 and

*y *∗

*kl *+1. By construction

*all *+1 −

*akl *+1

*M*∗ , and so for

*M*∗ → 0 we have

*(s)* → 0, it follows that

*y*∗

*(s) *−

*y*∗

*(s)* → 0 for all

*s *∈

*S*,

*l *+1 =

*akl *+1 . Considering that max

*y *∗

*(s ) *−

*y *∗
which leads also to max

*δl *→ 0. As the absolute value of the difference

*y*∗

*(s) *−

*y*∗

*(s)* is bounded and
continuous, we have that if

*S * *y*∗

*(s) *−

*y*∗

*(s)* d

*s *goes to zero, then lim sup

*(s) *−

*y*∗

*(s)* , and hence

*S * *y*∗0
also

*M*∗, goes to zero as well.

*Nonparametric Bayes count processes*
The fact that

*ζ(*· ;

*y*0

*) *is nondecreasing follows directly from its definition.

By Lemma with suitable

∗ we have

*)*}

*> *0

*.*
*Proof of Theorem *Since

*y*0

*(si ) *is equal to the observed

*yi *for all

*i*, we can rewrite the posterior

* y *∈

*η*C

* (y*0

*) y*1

*, . . , yn*
*i ) *d

*(y) *+

*i ) *d

*(y)*
*i *=1

*yi*
*i ) *d

*(y)*
*(*1 −

*i ) *d

*(y) *+

*i ) *d

*(y)*
*i *=1

*yi*
*i ) *d

*(y)*
=

*n *+

*I*1

*,n(y*1

*, . . , yn) *+

*I*2

*,n(y*1

*, . . , yn),*
*I*3

*,n(y*1

*, . . , yn)*
where

*δa *is a delta mass at

*a*,

*n *is a test function, and

*Cn *is a sieve that grows eventually to become thewhole space

*C*. It suffices to show that

*n *→ 0

*,*
exp

*(β*1

*n)I*1

*,n(y*1

*, . . , yn) *→ 0

*,*
exp

*(β*2

*n)I*2

*,n(y*1

*, . . , yn) *→ 0

*,*
exp

*(βn)I*3

*,n(y*1

*, . . , yn) *→ ∞

*,*
with

*β < *min{

*β*1

*, β*2}.

*a* the integer part of

*a *and let

*S *=

*j , *with each

*G j *being an

*L *∞-ball of size

*n*1

*/p**)*−1 and centre

*s* , where the centres are chosen on a grid so that

*n*1

*/p**p *balls cover

*S *and each

*Gj *contains at least one element of

*(s*1

*, . . , sn)*T under Assumption Define

*Xi *= 1{

*y(si) *=

*y*0

*(s* *)*}, with

*s* being the centroid of the

*G*
*j *which contains

*si *. Let

*n *= 1{

*i < n*} be the test on the set

*Cn *= {

*y *:

*y *is constant in

*Gj *for all

*j *= 1

*, . . , *
*n*1

*/p**p, **y*∞

*< Mn*}

*,*
where

*Mn *=

*O(nα) *and 1

*/*2

*< α < *1. The first condition on the sieve governs the regularity of the process,while the second gives an upper bound for the infinity norm as in The true

*y*0belongs to

*Cn *for a given

*n*; hence, for

*n *sufficiently large, the test functions have exactly zero Type I andType II probability. From this is directly verified. We proceed to proving By Fubini's theoremwe have

*Ey *{

*I*
*(*1 −

1

*,n (y*1

*, . . , yn )*} =

*E y*0

*i ) *d

*(y)*
*Ey*{

*(*1 −

*n)*} = 0

*,*
where the final equality is directly verified by the test construction. Next, we prove Again by Fubini'stheorem, we have

*Ey *{

*I*
*(C*C

*) * *c*
2

*,n (y*1

*, . . , yn )*} =

*E y*0

*i ) *d

*(y)*
A. CANALE AND D. B. DUNSON
hence, for

*β*2

*< c*2, exp

*(β*2

*n)I*2

*,n(y*1

*, . . , yn) *→ 0. Finally, the prior positivity of

* *makes

*I*3

*,n(y*1

*, . . , yn)*positive. This also establishes and thus completes the proof.

BANERJEE, A., DUNSON, D. B. & TOKDAR, S. T. (2013). Efficient Gaussian process regression for large datasets.

*Biometrika ***100**, 75–89.

BANERJEE, S., GELFAND, A. E., FINLEY, A. O. & SANG, H. (2008). Gaussian predictive process models for large spatial
data sets.

*J. R. Statist. Soc. *B

**70**, 825–48.

BEHSETA, S., KASS, R. E. & WALLSTROM, G. L. (2005). Hierarchical models for assessing variability among functions.

*Biometrika ***92**, 419–34.

CANALE, A. & DUNSON, D. B. (2011). Bayesian kernel mixtures for counts.

*J. Am. Statist. Assoc. ***106**, 1528–39.

CHOI, T. & SCHERVISH, M. J. (2007). On posterior consistency in nonparametric regression problems.

*J. Mult. Anal.*
CHU, W. & GHAHRAMANI, Z. (2005). Gaussian process for ordinal regression.

*J. Mach. Learn. Res. ***6**, 1019–41.

DUNSON, D. B. & HERRING, A. H. (2005). Bayesian latent variable models for mixed discrete outcomes.

*Biostatistics*
**6**, 11–25.

ERHARD, V. & CZADO, C. (2009). Sampling count variables with specified Pearson correlation: A comparison between
a naive and a C-vine sampling approach. In

*Dependence Modeling: Vine Copula Handbook*, Ed. D. Kurowicka &H. Joe, pp. 73–87. Singapore: World Scientific.

FADDY, M. J. (1997). Extended Poisson process modeling and analysis of count data.

*Biomet. J. ***39**, 431–40.

FERGUSON, T. S. (1973). A Bayesian analysis of some nonparametric problems.

*Ann. Statist. ***1**, 209–30.

FR ¨UHWIRTH-SCHNATTER, S. & WAGNER, H. (2006). Auxiliary mixture sampling for parameter-driven models of time
series of counts with applications to state space modelling.

*Biometrika ***93**, 827–41.

GENEST, C. & NESLEHOVA, J. (2007). A primer on copulas for count data.

*Astin Bull. ***37**, 475–515.

GHOSAL, S. & ROY, A. (2006). Posterior consistency of Gaussian process prior for nonparametric binary regression.

*Ann. Statist. ***34**, 2413–29.

GRUNWALD, G. K., BRUCE, S. L., JIANG, L., STRAND, M. & RABINOVITCH, N. (2011). A statistical model for under-
or overdispersed clustered and longitudinal count data.

*Biomet. J. ***53**, 578–94.

KACHOUR, M. & YAO, J. F. (2009). First-order rounded integer-valued autoregressive (RINAR(1)) process.

*J. Time*
*Ser. Anal. ***30**, 417–48.

LIECHTY, M. W. (2010). Multivariate normal slice sampling.

*J. Comp. Graph. Statist. ***19**, 281–94.

MORRIS, J. S. & CARROLL, R. J. (2006). Wavelet-based functional mixed models.

*J. R. Statist. Soc. *B

**68**, 179–99.

MURRAY, I. & ADAMS, R. P. (2010). Slice sampling covariance hyperparameters of latent Gaussian models. In

*Advances in Neural Information Processing Systems 23*, Ed. J. Lafferty, C. K. I. Williams, R. Zemel, J. Shawe-Taylor & A. Culotta, pp. 1723–31. New York: Curran Associates.

NEELON, B. & DUNSON, D. B. (2004). Bayesian isotonic regression and trend analysis.

*Biometrics ***60**, 398–406.

NIKOLOULOPOULOS, A. & KARLIS, D. (2010). Regression in a copula model for bivariate count data.

*J. Appl. Statist.*
RAMSAY, J. O. (1998). Estimating smooth monotone functions.

*J. R. Statist. Soc. *B

**60**, 365–75.

RODR´IGUEZ, A., DUNSON, D. B. & GELFAND, A. E. (2010). Latent stick-breaking processes.

*J. Am. Statist. Assoc. ***105**,

RUE, H., MARTINO, S. & CHOPIN, N. (2009). Approximate Bayesian inference for latent Gaussian models by using
integrated nested Laplace approximations.

*J. R. Statist. Soc. *B

**71**, 319–92.

SHIVELY, T. S., SAGER, T. W. & WALKER, S. G. (2009). A Bayesian approach to non-parametric monotone function
estimation.

*J. R. Statist. Soc. *B

**71**, 159–75.

SHMUELI, G., MINKA, T. P., KADANE, J. B., BORLE, S. & BOATWRIGHT, P. (2005). A useful distribution for fitting
discrete data: Revival of the Conway–Maxwell–Poisson distribution.

*J. R. Statist. Soc. *C

**54**, 127–42.

VAN DER VAART, A. W. & VAN ZANTEN, J. H. (2009). Adaptive Bayesian estimation using a Gaussian random field
with inverse gamma bandwidth.

*Ann. Statist. ***37**, 2655–75.

WILSON, A. & GHAHRAMANI, Z. (2010). Copula processes. In

*Advances in Neural Information Processing Systems*
*23*, Ed. J. Lafferty, C. K. I. Williams, J. Shawe-Taylor, R. Zemel & A. Culotta, pp. 2460–68. New York: CurranAssociates.

[

*Received April *2012

*. Revised July *2013]

Source: http://www.carloalberto.it/assets/working-papers/no.318.pdf

LITHIUM - ION BATTERY TELECOM APPLICATIONS Arun Golas DDG (T&A), Ram Krishna DDG (FLA), R. K. Siddhartha Director (FLA) and Naveen Kumar AD (FLA) Abstract We present various aspects for use of Lithium-Ion Battery in various Telecom Applications in present as well as future scenario. The uses of Lithium-ion (Li-ion)

COPERTINA ABSTRACT 300x210:Layout 1 18-11-2009 19:04 Pagina 1 THIRD INTERNATIONAL CONGRESS ON International Institute University of Mekelle National Institute Health Sector Development International Society International Centre for Health Migrants Forlanini Hospital for Migration and Health Anthropological Sciences Dermatological Care for All: Common Diseases for Neglected People