# Parameter estimates of mixed generalized Gaussian distribution for modelling the increments of electroencephalogram data

Electroencephalogram (EEG) is used to monitor child's brain during coma by recording data on electrical neural activity of the brain. Signals are captured by multiple electrodes called channels located over the scalp. Statistical analyses of EEG data includes classification and prediction using arrays of EEG features, but few models for the underlying stochastic processes have been proposed. For this purpose, a new strictly stationary strong mixing diffusion model with marginal multimodal (three-peak) distribution (MixGGDiff) and exponentially decaying autocorrelation function for modeling of increments of EEG data was proposed. The increments were treated as discrete-time observations and a diffusion process where the stationary distribution is viewed as a mixture of three non-central generalized Gaussian distributions (MixGGD) was constructed.

Probability density function of a mixed generalized Gaussian distribution (MixGGD) consists of three components and is described using a total of 12 parameters:*\mu _{k}*, location parameter of each of the components,

*s*, shape parameter of each of the components,

_{k}*\sigma*, parameter related to the scale of each of the components and

^{2}_{k}*w*, weight of each of the components,

_{k}where

*k*,

*k*={1,2,3} refers to theindex of the component of a MixGGD.

The parameters of this distribution were estimated using the expectation-maximization algorithm, where the added shape parameter is estimated using the higher order statistics approach based on an analytical relationship between the shape parameter and kurtosis.

To illustrate an application of the MixGGDiff to real data, analysis of EEG data collected in Uganda between 2008 and 2015 from 78 children within age-range of 18 months to 12 years who were in coma due to cerebral malaria was performed. EEG were recorded using the International 10–20 system with the sampling rate of 500 Hz and the average record duration of 30 min. EEG signal for every child was the result of a recording from 19 channels. MixGGD was fitted to each channel of every child's recording separately, hence for each channel a total of 12 parameter estimates were obtained.

The data is presented in a matrix form (dimension 79*228) in a .csv format and consists of 79 rows where the first row is a header row which contains the names of the variables and the subsequent 78 rows represent parameter estimates of one instance (i.e. one child, without identifiers that could be related back to a specific child). There are a total of 228 columns (19 channels times 12 parameter estimates) where each column represents one parameter estimate of one component of MixGGD in the order of the channels, thus columns 1 to 12 refer to parameter estimates on the first channel, columns 13 to 24 refer to parameter estimates on the second channel and so on. Each variable name starts with "ch*i*" where "ch" is an abbreviation of "channel" and *i* refers to the order of the channel from EEG recording. The rest of the characters in variable names refer to the parameter estimate names of the components of a MixGGD, thus for example "ch3sigmasq1" refers to the parameter estimate of* \sigma ^{2}* of the first component of MixGGD obtained from EEG increments on the third channel. Parameter estimates contained in the .csv file are all real numbers within a range of -671.11 and 259326.96.

Research results based upon these data are published at https://doi.org/10.1007/s00477-023-02524-y