1 Introduction
Recent technological advances allow the collection of detailed data on ratings, attitudes, and choices during behavioral tasks. Unlike standard surveys and questionnaires, these tools provide a rich source of data as they adopt tracking devices that collect subjectbased information about the dynamics involved during the data collection task [freeman+al:2010, schultemecklenbeck+al:2019]. Examples of such devices include eyetracking, body movementtracking, computer mousetracking, and electrodermal activity. Among these, computer mousetracking has become an important and widely used tool in behavioral sciences, as it provides a valid and costeffective way to measure the usually unknown processes underlying human ratings and decisions [freeman:2017]. Mousetracking data are often collected by means of standard computer mouse, wii instruments, and joystick devices and consist of collections of realtime trajectories recorded during the behavioral task. In a typical mousetracking task, individuals are presented with a computerbased interface showing the stimulus at the bottom of the screen (e.g., the image of a “dolphin”) and two labels on the left and right top corners (e.g., the labels “mammal” vs. “fish”). They are asked to decide which of the two labels is appropriate given the task instruction and stimulus (e.g., to decide whether dolphin is mammal or fish). In the meanwhile, the xy trajectories of the computer device are instantaneously recorded. The realtime trajectories offer an effective way to study the decision process underlying the hand movement behavior by revealing, for instance, the presence of some levels of decisional uncertainty. The applications of mousetracking tools spread across many research area, including cognitive sciences [coco+duran:2016], neuroscience [stolier+freeman:2017], neurology [ruitenberg+al:2016], and forensic studies [monaro+al:2017].
Several tools for running mousetracking analyses are available in opensource specialized software like MouseTracker [freeman+al:2010], EMOT [calcagni+al:2017], and MouseTrap [kieslich+henninger:2017]. In the R environment, only the packages mousetrack [Coco+al:2015] and mousetrap [Kieslich+al:2018] are devoted to mousetracking data. More generally, there are other packages developed to handle with tracking data such as trajectories [pebesma+al:2015], trackeR [frick+al:2017], adehabtatLT [calenge2006package], and move [kranstauber+al:2017]. Similarly, there are many packages developed for statespace models like, for instance, KFAS [helske:2017] and bssm [helske+vihola:2018].
In this paper we present ssMousetrack
, a novel R package to analyse computerized tracking data as they emerge from typical mousetracking data recording. The package implements a nonlinear state space model to handle with the dynamics of mousetracking data. The model is estimated using approximated Kalman filter coupled with MCMC algorithms via the
rstan package [rstan:2018]. The package includes functions for data preprocessing, data generation, and model assessment. It also provides functionalities to setup designs for mousetracking data recording. Despite other R packages are available in this context, ssMousetrack differs in some aspects. For instance, mousetrack and mousetrap focus on descriptive evaluation of mousetracking data static measures (e.g., minima, maxima, flips, curvature). By contrast, our package (i) implements a dynamic evaluation of the trajectory data without resorting the use of summary measures and (ii) evaluates the observed data variability in terms of latent dynamics and external covariates (e.g., experimental variables), which are usually of relevance in mousetracking data analysis. With respect to KFAS and bssm, our package offers a more focused implementation for mousetracking data. Moreover, although complete and useful in many cases, these packages implement a general class of statespace representations which may not be widely applicable to computerized tracking data. Finally, ssMousetrack differs also from trajectories, trackeR, adehabtatLT, and move as they focus on animal tracking and related problems, such as estimation of habitat choices. This makes them not directly suitable for analysing the various aspects of mousetracking studies. In this respect, the advantages of ssMousetrack are as follows: (i) it is easy to use as it requires typing a single function to run the entire procedure, (ii) it takes advantages of rstan package to estimate model parameters via MCMC, (iii) it allows modeling and analysing computerized tracking data as they are usually recorded in typical mousetracking tasks, (iv) it provides a userfriendly workflow for all processing steps which can be easily understood by nonexpert users, (v) it offers users a way to simulate mousetracking designs and data as well. In addition, ssMousetrack can be combined with other R packages, including shinystan [shinystan:2018] and ggmcmc [ggmcmc:2016] to produce further statistical and graphical representation of the output. The package is available from the Comprehensive R Archive Network at https://CRAN.Rproject.org/package=ssMousetrack.This paper is organized as follows. Section 2 gives a statistical overview of the model implemented in the package, the methods of estimation and inference, and the assessment of the model. Section 3 describes the package’s structure and its utilities. Section 4 illustrates the functioning of the package by means of an illustrative case study. Finally, Section 5 concludes the manuscript with a discussion and future directions.
2 Model
Computerized mousetracking data typically consist of arrays containing the streaming of xy Cartesian coordinates of the computermouse pointer, for subjects, stimuli, and time steps. To simplify data analysis, raw trajectories are usually preprocessed according to the following steps [hehman+al:2014, calcagni+al:2017]. First, the trajectories are normalized on a common sampling scale such that is the same over subjects and stimuli. Next, the arrays are rotated and translated into the quadrant with and by convention. Finally, normalized data are projected onto a (lower) 1dimension space via atan2 function. In this way, the final ordered data lie on the arc defined by the union of two disjoint sets, i.e. the set which represents the rightside section of the screen (usually called target, T) and the set which instead represents the leftside section (usually called distractor, D). The final data are arranged as an columnwise stacked matrix.
The statespace model implemented in ssMousetrack is as follows:
(1)  
(2)  
(3)  
(4)  
(5) 
where Equation (1) is a von Mises measurement equation with being the mean for the th time step and the concentrations around the
th mean vector, Equation (
2) represents the locations on the arc defined in from which the data vector is sampled from and it behaves according to the a real function , which maps reals into radians. In the current version of ssMousetrack, can be defined as:
scaled logistic function:
with representing the contribution of the stimuli on .

scaled Gompertz function:
where has the same meaning as before.
Although they represent two cases of the general family of shaped functions, logistic and Gompertz models differ in some respects. For instance, unlike the logistic model, the Gompertz function is not symmetric around its inflection point, with the consequence that its growth rises rapidly to its maximum rate occurring before the fixed inflection point [mcneish+dumas:2017]. Moreover, the parameters of the Gompertz function are always positive, a constrain which is often required by applications where the covariates of the model cannot take negative values (e.g., reaction times). These two implementations allow users to choose the type of function on the basis of the experimental designs they have used in their studies.
In Equation (3) is a Normal states equation
which represents a lag1 autoregressive process with timefixed variance parameter
. In the current version of ssMousetrack, the covariance matrix of the latent processes is set to an identity matrix
without loss of generality (). Equation (4) is the linear term modeling the contribution of the experimental design (e.g., twobytwo design) and variables involved (e.g., categorical variables, continuous covariates). Note that
is a design (dummy) matrix of main and highorder effects defined by adopting the dummy coding (e.g., treatment contrasts, sum contrasts) whereas is the associated vector of parameters for the columns of , with being the usual baseline term for the contrasts. Finally, Equation (5) defines the concentrations around the th location by using the transformed data:with being the exponential function scaled in the natural range of the concentration parameter (e.g., , ). In the current implementation of the package, the parameter is fixed to unity.
The interpretation of Equations (2)(4) is as follows. The th mean vector is expressed as function of the stimulirelated component and subjectbased component , which are integrated together to form the conditional sampling through the function . As a result, Equation (3) can be interpreted as the individual latent dynamics that are unaffected by the experimental stimuli whereas Equation (4) represents the experimental effect regardless to individual dynamics. More generally, Equation (3) conveys information about the hand movement process underlying the tracking device and as such it can be used to analyse how much individuals differ in executing the task. By contrast, Equation (4) collects information on how a certain experimental manipulation has an effect or not on the movement responses. Interestingly, when normalized into ,
can be interpreted as the probability of the
th individual at the th stimulus to navigate close the distractor cue in the leftside section of the arc. Finally, Equation (5) follows from the fact that hand movements underlying computerized tracking data tend to be smooth over the experimental task, with small changes being more likely close to left (distractor) or right (target) endpoints [brockwell+al:2004].2.1 Estimation and inference
The statespace model in Equations (1)(5) requires estimating the array of latent trajectories together with the array of parameters , with and (logistic case) or (Gompertz case). The array of unknown quantities can be estimated in various way, by adopting both a frequentist and Bayesian perspectives [shumway+stoffer:2006]. In the ssMousetrack package, the parameters are recovered in Bayesian way by means of a marginal MCMC algorithm through which and are alternately recovered [andrieu+al:2010, sarkka:2013]. The reason is twofold: (i) MCMC algorithms, as those implemented in rstan
package, provide a more efficient and complete solution for sampling from the probability distribution of the parameters. (ii) The Bayesian approach offers an elegant solution for data analysis and inference
[gelman+al:2014] by means of which the model is adequately assessed by the analysis of (marginal) posterior distributions of the parameters [kruschke:2014].More in details, the posterior density after factorization of the joint density , is as follows [andrieu+al:2010]:
(6) 
where is the (marginal) likelihood function, is the filtering density, whereas is the prior ascribed on the model parameters. In the current version of ssMousetrack, is approximated via Kalman filtering/smoothing, with being computed as a byproduct of the Kalman theory (see Appendix A).
2.2 Model assessment
In the Bayesian context of data analysis, ssMousetrack provides a simulationbased procedure to evaluate the adequacy of the model to reproduce the observed data [gelman+al:2014]. More technically, given the posteriors of parameters and latent states , new (simulated) datasets are generated according to the estimated model structure and, for each new dataset, two discrepancy measures are considered [kiers1997techniques]:
(7)  
(8)  
which measure the total amount of data reconstruction based on the overall observed array (Equation 7) and the amount of data reconstruction based on the observed matrix for each subject (Equation 8). Both the indices are in the range 0100%, with 100% indicating optimal fit. Note that the measure allows for evaluating the adequacy of the model to reconstruct the individualbased set of data. In addition, the dynamic time warp distance (dtw), as implemented in dtw package, is also computed between and . Unlike the index, the dtw distance measures the similarity among time series by considering their different dynamics [giorgino2009computing].
3 The ssMousetrack package
The ssMousetrack is distributed via the Comprehensive R Archive Network (CRAN). It is based on rstan [rstan:2018], the R interface to the probabilistic programming language Stan [carpenter+al:2017]. The current version of the package allows for (i) simulating data according to a given experimental design, (ii) analysing mousetracking data via statespace modeling, and (iii) evaluating the adequacy of model results. The package consists of five main function (generate_data(), run_ssm(), check_prior(), prepare_data(), evaluate_ssm()), two datasets (language, congruency), and three subfunctions (compute_D(), generate_Z(), generate_design()). The main functions generate_data() and run_ssm() are wrappers to previouslycompiled Stan codes which implement the model described in Section 2. Table 1 provides an overview of the functions and datasets provided in the ssMousetrack package whereas a description of the usage of the functions is reported in the next subsections.
function  type  description 

generate_data()  main  simulate data according to a userdefined experimental design. 
run_ssm()  main  run statespace model on a given mousetracking dataset. 
check_prior()  main  allows users to define a list of priors for prior running run_ssm(). 
prepare_data()  main  preprocess raw tracking data prior running run_ssm(). 
evaluate_ssm()  main  run model evaluation given an output of run_ssm(). The function can plot results if requested by users. 
compute_D()  internal  compute the matrix of distances given the observed data (see Equation 5). 
generate_Z()  internal  generate the Boolean trialbyvariable (design) matrix (see Equation 4). 
generate_design()  internal  allows users to specify an experimental design in terms individuals, trials, variables, and design matrix . 
congruency  dataset  subset of data from [coco+duran:2016]. 
language  dataset  subset of data from [barca+pezzullo:2012]. 
3.1 Generate artificial data
To simulate artificial data we use the function generate_data(), which requires as input the experimental template for the data generation process. More generally, the function works by first sampling the parameters from the prior density and then generates the latent states from Equation (3), computes the matrix from Equation (2) and the matrix , drawns the matrix of data by simply applying Equation (1). For instance, an experiment with one categorical independent variable and two levels, each with three trials, can be generated via the following syntax:
Ψprior_list < list("normal(0.25,0.5)","normal(2.7,1)") Ψdatagen1_ssm < generate_data(N = 61, M = 100,I = 2,J = 6, Ψ+ K = 2,Z.formula = "~Z1", priors = prior_list)
where M = 100 is the number of data to be generated, N = 61 is the number of time step for the mousetracking trajectories, K = 2 means that we have just one variable with two levels, J = 6 indicates the total number of trials such that J/K is the number of trials for each level of the variable, I = 2 is the number of subject, Z.formula indicates the formula for the contrast matrix with standard R syntax. Note that selective priors are specified for each level of the experimental design using the Stan syntax (see the help of check_prior() for further details).
The output is a list containing three sublists, as follows:

params, which contains the model parameters generated for the datasets:
ΨΨ## List of 4 ΨΨ## $ sigmax: num [1:2] 1 1 ΨΨ## $ lambda: num [1:12] 1 1 1 1 1 ... ΨΨ## $ gamma : num [1:75, 1:2] 0.228 0.378 ... ΨΨ## $ beta : num [1:75, 1:12] 0.228 0.378 ... Ψ

data, which contains the matrices of latent states and trajectories , together with , , and :
ΨΨ## List of 5 ΨΨ## $ Y : num [1:75, 1:61, 1:12] 1.54 1.53 ... ΨΨ## $ X : num [1:75, 1:61, 1:2] 1e04 1e04 1e04 1e04 1e04 ... ΨΨ## $ MU: num [1:75, 1:61, 1:12] 1.57 1.57 ... ΨΨ## $ D : num [1:75, 1:61, 1:12] 0.785 0.785 ... ΨΨ## $ Z : num [1:12, 1:2] 1 1 1 1 1 ... ΨΨ

design, which contains the experimental design used as template to generate the data:
ΨΨ## sbj trial Z1 ΨΨ## 1 1 1 100 ΨΨ## 2 1 2 100 ΨΨ## 3 1 3 100 ΨΨ
Similarly, artificial datasets can be generated using more complex designs. For instance, a bivariate design with two variables is produced by typing:
ΨΨdatagen2_ssm < generate_data(I = 2,J = 8,K = c(2,4),Z.formula = "~Z1*Z2", ΨΨ+ Z.type=c("symmetric","random")) ΨΨ
where K = c(2,4) codifies two variables each with two and four levels, Z.formula = " Z1*Z2" indicates that the variables interact whereas Z.type=c("symmetric","random") indicates that trials must be assigned to the first variable using the symmetric method and to the second variable using the random method (see the help of generate_Z() for further details).
Figure 1 shows a sample of mousetracking data generated in the univariate design case with I = 2, K = 2 and J = 6. We report the univariate case only for the sake of simplicity but the same graphical representations can be done for the more complex designs as well.
3.2 Run statespace analysis
Statespace analysis can be run on both simulated and real data. In the first case, after the datageneration process, the statespace model implemented in the ssMousetrack package can be fit using run_ssm(). For instance, the syntax:
ΨΨdatagen2_ssm < generate_data(I = 2,J = 8,K = c(2,4),Z.formula = "~Z1*Z2", ΨΨ+ Z.type=c("symmetric","random")) ΨΨiid < 2 ΨΨdatagen2_fit < run_ssm(N = datagen2_ssm$N,I = datagen2_ssm$I, ΨΨ+ J = datagen2_ssm$J,Y = datagen2_ssm$data$Y[iid,,], ΨΨ+ D = datagen2_ssm$data$D[iid,,],Z = datagen2_ssm$data$Z, ΨΨ+ niter = 5000,nwarmup = 2000,nchains = 2) Ψ
runs the statespace analysis on the iid = 2 artificial data datagen2_ssm. Note that niter indicates the number of total samples to be drawn, nwarmup the number of warmup/burnin iterations per chain, and nchains the number of chains to be executed in parallel. The function run_ssm() allows for parallel computing via the parallel package when nchains > 1. In this case, since ncores="AUTO" (default), the function will run two parallel chains using two cores.
Unlike for the case of artificial data, the analysis of real datasets requires preparing raw data in a proper format via prepare_data(), the function that implements the steps described in Section 2. Generally, raw datasets need to be organized using the longformat, with information being organized as nested. The dataset language is an example of a typical data structure required by prepare_data():
ΨΨdata("language") ΨΨstr(language,vec.len=2) ΨΨ## ’data.frame’:Ψ6060 obs. of 6 variables: ΨΨ## $ sbj : int 1 1 1 1 1 ... ΨΨ## $ trial : int 1 1 1 1 1 ... ΨΨ## $ condition: Factor w/ 4 levels "HF","LF","PW",..: 1 1 1 1 1 ... ΨΨ## $ timestep : int 1 2 3 4 5 ... ΨΨ## $ x : num 0 0.0098 0.0098 0.0098 0.0098 ... ΨΨ## $ y : num 0 0.0025 0.0025 0.0025 0.0025 ... Ψ
where condition is the categorical variable involved in the study. The preprocessing of raw data is performed by the call:
ΨΨlanguage_proc < prepare_data(X = language,N = 61,Z.formula = "~condition") Ψ
where the output language_proc is a data frame containing the preprocessed dataset together with the columnwise stacked matrix of angles, the contrast matrix , and the matrix of distances .
Once raw data have been preprocessed, the statespace analysis is performed as for the case of artificial data:
ΨΨlanguage_fit < run_ssm(N = language_proc$N,I = language_proc$I, ΨΨ+ J = language_proc$J,Y = language_proc$Y, ΨΨ+ D = language_proc$D,Z = language_proc$Z, ΨΨ+ niter = 5000,nwarmup = 2000,nchains = 2) Ψ
The function returns as output a list composed of three sublists, as follows:

params, which contains the posterior samples for the free parameters and :
ΨΨΨ## List of 6 ΨΨΨ## $ sigmax : num 1 ΨΨΨ## $ lambda : num 1 ΨΨΨ## $ kappa_bnds: num [1:2] 5 300 ΨΨΨ## $ gamma :’data.frame’:Ψ4000 obs. of 4 variables: ΨΨΨ## $ beta : num [1:4000, 1:60] 0.26 0.146 ... ΨΨΨ## $ :function (z, ...) ΨΨ

data, which contains the posterior samples for the latent states and the moving means :
ΨΨΨ## List of 6 ΨΨΨ## $ Y : num [1:101, 1:60] 1.56 1.7 ... ΨΨΨ## $ X : num [1:4000, 1:101, 1:5] 1e04 1e04 1e04 1e04 1e04 ... ΨΨΨ## $ MU : num [1:4000, 1:101, 1:60] 1.76 1.68 ... ΨΨΨ## $ D : num [1:101, 1:60] 0.592 0.474 ... ΨΨΨ## $ Z : num [1:60, 1:4] 1 1 1 1 1 ... ΨΨΨ## $ X_smooth: num [1:4000, 1:101, 1:5] 0.0878 0.0635 ... ΨΨ

stan_table
, containing the typical Stan output (i.e., point estimates, credibility intervals, and GelmanRubin index) for the
sampling() method as implemented in the rstan package:ΨΨΨ## mean se_mean sd 2.5% 25% 50% 75% 97.5% n_eff Rhat ΨΨΨ## gamma[1] 0.05 0 0.19 0.43 0.18 0.05 0.08 0.33 3047 1 ΨΨΨ## gamma[2] 0.02 0 0.06 0.13 0.06 0.02 0.02 0.09 2764 1 ΨΨΨ## gamma[3] 0.16 0 0.06 0.04 0.12 0.16 0.20 0.28 2782 1 ΨΨ
Note that users can also export the stanfit object with all the Stan results by specifying stan_object=TRUE in run_ssm().
The function run_ssm() allows for different priors specification. In particular, users can specify different priors for the model parameters as follows:
ΨΨpriors_list < list("lognormal(1,0.5)","normal(1,2)","chi_square(2)","normal(0,1)") ΨΨlanguage_fit < run_ssm(..., priors = priors_list) Ψ
which means that , , , . The list of probability distributions accepted by run_ssm() is described in the help of the function check_prior(). Specification of priors for single parameters is also allowed, by using NULL attributes:
ΨΨpriors_list < list(NULL,"normal(1,2)","chi_square(2)",NULL) ΨΨlanguage_fit < run_ssm(..., priors = priors_list) Ψ
where predefined priors are used for parameters and . Further examples about run_ssm() are illustrated in the manual of the package.
3.3 Evaluate the model results
The methods described in Section 2.2 for the model evaluation are implemented by the function evaluate_ssm(), which requires as input the output of run_ssm(). For instance, considering the fitted object language_fit, the model evaluation can simply be run via the command:
ΨΨlanguage_eval < evaluate_ssm(ssmfit = language_fit, M = 1000, plotx = FALSE) Ψ
where M = 1000 is the number of replications to compute the indices. The function returns as ouput a list containing the mean values of the indices , , and dtw, as well as the distributions obtained over the M replications. Note that, users can also ask for a graphical representation of the indices by setting plotx = TRUE.
4 An Illustrative example
In this section we provide a full example about the way ssMousetrack can be used for statespace analysis of real computerized tracking data. Note that the application reported here has an illustrative purpose only. To this end, we will make use of the dataset language, a subset of data originally presented in [barca+pezzullo:2012]. In this typical computerized tracking task, participants saw a printed stimulus on the screen (e.g., the word water
) and were requested to perform a dichotomous choice task where stimuli need to be classified as word or nonword. The experimental variable
condition was a categorical variables with four levels (HF: Highfrequency word; LF: Lowfrequency word; PW: Pseudoword; NW: Nonword). Participants had to classify each stimulus as word vs. nonword by using a computermouse tracking device. The dataset contains participants, trials, one categorical variable with levels, each with trials. From the dataanalysis viewpoint, we evaluat the extent to which the parameters of the statespace model reflect eventual differences associated with the levels of condition.The raw computerized tracking trajectories in the dataset consist of Cartesian coordinates with (; ). The dataset is partially preprocessed as raw trajectories have the same length (). However, we need to run prepare_data() in order to rotate/translate the raw data into the quadrant and compute the atan2 projections. The preprocessing step is called by the command:
Ψdata("language") Ψlanguage_proc < prepare_data(X = language, N = 101, Z.formula = "~condition")
Figure 2 shows the trajectories associated with the task for all participants and trials.
Next, the statespace model is fit to the preprocessed data by the following call:
Ψpriors_list < list("normal(0,1)","normal(1,1)","normal(2,1)","normal(2,1)") Ψlanguage_fit < run_ssm(N = language_proc$N,I = language_proc$I, Ψ+ J = language_proc$J,Y = language_proc$Y, Ψ+ D = language_proc$D,Z = language_proc$Z, Ψ+ niter = 6000,nwarmup = 2000,nchains=4, Ψ+ priors = priors_list, Ψ+ gfunction = "logistic")
where, in this case, the prior for have been choosen to codify a priori expectations about the effect of the variable condition [barca+pezzullo:2012]. Figure 3 shows some MCMC graphical diagnostics for the model parameters computed using bayesplot [bayesplot:2018] whereas Table 2 reports the posterior quantities for the model parameters. In the Bayesian context of dataanalysis, we evaluate the effects of the variable condition by computing the degree of overlapping among marginal posterior densities for each level of the experimental variable (i.e., the more the overlapping, the weaker the evidence supporting the experimental manipulation). Figure 4 shows the results graphically. Overall, the variable condition showed no strong effect, as the densities of the levels are overlapped. In particular, stimuli in HF, LF, and NW conditions showed no activation of the distractor section of the tracking space as , , and approach zero. By contrast, stimuli in PW condition showed a small effect on activating the target section (), possibly due to the fact that PW stimuli require less cognitive workload [barca+pezzullo:2012].
mean  sd  25%  50%  75%  n_eff  Rhat  

gamma1  0.05  0.19  0.18  0.05  0.08  3047.00  1.00 
gamma2  0.02  0.06  0.06  0.02  0.02  2764.00  1.00 
gamma3  0.16  0.06  0.12  0.16  0.20  2782.00  1.00 
gamma4  0.05  0.06  0.01  0.05  0.09  2680.00  1.00 
Figure 5A reports the filtered latent states for the subjects in the dataset. To further investigate how individual dynamics differ over the levels of condition, we can make use of and ask whether HF, LF, NW, and PW stimuli differ in terms of evidence of mousetracking competition. The idea is that the higher the evidence, the larger the difficulty in categorizing stimuli as word (target) or nonword (distractor).
To do this, we follow the findings of [barca+pezzullo:2012] and divide the entire respose process into three disjoint windows , , and . Usually it is expected that a higher competition would be observed in and rather than . More formally, let be the sequence of filtered states for the th subject and the generic time window , with being equals to the cardinality of . Next, the probability to select nonword (distractor) responses are computed by normalizing the function into the domain , as follows:
(9) 
where is the array of posterior means of the model parameters. Note that in this example we use the logistic function because we set gfunction="logistic" in run_ssm()
. Finally, the evidence measures can be defined in terms of logodd ratio using the probability matrix
:(10) 
where is the profile probability for HF, LF, NW, and PW. The interpretation of is as follows. For there is a higher competition in categorizing the stimulus as word (target) vs. nonword (distractor). By contrast, for there is a lower competition in the response process, as stimuli are easily categorized as word (target). Finally, the case indicates that there is no difference in terms of evidence between word (target) and nonword (distractor) responses. Figure 5B shows the results for the four levels of condition. As expected, the competition in the third phase of the response process is low, as the probability to select the target is higher. The same applies to . On the contrary, in the first stage of the process the competition is higher although the evidence ratio for all the levels of condition approximate zero. Interestingly, the second phase shows a higher whithinsubject variability of competition, which probably indicates that subjects differ in the categorization process just in the middle phase of the response process.
Finally, we assess the adequacy of the model with regards to the observed data by means of evaluate_ssm(), as follows:
Ψlanguage_fit_eval < evaluate_ssm(ssmfit = language_fit,M = 500,plotx = FALSE)
where language_fit is the fitted object returned by run_ssm() whereas M = 500 is the number of replications used to compute the three fit indices. The output of the function consists of a list containing means and distributions of the fit indices:
Ψ## List of 2 Ψ## $ dist :List of 3 Ψ## ..$ PA_ov : num [1:500] 0.944 0.938 ... Ψ## ..$ PA_sbj: num [1:500, 1:5] 0.991 0.991 ... Ψ## ..$ DTW : num [1:500, 1:60] 0.0945 0.1105 ... Ψ## $ indices:List of 3 Ψ## ..$ PA_ov : num 0.936 Ψ## ..$ PA_sbj: num 0.99 Ψ## ..$ DTW : num 0.119
Overall, in this example the fitted model is adequate to reproduce the observed trajectory data as supported by high values of the indices , , and dtw. Figure 6 shows the results graphically.
5 Conclusions
In this paper we introduced the R package ssMousetrack
that analyses computerizedtracking data using Bayesian statespace modeling. The package provides a set of functions to facilitate the preparation and analysis of tracking data and offers a simple way to assess model fit. The package can be of particular interest to researchers needing tools to analyse computerizedtracking experiments using a complete statistical modeling environment instead of descriptive statistics only. In addition, the package
ssMousetrack allows for individualbased analysis of trajectories where latent dynamics are used to obtain richer information which can pave the way to further analyses (e.g., profile analysis). The current version of the package can be extended in several ways. For instance, the inclusion of other statespace representations beyond the simple Gaussian AR(1) model can be a further generalization of the package. Still, model parameters like and can be free to allow for multigroup analysis. Similarly, more comprehensive model diagnostics could also be considered in future releases of the package.Finally, we believe our package may be a useful tool supporting researchers and practitioners who want to make analysis of computerizedtracking experiments using a statistical modeling environment. This will surely help them to improve the interpretability of data analysis as well as the reliability of conclusions they can draw from their studies.
Appendix A Appendix
Given a candidate sample , the mean and variance of the density are approximated via the following recursion:
where is the Kronecker product, the (elementwise) Hadamard product, the (elementwise) Hadamard division, whereas is a scaling matrix with . As a byproduct of the Kalman filter, the marginal likelihood is multivariate Gaussian with mean and variance , with being the linear operator that transforms a vector into a diagonal matrix. Finally, the array contains the filtered latent states implied by the model whereas is the array of variances associated with the filtered states. The smoothing part of the algorithm is implemented using the fixedinterval Kalman smoother [sarkka:2013] where the filtered arrays and are used as input of the backward recursion.
Comments
There are no comments yet.