UNIGE document Scientific Article
previous document  unige:87526  next document
add to browser collection
Title

Analyzing state sequences with probabilistic suffix trees: the PST R package

Authors
Published in Journal of Statistical Software. 2016, vol. 72, no. 3, p. 1-39
Abstract This article presents the PST R package for categorical sequence analysis with probabilistic suffix trees (PSTs), i.e., structures that store variable-length Markov chains (VLMCs). VLMCs allow to model high-order dependencies in categorical sequences with parsimonious models based on simple estimation procedures. The package is specifically adapted to the field of social sciences, as it allows for VLMC models to be learned from sets of individual sequences possibly containing missing values; in addition, the package is extended to account for case weights. This article describes how a VLMC model is learned from one or more categorical sequences and stored in a PST. The PST can then be used for sequence prediction, i.e., to assign a probability to whole observed or artificial sequences. This feature supports data mining applications such as the extraction of typical patterns and outliers. This article also introduces original visualization tools for both the model and the outcomes of sequence prediction. Other features such as functions for pattern mining and artificial sequence generation are described as well. The PST package also allows for the computation of probabilistic divergence between two models and the fitting of segmented VLMCs, where sub-models fitted to distinct strata of the learning sample are stored in a single PST.
Keywords State sequencesCategorical sequencesSequence visualizationSequence mining, variable-length Markov chainsProbabilistic suffix treesR
Identifiers
Full text
Article (Published version) (792 Kb) - public document Free access
Other version: http://www.jstatsoft.org/v72/i03/
Structures
Research group NCCR LIVES
Citation
(ISO format)
GABADINHO, Alexis, RITSCHARD, Gilbert. Analyzing state sequences with probabilistic suffix trees: the PST R package. In: Journal of Statistical Software, 2016, vol. 72, n° 3, p. 1-39. https://archive-ouverte.unige.ch/unige:87526

167 hits

89 downloads

Update

Deposited on : 2016-09-20

Export document
Format :
Citation style :