(For theoretical details of PRISM and our recent research,
see prismintro.pdf [411KB],
which is a compilation of past slides.)
PRISM is a general programming language intended for symbolicstatistical
modeling. It is a new and unprecedented programming language with
learning ability for statistical parameters embedded in programs.
Its programming system, shortly called “PRISM system” here,
is a powerful tool for building complex statistical models.
The theoretical background of PRISM system is
distribution semantics for parameterized logic programs
and EM learning of their parameters from observations
[Sato 95, Sato 01c].
The PRISM system is comprised of two subsystems, one for learning and
the other for execution, just like a human being has an artery and a
vein. The execution subsystem supports various probabilistic
builtin predicates conforming to the distribution semantics and makes
it possible to use a program as a random sampler. The learning
subsystem on the other hand supports abductive reasoning, i.e. search
for logical explanations of observations through a program and learning
parameters associated with them for the adaptive behavior of the
program.
Speaking mathematically, a PRISM program is a logic program in which
facts have a parameterized probability distribution so that the
program can be seen as a parameterized statistical model. The
program defines a probability distribution (probability measure) over
the set of possible Herbrand interpretations. In execution,
probabilities or samples of various probabilistic constructs in the
program will be calculated according to the defined distribution. In
learning, we perform ML (maximum likelihood) estimation of the
program parameters from incomplete data by the EM
(ExpectationMaximization) algorithm so that the defined distribution
is closer to the observed distribution. Because PRISM programs can
be arbitrary complex, our model can be arbitrary complex as well,
which means we have opportunities of building large yet understandable
symbolicstatistical models that go beyond traditional statistical
models. [Sato 01c] and [Sato 08a]
provide a comprehensive view of the theoretical background of PRISM.
At this point, we have to say, for fairness, that understanding PRISM
programming might require some learning efforts on the user side and
you have to take care not to violate certain statistical conditions
when writing codes. However PRISM programs just look like a Prolog
program with a couple of new builtin predicates. Accordingly, PRISM
programming is expected not to be much difficult for anyone who is
already familiar with Prolog and statistics.
We here list the (expected) merits of PRISM programming;

We can enjoy the expressive power of firstorder
rules. We can compactly express complex notions
in terms of predicates, logical connectives and logical variables
at abstract level, not at C code level. Also plenty of standard
Prolog builtin predicates are available.

PRISM has declarative semantics called distribution
semantics. This semantics enables us to understand
and verify programs declaratively with mathematical rigor. Besides,
we can write procedural subroutines at our own risk.

Probabilistic inference routines are builtin.
Marginal probability computation, Viterbi computation,
hindsight computation, forward and MCMC sampling, EM learning,
and Viterbi training are all builtin with a couple of variations.
So we need not invent or implement these probabilistic inference
algorithms for a new statistical model. This would be a
significant saving, considering time and efforts wasted by the
trialanderror phase of modeling.
Relaxing modeling assumptions
The current programming system makes several modeling assumptions in PRISM
programs for efficient probabilistic inferences. On the other hand, these
assumptions sometimes turn to be a serious restriction to users.
We have already shown a way to relax the uniqueness condition (all observable
goal patterns are exclusive to each other and their probabilities sum up
to unity) by considering failures in the generation process of possible
observations. Along with this line, now we are considering to explore
the ways to relax other assumptions to enhance the expressivity and
flexibility of the PRISM language.
PRISM is built on top of BProlog
and runs on Windows, Linux with x86 processors and Mac OS X.
Since version 2.0, the source code of the PRISMspecific part is open to public.
Visit release notes
for details, and here for previous versions.
The latest stable package is version 2.1 as of September 1, 2012.
The package contains executable binaries, source code of the PRISMspecific
part and program examples.
Based on the license agreement for version 2.x,
the users can use this package free of charge for academic use, and the
source code under the
modified
BSD license.
Highlights of 2.1:
 New builtins for MCMC sampling and Viterbi training
Notes on platforms:

The Linux package contains both binaries for 32bit and 64bit
systems. The startup command (
prism
, etc.) automatically
chooses the binary for your environment. Please note that we have only
tested these binaries with glibc 2.3 or higher.

In 32bit environments, it is empirically known that there is
difficulty in using more memory space than about 1GB.

We have not tested the Mac OS X package well, since our test
environment for Mac OS X is rather limited.

The binaries have been tested most intensively on 64bit Linux systems.
Comprehensive descriptions on PRISM
First, [Sato 01c] includes a full description of
the distribution semantics and the efficient architecture for probabilistic
inferences. On the other hand, [Sato 08a] adds
a couple of recent results. That is, [Sato 08a]
discusses on bridging a gap between discrete Bayesian networks and
probabilistic contextfree grammars, with some performance report.
It also presents a detailed description of our approach to deal with failures in
generative models. [Sato 08b] gives another
overview of PRISM, exploring probabilistic contextfree graph grammars.
[Kameya 07] is a guide on PRISM
for Japanese users.
 [Sato 09c]
Sato, T.:
Generative modeling by PRISM.
Proceedings of the 25th International Conference on Logic Programming
(ICLP2009), LNCS 5649, Springer, pp.24–35, 2009.
PDF (Springer),
PDF (Draft),
Slides
 [Sato 09b]
Sato, T.:
Logicbased probabilistic modeling.
Proceedings of the 16th Workshop on Logic, Language, Information and Computation
(WoLLIC2009), LNAI 5514, pp.61–71, 2009.
PDF (Draft),
PDF (Springer)
 [Sato 08b]
Sato, T.:
A glimpse of symbolicstatistical modeling by PRISM.
Journal of Intelligent Information Systems, Vol.31, No.2, pp.161–176, 2008.
PDF (Springer)
 [Sato 08a]
Sato, T. and Kameya, Y.:
New advances in logicbased probabilistic modeling by PRISM.
In Probabilistic Inductive Logic Programming,
LNCS 4911, Springer, pp.118–155, 2008.
PDF (Draft),
PDF (Springer)
 [Kameya 07]
Kameya, Y., Sato, T., Zhou, N.F., Izumi, Y. and Iwasaki, T.:
PRISM: A logic programming language and system for probabilistic modeling.
Computer Software,
Vol.24, No.4, pp.2–22, 2007 (in Japanese).
PDF
(JSSST JSTAGE)
 [Sato 01c]
Sato, T. and Kameya, Y.:
Parameter learning of logic programs for symbolicstatistical modeling.
Journal of Artificial Intelligence Research
(JAIR),
Vol.15,
pp.391–454, 2001.
PDF (JAIR site),
PDF (Draft)
Distribution semantics and PRISM as a modeling language
[Sato 95] introduced distribution semantics,
which is the theoretical (semantic) basis of PRISM programs, and also
presents an algorithm for parameter learning of a certain class
of logic programs, called BS programs.
[Sato 97] describes the design of PRISM language
and PRISM system, and shows the examples of writing hidden Markov models
(HMMs), Bayesian networks (BNs), and so on.
In [Sato 98], we model the
complicated interaction between geneinheritance and a tribal social
system discovered in the Kariera tribe.
[Sato 03] and
[Sato 05b] show how our generic approach
could reduce the modeler's effort, and [Sato 05b]
gives PRISM programs representing a probabilistic version of
two different types of contextfree graph grammars.
 [Sato 05b]
Sato, T.:
A generic approach to EM Learning for symbolicstatistical models.
Proceedings of the 4th Learning Language in Logic Workshop
(LLL05), 2005.
PDF
 [Sato 03]
Sato, T. and Zhou, N.F.:
A new perspective of PRISM relational modeling.
Proceedings of IJCAI03 workshop on Learning Statistical Models
from Relational Data
(SRL2003),
pp.133–139,
2003.
PS,
PS + gz,
PDF
 [Sato 98]
Sato, T.:
Modeling scientific theories as PRISM programs.
ECAI98 Workshop on Machine Discovery, pp.37–45, 1998.
PS,
PS + gz,
PDF
 [Sato 97]
Sato, T. and Kameya, Y.:
PRISM: A symbolicstatistical modeling language.
Proceedings of the 15th International Joint Conference on
Artificial Intelligence (IJCAI97), pp.1330–1335, 1997.
PS,
PS + gz,
PDF
 [Sato 95]
Sato, T.:
A statistical learning method for logic programs with distribution
semantics. Proceedings of the 12th International Conference
on Logic Programming (ICLP95), Tokyo, pp.715–729, 1995.
Extended version:
PS,
PS + gz,
PDF
Efficient probabilistic inferences for PRISM
After proposing PRISM, we have tackled the problem of efficiency in
EM learning for PRISM programs. While [Kameya 99]
could only state the necessity of the uniqueness condition (in which all observable
goal patterns are exclusive, and their probabilities sum to unity) and
some compact data structure, [Kameya 00] and
[Sato 00] proposed to use OLDT search
and probability computation based on dynamic programming for
efficient EM learning and finding the most likely explanation.
[Zhou 02], [Zhou 03a],
and [Kameya 04]
describe the further developments of the system.
Most recently we presented an insight that bridges a gap between BNs and PCFGs
from a viewpoint of probabilistic inference algorithms
[Sato 07], and proposed a dataparallel version
of our EM algorithm [Izumi 06].
 [Sato 07]
Sato, T.:
Insideoutside probability computation for belief propagation.
Proceedings of the 20th International Joint Conference on
Artificial Intelligence
(IJCAI2007),
pp.2605–2610, 2007.
PDF (Draft),
PDF (Online proceedings)
 [Izumi 06]
Izumi, Y., Kameya, Y. and Sato, T.:
Parallel EM Learning for SymbolicStatistical Models.
Proceedings of the International Workshop on DataMining and Statistical Science
(DMSS2006),
pp.133–140, 2006.
PDF
 [Kameya 04]
Kameya, Y., Sato, T. and Zhou, N.F.:
Yet more efficient EM learning for parameterized logic programs
by intergoal sharing.
Proceedings of the 16th European Conference on Artificial
Intelligence
(ECAI2004),
pp.490–494, 2004.
PDF
 [Zhou 03a]
Zhou, N.F., Sato, T., and Hashida, K.:
Toward a highperformance system for symbolic and statistical modeling.
Proceedings of IJCAI03 workshop on Learning Statistical Models
from Relational Data
(SRL2003),
pp.153–159,
2003.
PDF
 [Zhou 02]
Zhou, N.F. and Sato, T.:
Toward a Highperformance System for Symbolic and Statistical
Modeling,
Technical Report (Computer Science) TR200212,
City University of New York, 2002.
PDF
 [Sato 00]
Sato, T. and Kameya Y.:
A Viterbilike algorithm and EM learning for statistical abduction.
Proceedings of UAI2000 Workshop on Fusion of Domain
Knowledge with Data for Decision Support, 2000.
PS,
PS + gz,
PDF
 [Kameya 00]
Kameya, Y. and Sato, T.:
Efficient EM learning with tabulation for parameterized logic programs.
Proceedings of the 1st International Conference on Computational
Logic (CL2000),
LNAI
Vol.1861, pp.269–294, 2000.
PS,
PS + gz,
PDF
 [Kameya 99]
Kameya, Y., Ueda, N., and Sato, T.:
A graphical method for parameter learning of symbolicstatistical models.
Proceedings of the 2nd International Conference on Discovery
Science (DS99),
LNAI Vol.1721, pp.264–276, 1999.
PS,
PS + gz,
PDF
Learning through failure
On the basis of Cussens's FAM (failureadjusted maximization)
algorithm [Cussens 01] and
program transformation techniques [Sato 89],
we proposed a new learning scheme in which the users can add
constraints (which cause failures) to PRISM programs, with keeping
the efficency in learning.
 [Sato 06]
Sato, T. and Kameya, Y.:
Learning through failure.
Dagstuhl Seminar Proceedings on Probabilistic, Logical and
Relational Learning  Towards a Synthesis, 2006.
 [Sato 05a]
Sato, T., Kameya, Y. and Zhou, N.F.:
Generative modeling with failure in PRISM.
Proceedings of the 19th International Joint Conference on
Artificial Intelligence
(IJCAI2005), pp.847–852, 2005.
PDF (Draft),
PDF (Online proceedings)
 [Sato 04a]
Sato, T. and Kameya, Y.:
A dynamic programming approach to parameter learning of
generative models with failure.
Proceedings of ICML Workshop on Statistical Relational Learning
and its Connection to the Other Fields
(SRL2004), 2004.
PS,
PDF
 [Sato 04b]
Sato,T. and Kameya,Y.:
Negation elimination for finite PCFGs.
Proceedings of the International Symposium on
Logicbased Program Synthesis and Transformation 2004
(LOPSTR04), 2004.
Later selectively published as
Springer LNCS 3573,
S. Etalle (Ed.), pp.117–132, 2005.
PDF
Bayesian learning
In an engineering context, Bayesian learning gives us a systematic way to robust
prediction and model selection. Variational Bayes (VB) is known as an efficient
approximation method for Bayesian learning, and goes together well with our
dynamic programming approach via tabled search. In [Sato 09a],
we proposed a VBEM algorithm for PRISM programs, and confirmed its usefulness
with the examples from natural language processing and bioinformatics.
Furthermore, in [Sato 11], M. Johnson's MCMC algorithm for PCFGs
was generalized into the one for PRISM programs.
 [Sato 11]
Sato, T.:
A general MCMC method for Bayesian inference in logicbased probabilistic modeling.
Proceedings of the 22nd International Joint Conference on Artificial Intelligence (IJCAI2011), pp.1472–1477, 2011.
PDF
(Online proceedings)
 [Sato 09a]
Sato, T., Kameya, Y., Kurihara, K.:
Variational Bayes via propositionalized probability computation in PRISM.
Annals of Mathematics and Artificial Intelligence, Vol.54, No.1–3, pp.135–158, 2009.
PDF
(Springer)
Relaxing modeling assumptions
As described above, currently we are seeking the way to relax modeling
assumptions that can be an obstacle in building useful models.
Following [De Raedt 07] and
[Minato 07], in
[Ishihata 08a] and
[Ishihata 08b], we proposed the use of
(zerosuppressed) binary decision diagrams ((Z)BDDs) as a new data structure for
the EM algorithm. This would be a clue to remove the exclusiveness condition
assumed in PRISM. In [Ishihata 10],
multivalued random variables are efficiently handled in EM learning on BDDs.
Recently [Sato 12a] proposed Viterbi training,
a parameter learning framerwork based on Viterbi computation. Viterbi training
is interesting since it does not necessarily require the exclusiveness condition and it seems
appropriate for the applications where we eventually need just one explanation for every goal
(for example, statistical parsing). Another recent work is [Sato 12b],
in which a generalized algorithm for prefix probability computation is proposed.
This inference algorithm can lead to a relaxation of the acyclicity condition.
 [Sato 12b]
Sato, T. and Meyer, P.:
Tabling for infinite probability computation.
The 28th International Conference on Logic Programming,
(ICLP2012),
to appear as Technical Communications, 2012.
PDF (Draft)
 [Sato 12a]
Sato, T. and Kubota, K.:
Viterbi training in PRISM.
ICML Workshop on Statistical Relational Learning,
(SRL2012),
2012.
PDF (Draft)
 [Ishihata 10]
Ishihata M., Kameya, Y., Sato, T. and Minato, S.:
An EM algorithm on BDDs with order encoding for logicbased
probabilistic models.
Proceedings of the 2nd Asian Conference on Machine Learning
(ACML2010),
pp.161–176, 2010.
PDF (Online proceedings)
 [Ishihata 08b]
Ishihata, M., Kameya, Y., Sato, T. and Minato, S.:
Propositionalizing the EM algorithm by BDDs.
Late breaking papers at the 18th International Conference on
Inductive Logic Programming (ILP2008), pp.44–49, 2008.
PDF
 [Ishihata 08a]
Ishihata, M., Kameya, Y., Sato, T. and Minato, S.:
Propositionalizing the EM algorithm by BDDs,
Technical Report TR080004, Dept. of Computer Science,
Tokyo Institute of Technology, June, 2008.
PDF
Applications
So far, PRISM has been applied to biological sequence analysis (thanks to
Henning Christiansen and the members of the LoSt project),
analysis of music (thanks to Jon Sneyers and his colleagues), probabilistic planning,
and so on. [Christiansen 09] experiments with optimization
of PRISM models inspired from genomic sequence analysis (e.g. gene finding
in E. Coli) in biology.
The authors introduce hidden Markov models to discriminate the
coding/noncoding sections, and stochastic contextfree grammars
to capture the nested hairpin structures in the secondary structure
of RNA. [Christiansen 10] integrates
sideconstraints with HMMs written in PRISM, and applies such constrained HMMs to
global pairwise alignment, a problem of biological
sequence analysis. [Mørk 12] fully utilizes PRISM system's
learning facilities for statistical evaluation of potential HMM structures
in a bacterial genefinding task. [Sneyers 06] proposes a
probabilisticlogical model of music, into which the synthetic aspect and the analytic aspect
of music are incorporated simultaneously.
 [Mørk 12]
Mørk, S. and Holmes, I.:
Evaluating bacterial genefinding HMM structures as probabilistic logic programs.
Bioinformatics, Vol.28, No.5, pp.636–642,
doi:10.1093/bioinformatics/btr698, 2012.
 [Christiansen 10]
Christiansen, H., Have, C. T., Lassen, O. T. and Petit, M.:
Inference with constrained hidden Markov models in PRISM.
Theory and Practice of Logic Programming, Vol.10, No.4–6, p. 449–464, 2010.
 [Christiansen 09]
Christiansen, H. and Lassen, O. T.:
Preprocessing for optimization of probabilisticlogic models for
sequence analysis.
Proceedings of the 25th International Conference on Logic Programming
(ICLP2009), LNCS 5649, Springer, pp.70–83, 2009.
 [Sneyers 06]
Sneyers, J., Vennekens, J. and De Schreye, D.:
Probabilisticlogical modeling of music.
Proceedings of the 8th International Symposium on Practical
Aspects of Declarative Languages (PADL06), LNCS 3819, Springer, pp.60–72, 2006.
Extensions
We believe that PRISM can still be extended further.
[Sneyers 10] built a system for probabilistic constraint handling rules (CHRs) on top of the PRISM system.
 [Sneyers 10]
Sneyers, J., Meert, W., Vennekens, J., Kameya, Y. and Sato, T.:
CHR(PRISM)based probabilistic logic learning.
Theory and Practice of Logic Programming, Vol.10, No.4–6, pp.433–447, 2010.
Advanced logic programming techniques
As described above,
[Tamaki and Sato 86],
[Zhou 03b], [Zhou 04]
and [Zhou 08]
proposed tabled search techniques for logic programs, which play
a central role in efficient EM learning of PRISM programs.
Also, the learning algorithm for generative models with failure is built
based on a program transformation proposed in [Sato 89].
 [Zhou 08]
Zhou, N.F., Sato, T. and Shen, Y.D.:
Linear tabling strategies and optimization.
Theory and Practice of Logic Programming,
Vol.8, No.1, pp.81–109, 2008.
PDF (arXiv.org)
 [Zhou 04]
Zhou, N.F., Shen, Y.D. and Sato, T.:
Seminaive evaluation in linear tabling.
Proceedings of the 6th ACMSIGPLAN International Conference on
Principles and Practice of Declarative Programming
(PPDP04), pp.90–97,
2004.
PDF
 [Zhou 03b]
Zhou, N.F. and Sato, T.:
Efficient fixpoint computation in linear tabling.
Proc. of the 5th ACMSIGPLAN International Conference on
Principles and Practice of Declarative Programming
(PPDP 03), pp.275–283, 2003.
PDF
 [Sato 89]
Sato, T.:
First order compiler: A deterministic logic program synthesis algorithm.
Journal of Symbolic Computation, Vol.8, pp.605–627, 1989.
PDF
 [Tamaki and Sato 86]
Tamaki, H. and Sato, T.:
OLD resolution with tabulation.
Proc. of the 3rd International Conference on Logic Programming
(ICLP 86), LNCS 225, pp.84–98, 1986.
PDF,
PDF (Springer)
Advanced statistical learning techniques
[Cussens 01] proposed the FAM
(failureadjusted maximization) algorithm, which is implemented in the programming system
(since version 1.8) to deal with failures in generative models.
[Kurihara 06] provides a variational Bayesian
(VB) learning method for probabilistic contextfree grammars, which forms a basis of the VB learning routines of the programming system (since version 1.11).
The deterministic annealing EM algorithm, which is utilized for avoiding undesirable
local maxima in EM learning, and its extension to variational Bayesian learning were
proposed in [Ueda 98] and
[Katahira 08], respectively.
 [Katahira 08]
Katahira, K., Watanabe, K. and Okada, M.:
Deterministic annealing variant of variational Bayes method.
Journal of Physics, Conference Series, Vol.95, 012015, 2008.
 [Kurihara 06]
Kurihara, K. and Sato, T.:
Variational Bayesian grammar induction for natural language.
Proceedings of the 8th International Colloquium on Grammatical Inference
(ICGI2006),
pp.84–95, 2006.
PDF
(copyright SpringerVerlag)
 [Cussens 01]
Cussens, J.:
Parameter estimation in stochastic logic programs.
Machine Learning, Vol.44, Issue 3, pp.245–271, 2001.
 [Ueda 98]
Ueda, N. and Nakano, R.:
Deterministic annealing EM algorithm.
Neural Networks, Vol.11, No.2, pp.271–282, 1998.
Integration with BDDs
Binary decision diagrams (BDDs) are known as a compact representation of
Boolean (propositional) formulas, and utilized in the recent research
on statistical relational learning (SRL) or probabilistic logic learning
(PLL). ProbLog [De Raedt 07] is the first
probabilisticlogical framework (to our knowledge) that introduces BDDs as
a basic data structure for probability computation.
[Minato 07] proposes an efficient
probabilistic inference on Bayesian networks by compiling them into
Zerosuppressed BDDs.
 [De Raedt 07]
De Raedt, L., Kimmig, A. and Toivonen, H.:
ProbLog: A probabilistic Prolog and its application in link discovery.
Proceedings of the 20th International Joint Conference on
Artificial Intelligence
(IJCAI2007),
pp.2462–2467, 2007.
 [Minato 07]
Minato, S., Satoh, K., and T. Sato:
Compiling Bayesian networks by symbolic probability calculation
based on Zerosuppressed BDDs.
Proceedings of the 20th International Joint Conference on
Artificial Intelligence
(IJCAI2007),
pp.2550–2555, 2007.
PDF (Draft),
PDF (Online proceedings)
Fast EM learning for probabilistic CFG
As an instance of the learning framework for PRISM, in
[Kameya 01], we focused on parametric (EM) learning
of probabilistic CFGs, or CFGs with contextsensitive probabilities.
For these model classes, efficient CFG parsers (e.g. CYK, Earley, Tomita parser,
and so on) can be used instead of the OLDT interpreter. Using two actual corpora and
handcrafted grammars of moderate size with different characters,
we experimentally confirmed in [Sato 01a]
that the speedup by our approach of EM learning per iteration amounts
to from hundreds times to over a thousand times compared to the
traditional approach using the InsideOutside algorithm.
[Sato 01b] deals with the probabilistic
Earley parsing by our approach.
 [Sato 01b]
Sato, T., Abe, S., Kameya, Y., and Shirai, K.:
A separateandlearn approach to EM learning of PCFGs.
Proceedings of the Sixth Natural Language Processing Pacific Rim Symposium
(NLPRS2001),
pp.255–262, 2001.
Revised version:
PS,
PS + gz,
PDF
 [Sato 01a]
Sato, T., Kameya, Y., Abe, S., and Shirai, K.:
Fast EM learning of a family of PCFGs.
Technical Report
TR010006,
Tokyo Institute of Technology, May, 2001.
PS,
PS + gz,
PDF
 [Kameya 01]
Kameya, Y., Mori, T. and Sato, T.:
Efficient EM learning of probabilistic CFGs and their extensions
by using WFSTs.
Journal of Natural Language Processing,
Vol.8, No.1, pp.49–84, 2001 (in Japanese).
PS,
PS + gz,
PDF
The system is under development, and we keep the latest version
downloadable from this page. Please send requests on license, questions
or bug reports to
(please replace [at]
with @
):
prismquery [at] mi.cs.titech.ac.jp
Last update: September 1, 2012