Table of Contents
This section explains the data format used by CRFsuite for training and tagging. A data consists of a set of item sequences each of which is represented by consecutive lines and terminated by an empty line. An item sequence consists of a series of items whose characteristics (labels and attributes) are described in lines. An item line begins with its label, followed by its attributes separated by TAB ('\t') characters.
This is an example of training data (obtained from the CoNLL 2000 chunking shared task).
This example contains four item sequences (the last one is partially shown). The first item of the first sequence is annotated with the label "BNP" and characterized by five attributes, "w[0]=An", "w[1]=A.P.", "pos[0]=DT", "pos[1]=NNP", and "__BOS__". The labels and attributes in this example follow a certain naming convension (feature design); for example, "BNP" presents that the current token is a beginning of a noun phrase, "w[0]=An" presents that the surface form of the current item is "An", "pos[1]=NNP" presents that the next token is a proper noun, and "__BOS__" presents that the current item is the first item in a sequence. However, CRFsuite does not care for the naming convension nor feature design of labels and attributes, but treats them as mere strings. CRFsuite learns weights of associations (feature weights) between attributes and labels (e.g., if the current item has an attribute "pos[0]=DT", it is likely to have the label "BNP"), without knowing the meaning of labels and attributes. In other words, one can design and use arbiterary features just by writing label and attribute names in data sets.
An attribute can have a scaling value separated by a colon character (':'). Formally, the amount of the influence of a feature is determined by a scaling value of the corresponding attribute multiplied by the feature weight. Roughly speaking, a scaling value of an attribute has the similar effect to the frequency of occurrences of the attribute, but can be decimal or negative. Note that a large scaling value may cause an overflow (range error) in training. Because a colon character has the special role in data sets, CRFsuite employs escape sequences; "\:" and "\\" represent ':' and '\', respectively, for attribute names. If an attribute value is omitted (without colon character), CRFsuite assumes the scaling value to be one. For example, these three items are identical in terms of attributes and their scaling values:
BNP w[1..4]=a:2 w[1..4]=man w[1..4]=eats BNP w[1..4]=a w[1..4]=a w[1..4]=man w[1..4]=eats BNP w[1..4]=a:2.0 w[1..4]=man:1.0 w[1..4]=eats:1.0
The data format for tagging is exactly the same as that for training, except that labels in the tagging data can be empty (but cannot be omitted). When tagging, CRFsuite ignores labels in the input data or uses them for measuring the performance of predictions.
This is the BNF notation representing the data format.
<line> ::= <item>  <eos> <item> ::= <label> ('\t' <attribute>)+ <br> <eos> ::= <br> <label> ::= <string> <attribute> ::= <name>  <name> ':' <scaling> <name> ::= (<letter>  "\:"  "\\")+ <scaling> ::= <numeric> <br> ::= '\n'
The easiest way for installing CRFsuite is to use a binary distribution. Currently, binaries for Win32 and Linux (Intel 32bit and 64bit architectures) are distributed.
As of CRFsuite 0.5, the source package no longer include the portion of libLBFGS. In order to build CRFsuite, you need to download and build libLBFGS first.
In Windows environments, open the Visual Studio solution file (lbfgs.sln
) of libLBFGS, and build it. The solution file builds a staticlink library, lbfgs.lib
(release build) or lbfgs_debug.lib
(debug build), at Release
or Debug
directory.
Because the solution file (crfsuite.sln
) of CRFsuite assumes that the header and library files of libLBFGS exist in win32/lbfgs
directory, create this directory, and copy lbfgs.h
, lbfgs.lib
and/or lbfgs_debug.lib
into the directory.
Then open the solution file (crfsuite.sln
) and build it.
In Linux environments, download the source package of libLBFGS, and build it.
If you do not want to install libLBFGS into your operating system, specify "prefix" option to the configure script.
This example installs libLBFGS to the directory local
under the home directory ($HOME
).
$ ./configure prefix=$HOME/local $ make $ make install
Now you are ready to build CRFsuite. If you have libLFGS installed to a different directory, please specify the directory in the argument of "withliblbfgs" option.
$ ./configure prefix=$HOME/local withliblbfgs=$HOME/local $ make $ make install
CRFsuite utility expects the first commandline argument to be a command name:
 learn
 Train a CRF model from a training set.
 tag
 Tag sequences using a CRF model.
 dump
 Dump a CRF model in plaintext format.
To see the commandline syntax, use h (help) option.
$ crfsuite h CRFSuite 0.12 Copyright (c) 20072011 Naoaki Okazaki USAGE: crfsuite <COMMAND> [OPTIONS] COMMAND Command name to specify the processing OPTIONS Arguments for the command (optional; commandspecific) COMMAND: learn Obtain a model from a training set of instances tag Assign suitable labels to given instances by using a model dump Output a model in a plaintext format For the usage of each command, specify h option in the command argument.
To train a CRF model from a training set, enter the following command,
$ crfsuite learn [OPTIONS] [DATA]
If the argument DATA is omitted or '', this utility reads a training data from STDIN. To see the usage of learn command, specify h (help) option.
$ crfsuite learn h CRFSuite 0.12 Copyright (c) 20072011 Naoaki Okazaki USAGE: crfsuite learn [OPTIONS] [DATA1] [DATA2] ... Trains a model using training data set(s). DATA file(s) corresponding to data set(s) for training; if multiple N files are specified, this utility assigns a group number (1...N) to the instances in each file; if a file name is '', the utility reads a data set from STDIN OPTIONS: t, type=TYPE specify a graphical model (DEFAULT='1d'): (this option is reserved for the future use) 1d 1storder Markov CRF with state and transition features; transition features are not conditioned on observations a, algorithm=NAME specify a training algorithm (DEFAULT='lbfgs') lbfgs LBFGS with L1/L2 regularization l2sgd SGD with L2regularization ap Averaged Perceptron pa Passive Aggressive arow Adaptive Regularization of Weights (AROW) p, set=NAME=VALUE set the algorithmspecific parameter NAME to VALUE; use 'H' or 'helpparameters' with the algorithm name specified by 'a' or 'algorithm' and the graphical model specified by 't' or 'type' to see the list of algorithmspecific parameters m, model=FILE store the model to FILE (DEFAULT=''); if the value is empty, this utility does not store the model g, split=N split the instances into N groups; this option is useful for holdout evaluation and cross validation e, holdout=M use the Mth data for holdout evaluation and the rest for training x, crossvalidate repeat holdout evaluations for #i in {1, ..., N} groups (Nfold cross validation) l, logtofile write the training log to a file instead of to STDOUT; The filename is determined automatically by the training algorithm, parameters, and source files L, logbase=BASE set the base name for a log file (used with l option) h, help show the usage of this command and exit H, helpparameters show the help message of algorithmspecific parameters; specify an algorithm with 'a' or 'algorithm' option, and specify a graphical model with 't' or 'type' option
The following options are available for training.
 t, type=TYPE
 Specify a graphical model used for feature generation. The default value is "1d".
 1d
 The 1storder Markov CRF with state and transition features (dyad features). State features are conditioned on combinations of attributes and labels, and transition features are conditioned on label bigrams.
 a, algorithm=NAME
 Specify a training algorithm. The default value is "lbfgs".
 lbfgs
 Gradient descent using the LBFGS method
 l2sgd
 Stochastic Gradient Descent with L2 regularization term
 ap
 Averaged Perceptron
 pa
 Passive Aggressive (PA)
 arow
 Adaptive Regularization Of Weight Vector (AROW)
 p, param=NAME=VALUE
 Configure a parameter for the training. CRFsuite sets the parameter (NAME) to VALUE. Available parameters depend on the graphical model and training algorithm selected. To see the help message of available parameters, use 'H' or 'helpparameters' with the algorithm name specified by 'a' or 'algorithm' and the graphical model specified by 't' or 'type'.
 m, model=MODEL
 Store the trained model to a file MODEL. The default value is "" (empty). CRFsuite does not store the model to a file when MODEL is empty.
 g, split=N
 Split the instances into N groups, and assign a number in {1, ..., N} to each group. This option is mostly used to perform Nfold cross validation (with x option). By default, CRFsuite does not split input data into groups.
 e, holdout=M
 Use the instances of group number M for holdout evaluation. CRFsuite does not use the instances of group number M for training. By default, CRFsuite does not perform holdout evaluation.
 x, crossvalidate
 Perform Nfold cross validation. Specify the number of splits with g option. By default, CRFsuite does not perform cross validation.
 l, logtofile
 Write out the log message of training to a file. The file name is determined automatically from the commandline arguments (e.g., training algorithm, graphical model, parameters, source files). By default, CRFsuite writes out the log message to STDOUT.
 L, logbase=BASE
 Specify the base name for the log file (used with l option). By default, the base name is "log.crfsuite".
 h, help
 Show the usage of this command and exit.
 H, helpparameters
 Show the list of parameters and their descriptions. Specify the graphical model and training algorithm with t and a options, respectively.
 p, param=NAME=VALUE
 Configure a parameter for the training. CRFsuite sets the parameter (NAME) to VALUE. To see the list of parameters and their descriptions, use H (helpparameters) option.
Here are some examples of CRFsuite commandlines for training.

Train a CRF model from
train.txt
with the default parameters, and store the model toCRF.model
. 
$ crfsuite learn m CRF.model train.txt

Train a CRF model from
STDIN
with the default parameters. 
$ cat train.txt  crfsuite learn 

Train a CRF model from
train.txt
(group #1). During the trainig, test the model with a holdout datatest.txt
(group #2). 
$ crfsuite learn e2 train.txt test.txt

Perform 10fold crossvalidation on the training data
train.txt
. The log output will be stored inlog.crfsuite_lbfgs
(the name of the log file may vary depending on the training parameters). 
$ crfsuite learn g10 x l train.txt
The 1storder Markov CRF with state and transition features (dyad features). State features are conditioned on combinations of attributes and labels, and transition features are conditioned on label bigrams.
 feature.minfreq=VALUE
 Cutoff threshold for occurrence frequency of a feature. CRFsuite will ignore features whose frequencies of occurrences in the training data are no greater than VALUE. The default value is 0 (i.e., no cutoff).
 feature.possible_states=BOOL
 Specify whether CRFsuite generates state features that do not even occur in the training data (i.e., negative state features). Setting BOOL to 1, CRFsuite generates state features that associate all of possible combinations between attributes and labels. Suppose that the numbers of attributes and labels are A and L respectively, this function will generate (A * L) features. Enabling this function may improve the labeling accuracy because the CRF model can learn the condition where an item is not predicted to its reference label. However, this function may also increase the number of features and slow down the training process drastically. This function is disabled by default.
 feature.possible_transitions=BOOL
 Specify whether CRFsuite generates transition features that do not even occur in the training data (i.e., negative transition features). Setting BOOL to 1, CRFsuite generates transition features that associate all of possible label pairs. Suppose that the number of labels in the training data is L, this function will generate (L * L) transition features. This function is disabled by default.
Here are some examples of CRFsuite commandlines.
 Features occurring less than twice will be unused for training.

$ crfsuite learn m CRF.model p feature.minfreq=2 train.txt
 Generate negative state and transition features (aka, dense feature set).

$ crfsuite learn m CRF.model p feature.possible_states=1 p feature.possible_transitions=1 train.txt
Maximize the logarithm of the likelihood of the training data with L1 and/or L2 regularization term(s) using the Limitedmemory BroydenFletcherGoldfarbShanno (LBFGS) method. When a nonzero coefficient for L1 regularization term is specified, the algorithm switches to the OrthantWise Limitedmemory QuasiNewton (OWLQN) method. In practice, this algorithm improves feature weights very slowly at the beginning of a training process, but converges to the optimal feature weights quickly in the end.
 c1=VALUE
 The coefficient for L1 regularization. If a nonzero value is specified, CRFsuite switches to the OrthantWise Limitedmemory QuasiNewton (OWLQN) method. The default value is zero (no L1 regularization).
 c2=VALUE
 The coefficient for L2 regularization. The default value is 1.
 max_iterations=NUM
 The maximum number of iterations for LBFGS optimization. The LBFGS routine terminates if the iteration count exceeds this value. The default value is set to the maximum value of integer on the machine (INT_MAX).
 num_memories=NUM
 The number of limited memories that LBFGS uses for approximating the inverse hessian matrix. The default value is 6.
 epsilon=VALUE
 The epsilon parameter that determines the condition of convergence. The default value is 1e5.
 stop=NUM
 The duration of iterations to test the stopping criterion. The default value is 10.
 delta=VALUE
 The threshold for the stopping criterion; an LBFGS iteration stops when the improvement of the log likelihood over the last ${stop} iterations is no greater than this threshold. The default value is 1e5.
 linesearch=STRING
 The line search method used in the LBFGS algorithm. Available methods are: "MoreThuente" (MoreThuente method proposd by More and Thuente), "Backtracking" (Backtracking method with regular Wolfe condition), and "StrongBacktracking" (Backtracking method with strong Wolfe condition). The default method is "MoreThuente".
 max_linesearch=NUM
 The maximum number of trials for the line search algorithm. The default value is 20.
Here are some examples of commandlines for LBFGS training.
 Training a model with L2 regularization (c1=0, c2=1.0).

$ crfsuite learn m CRF.model a lbfgs p c2=1 train.txt
 Training a model with L1 regularization (c1=1.0, c2=0).

$ crfsuite learn m CRF.model a lbfgs p c1=1 p c2=0 train.txt
 Training a model with L1 and L2 regularization (c1=1.0, c2=1.0).

$ crfsuite learn m CRF.model a lbfgs p c1=1 p c2=1 train.txt
Maximize the logarithm of the likelihood of the training data with L2 regularization term(s) using Stochastic Gradient Descent (SGD) with batch size 1. This algorithm usually approaches to the optimal feature weights quite rapidly, but shows slow convergences at the end.
 c2=VALUE
 The coefficient for L2 regularization. The default value is 1.
 max_iterations=NUM
 The maximum number of iterations (epochs) for SGD optimization. The optimization routine terminates if the iteration count exceeds this value. The default value is 1000.
 period=NUM
 The duration of iterations to test the stopping criterion. The default value is 10.
 delta=VALUE
 The threshold for the stopping criterion; an optimization process stops when the improvement of the log likelihood over the last ${period} iterations is no greater than this threshold. The default value is 1e5.
 calibration.eta=VALUE
 The initial value of learning rate (eta) used for calibration. The default value is 0.1.
 calibration.rate=VALUE
 The rate of increase/decrease of learning rate for calibration. The default value is 2.
 calibration.samples=NUM
 The number of instances used for calibration. The calibration routine randomly chooses instances no larger than VALUE. The default value is 1000.
 calibration.candidates=NUM
 The number of candidates of learning rate. The calibration routine terminates after finding NUM candidates of learning rates that can increase loglikelihood. The default value is 10.
 calibration.max_trials=NUM
 The maximum number of trials of learning rates for calibration. The calibration routine terminates after trying NUM candidate values of learning rates. The default value is 20.
Here is an example of commandline for SGD training.
 Training a model with L2 regularization (c2=1.0).

$ crfsuite learn m CRF.model a l2sgd p c2=1 train.txt
When the current model parameter cannot predict the item sequence correctly, this algorithm applies perceptron updates to the model. The algorithm takes the average of feature weights at all updates in the training process. The algorithm is fastest in terms of training speed. Even though the algorithm is very simple, it exhibits high prediction performance. In practice, it is necessary to stop a training process by specifying the maximum number of iterations, which should be tuned on a development set.
 max_iterations=NUM
 The maximum number of iterations (epochs). The optimization routine terminates if the iteration count exceeds this value. The default value is 100.
 epsilon=VALUE
 The epsilon parameter that determines the condition of convergence. The optimization routine terminates if the ratio of incorrect labels predicted by the model is no greater than VALUE. The default value is 1e5.
Here is an example of commandline for Averaged Perceptron.
 Training a model with 10 iterations at most.

$ crfsuite learn m CRF.model a ap p max_iterations=10 train.txt
Given an item sequence (x, y) in the training data, the algorithm computes the loss: s(x, y')  s(x, y) + sqrt(d(y', y)), where s(x, y') is the score of the Viterbi label sequence, s(x, y) is the score of the label sequence of the training data, and d(y', y) measures the distance between the Viterbi label sequence (y') and the reference label sequence (y). If the item suffers from a nonnegative loss, the algorithm updates the model based on the loss.
 type=NUM
 The strategy for updating feature weights: PA without slack variables (0), PA type I (1), or PA type II (2). The default value is 1.
 c=VALUE
 Aggressiveness parameter (used only for PAI and PAII). This parameter controls the influence of the slack term on the objective function. The default value is 1.
 error_sensitive=BOOL
 If this parameter is true (nonzero), the optimization routine includes into the objective function the square root of the number of incorrect labels predicted by the model. The default value is 1 (true).
 averaging=BOOL
 If this parameter is true (nonzero), the optimization routine computes the average of feature weights at all updates in the training process (similarly to Averaged Perceptron). The default value is 1 (true).
 max_iterations=NUM
 The maximum number of iterations (epochs). The optimization routine terminates if the iteration count exceeds this value. The default value is 100.
 epsilon=VALUE
 The epsilon parameter that determines the condition of convergence. The optimization routine terminates if the mean loss is no greater than VALUE. The default value is 1e5.
Given an item sequence (x, y) in the training data, the algorithm computes the loss: s(x, y')  s(x, y), where s(x, y') is the score of the Viterbi label sequence, and s(x, y) is the score of the label sequence of the training data.
 variance=VALUE
 The initial variance of every feature weight. The algorithm initialize a vector of feature weights as a multivariate Gaussian distribution with mean 0 and variance VALUE. The default value is 1.
 gamma=VALUE
 The tradeoff between loss function and changes of feature weights. The default value is 1.
 max_iterations=NUM
 The maximum number of iterations (epochs). The optimization routine terminates if the iteration count exceeds this value. The default value is 100.
 epsilon=VALUE
 The epsilon parameter that determines the condition of convergence. The optimization routine terminates if the mean loss is no greater than VALUE. The default value is 1e5.
To tag a data using a CRF model, enter the following command,
$ crfsuite tag [OPTIONS] [DATA]
If the argument DATA is omitted or '', CRFsuite reads a data from STDIN.To see the usage of tag command, specify h (help) option.
$ crfsuite tag h CRFSuite 0.12 Copyright (c) 20072011 Naoaki Okazaki USAGE: crfsuite tag [OPTIONS] [DATA] Assign suitable labels to the instances in the data set given by a file (DATA). If the argument DATA is omitted or '', this utility reads a data from STDIN. Evaluate the performance of the model on labeled instances (with t option). OPTIONS: m, model=MODEL Read a model from a file (MODEL) t, test Report the performance of the model on the data r, reference Output the reference labels in the input data p, probability Output the probability of the label sequences i, marginal Output the marginal probabilities of items q, quiet Suppress tagging results (useful for test mode) h, help Show the usage of this command and exit
The following options are available for tagging.
 m, model=MODEL
 A filename from which CRFsuite reads a CRF model.
 t, test
 Evaluate the performance (accuracy, precision, recall, f1 measure) of the CRF model, assuming that the input data is labeled. This function is disabled by default.
 r, reference
 Output the reference labels in parallel with predicted labels, assuming that the input data is labeled. This function is disabled by default.
 p, probability
 Output the probabilities of label sequences predicted by the model. When this function is enabled, a label sequence begins with a line "@probability\tx.xxxx", where "x.xxxx" presents the probability of the sequence and "\t" denotes a TAB character. This function is disabled by default.
 i, marginal
 Output the marginal probabilities of labels. When this function is enabled, each predicted label is followed by ":x.xxxx", where "x.xxxx" presents the probability of the label. This function is disabled by default.
 q, quiet
 Suppress the output of tagged labels. This function is useful for evaluating a CRF model with t option.
 h, help
 Show the usage of this command and exit.
Here are some examples of CRFsuite commandlines for tagging.

Tag a data
test.txt
using a CRF modelCRF.model

$ crfsuite tag m CRF.model test.txt

Evaluate a CRF model
CRF.model
on the labeled datatest.txt
. 
$ crfsuite tag m CRF.model qt test.txt