Given a word, the goal is to predict the label as B, I, L, U, or O. Since whether a phrase is a keyword is ambiguous by nature, instead of a hard prediction, the classifier 1512 needs to predict how likely it is that a candidate has a particular label.