When we develop voice-activated human-appliance interface systems in smart homes, named entity recognition (NER) is an essential tool for extracting execution targets from natural language commands. Previous studies on NER systems generally include supervised machine-learning methods that require a substantial amount of human-annotated training corpus. In the smart home environment, categories of named entities should be defined according to voice-activated devices (e.g., food names for refrigerators and song titles for music players). The previous machine-learning methods make it difficult to change categories of named entities because a large amount of the training corpus should be newly constructed by hand. To address this problem, we present a semi-supervised NER system to minimize the time-consuming and labor-intensive task of constructing the training corpus. Our system uses distant supervision methods with two kinds of auto-labeling processes: auto-labeling based on heuristic rules for single-class named entity corpus generation and auto-labeling based on a pre-trained single-class NER model for multi-class named entity corpus generation. Then, our system improves NER accuracy by using a bagging-based active learning method. In our experiments that included a generic domain that featured 11 named entity classes and a context-specific domain about baseball that featured 21 named entity classes, our system demonstrated good performances in both domains, with F1-measures of 0.777 and 0.958, respectively. Since our system was built from a relatively small human-annotated training corpus, we believe it is a viable alternative to current NER systems in smart home environments.
This is an open access article distributed under the Creative Commons Attribution License
which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited