Abstract: Numerous initiatives have allowed users to share knowledge or opinions using collaborative platforms. In most cases, the users provide a textual description of their knowledge, following very limited or no constraints. Here, we tackle the classification of documents written in such an environment. As a use case, our study is made in the context of text mining evaluation campaign material, related to the classification of cooking recipes tagged by users from a collaborative website. This context makes some of the corpus specificities difficult to model for machine-learning-based systems and keyword or lexical-based systems. In particular, different authors might have different opinions on how to classify a given document. The systems presented hereafter were submitted to the D´Efi Fouille de Textes 2013 evaluation campaign, where they obtained the best overall results, ranking first on task 1 and second on task 2. In this paper, we explain our approach for building relevant and effective systems dealing with such a corpus.
Abstract: We address the problem of automatically processing collocations—a subclass of multi-word expressions characterized by a high degree of morphosyntactic flexibility—in the context of two major applications, namely, syntactic parsing and machine translation. We show that parsing and collocation identification are processes that are interrelated and that benefit from each other, inasmuch as syntactic information is crucial for acquiring collocations from corpora and, vice versa, collocational information can be used to improve parsing performance. Similarly, we focus on the interrelation between collocations and machine translation, highlighting the use of translation information for multilingual collocation identification, as well as the use of collocational knowledge for improving translation. We give a panorama of the existing relevant work, and we parallel the literature surveys with our own experiments involving a symbolic parser and a rule-based translation system. The results show a significant improvement over approaches in which the corresponding tasks are decoupled.
Abstract: On being promoted to a personal chair in 1993 I chose the title of Professor of Informatics, specifically acknowledging Donna Haraway’s definition of the term as the “technologies of information [and communication] as well as the biological, social, linguistic and cultural changes that initiate, accompany and complicate their development” .This neatly encapsulated the plethora of issues emanating from these new technologies, inviting contributions and analyses from a wide variety of disciplines and practices.(In my later work Thinking Informatically  I added the phrase “and communication”.) In the intervening time the word informatics itself has been appropriated by those more focused on computer science, although why an alternative term is needed for a well-understood area is not entirely clear.Indeed the term is used both as an alternative term and as an additional one—i.e. “computer science and informatics”.