Machine Learning and Natural Language Processing (ML & NLP)

A special issue of Stats (ISSN 2571-905X).

Deadline for manuscript submissions: 30 June 2025 | Viewed by 7530

Special Issue Editor


E-Mail Website
Guest Editor
CHROME, University of Nîmes, Avenue du Dr. Georges Salan, 30000 Nimes, France
Interests: mathematical statistics; econometrics (Gini regressions); data analysis (on l1 norm and Gini metrics); machine learning; neural networks; stochastic dominance; inequality measurement; social choice; game theory
Special Issues, Collections and Topics in MDPI journals

Special Issue Information

Dear Colleagues,

The Special Issue “Machine learning and Natural Language Processing (ML & NLP)” is dedicated to the modelization of predictive tasks and classification tasks on different types of texts.

The Special Issue is concerned with textmining and natural language processing, which comprises, among other things:

- Ontology, embeddings (static), transformers (contextual embeddings), NER (named entities recognition), text classification, text generation (GAN), text summarization, word clustering, sentiment analysis, pattern recognition, tagging/annotations, document summarization, entity extraction, etc.

The Special Issue is dedicated to the modelization (statistical models) applied to texts, such as supervised and non-supervised models, and algorithms that help to identify patterns in texts. 

A special focus could be placed on explainable artificial intelligence (XAI), i.e., neural networks applied to texts in order to determine words and sentences that contribute to the prediction/classification made by the neural network.

Applications to all kinds of texts is welcome, and in this respect many fields may be covered: health, biomedical research, law, jurimetrics, economics, finance, etc.

This research is funded by the French National Research Agency: ANR LAWBOT ANR-20-CE38-0013.

Prof. Dr. Stéphane Mussard
Guest Editor

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Stats is an international peer-reviewed open access quarterly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 1600 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • textmining
  • embeddings
  • named entities
  • sentiment analysis
  • XAI

Benefits of Publishing in a Special Issue

  • Ease of navigation: Grouping papers by topic helps scholars navigate broad scope journals more efficiently.
  • Greater discoverability: Special Issues support the reach and impact of scientific research. Articles in Special Issues are more discoverable and cited more frequently.
  • Expansion of research network: Special Issues facilitate connections among authors, fostering scientific collaborations.
  • External promotion: Articles in Special Issues are often promoted through the journal's social media, increasing their visibility.
  • e-Book format: Special Issues with more than 10 articles can be published as dedicated e-books, ensuring wide and rapid dissemination.

Further information on MDPI's Special Issue policies can be found here.

Published Papers (3 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

16 pages, 1499 KiB  
Communication
Article 700 Identification in Judicial Judgments: Comparing Transformers and Machine Learning Models
by Sid Ali Mahmoudi, Charles Condevaux, Guillaume Zambrano and Stéphane Mussard
Stats 2024, 7(4), 1421-1436; https://doi.org/10.3390/stats7040083 - 26 Nov 2024
Viewed by 733
Abstract
Predictive justice, which involves forecasting trial outcomes, presents significant challenges due to the complex structure of legal judgments. To address this, it is essential to first identify all claims across different categories before attempting to predict any result. This paper focuses on a [...] Read more.
Predictive justice, which involves forecasting trial outcomes, presents significant challenges due to the complex structure of legal judgments. To address this, it is essential to first identify all claims across different categories before attempting to predict any result. This paper focuses on a classification task based on the detection of Article 700 in judgments, which is a rule indicating whether the plaintiff or defendant is entitled to reimbursement of their legal costs. Our experiments show that conventional machine learning models trained on word and document frequencies can be competitive. However, using transformer models specialized in legal language, such as Judicial CamemBERT, also achieves high accuracies. Full article
(This article belongs to the Special Issue Machine Learning and Natural Language Processing (ML & NLP))
Show Figures

Figure 1

13 pages, 419 KiB  
Article
Investigating Self-Rationalizing Models for Commonsense Reasoning
by Fanny Rancourt, Paula Vondrlik, Diego Maupomé and Marie-Jean Meurs
Stats 2023, 6(3), 907-919; https://doi.org/10.3390/stats6030056 - 29 Aug 2023
Cited by 2 | Viewed by 1851
Abstract
The rise of explainable natural language processing spurred a bulk of work on datasets augmented with human explanations, as well as technical approaches to leverage them. Notably, generative large language models offer new possibilities, as they can output a prediction as well as [...] Read more.
The rise of explainable natural language processing spurred a bulk of work on datasets augmented with human explanations, as well as technical approaches to leverage them. Notably, generative large language models offer new possibilities, as they can output a prediction as well as an explanation in natural language. This work investigates the capabilities of fine-tuned text-to-text transfer Transformer (T5) models for commonsense reasoning and explanation generation. Our experiments suggest that while self-rationalizing models achieve interesting results, a significant gap remains: classifiers consistently outperformed self-rationalizing models, and a substantial fraction of model-generated explanations are not valid. Furthermore, training with expressive free-text explanations substantially altered the inner representation of the model, suggesting that they supplied additional information and may bridge the knowledge gap. Our code is publicly available, and the experiments were run on open-access datasets, hence allowing full reproducibility. Full article
(This article belongs to the Special Issue Machine Learning and Natural Language Processing (ML & NLP))
Show Figures

Figure 1

16 pages, 706 KiB  
Article
Extracting Proceedings Data from Court Cases with Machine Learning
by Bruno Mathis
Stats 2022, 5(4), 1305-1320; https://doi.org/10.3390/stats5040079 - 13 Dec 2022
Cited by 4 | Viewed by 4108
Abstract
France is rolling out an open data program for all court cases, but with few metadata attached. Reusers will have to use named-entity recognition (NER) within the text body of the case to extract any value from it. Any court case may include [...] Read more.
France is rolling out an open data program for all court cases, but with few metadata attached. Reusers will have to use named-entity recognition (NER) within the text body of the case to extract any value from it. Any court case may include up to 26 variables, or labels, that are related to the proceeding, regardless of the case substance. These labels are from different syntactic types: some of them are rare; others are ubiquitous. This experiment compares different algorithms, namely CRF, SpaCy, Flair and DeLFT, to extract proceedings data and uses the learning model assessment capabilities of Kairntech, an NLP platform. It shows that an NER model can apply to this large and diverse set of labels and extract data of high quality. We achieved an 87.5% F1 measure with Flair trained on more than 27,000 manual annotations. Quality may yet be improved by combining NER models by data type. Full article
(This article belongs to the Special Issue Machine Learning and Natural Language Processing (ML & NLP))
Show Figures

Figure 1

Back to TopTop