# Bayesian Constitutionalization: Twitter Sentiment Analysis of the Chilean Constitutional Process through Bayesian Network Classifiers

## Abstract

## 1. Introduction

## 2. Background

#### 2.1. Bayesian Networks

#### 2.2. Tree Augmented Naive Bayes Classifiers

- Build a fully connected weighted graph where the nodes are the variables/attributes of your problem and the weights of the edges are the mutual information between pairs of nodes.
- Apply a maximum spanning tree algorithm to obtain a tree structure amongst the variables, such that the sum of weights is the maximum. Here, Kruskal’s algorithm can be used for this purpose.
- Transform the undirected tree to a directed one by choosing a root variable and then setting the direction of all edges to face outward from it.

- Build a fully connected weighted graph where the nodes are the variables/attributes of your problem and the weights of the edges are the conditional mutual information between pairs of nodes.
- Apply a maximum spanning tree algorithm, to obtain a tree structure amongst the variables, such that the sum of weights is maximum. Here, Kruskal’s algorithm can be used for this purpose.
- Transform the undirected tree to a directed one by choosing a root variable and then setting the direction of all edges to be outward from it.
- Construct a TAN model by adding a vertex labelled y and adding an edge from y to each ${x}_{i}$.

#### 2.3. Recent Sentiment Analysis Approaches

## 3. Data and Methods

#### 3.1. Twitter Data

#### 3.2. Pre-Processing

- Removing all URLs (e.g., www.xyz.com), hashtags (e.g., #topic), and targets (@username);
- Removing all punctuation, symbols, and numbers;
- Correcting the spellings and handling the sequence of repeated characters;
- Removing stop words;
- Removing non-Spanish tweets.

#### 3.3. Term Frequency-Inverse Document Frequency (TF-IDF)

#### 3.4. Sentiment Analysis

- Cluster 1: 12,298 negative tweets (−1) and 4894 positive tweets (1);
- Cluster 2: 9761 negative tweets (−1) and 4741 positive tweets (1);
- Cluster 3: 4518 negative tweets (−1) and 1602 positive tweets (1).

#### 3.5. Model Performance Evaluation

#### 3.6. An Evolution Strategy for Learning TAN Classifiers

- The parent population contains $\mu =10$ individuals.
- $\lambda =20$ denotes the number of offspring generated in each iteration.
- Individuals die out after one iteration step (we use 1000 iterations), and only the offspring (the youngest individuals) survive to the next generation. In that case, environmental selection chooses $\mu $ parents from $\lambda $ offspring.

#### 3.7. Experimental Setup

## 4. Results and Analysis

#### Interpreting the Networks

## 5. Conclusions

## Author Contributions

## Funding

## Institutional Review Board Statement

## Informed Consent Statement

## Data Availability Statement

## Acknowledgments

## Conflicts of Interest

Clusters | Twitter Accounts | Number of Tweets |
---|---|---|

Cluster 1 | @MillaburAdolfo @AmpueroAdriana @afc073 @AlondraCVidal @AlvinSM15 @BSepulvedaHales @Bastianlabbed20 @CamilaZarateZ @CaroDistrito1 @CaroVilchesF @CesarUribeA @CotaSanJuan @criordor @dbravosilva @DayyanaGonzalez @ElisaGiustinia1 @elisaloncon @ElsaLabrana @ErickaPortillaB @fersalinas333 @FranciscaArauna @MachiFrancisca1 @frandistrito14 @TiaPikachu @gloconstituyent @oasishernan @Hugo_Gutierrez_ @isabelgodoym @MamaniIsabella @ivannaolivares5 @JanisMeneses_D6 @lidiayagan @lisettevergarar @loreto_vallejos @LuisJimenezC @manuconstituye @MarcoArellano29 @marcosbarrazag @CKawesqar @mariariveramit @MEQChile @medicanatyh @NatividadLlanq3 @NicolasFernand @RenatoGarinG @robertoceledon @rkatrileo @FloresMadriaga @TiareHey @valemirandacc @vanessahoppe21 @BacianWilfredo | 17,192 |

Cluster 2 | @SquellaAgustin @amaya_alvez @AuroraDelgadoV @labeasanchez @BenitoBaranda @bessygallardoP @carolinasepud19 @CESARVALENZ @christianpviera @cgomezcas @ConySchon @damabarca @danielstingo @felipeharboe @fernando_atria @fchahin @gdominguez_ @giovannaroa @GuillermoNamor @IgnacioAchurra @Jaime_Bassa @JAVIERFUCHSLOC1 @jeniffermella @Jorgeabarcaxv @baradit @JuanjoMartinb @jjlalvarez @LorenaCesp_D23 @barcelobiobio21 @maluchayallego @Ma_joseOyarzun @MarielaSerey @MarioVa25830274 @Mati_Orellana_ @mdaza_abogado @MaxHurtadoR @BottoConstituy1 @patriciapolitz @PatoFdez @tia_paulina_vr @PedroMunozLeiva @Rmontero_ @rodrigo_logan @T_Pustilnick @tatiurru @tomaslaibe @YarelaAysen | 14,502 |

Cluster 3 | @amorenoe @AlvaroJofre @angelica_tepper @arturozunigaj @bdelamaza @berfontaine @CarolCBown @clau_castrog @conihube @cmonckeberg @cretton15 @felipemena_ @geoconda_aysen @HarryJurgensen @HernanLarrain @arancibialmte @Ktymontealegre @LucianoErnest15 @LMayolB @ossandon_d12 @mcubillossigall @margaritaleteli @CeciliaUbilla @martinarrau @pablotoloza @PatyLabraB @PaulinaVelosoM1 @PollyanaConsti1 @raulcelism @raneumannb @rvega_c @rocicantuarias @RodrigoAlvarez_ @ruggero_cozzi @ruth_uas @tere_marinovic | 6120 |

Total | 135 | 37,814 |

**Table 2.**Performance comparison of NB, TAN, ATAN, HC-TAN, HC-SP-TAN, BSEJ, FSSJ, and $(\mu ,\lambda )$-TAN in terms of accuracy in the test set.

Datasets | NB | TAN | ATAN | HC-TAN | HC-SP-TAN | BSEJ | FSSJ | $(\mathit{\mu},\mathit{\lambda})$-TAN |
---|---|---|---|---|---|---|---|---|

Cluster 1 | 73.48 ± 0.642 | 76.46 ± 0.577 | 76.91 ± 0.399 | 76.81 ± 0.307 | 76.69 ± 0.426 | 76.94 ± 0.498 | 77.01 ± 0.278 | 81.64 ± 0.372 |

Cluster 2 | 74.32 ± 0.312 | 77.08 ± 0.579 | 76.91 ± 0.419 | 76.16 ± 0.733 | 76.46 ± 0.457 | 76.52 ± 0.491 | 76.88 ± 0.401 | 80.62 ± 0.737 |

Cluster 3 | 71.15 ± 1.271 | 73.63 ± 0.977 | 74.63 ± 0.841 | 74.15 ± 0.743 | 74.61 ± 0.854 | 74.18 ± 0.843 | 75.12 ± 0.783 | 79.63 ± 0.537 |

**Table 3.**Performance comparison of NB, TAN, ATAN, HC-TAN, HC-SP-TAN, BSEJ, FSSJ, and $(\mu ,\lambda )$-TAN in terms of precision in the test set.

Datasets | NB | TAN | ATAN | HC-TAN | HC-SP-TAN | BSEJ | FSSJ | $(\mathit{\mu},\mathit{\lambda})$-TAN |
---|---|---|---|---|---|---|---|---|

Cluster 1 | 84.38 ± 0.609 | 88.44 ± 0.361 | 89.79 ± 0.411 | 89.87 ± 0.591 | 89.73 ± 0.396 | 89.91 ± 0.571 | 90.12 ± 0.578 | 94.52 ± 0.422 |

Cluster 2 | 85.31 ± 0.782 | 87.77 ± 0.590 | 87.34 ± 0.504 | 89.53 ± 0.570 | 89.52 ± 0.519 | 89.69 ± 0.574 | 89.72 ± 0.549 | 92.58 ± 0.667 |

Cluster 3 | 81.53 ± 1.568 | 85.49 ± 1.302 | 86.01 ± 1.034 | 91.34 ± 0.684 | 90.99 ± 1.110 | 90.57 ± 0.815 | 91.89 ± 0.456 | 93.81 ± 0.795 |

**Table 4.**Performance comparison of NB, TAN, ATAN, HC-TAN, HC-SP-TAN, BSEJ, FSSJ, and $(\mu ,\lambda )$-TAN in terms of recall in the test set.

Datasets | NB | TAN | ATAN | HC-TAN | HC-SP-TAN | BSEJ | FSSJ | $(\mathit{\mu},\mathit{\lambda})$-TAN |
---|---|---|---|---|---|---|---|---|

Cluster 1 | 79.68 ± 0.787 | 80.66 ± 0.622 | 80.81 ± 0.654 | 80.12 ± 0.311 | 80.03 ± 0.497 | 80.17 ± 0.299 | 80.65 ± 0.289 | 82.50 ± 0.205 |

Cluster 2 | 78.51 ± 0.539 | 80.13 ± 0.684 | 80.03 ± 0.578 | 78.11 ± 0.752 | 78.56 ± 0.568 | 78.50 ± 0.427 | 78.59 ± 0.389 | 81.03 ± 0.708 |

Cluster 3 | 79.88 ± 1.483 | 80.01 ± 1.021 | 79.92 ± 1.011 | 77.51 ± 0.867 | 78.23 ± 1.019 | 78.02 ± 0.961 | 79.01 ± 0.912 | 81.23 ± 0.608 |

**Table 5.**Performance comparison of NB, TAN, ATAN, HC-TAN, HC-SP-TAN, BSEJ, FSSJ, and $(\mu ,\lambda )$-TAN in terms of ${F}_{1}$-score in the test set.

Datasets | NB | TAN | ATAN | HC-TAN | HC-SP-TAN | BSEJ | FSSJ | $(\mathit{\mu},\mathit{\lambda})$-TAN |
---|---|---|---|---|---|---|---|---|

Cluster 1 | 78.78 ± 0.491 | 84.37 ± 0.414 | 84.48 ± 0.409 | 84.71 ± 0.218 | 84.61 ± 0.281 | 84.88 ± 0.389 | 84.98 ± 0.501 | 88.10 ± 0.225 |

Cluster 2 | 81.76 ± 0.308 | 83.77 ± 0.438 | 83.88 ± 0.439 | 83.43 ± 0.538 | 83.67 ± 0.335 | 83.72 ± 0.373 | 83.88 ± 0.343 | 86.42 ± 0.591 |

Cluster 3 | 80.69 ± 1.073 | 82.69 ± 0.727 | 82.56 ± 0.801 | 83.86 ± 0.515 | 84.12 ± 0.662 | 83.82 ± 0.615 | 85.12 ± 0.577 | 87.07 ± 0.356 |

