# An Optimized Abstractive Text Summarization Model Using Peephole Convolutional LSTM

## Abstract

## 1. Introduction

## 2. Related Works

## 3. Abstractive Text Summarization Models

#### 3.1. Traditional LSTM Unit

#### 3.2. Peephole Convolutional LSTM Unit

#### 3.3. MAPCoL Model

#### 3.4. Design of Experiment (DoE)

## 4. Experiment

#### 4.1. Experimental Setup

#### 4.2. Experimental Data Set

#### 4.3. Evaluation Method

## 5. Results and Discussion

#### 5.1. Summary Generation by MAPCoL

#### 5.2. Model Optimization by DoE

## 6. Conclusions

## Author Contributions

## Acknowledgments

## Conflicts of Interest

## Abbreviations

LSTM | Long Short-term Memory |

PCLSTM | Peephole Convolutional Long Short-term Memory |

DoE | Design of Experiment |

CCD | Central Composite Design |

RSM | Response Surface Method |

**Figure 1.**A traditional LSTM (long short-term memory) with three gates: input, forget, and output gates. The content of the memory block is controlled by these three gates. Here, ${c}_{t-1}$ and ${c}_{t}$ are respectively the contents of the previous and the current memory cells, ${h}_{t-1}$ and ${h}_{t}$ are respectively the outputs of the previous and the current states, ${x}_{t}$ is an input vector, X is a bitwise multiplication, + is a bitwise summation, $tanh$ is a hyperbolic tangent function, $\sigma $ is a sigmoid function. ${b}_{f}$, ${b}_{i}$, ${b}_{c}$, and ${b}_{o}$ are the bias of the different gates.

**Figure 2.**A peephole convolutional LSTM with a peephole connection. Here, each gate is connected with the content of the previous memory cell ${c}_{t-1}$. The memory of the previous cell along with ${h}_{t-1}$, ${x}_{t}$, and bias are provided as input to each gate. This allows for accessing the content of the previous memory cell even when the output gate is closed.

**Figure 3.**The entire work flow of the model which starts by taking an input text and finishes by generating a summary.

Text | System Summary | Reference Summary |
---|---|---|

Natural language processing is a subfield of computer science, information engineering, and artificial intelligence concerned with the interactions between computers and human languages, in particular how to program computers to process and analyze large amounts of natural language data | A subfield of artificial intelligence named natural language processing deal computer and human language interactions | Natural language processing is a subfield of artificial intelligence that works with computer and human language interactions |

**Table 2.**Performance comparison between the MAPCoL and other models on the CNN/DailyMail dataset. ROUGE stands for Recall-Oriented Understudy for Gisting Evaluation.

Model | ROUGE-1 (%) | ROUGE-2 (%) | ROUGE-L (%) |
---|---|---|---|

Bottom-Up Sum | 41.22 | 18.68 | 38.34 |

SummaRuNNer | 39.60 | 16.20 | 35.30 |

C2F + ALTERNATE | 31.10 | 15.40 | 28.80 |

MAPCoL | 39.61 | 20.87 | 39.33 |

Optimized MAPCoL | 41.21 | 21.30 | 39.42 |

Feature | Number |
---|---|

Number of Summary | 11,490 |

Average number of sentence per summary | 6.32 |

Maximum number of sentence per summary | 11 |

Average word length (AWL) in summary | 1.42 |

Average sentence length (ASL) in summary | 13.31 |

Optimized | Non-Optimized | |
---|---|---|

ARI Score | 8.09 | 8.15 |

Flesch Score | 73.16 | 73.68 |

ARI Score | Flesch Score | Grade Level |
---|---|---|

5 | 90–100 | 5th |

6 | 80–89 | 6th |

7 | 70–79 | 7th |

8 | 60–69 | 8th or 9th |

10–12 | 50–60 | 10th to 12th |

Experiment No. | Parameters | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|

Coded Value | Real Value | ROUGE Scores | |||||||||

${\mathit{X}}_{\mathbf{1}}$ | ${\mathit{X}}_{\mathbf{2}}$ | ${\mathit{X}}_{\mathbf{3}}$ | ${\mathit{X}}_{\mathbf{4}}$ | epochNumber (${\mathit{X}}_{\mathbf{1}}$) | batchSize (${\mathit{X}}_{\mathbf{2}}$) | learningRate (${\mathit{X}}_{\mathbf{3}}$) | hiddenUnits (${\mathit{X}}_{\mathbf{4}}$) | ROUGE-1 (%) | ROUGE-2 (%) | ROUGE-L (%) | |

1 | −1 | 0 | 0 | 0 | 80 | 64 | 0.01 | 100 | 13.26 | 5.86 | 12.62 |

2 | −1 | 0 | +1 | +1 | 80 | 64 | 0.02 | 150 | 14.10 | 6.24 | 14.05 |

3 | −1 | +1 | +1 | −1 | 80 | 128 | 0.02 | 50 | 15.06 | 6.39 | 14.63 |

4 | −1 | −1 | −1 | +1 | 80 | 32 | 0.001 | 150 | 11.23 | 5.05 | 10.85 |

5 | −1 | −1 | 0 | −1 | 80 | 32 | 0.01 | 50 | 14.32 | 6.98 | 13.67 |

6 | −1 | +1 | −1 | −1 | 80 | 128 | 0.001 | 50 | 14.29 | 6.07 | 13.75 |

7 | −1 | +1 | 0 | 0 | 80 | 128 | 0.01 | 100 | 5.32 | 3.21 | 5.06 |

8 | −1 | +1 | −1 | −1 | 80 | 128 | 0.001 | 50 | 14.29 | 7.06 | 13.98 |

9 | −1 | +1 | 0 | 0 | 80 | 128 | 0.01 | 100 | 5.34 | 3.46 | 4.68 |

10 | −1 | −1 | +1 | −1 | 80 | 32 | 0.02 | 50 | 10.25 | 7.65 | 9.87 |

11 | 0 | −1 | −1 | +1 | 150 | 32 | 0.001 | 150 | 24.32 | 8.67 | 23.98 |

12 | 0 | −1 | 0 | −1 | 150 | 32 | 0.01 | 50 | 24.31 | 9.45 | 23.85 |

13 | 0 | −1 | 0 | 0 | 150 | 32 | 0.001 | 100 | 25.31 | 11.34 | 24.32 |

14 | 0 | −1 | +1 | −1 | 150 | 32 | 0.02 | 50 | 19.35 | 10.48 | 19.02 |

15 | 0 | 0 | 0 | −1 | 150 | 64 | 0.01 | 50 | 26.34 | 13.27 | 26.32 |

16 | 0 | 0 | 0 | 0 | 150 | 64 | 0.01 | 100 | 29.31 | 15.71 | 28.75 |

17 | 0 | 0 | +1 | +1 | 150 | 64 | 0.02 | 150 | 24.32 | 13.07 | 23.58 |

18 | 0 | +1 | +1 | −1 | 150 | 128 | 0.02 | 50 | 21.30 | 12.01 | 20.69 |

19 | 0 | +1 | −1 | 0 | 150 | 128 | 0.001 | 100 | 23.10 | 12.20 | 22.79 |

20 | 0 | +1 | 0 | 0 | 150 | 128 | 0.01 | 100 | 22.32 | 11.23 | 23.06 |

21 | +1 | −1 | 0 | +1 | 200 | 32 | 0.01 | 150 | 31.81 | 17.31 | 30.76 |

22 | +1 | −1 | +1 | −1 | 200 | 32 | 0.02 | 50 | 28.65 | 14.27 | 29.07 |

23 | +1 | −1 | +1 | 0 | 200 | 32 | 0.02 | 100 | 34.10 | 17.36 | 34.06 |

24 | +1 | 0 | −1 | +1 | 200 | 64 | 0.001 | 150 | 35.14 | 18.41 | 33.21 |

25 | +1 | 0 | −1 | −1 | 200 | 64 | 0.001 | 50 | 32.32 | 16.21 | 33.01 |

26 | +1 | 0 | −1 | 0 | 200 | 64 | 0.001 | 100 | 39.61 | 20.87 | 39.33 |

27 | +1 | 0 | +1 | +1 | 200 | 64 | 0.02 | 150 | 27.36 | 14.09 | 26.45 |

28 | +1 | +1 | +1 | −1 | 200 | 128 | 0.02 | 50 | 28.21 | 14.78 | 29.31 |

29 | +1 | +1 | −1 | 0 | 200 | 128 | 0.001 | 100 | 33.95 | 16.93 | 31.89 |

30 | +1 | +1 | −1 | +1 | 200 | 128 | 0.001 | 150 | 31.30 | 15.86 | 30.75 |

**Table 7.**Results of analysis of variance (ANOVA) for response surface in order to optimize the ROUGE-1.

Source | Predicted Stages | ||||
---|---|---|---|---|---|

Degree of Fredoom | Sum of Square | Mean of Square | F-Value | $\mathit{p}$-Value | |

Model | 14 | 3236.60 | 231.19 | 16.90 | 0.00 |

Linear | 4 | 1749.08 | 437.27 | 31.97 | 0.00 |

${X}_{1}$ | 1 | 1624.01 | 1624.01 | 118.72 | 0.00 |

${X}_{2}$ | 1 | 1201.2 | 1201.2 | 37.3 | 0.02 |

${X}_{3}$ | 1 | 678.3 | 678.3 | 45.2 | 0.01 |

${X}_{4}$ | 1 | 14.29 | 14.29 | 1.04 | 0.31 |

Square | 4 | 279.04 | 69.76 | 5.10 | 0.00 |

${X}_{1}^{2}$ | 1 | 0.12 | 0.12 | 0.01 | 0.92 |

${X}_{2}^{2}$ | 1 | 229.09 | 229.09 | 16.75 | 0.00 |

${X}_{3}^{2}$ | 1 | 6.19 | 6.19 | 0.45 | 0.50 |

${X}_{4}^{2}$ | 1 | 87.11 | 87.11 | 6.37 | 0.01 |

2-way interaction | 6 | 80.99 | 13.50 | 0.99 | 0.44 |

${X}_{1}{X}_{2}$ | 1 | 0.31 | 0.31 | 0.02 | 0.88 |

${X}_{1}{X}_{3}$ | 1 | 17.77 | 17.77 | 1.30 | 0.26 |

${X}_{1}{X}_{4}$ | 1 | 22.35 | 22.35 | 1.63 | 0.210 |

${X}_{2}{X}_{3}$ | 1 | 1.91 | 1.91 | 0.14 | 0.71 |

${X}_{2}{X}_{4}$ | 1 | 1.00 | 1.00 | 0.07 | 0.78 |

${X}_{3}{X}_{4}$ | 1 | 17.30 | 17.30 | 1.26 | 0.26 |

**Table 8.**Results of analysis of variance (ANOVA) for response surface in order to optimize the ROUGE-2.

Source | Predicted Stages | ||||
---|---|---|---|---|---|

Degree of Fredoom | Sum of Square | Mean of Square | F-Value | $\mathit{p}$-Value | |

Model | 14 | 644.91 | 46.05 | 19.67 | 0.00 |

Linear | 4 | 459.44 | 114.86 | 49.05 | 0.00 |

${X}_{1}$ | 1 | 444.36 | 444.36 | 189.77 | 0.00 |

${X}_{2}$ | 1 | 8.40 | 8.40 | 3.59 | 0.04 |

${X}_{3}$ | 1 | 0.37 | 0.37 | 44.32 | 0.02 |

${X}_{4}$ | 1 | 18.56 | 18.56 | 7.93 | 0.01 |

Square | 4 | 26.14 | 6.53 | 4.65 | 0.02 |

${X}_{1}^{2}$ | 1 | 0.33 | 0.33 | 0.14 | 0.71 |

${X}_{2}^{2}$ | 1 | 17.84 | 17.84 | 7.62 | 0.01 |

${X}_{3}^{2}$ | 1 | 0.07 | 0.07 | 0.03 | 0.86 |

${X}_{4}^{2}$ | 1 | 6.10 | 6.10 | 2.61 | 0.12 |

2-way interaction | 6 | 30.89 | 5.14 | 2.20 | 0.10 |

${X}_{1}{X}_{2}$ | 1 | 5.86 | 5.86 | 2.51 | 0.13 |

${X}_{1}{X}_{3}$ | 1 | 5.70 | 5.70 | 2.44 | 0.13 |

${X}_{1}{X}_{4}$ | 1 | 6.52 | 6.52 | 2.79 | 0.11 |

${X}_{2}{X}_{3}$ | 1 | 5.71 | 5.71 | 2.44 | 0.13 |

${X}_{2}{X}_{4}$ | 1 | 12.11 | 12.11 | 5.17 | 0.03 |

${X}_{3}{X}_{4}$ | 1 | 1.68 | 1.68 | 0.72 | 0.40 |

**Table 9.**Results of analysis of variance (ANOVA) for response surface in order to optimize the ROUGE-L.

Source | Predicted Stages | ||||
---|---|---|---|---|---|

Degree of Fredoom | Sum of Square | Mean of Square | F-Value | $\mathit{p}$-Value | |

Model | 14 | 2289.54 | 163.54 | 18.43 | 0.00 |

Linear | 4 | 1721.53 | 430.38 | 48.50 | 0.00 |

${X}_{1}$ | 1 | 1587.28 | 1587.28 | 178.86 | 0.00 |

${X}_{2}$ | 1 | 49.25 | 49.25 | 5.55 | 0.03 |

${X}_{3}$ | 1 | 23.42 | 23.42 | 5.62 | 0.02 |

${X}_{4}$ | 1 | 55.16 | 55.16 | 6.22 | 0.02 |

Square | 4 | 33.48 | 33.48 | 0.94 | 0.46 |

${X}_{1}^{2}$ | 1 | 1.93 | 1.93 | 0.22 | 0.64 |

${X}_{2}^{2}$ | 1 | 19.33 | 19.33 | 8.34 | 0.01 |

${X}_{3}^{2}$ | 1 | 0.95 | 0.95 | 0.11 | 0.74 |

${X}_{4}^{2}$ | 1 | 10.77 | 10.77 | 4.65 | 0.04 |

2-way interaction | 6 | 52.11 | 8.69 | 0.98 | 0.47 |

${X}_{1}{X}_{2}$ | 1 | 6.35 | 6.35 | 0.71 | 0.41 |

${X}_{1}{X}_{3}$ | 1 | 16.44 | 16.44 | 7.34 | 0.04 |

${X}_{1}{X}_{4}$ | 1 | 5.32 | 5.32 | 0.60 | 0.45 |

${X}_{2}{X}_{3}$ | 1 | 1.64 | 1.64 | 0.18 | 0.67 |

${X}_{2}{X}_{4}$ | 1 | 23.98 | 23.98 | 9.65 | 0.03 |

${X}_{3}{X}_{4}$ | 1 | 0.57 | 0.57 | 0.06 | 0.83 |

**Table 10.**Statistical parameter obtained from the analysis of variance (ANOVA) for the ROUGE-1 optimization.

R-sq(%) | R-sq(adj)(%) | R-sq(pred)(%) |
---|---|---|

97.11 | 93.96 | 90.90 |

**Table 11.**Statistical parameter obtained from the analysis of variance (ANOVA) for the ROUGE-2 optimization.

R-sq(%) | R-sq(adj)(%) | R-sq(pred)(%) |
---|---|---|

96.01 | 94.21 | 91.14 |

**Table 12.**Statistical parameter obtained from the analysis of variance (ANOVA) for the ROUGE-L optimization.

R-sq(%) | R-sq(adj)(%) | R-sq(pred)(%) |
---|---|---|

97.13 | 92.96 | 92.01 |

**Table 13.**Predicted and experimental ROUGE scores with the optimal processing condition. Here, the star (*) sign represents the ROUGE scores getting through an experiment with the optimal values of the four parameters. The ROUGE scores without the star (*) sign are the predicted ROUGE scores by the optimization model.

epochNumber | batchSize | learningRate | hiddenUnits | ROUGE Scores | |||
---|---|---|---|---|---|---|---|

ROUGE-1 (%) | ROUGE-2 (%) | ROUGE-L (%) | |||||

200 | 75.63 | 0.001 | 111 | 41.98 41.21 * | 21.67 21.30 * | 39.84 39.42 * |

