#
A Technology-Based Classification of Firms: Can We Learn Something Looking Beyond Industry Classifications?^{ †}

^{*}

^{†}

## Abstract

**:**

## 1. Introduction

#### 1.1. Data

#### 1.2. The Clustering Exercise

_{k}and j

_{k}are the technology related coordinates of firms i and j in the n-dimensional technology space. Calculations have been performed using different data representations: (i) patent shares of firm i in technology k, where i

_{k}is a number between 0 and 1; (ii) binary representation, where i

_{k}= 1 in case company i owns patents in technology k; and, (iii) the relative technology advantage representation, where i

_{k}= 1 if firm i is specialised in technology k. In this last case we first define the relative technology advantage (RTA) of firm i in technology k as the share of patents of firm i in technology k over the average share of patents in technology k in the sample. An RTA > 1 means that firm i is specialised in technology k.

_{A}, n

_{B}correspond to the number of observations within clusters A and B respectively. This form of the objective function reveals another characteristic of the Ward method. Given that the increase in the total sum of squares depends both on the size of the two clusters A and B, and on the geometric separation of their centres, the method favours merging smaller pairs of clusters when the centres of these clusters are equally far apart as the centres of other pairs of larger clusters. The clustering algorithm used in this work is the linkage algorithm implemented in python scipy package. The clustering exercise was performed using patent data distributed over different technological fields as defined by the IPC4 level. Given our interest in investigating the links of our purely technology based classification scheme with the ICB3 one, we decided to use the results of the grouping at specific cut off levels: namely at five cluster families, 10, 20, 30 and finally 38 clusters. The latter is equal to the number of distinct groups in which our firms are classified according to the ICB3 classification. In what follows results are based on the binary representation of the data which perform better in the validation exercise, presented later on, compared to the results obtained using the other two representations (results are available upon request.).

## 2. Results

#### 2.1. Results of the Clustering

#### 2.2. Significance of the Clustering

_{x}the elements in cluster x and n

_{x}

_{1}, n

_{x}

_{2}the elements that are in cluster X1 according to clustering X1 and in cluster X2 according to clustering X2.

#### 2.3. A Technological Classification of Firms

## 3. Conclusions

## Author Contributions

## Funding

## Conflicts of Interest

## Appendix A. Results of K-Means Clustering

**Figure A1.**Share of explained variance of patent over R&D ratio explained by the K-means clustering with respect to the number of clusters (blue line), compared with the share of variance explained by the ICB3 classification (green dot).

## Appendix B. Validation of the TDC Clustering Using the Nearest Neighbour

**Figure A2.**Share of successful predictions based on the nearest neighbour model (red solid line) and expected share of successful random predictions (blue dashed line).

## References

- Cohen, W.M.; Klepper, S. The Anatomy of Industry R&D Intensity Distributions. Am. Econ. Rev.
**1992**, 82, 773–799. [Google Scholar] - Lee, C.Y. Industry R&D intensity distributions: Regularities and underlying determinants. J. Evol. Econ.
**2002**, 12, 307–341. [Google Scholar] - Guevara, H.H.; Soriano, F.H.; Tuebke, A.; Vezzani, A.; Amoroso, S.; Coad, A.; Gkotsis, P.; Grassano, N. The 2016 EU Industrial R&D Investment Scoreboard (No. JRC103716); Joint Research Centre (Seville Site): Brussels, Belgium, 2017. [Google Scholar]
- Axtell, R.L. Zipf distribution of US firm sizes. Science
**2001**, 293, 1818–1820. [Google Scholar] [CrossRef] [PubMed] - Pavitt, K. Sectoral patterns of technical change: Towards a taxonomy and a theory. Res. Policy
**1984**, 13, 343–373. [Google Scholar] [CrossRef] - Evangelista, R. Sectoral patterns of technological change in services. Econ. Innov. New Technol.
**2000**, 9, 183–222. [Google Scholar] [CrossRef] - Castellacci, F. Technological paradigms, regimes and trajectories: Manufacturing and service industries in a new taxonomy of sectoral patterns of innovation. Res. Policy
**2008**, 37, 978–994. [Google Scholar] [CrossRef][Green Version] - Castaldi, C. The relative weight of manufacturing and services in Europe: An innovation perspective. Technol. Forecast. Soc. Chang.
**2009**, 76, 709–722. [Google Scholar] [CrossRef] - Breschi, S.; Malerba, F.; Orsenigo, L. Technological regimes and Schumpeterian patterns of innovation. Econ. J.
**2000**, 110, 388–410. [Google Scholar] [CrossRef] - Archibugi, D. Pavitt’s taxonomy sixteen years on: A review article. Econ. Innov. New Technol.
**2001**, 10, 415–425. [Google Scholar] [CrossRef] - Schmoch, U. Concept of a Technology Classification for Country Comparisons; Final Report to the World Intellectual Property Organisation (WIPO); Revised August 2011; WIPO: Geneva, Switzerland, 2008; Available online: www.wipo.int/export/sites/www/ipstats/en/statistics/patents/pdf/wipo_ipc_technology.pdf (accessed on 15 November 2018).
- Scherer, F.M. The propensity to patent. Int. J. Ind. Organ.
**1983**, 1, 107–128. [Google Scholar] [CrossRef] - Dernis, H.; Dosso, M.; Hervás, F.; Millot, V.; Squicciarini, M.; Vezzani, A. World Corporate Top R&D Investors: Innovation and IP Bundles. A JRC and OECD Common Report; Publications Office of the European Union: Luxembourg, 2015. [Google Scholar]
- Coad, A. Persistent heterogeneity of R&D intensities within sectors: Evidence and policy implications. Res. Policy
**2018**. [Google Scholar] [CrossRef] - Hidalgo, C.A.; Klinger, B.; Barabási, A.L.; Hausmann, R. The product space conditions the development of nations. Science
**2007**, 317, 482–487. [Google Scholar] [CrossRef] [PubMed] - Zaccaria, A.; Cristelli, M.; Tacchella, A.; Pietronero, L. How the taxonomy of products drives the economic development of countries. PLoS ONE
**2014**, 9, e113770. [Google Scholar] [CrossRef] [PubMed] - Nesta, L.; Saviotti, P.P. Coherence of the knowledge base and the firm’s innovative performance: Evidence from the US pharmaceutical industry. J. Ind. Econ.
**2005**, 53, 123–142. [Google Scholar] [CrossRef] - Pugliese, E.; Cimini, G.; Patelli, A.; Zaccaria, A.; Pietronero, L.; Gabrielli, A. Unfolding the innovation system for the development of countries: Co-evolution of Science, Technology and Production. arXiv
**2017**, arXiv:1707.05146. [Google Scholar] - Napolitano, L.; Evangelou, E.; Pugliese, E.; Zeppini, P.; Room, G. Technology networks: The autocatalytic origins of innovation. R. Soc. Open Sci.
**2018**, 5, 172445. [Google Scholar] [CrossRef] [PubMed] - Cohen, W.M.; Levinthal, D.A. Absorptive capacity: A new perspective on learning and innovation. In Strategic Learning in a Knowledge Economy; Cross, R.L., Jr., Israelit, S.B., Eds.; Taylor & Francis: Abingdon, UK, 2000; pp. 39–67. [Google Scholar]
- Pavitt, K. R&D, patenting and innovative activities: A statistical exploration. Res. Policy
**1982**, 11, 33–51. [Google Scholar] - WIPO. IPC-Technology Concordance Table; World Intellectual Property Organization: Geneva, Switzerland, 2013.
- Levenshtein, V.I. Binary codes capable of correcting deletions, insertions, and reversals. Sov. Phys.-Dokl.
**1966**, 10, 707–710. [Google Scholar] - Winkler, W.E. The State of Record Linkage and Current Research Problems. Statistics of Income Division, Internal Revenue Service Publication R99/04. 1999. Available online: www.census.gov/srd/papers/pdf/rr99-04.pdf (accessed on 15 November 2018).
- Martínez, C. Patent families: When do different definitions really matter? Scientometrics
**2011**, 86, 39–63. [Google Scholar] [CrossRef] - Dernis, H.; Khan, M. Triadic Patent Families Methodology; OECD STI Working Papers 2004/2; OECD: Paris, France, 2004. [Google Scholar]
- Jain, A.K. Data clustering: 50 years beyond K-means. Pattern Recognit. Lett.
**2010**, 31, 651–666. [Google Scholar] [CrossRef][Green Version] - Ward, J.H. Hierarchical Grouping to Optimize an Objective Function. J. Am. Stat. Assoc.
**1963**, 58, 236–244. [Google Scholar] [CrossRef][Green Version] - Strehl, A.; Ghosh, J. Cluster Ensembles—A Knowledge Reuse Framework for Combining Multiple Partitions. J. Mach. Learn. Res.
**2002**, 3, 583–617. [Google Scholar]

**Figure 1.**Representing the clustering up to 38. The 5 cluster families are highlighted in the dendogram.

**Figure 2.**NMI with respect to the number of clusters. The blue line compares the clustering obtained using as input the IPC4 and the IPC3 codes. The green dot represents the NMI between the results of our clustering using the IPC4 codes and the ICB classification.

**Figure 3.**Share of variance of different variables explained by the clustering for different number of clusters (blue line) versus the share of variance explained by the ICB3 classification (green dot).

**Figure 4.**In the two subpanels we highlight the main characteristics of the 38 clusters. The size of the circles is proportional to the number of firms in the cluster. In the left panel, we show the share of patents in each of the 8 main sections of the IPC classification (IPC1): A—Human Necessities; B—Performing Operations, Transporting; C—Chemistry, Metallurgy; D—Textiles, Paper; E—Fixed Constructions; F—Mechanical Engineering, Lighting, Heating, Weapons, Blasting; G—Physics; H—Electricity. Sectors are divided in two groups related to the two main patenting industrial sectors: Health (sections A and C) and ICT (sections G and H). In the right panel, we show the distribution of firms’ technological diversification (# of IPC classes) in the clusters, presented in log-scale. Firms in clusters of Family 1 are not very diversified (around 10 fields), while the general diversification of our sample is generally very high. Firms in family 5 in particular are often characterized by a diversification above 100 different subclasses (out of 629 total subclasses in the IPC classification). From the right panel it is possible to see instead how the firms are divided in clusters with a similar breath of technological diversification (number of IPC4 codes in their patent portfolio). In particular, the clusters in family 1 show a low degree of diversification; in other words in these clusters are classified firms active in the development of a relative low number of technologies compared to the average Scoreboard firm.

Macro Cluster | Cluster | Patent Propensity | Diversification (# IPC4) | # of Firms | R&D | R&D Intensity | Top 3 Subclasses | % Top 3 IPC4 | |
---|---|---|---|---|---|---|---|---|---|

5 | Patent Propensity: 0.91 # IPC4: 182 # of firms: 134 R&D: 1692 R&D intensity: 4.6% | 5D | 7.6 | 280 | 2 | 629 | 1.2% | G06F,H05K,H01R | 36% |

5O | 2.0 | 198 | 13 | 662 | 5.5% | G03G,H04N,G06F | 30% | ||

5J | 1.4 | 124 | 16 | 415 | 1.3% | H01L,E21B,H01M | 44% | ||

5M | 0.9 | 136 | 18 | 773 | 4.3% | H01L,G06F,A61B | 39% | ||

5N | 0.7 | 157 | 12 | 4140 | 9.6% | G06F,H04W,H04L | 53% | ||

5C | 0.7 | 289 | 7 | 3180 | 6.2% | H01L,G06F,H04N | 33% | ||

5A | 0.6 | 178 | 12 | 650 | 3.6% | F01D,E02F,F02C | 15% | ||

5G | 0.5 | 141 | 11 | 1715 | 10.3% | H01L,G02B,A61K | 23% | ||

5K | 0.5 | 136 | 6 | 366 | 0.5% | C22C,F25J,F17C | 15% | ||

5F | 0.4 | 285 | 7 | 3628 | 5.0% | H01M,B60W,B60R | 14% | ||

5B | 0.4 | 321 | 8 | 3209 | 4.0% | F01D,F02C,G06F | 13% | ||

5E | 0.3 | 205 | 8 | 3099 | 5.2% | H01L,H01M,B60W | 12% | ||

5H | 0.3 | 140 | 8 | 922 | 4.5% | C08G,C07C,C08L | 18% | ||

5I | 0.2 | 221 | 4 | 2582 | 5.7% | A61B,A61F,A61K | 20% | ||

5L | 0.03 | 134 | 2 | 937 | 2.5% | G06Q,G06F,B01D | 24% | ||

4 | Patent Propensity: 0.54 # IPC4: 49 # of firms: 278 R&D: 273 R&D intensity: 5.0% | 4F | 1.2 | 93 | 19 | 273 | 6.3% | B41J,H01L,G06F | 17% |

4D | 0.8 | 57 | 42 | 124 | 4.6% | H01L,G02F,G06F | 34% | ||

4E | 0.5 | 66 | 44 | 606 | 7.5% | G06F,H04W,H04L | 39% | ||

4C | 0.4 | 36 | 75 | 331 | 4.1% | G06F,H04L,H04W | 46% | ||

4A | 0.4 | 47 | 57 | 127 | 2.3% | H01R,G01V,H01L | 16% | ||

4B | 0.3 | 27 | 41 | 165 | 13.9% | G06F,H01L,G11C | 44% | ||

3 | Patent Propensity: 0.38 # IPC4: 62 # of firms: 172 R&D: 281 R&D intensity: 2.4% | 3F | 0.6 | 99 | 19 | 226 | 3.3% | F16C,H02K,B62D | 21% |

3A | 0.5 | 67 | 37 | 150 | 2.7% | D06F,A47L,B29C | 9% | ||

3C | 0.4 | 49 | 24 | 98 | 1.0% | C22C,F01D,C23C | 19% | ||

3D | 0.4 | 37 | 33 | 95 | 2.6% | A01D,F16D,B60T | 21% | ||

3B | 0.2 | 34 | 34 | 136 | 0.7% | E21B,F24H,F16K | 20% | ||

3E | 0.2 | 113 | 25 | 1132 | 4.3% | B60R,F16H,B62D | 16% | ||

2 | Patent Propensity: 0.37 # IPC4: 60 # of firms: 142 R&D: 360 R&D intensity: 1.9% | 2C | 0.7 | 86 | 11 | 228 | 2.8% | B60C,A63B,C08L | 41% |

2B | 0.5 | 71 | 40 | 177 | 1.2% | H01L,C08L,C08G | 18% | ||

2A | 0.3 | 42 | 53 | 218 | 0.8% | H01M,B01J,C08F | 17% | ||

2D | 0.2 | 61 | 23 | 385 | 3.3% | A61K,C11D,A61Q | 30% | ||

2E | 0.2 | 78 | 15 | 1402 | 6.5% | A61M,A61K,A61F | 36% | ||

1 | Patent Propensity: 0.11 # IPC4: 13 # of firms: 950 R&D: 146 R&D intensity: 3.0% | 1E | 0.4 | 42 | 32 | 154 | 1.3% | A61F,B65D,A47J | 25% |

1B | 0.2 | 21 | 137 | 73 | 1.5% | H01L,G02B,G06F | 30% | ||

1D | 0.1 | 23 | 105 | 441 | 10.7% | A61K,A61B,A61M | 37% | ||

1C | 0.1 | 10 | 215 | 155 | 3.9% | G06F,H04L,G06Q | 46% | ||

1F | 0.1 | 9 | 173 | 107 | 4.3% | A61K,C07D,A61P | 56% | ||

1A | 0.05 | 6 | 288 | 89 | 1.3% | H01L,B60N,H01R | 14% |

© 2018 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Gkotsis, P.; Pugliese, E.; Vezzani, A. A Technology-Based Classification of Firms: Can We Learn Something Looking Beyond Industry Classifications? *Entropy* **2018**, *20*, 887.
https://doi.org/10.3390/e20110887

**AMA Style**

Gkotsis P, Pugliese E, Vezzani A. A Technology-Based Classification of Firms: Can We Learn Something Looking Beyond Industry Classifications? *Entropy*. 2018; 20(11):887.
https://doi.org/10.3390/e20110887

**Chicago/Turabian Style**

Gkotsis, Petros, Emanuele Pugliese, and Antonio Vezzani. 2018. "A Technology-Based Classification of Firms: Can We Learn Something Looking Beyond Industry Classifications?" *Entropy* 20, no. 11: 887.
https://doi.org/10.3390/e20110887