Ex Machina: Analytical platforms, Law and the Challenges of Computational Legal Science

Over the years, computation has become a fundamental part of the scientific practice in several research fields that goes far beyond the boundaries of natural sciences. Data mining, machine learning, simulations and other computational methods lie today at the hearth of the scientific endeavour in a growing number of social research areas from anthropology to economics. In this scenario, an increasingly important role is played by analytical platforms: integrated environments allowing researchers to experiment cutting-edge data-driven and computation-intensive analyses. The paper discusses the appearance of such tools in the emerging field of computational legal science. After a general introduction to the impact of computational methods on both natural and social sciences, we describe the concept and the features of an analytical platform exploring innovative cross-methodological approaches to the academic and investigative study of crime. Stemming from an ongoing project involving researchers from law, computer science and bioinformatics, the initiative is presented and discussed as an opportunity to raise a debate about the future of legal scholarship and, inside of it, about the challenges of computational legal science.


Introduction
The history of science has always been marked by a close connection between the development of new research tools and the understanding of reality. Our ability to advance scientific knowledge is strongly affected by the capacity to conceive and develop new tools, new ways to explore the world: instruments enable science and, at once, advances in science lead to the creation of new tools in a sort of neverending cycle.
Even if in peculiar ways, this process involves law as well. The design of new scientific tools for various reasons is looming on the horizon within today's debate about the aims and methods of legal studies. Two emerging research fields, Empirical legal studies (ELS) [1][2][3][4] and Computational legal science (CLS) [5][6][7][8][9], are indeed pushing forward discussions involving not only scientific issues such as the definition of scope and object of legal science, but also methodological questions concerning how and with which instruments law can be studied. The call for a closer integration of empirical analyses into legal scholarship and practice characterising ELS-the latest in a series of empirically flavoured schools of thought going from Law and Economics, to Sociology of law and Legal Realism-inevitably results into the quest for tools enabling a deeper understanding of the factual dimension of the legal universe [1,[10][11][12]. Even more so, computational legal scholars are trying to figure out how to exploit the impact that digitisation, Big Data and findings from the area of computational social science can have on the legal science and practice.
This paper lies at the intersection between the topics and the research fields above mentioned dwelling on how computational tools and methods can favour the emergence of new perspectives in legal research and practice. We focus, in particular, on the role played by analytical platforms here defined as integrated hardware and software infrastructures design to manage data and support scientific analyses. To do that, we draw on the experience and results of a research project exploring innovative and cross-methodological computational approaches to the academic and investigative study of crime.
Our goal is twofold. A first objective is to present the more recent developments of an ongoing research in which different studies at the borders between law and computational social science-from network analysis [13,14], to visualisation [15,16] and social simulation [17]-converge and interact. On the other hand, we aim at raising the debate about the new frontier of computational legal studies, a still poorly explored research field that will probably make important contributions to legal theory and practice in coming years.
The work is structured as follows: Section 2 sketches the background of our analysis by means of a brief overview of the computational and data-driven turn of science. Sections 3 and 4 focus on the spread of analytical platforms in science and then, more specifically, in the legal field. Section 5 presents the concept, the features and the development prospects of CrimeMiner, an analytical platform for the computational analysis of criminal networks. Section 6, finally, is reserved to some brief considerations about the implications that our experience has suggested with reference to the future of legal scholarship and the emerging field of Computational Legal Science.

The Computational and Data Driven Turn of Science
Together with the spread of information technologies, the appearance of digital computers has had a deep impact on the perspectives and the methods of scientific enquiry [18][19][20]. Since the early 1950s, theories and experiments, the two basic paradigms brought at the hearth of science by the Galilean revolution, have been joined by computer simulations, a qualitatively new research method "lying somewhere intermediate between traditional theoretical science and its empirical methods of experimentation and observation" [21]. Often defined as the "third paradigm" of science, computer simulations have gradually enabled the "in silico" exploration of phenomena otherwise inaccessible, sparking a huge debate in many areas of scientific research (see, merely as an example, in the body of scientific literature that spans from plasma physics to biology, from linguistics to cognitive and social science: [22][23][24][25][26][27]) and, understandably, in philosophy of science [28,29].
In more recent years, the outbreak of Big data has brought a further evolution. Planetary-scale computational infrastructures produce and disseminate unprecedented amounts of information in unthinkable volume and variety: experimental and observational data generated in scientific labs merge with those coming from online transactions, from quantified users interacting in the networks and from myriads of sensors spread across the world. When coupled with the ever growing computational power today available, these data become the keystone of a fourth paradigm [30,31], engendering scientific and methodological shifts across multiple disciplines (see Table 1). In this new landscape, the observation of patterns in nature followed by testable theories about their causes is increasingly backed up by heterogeneous computational heuristics thanks to which it is possible to recognise so far hidden patterns and correlations. In the realm of data-driven science [32,33], rather than being tested and developed from data purposefully collected, hypotheses are built after identifying relationships in huge datasets. That way, data and computation work as "epistemic enhancers" [19], artefacts that extend our natural observational, cognitive and computational abilities reframing fundamental issues about the research process and the ways we deal with information [31,[34][35][36].
Even with considerable differences among disciplines, the methodological apparatus of researchers is gradually expanding thanks to a steadily increasing series of research methods. Data mining, machine learning, natural language processing, network analysis or simulation are just the emerging tip of a wide range of computational methodologies that often complement each other. Considered as a whole, computational science can be seen as the result of the convergence of three different components (Figure 1). In the computational science setting, domain-specific research questions are addressed by means of algorithms (numerical and non-numerical), computational models and simulation software (also experimentally) designed to explore specific issues. The research activity is supported by computational infrastructures consisting of sets of hardware, software, networking systems and by data management components needed to solve computationally demanding problems. Scientific computing plays today a central role in nearly every area of natural science from astronomy to physics, chemistry and climate studies where simulations and other computational methods have become a standard part of the scientific practice.
If crucial in the field of natural sciences, the adoption of computational approaches and methodologies becomes potentially more disruptive in social sciences where the use of formal and data driven methods is for historical reasons less frequent. As highlighted by Kitchin [31], the emerging field of computational social science [38][39][40] provides an important opportunity to develop more sophisticated models of social life allowing scientists to shift "from data-scarce to data-rich studies of societies from static to dynamic unfoldings; from coarse aggregations to high resolutions; from relatively simple models to more complex, sophisticated simulations".

From Digital Tools to Analytical Platforms: The Advent of Augmented Science
Recent years have witnessed a rapid growth in the development of instruments implementing the computational science paradigm. In fact, the project of a computer-augmented science is becoming the current credo in all research areas from biology to digital humanities fostering an ever closer integration between researchers and machines. Thanks also to the speed and frequency of technological advancements, we are in front of a wide variety of tools conceived to help scientists in different phases of their work.
Quite often available on line, these tools support activities that go from the exploration of scientific literature, to data and code sharing. A comprehensive and useful overview is proposed by Crouzier [41] from the Royal Institute of Technology in Stockholm that identifies different categories based on tool's main goals.

•
Literature Analysis. A first group of tools is designed to help researchers in exploring the ever growing amount of papers today available online. Literature analysis systems provide users with both ad hoc search engines helping scientists to quickly find articles they are interested in and visualisation features helping the navigation within the materials and, sometimes, social bookmarking and publication-sharing system. To this category belong tools such as Bibsonomy [42], CiteUlike [43], Google Scholar [44], Mendeley [45], ReadCube [46] , Biohunter [47] (biology) or PubChase [48] (life sciences).

•
Data and Code Sharing. A second category of tools supports the management of large sets of data and programming code allowing researchers to efficiently store, share, cite and reuse materials. Github [49] and CodeOcean [50] are two examples of platforms for software sharing and development. The latter, in particular, is focused on facilitating code reuse creating connections between coders, researchers and students. Other sharing platforms are more focused on data: Socialsci [51], for instance, helps researchers collect data for their surveys and social experiments. GenBank [52] makes available online a gene sequence database while tools such as DelveHealth [53] orBioLINCC [54] are specialised in the sharing of clinical data. • Collaboration. A heterogeneous set of instruments is conceived to facilitate researchers in developing collaborations. Platforms such as Academia [55], ResearchGate [56] and Loop aim to help scientists in reaching out to other researchers and find expertise for scientific cooperations. Tools such as Kudos [57] and AcaWiki [58] help the communication of research activities and results to the general public. Other environments, instead, gather tools helping researchers to directly involve the general public in the research efforts, by sharing CPU time or, for example, classifying pictures. Variously referred to as "crowd science" or "citizen science" [59], these tools attract a growing attention from the scientific community. They are able to draw on the effort and knowledge inputs provided by a large and diverse base of contributors, potentially expanding the range of scientific problems that can be addressed at relatively low cost, while also increasing the speed at which they can be solved.

•
Experiments and Everyday Research Tasks. Research is a tough task particularly when involving experiments: researchers have to deal with equipment and data management, with the scheduling of activities, with research protocols, coding and data analysis. A huge collection of tools has been developed to help researchers in these everyday research tasks. Tools such as Asana, LabGuru [60] and Quartzy [61] support daily activities from vision to execution often offering web-based laboratory inventory management systems. Some tools (Tetrascience [62] and Transcriptic [63]) are used to outsource experiments, while others (Dexy [64] and GitLab [65]) are conceived to ease coding activity, and still others (Wolfram Alpha [66], Sweave [67], VisTrails [68], and Tableau [69]) allow generating and analysing data and visualising results.
• Writing. In recent years, several tools have been developed to support paper drafting keeping in mind specific needs of researchers. Some tools such as Endnote [70], Zotero [71], and Citavi [72] allow storing, managing and sharing with colleagues bibliographies, citations and references. Others such as Authorea [73] and ShareLaTex [74] workspace are collaborative writing tools helping researchers to write papers with other people while keeping track of the activities and modification made by authors on the document. • Publish. A series of platform has been designed to ease the publication and the discussion of scientific papers aiming at the same time, at accelerating scientific communication and discovery. Platforms such as eLife [75], GigaScience [76], and Cureus [77] offer an alternative publishing model, allowing anyone to access published works for free according to the open access principles. Paper repositories such as ArXiv [78], allow authors to increase the exposure of their work (even if in progress) offering, at the same time, new opportunities of scientific interaction. Other tools such as Exec&Share [79] and RunMyCode [80] allow authors to connect papers with additional functionalities such as executable code.

•
Research Evaluation. An entire category of platforms, finally, deals with research evaluation both in terms of paper review and analysis of the impact of scientific publication. Tools such as PubPeer [81], Publons [82], and Academic Karma [83] are conceived to change the peer-review system by means of an open and anonymous review process bypassing journals and editors. Platforms such as Altmetric [84], PLOS Article-Level Metrics [85] and ImpactStory [86] offer a set of new tools that analyse the impact of scientific paper by other means than impact factor and citations counts.
The scenario just described is constantly changing due to the continuous development of new tools. In general, we can observe a trend leading to the convergence of different features into advanced environments to carry out ever more substantial parts of the research. "Analytical platforms" is the technical term describing the result of such integration: suites somehow similar to IDEs-the integrated development environments used by software developers assembling different functionalities in a single workflow.
It is possible to cite interesting experiences in which data-driven analytical platforms become the cornerstone of the path leading to answer complex research questions. Foldit [87], for example, is a large-scale collaborative gamification project involving thousands of participants aiming to shed light on the structure of proteins, a key factor to advance the understanding of protein folding and to target it with drugs. The number of different ways even a small protein can fold is astronomical because there are many degrees of freedom. Figuring out which of the many possible structures is the best is one of the hardest problems in biology today and current methods take a lot of money and time, even for computers. Foldit attempts to predict the structure of a protein by taking advantage of humans' puzzle-solving intuitions and having people play competitively to fold the best proteins. To do this, Foldit integrates the features of many of the tools cited above: data analysis visualisation, and collaboration.
A similar initiative is Galaxy Zoo [88] a project exploiting an online platform to share astronomical data and involve volunteers in classifying galaxies according to their shape. The morphological classification of these celestial bodies is not only essential to understanding how galaxies formed but is also a tough task for pattern recognition software. Thus far, over 200,000 of volunteers have provided more than 50 millions classifications. The automated analysis of such classification-carried out by the GalaxyZoo platform-has contributed to the discovery of new classes of galaxies and, in this way, to a deeper understanding of the universe.
In other experiences, analytical platforms are used to make online social experiments conceived to answer in an innovative way specific research questions. "The Collaborative image of the City" [89], for example, is a project from MIT that, by using thousands of geo-tagged images, measured the perception of safety, class and uniqueness among cities all over the world; many participants were involved in the process of "labelling" geo-tagged images and, as an example, a significant correlation was found out between the perceptions of safety and class and the number of homicides. Finally, the SETI project [90] allows participants to share part of their device's power to analyse the whole universe looking for extraterrestrial life. SETI is based on the BIONIC [91] platform, a project developed at the Berkeley University, with which people can share their unused CPU and GPU power to address many challenges, such as, life/universe (SETI case), medicine, biology and so on.
Moving on to life sciences, it is worth mentioning Cytoscape [92], an open source platform for visualising molecular interaction networks and biological pathways and integrating these networks with annotations, gene expression profiles and other state data. Even if originally designed for biological research, Cytoscape is now a general platform for complex network analysis and visualisation. It may be developed by anyone using the Cytoscape open API based on Java technology. In this scenario, KNIME [93] deserves a mention too. It was developed in 2004 at the University of Konstanz, where a team of developers from a Silicon Valley software company specialising in pharmaceutical applications started working on a new open source platform as a collaboration and research tool. In 2006 some pharmaceutical companies began using it and now many other software companies build KNIME-based tool for all purposes: science, financial, retails, and so on. KNIME is an Open Source analytical platform that integrates various components for machine learning and data mining through its modular data pipelining concept and provides a graphical user interface allowing assembly of nodes for data preprocessing, for modeling and data analysis and visualisation. Thus, it is designed for discovering the potential hidden in data, mining for fresh insights, or predicting new futures.

Law, Computation and the Machines
Compared to the scenario above described, it can be said the legal world has accumulated some delay. Beyond still quite isolated experiences, the use of computational and data-driven approaches in legal research is still relatively occasional and, above all, does not reach the level of complexity and sophistication that can be observed in other areas . There are different reasons for this situation which depend on the way legal scholars conceive the subject matter of their investigation-e.g., only norms and legal doctrines or also empirical facts-and, consequently, their own research methods. In more general terms, the law is part of an area, that of the social sciences, which came later close to a quantitative reading of reality and to the use of tools-including computational ones-that make it possible.
Despite this, the intuition of a possible contiguity between law understood in the broad sense and the aid of calculation and analysis tools is not new at all.The idea of introducing quantitative and formal approaches borrowed from mathematics and exact sciences into the legal scholarship is the result of a line of thought that dates back at least to the Enlightenment. Since Leibniz explicitly proposed a computational stance to solve controversies-the famous "Calculemus!" -or to join Probability and legal reasoning [94] the evolution of legal thought has been littered by repeated and heterogeneous hypotheses to merge somehow natural science measurement, computation and law.
The call to arms made by Oliver Wendell Holmes [95] ("For the rational study of law the blackletter man may be the man of the present, but the man of the future is the man of statistics and the master of economics") and Cardozo [96] ("They do things better with logarithms") about a hundred years ago can be seen as the beginning of a recurring theme, a leitmotif periodically appearing in the legal debate. To this process can be traced back very heterogeneous experiences-from Jurimetrics [97-100], to Lawtomation [101], from the hypothesis of contamination between Cybernetics and Law [102][103][104] to Legal Informatics [105][106][107] and Computational law [108]. All these share the same basic goal, i.e., rethinking more or less fundamental aspects of the legal experience (categories, research methods, tools and objectives, working practice) in light of a series of factors: the exact science paradigm, the developments in other areas of science and of the possibilities offered by information and computation technologies.
The area, today often referred to as Computational legal science, is the most recent development of this process, a development taking place in a historical moment in which the perimeter of the research areas potentially interesting for the jurist has extended to include complexity sciences [109][110][111], computational social science [112] along with a considerable number of computational techniques emerged in the most disparate disciplinary fields.
Some authors focus on how methods and approaches from computer and information science can turn into new services or tools for legal professions. Daniel Katz from Michigan State University, for instance, characterises what he defines "computational legal studies" [113] through a list of research methods-including "out of equilibrium models", "machine learning", "social epidemiology", "information theory", "computational game theory" and "information visualisation"-conceived as new opportunities to achieve mainly practical objectives such as the decision prediction, a goal pursued for a long time since the early 1960s [99,100,114,115]. Along this line see [116][117][118][119][120]. Others, instead, focus on the impact that scientific and technological evolution could or should have on the scope and methods of legal science and, in more general terms, on the way in which the law is conceptualised, studied and designed [5,121,122] in a similar way to how it happened in other areas of science such as physics that has deeply revised itself, its own vision of the world and its methods according to scientific and technological developments.
In general terms, due to scientific and technological evolution, law is going through a period of change and it is not completely clear yet what science and technology can give to legal theory and practice. Whatever the perspective adopted, the creation of new computational tools for law will be increasingly important for both application and scientific purposes. Against this backdrop, analytical platforms certainly represent an important part of the future of the law and legal science on which it is worth knowing about.
Here, we propose an overview of the current scenario of legal analytical platforms. Our analysis, as well as the proposed classification, is a first attempt due to the novelty of the topic. It is nevertheless already possible to identify in the area two trends represented, on the one hand, by the design of environments for professional use (see Section 4.1) and, on the other hand, by the development of platforms for purely scientific purposes (see Section 4.2).

Professional Platforms
Today, customers show to be interested in receiving information on, for example, the odds of winning a case and they ask increasingly focused questions to their lawyer about this. The vast majority of lawyers still rely solely on their own experience, knowledge of case precedents, and intuition to predict what the courts will actually do [123]. However, they are inherently limited in their capacity to retain and process the information necessary to make well-informed judgments. Moreover, each lawyer has a limited range of personal experience. Computers, on the other hand, are far better at storing, processing, and summarising great amount of information. Thus, by leveraging the power of computers, lawyers can more accurately forecast how events will play out in litigation. Thus, lawyers who embrace data-driven decision-making will gain a clear advantage over their counterparts who still cling to their outdated instruments.
In the last few years, all the sciences moved towards Big Data era, completely changing their perspective, while in legal sciences this great change was hard to happen. "Law is horribly inefficient", says Mark Lemley, a professor at Stanford Law School, director of the Stanford Program in Law, Science and Technology. "And in some ways, it is inefficient by design". After all, lawyers get paid by the hour, so inefficiency is rewarded, says Lemley. Indeed, some are rewarded richly: Top lawyers charge north of US $1000 per hour [124]. Big Data changes the rules of the game. It invites lawyers to make a fundamental change in their approach to the law itself by looking to statistical patterns, predictors, and correlations, in addition to the legal rules that purportedly control outcomes-case law, statutory law, procedural rules, and administrative regulations. Traditional lawyering required knowledge of the pertinent legal rules and the ability to apply them to a given set of facts, whether in litigation or in transactional work. The question was whether a feature of the customer's current situation would trigger a rule and its mandatory result. Analogies, comparisons, and normative judgments are figured into this assessment and Big Data can drive the lawyer's decision/thinking. For example, historical litigation data in the aggregate might reveal a judge's tendency to grant or deny certain types of pretrial motions, so a lawyer can act accordingly.
In recent years, fortunately, legal science dropped from the past and moved towards data, analytics and other data-driven tech instruments. Hundreds of fresh new legal tech companies are arising [125], mostly to support lawyers in their job or customers in tackling legal issues-i.e., the business-oriented Premonition [126] that predicts lawyers/lawsuit's success chance -but also in research area even if the number of experiences is still limited. Indeed, leading the way in the emerging renaissance of legal research is a web-startup named Lex Machina [127] (which is Latin for "law machine"). Lex Machina [128] was founded in 2006 as an interdisciplinary project between the Stanford Law School and Stanford University's computer science department to bring transparency to intellectual-property litigation. Today, Lex Machina's web-based analytics service is used to uncover trends and patterns in historical patent litigation to more accurately forecast costs and more effectively evaluate various case strategies. The service scrapes PACER [129] for new cases involving intellectual property on a nightly basis [127,130]. Lex Machina uses a combination of human reviewers and a proprietary algorithm that automatically parses and classifies the outcome for each case that the service extracts from PACER. After extracting, processing, and scrubbing the data, Lex Machina assembles and presents aggregated case data for a particular judge, party, attorney, or law firm with analytics that allow users to quickly discern trends and patterns in the data that may affect the cost or outcome of their case.

Platforms for Legal Research
Big Data as well as Natural Language Processing and Artificial Intelligence are becoming popular in legal science. In [131] authors seamlessly use Network Analysis, Natural Language Processing and Artificial Intelligence to build a collaborative platform (which is discussed in Section 3), known as Knowledge Management System. Law firms can browse legal and legislative documents searching for relevant changes concerning the following topics: (i) compliance; (ii) taxes; (iii) general legal conditions; and (iv) disclosure. They collected 20,000 legal documents crawled from the government website by exploiting AlchemyAPI [132] (by IBM) and extracted important document keywords. These keywords are used as features to build a Document Network in which documents are connected to each other while they share one or some features. Furthermore, they use the Sim Rank [133] similarity algorithm to link documents which do not share any features, but are connected with a path through some features. This platform has a UI through which the user can "tag" documents to which is interested for and, by doing it, the system (the AI) learns that specific user preference to better answer in the future. Ravel Law [134] is a visual commercial search engine that exploits both natural language processing, machine learning and graph visualisation to help lawyers in sorting through legal information. Whereas traditional legal databases present results in a column, often hiding important cases pages back in search results, Ravel Law visually represents the most important cases on a particular topic as the node of a network, with numerous edges pointing to subsequent cases that have cited it. The size of the hub reflects the relative number of cases that cite it. The frequency with which courts cite a particular case often signals the influence of the case over a given area of law or its relevance to a particular legal concept.
AI has been used in the past by legal experts also to address crime battle. In [135] authors exploit criminal databases containing information on crimes, offenders, victims as well as vehicles involved in different crimes to identify serial criminals. Indeed, among those records lie groups of crimes that can be attributed to serial criminals who are responsible for multiple criminal offenses and usually exhibit patterns in their operations, by specialising in a particular crime category (i.e., rape, murder, robbery, etc.), and applying a specific method for implementing their crimes. Authors discovered serial criminal patterns in crime databases thanks to a clustering task carried out by means of a Neural Network working thanks to unsupervised learning. In recent years AI has been used also "to predict" crime. An example is PredPol [136] , a research project of the University of California, Los Angeles (UCLA) and the Los Angeles Police Department. They wanted to find a way to use CompStat [137] data for more than just historical purposes. The goal was to understand if this data could provide any forward-looking recommendations as to where and when additional crimes could occur. Being able to anticipate these crime locations and times could allow officers to pre-preemptively deploy and help prevent these crimes. Working with mathematicians and behavioural scientists from UCLA and Santa Clara University, the team evaluated a wide variety of data types and behavioural and forecasting models. The models were further refined with crime analysts and officers from the Los Angeles Police Department and the Santa Cruz Police Department. They ultimately determined that the three most objective data points collected by police departments provided the best input data for forecasting: (i) Crime type; (ii) Crime location; and (iii) Crime date and time.
Additionally, as we mentioned above, Network Analysis (NA) too has become part of several law research tool. In this area we find EuCasenet [14]. EuCasenet is an online toolkit that allows legal scholars to apply NA and visual analytics techniques to the entire corpus of the EU case law. The tool allows to visualise the links among all the judgments given by the Court, from its foundation until 2014, to study relevant phenomena from the legal theory viewpoint. In details, judgments are considered as a network and mapped on a simple graph G <V,E> in which V = {v 1 , v 2 , ..., v n : v i = judgment} and ∃ v i , v j ⇔ v i cites v j . Therefore, by leveraging NA and visualisation techniques one is able to analyse the network, having the possibility to explore data in a more efficient and enjoyable way. Another example is Knowlex [138], a Web application designed for visualisation, exploration, and analysis of legal documents coming from different sources. Using Knowlex a legal professional or a citizen understands how a given phenomenon is disciplined exploiting data visualisation through interactive maps. Indeed, starting from a legislative measure given as input by the user, the application implements two visual analytics functionalities aiming to offer new insights on the legal corpus under investigation. The first one is an interactive node graph depicting relations and properties of the documents. The second one is a zoomable treemap showing the topics, the evolution and the dimension of the legal literature settled over the years around the target norm.

An Analytical Platform for Computational Crime Analysis
CrimeMiner is an ongoing project involving experts from many research areas, ranging from computational social science to computer science, from legal studies to bioinformatics. Resulting from a reflection at the boundaries between law and computer science, the project shows numerous points of contact with the computational criminal analysis area [139], a heterogeneous research field in which theoretical perspectives provided by applied mathematics and statistical physics provide innovative research methodologies-spanning from differential equations [140,141], to agent-based modeling [142,143], evolutionary games [144], and the network science (for an overview see [145])-to illuminate the mechanisms underlying the emergence, the diffusion and the evolution of crime.
The tool allows the combination of several techniques, i.e., data mining, Social Network Analysis (SNA), and data visualisation, to gain a deeper understanding of structural and functional features of criminal organisations starting from the analysis of even simple relational and investigative data. The holistic approach underlying the project led to the creation of a computational framework that can be used in investigative and research settings to gather, markup, visualise and analyse all the information needed to apply SNA techniques to criminal organisations.
The first version of the framework [13], developed in Java language, proposed a set of functionalities that have been validated during a case study based on data coming from real criminal investigations (telephone and environmental tapping used to map the social structure of a criminal organisation belonging to the Italian Camorra).
Here, we present an evolution of the project, fully re-engineered, developed as Web application and enriched with new data mining and visualisation functionalities. In more details, the new features are the following:

•
A machine learning module for the assessment of criminal dangerousness of individuals belonging to the network under investigation; • Bipartite and tripartite graphs to enable new network-based inferences; and • New graph analysis metrics.
In the following Section, we first describe the overall architecture and the main functionalities it provides. Figure 2 offers a general overview of the project through a schema which links together, top to bottom, the domain expert's main research goals (Layer 1), the domain expert's relevant objects of investigations (Layer 2), the activities needed to achieve the specified goals (Layer 3) and the system functionalities which the platform relies on (Layer 4). A detailed description of the system functionalities can be found in [13]. In general terms, the platform allows: • Extracting from processual data all the information needed to build graphs;

Architecture and Workflow: Overview
After a first version of the platform relying on Relational Database [13], CrimeMiner is now built upon the Java EE Spring Data Neo4j framework whose architecture is structured in four layers, as shown in Figure 3. The first three are related to the back-end (server-side), while the last layer is the front-end (presentation or client-side). We describe each of them in the following.  [146] , 2D and 3D graphics (Highcharts [147]) and finally, tables with rich functionalities (Datatables [148]).

Research Goals/System Features
CrimeMiner is a modular platform offering several investigation and/or research features/tools. In order to provide a more "semantic" reading of the modules so far implemented, the following functionalities are described referring to the research goals underlying them. For each of these functionalities, a different module has been envisioned and deployed in the CrimeMiner framework.

Document Enhancement
Its goal is to overcome the limitations/problems of the classical data entry process: incompleteness, misinterpretation, inconsistency, and time-consuming. CrimeMiner provides an advanced and yet user-friendly WYSIWYG editor allowing the user to enrich the documents with structured and semantic metadata needed to implement the criminal network analysis and visualisation features [149].

Criminal Network Exploration (CNE)
Data currently handled by CrimeMiner consist of people records and phone/environment tappings between two or more of them. The CNE module offers several and different views on the processual data. Different from the previous version [13], we borrow visualisation/handling techniques from [150] to show different types of graphs, that we describe in the following.

•
Individual-telephone tapping:  On each graph, the user can apply some force-directed graph drawing algorithms for a better visualisation and a more meaningful nodes organisation. To reduce the number of nodes shown at the same time, several filter criteria can be applied (by node degree threshold or by node entity type, for example).

Network Analysis (NA)
The NA module is responsible for providing support for the application of SNA metrics. CrimeMiner implements several SNA metrics (i.e., pagerank, betweenness, degree, in-degree and out-degree). to study and extract new insights from the different data representations. SNA metrics allow the user to assess features such as the dominance, subordination, influence or prestige of social actors [151]; moreover, they help to identify subcommunities in the network. An example of application is shown in Figure 5.

Similarity Measures
This modules implements SimRank [133], a similarity measure based on a graph-theoretical model. It is defined as follow: "Two objects are considered to be similar if they are referenced by similar objects". CrimeMiner exploits SimRank to show similarities among individuals in the crime network. To increase understanding of the similarities measures applied on various types of graphs, we create 2D and 3D graphics, thanks to Highcharts Javascript Library. In Figure 6, we show a plotted 3D graphic in which on the x and z axes there are individuals names and on y axis we show similarity percentages between each pair of individuals, respectively. Using this arrangement on a Cartesian graph, we can clearly show that top points represent the more similar nodes couple. The couple is represented as a point in the Cartesian graph. With a mouse over, CrimeMiner shows names and similarity percentage of each couple. Using "Data Settings" option, a user can modify the SimRank threshold to better visualise the levels of similarity he is interested in (default: from 50% to 100%) and, of course, it is possible to select just one individual to get his SimRank with all the others (see Figure 7).

Tabular Data
The whole list of the individuals and telephone/environmental tapping can also be browsed in the form of interactive tabular data with an advanced filtering feature (boolean operations and range filters). The provided tables in this module are developed by using the Datatables Library. It shows individuals and couples, and their similarity percentage. To make it easier and faster to use, a user can choose the type of graph to analyse among the ones we previously explained (see Section 5.2.2). In this way, the user can see more similar individuals according to a specific relation (selected). Moreover, data can be exported in PDF, csv, and Excel file formats. An example of a similarity table is shown in Figure 8.

Simulation
CrimeMiner provides a module for experimenting with Agent-Based Modeling (ABM) [17]: the individuals and their various relationships could be exported in CSV or XML and imported in an ABM environment such as NetLogo [152]. A user can perform experiments based on the merging between real/network data and ABM models. It is possible to recreate the real world network topology into a simulation assigning certain behaviours to agents, depending on real data (such as personal information including criminal records), and position of the agent in real space, by using geographical information (i.e., GIS) available for each individual/environmental tapping. In the same way, it is possible to model a simulation of the social interactions nature (e.g., the frequency of communications between agents) and so on. The goal is to explore how the synergy between ABM and SNA can support a deeper and more empirically grounded understanding of the complex dynamics taking place within criminal organisations at both individual and structural level.

Machine Learning
Machine Learning is nowadays used for very different purposes in many scientific fields, such as [153,154], and, as anticipated in Section 4.1, is now becoming popular in the legal field. We are implementing Machine Learning techniques in CrimeMiner to give more richness to the information a user can find, analyse and study and, therefore, achieve a better overview of the criminal phenomena under investigation. The basic idea is to provide support to public prosecutors and researchers to investigate on a criminal network giving them the possibility of marking dangerous behaviours otherwise impossible to notice. In particular, our current effort is towards the implementation of a human-machine cooperation enabled module relying on Neural Networks or decision trees algorithms such as Random Forest classifier. We are now in the training and testing phase of several classifiers to understand which fits better our problem.

Preliminary Results and Future Developments
CrimeMiner has resulted in a challenging opportunity to link substantial issues of legal world with the development of information processing tools and with practices from other research fields. The project is yielding interesting results we have already started presenting [13] in journals belonging to areas ranging from Legal Informatics to computational criminology and information visualisation.
The prospects for developments appear to be promising, especially if one considers that legal science and practice have derived relatively limited advantages from information science and from the acceleration of scientific and technological advancements.
Discussions made with jurists and public prosecutors involved in the fight against organised crime have shown the innovative potential deriving from analytical platforms. The use of data mining and network analysis techniques offers investigators new and more advanced means to identify details of criminal phenomena destined to stay hidden in large corpora of unstructured documents collected and analysed using traditional methods. The "platform-based" approach can not only prove decisive for the discovery of phenomena relevant from the investigative point of view (suspicious activities, presence of subgroups, and fluctuations in criminal activity), but can also substantiate with scientifically-grounded evidence law enforcement, policy design and the debate within the trial. In more general terms, what sounds promising is the very logic underlying the platform. The possibility to analyse the same data with different methodologies integrated in one workflow allows extracting from the same raw material different kinds of knowledge, similar to what happens in medical tests where the application of different diagnostic techniques to the same biological sample allows the extraction of different relevant information.
Future developments we will be working on-by continuing to exploit the same data-are different and involve all the modules of the architecture above described notwithstanding the intention to realise new modules. We will pay particular attention to carry out in parallel both the design of new functionalities and the reflection on scientific and methodological implications of our work so to preserve the necessary complementarity between the perspective of "instrument-enabled science" and the "science-enabled instruments" one.
In this vein, a guiding principle of our future work will be that of sharing the result of our effort with the largest and most heterogeneous possible audience of researchers and domain experts. This is for two reasons. On the scientific level, the relevance of sharing is clear. As highlighted by Freeman Dyson [155], errors made by scientists "are not an impediment to the progress of science [...] they are a central part of the struggle". Sharing the results-not only the code but also the theoretical assumptions underlying algorithms and metrics used to analyse data-will ease us in correcting and expanding our theories and in refining the technical implementing them. On the legal level (figuring a future use of CrimeMiner in real investigative environment), the availability of the algorithms used to support the assessment of criminal dangerousness will be fundamental also to find out biases and to shed light on the choices made of public prosecutors and law enforcement agencies. In the era of Big data mining (noteworthy, among the others, the caveats made by Cathy O' Neal [156]), the transparency in the use of data analytics techniques is essential for contributing to the democratic nature of public institutions activities. In brief, expected developments are as follows.

Network-Based Inference
Future works will be devoted first to the improvement of network-based inference techniques. In particular, drawing inspiration from recent developments of other research fields such as bioinformatics [150], we will try to increase the level of semantic accuracy of the analyses so to improve the understanding of the criminal, and social features of the network under investigation. Among the solutions envisaged, the use of weighted metrics assigning nodes and edges different scores based on context data such as the gravity of the charges or the criminal record of individuals. Another part of the work will focus on the use of projections and multipartite graphs to extract further knowledge from the data (e.g., level of criminal significance of sub-communities or meetings taking place between network components).

Machine Learning
Another part of the work will focus on the enhancement of the machine learning module that, as we have seen, combines pattern recognition and network analysis to point out to CrimeMiner users potentially dangerous individuals within the organisation under scrutiny. The enhancement will be useful as much to public prosecutors and investigators to decide where to steer their investigations as to the social scientists to identify new empirical and measurable indices (patterns, values of particular metrics, etc.) of relevant phenomena. Thus far, we focused on the level of dangerousness of individuals. In next steps, we will work on the implementation of different artificial intelligences so to detect and highlight other features of the network (presence, dangerousness and level activity of sub=communities; and suspicious events and activities) or, again, to identify emerging behavioural trends or predict the creation of new social ties.

Collaboration and Advanced Visualisation
Other developments will be connected with the implementation of collaboration features allowing both researchers and investigators to engage in cooperative activities starting from data. Moreover, we will experiment both more advanced visualisation techniques-including 3D immersive graph navigation-and natural language processing to extract semantic data from telephone and environmental tapping.

Closing Remarks: Issues and Challenges of Computational Legal Science
The creation of IT platforms that collect and generate information has shown to represent the starting point for scientific developments of a different nature. If, on the one hand, the need to analyse data raises new challenges to research areas directly related to information extraction techniques-data mining, network science, information visualisation-the process of "datafication" [36] enabled by platforms-has given new impetus to numerous areas also in the social sciences. Over the last 15 years, the quantity, heterogeneity, granularity and frequency of data made available by platforms of social interaction have become the "focal point" [157] not only of already existing research fields-sociology, social psychology, anthropology, epidemiology-but also of new areas such as, sentiment analysis, which find their raw material in Twitter and Facebook [158,159] data. In fact, platforms are contributing to the reconfiguration of the infrastructure of practices, technologies and scientific perspectives that underlies the scientific endeavour [160] and this process is going to involve legal science. From this viewpoint, the experience of developing CrimeMiner platform has become an interesting opportunity to reflect on the future of the law. A few statements can be done in this regard both on a scientific and methodological standpoint.

Legal Computational Empiricism
The combination of data and computational heuristics that characterise the analytical platforms seems to offer new stimuli and supports for the growth, also in the legal world, of a sort of "computational empiricism" [19,161], an approach that is taking its place alongside more traditional empirical social science methods [162]. The last few years have been characterised by growing attention towards the study of different dimensions of the legal universe (law, legal procedure, and legal theory through empirical research). Even if focused on purely legal questions, empirical legal researchers tend to be more narrowly quantitative. It appears that analytical platforms can spur a higher propensity to think about the legal phenomenon in empirical (i.e., experimental, quantitative, and in a wide sense scientific) terms, making a contribution to the broader process through which legal science and practice are facing the often disorienting challenge of rethinking their goals, methods and scope. Obviously, what is being proposed here is not simply an a priori adhesion to a not yet completely defined scientific and methodological perspective. As highlighted in [7], "the more empirical legal research is a 'growth industry', the more important it is to understand and discuss epistemological problems of this field of study". A great deal of work will have to be done to tackle with fundamental issues including how to operationalise legal concepts, where to find data (stored, but also Big Data) and, above all, how to link empirical (including causal) evidence to the normativity of legal arrangements and legal scholarship. Besides epistemological and theoretical aspects, anyway, the opportunity to fuel a more empirical stance towards the legal world is a worthy effort also for practical reasons. The factual investigation of legal phenomena increasingly appears to be an indispensable condition for more effective legal solutions able to cope with the complexity of the real world. We will never able to meet this need if we do not learn figure new methods and tools.

Legal Science as an Instrument-Enabled Science
In light of what has been said about the relationship between technology and science, our feeling is that also legal science is destined to become more and more an instrument-enabled science. Due also to the digital evolution of the raw material, the lawyers have to deal with (not only data but also legal relations and assets), the design of new tools including platforms is a necessary step to advance the understanding of the legal phenomena and social phenomena relevant to the law. The change, on closer inspection, is full of implications on the theoretical level which it is good to be aware of. Far from being just tools in a mechanical sense, choices concerning data to be processed, functionalities, analytical methods to be implemented in the platforms incorporate basic scientific visions, questions, and options as well as the choice to use agent-based simulation models underpins a generative and micro foundational approach to social phenomena. Research tools are theoretically connoted tools in which scientific perspectives, knowledge and methods coevolve influencing each other. It will be up to lawyers to engage in the design of new computational tools gradually acquiring skills, still quite rare today, to rethink their research questions and, together with them, conceptual categories, methods of study and work, relationships with other sciences.
The theory is in the artefact, not in the brain of the scientist, and computer-based artefacts can contain enormous quantities of information and can easily and rapidly compute what happens when the different pieces of information interact together.
The data-and computation-driven development of scientific research offers insights that also involve the methods of legal science as well as those of social science in general. The availability of tools that allow investigating new aspects of reality will gradually produce the emergence of new research methods intended to complement existing ones. From this point of view, the platform-based development of science calls for a reflection on the methods by which lawyers deal with their own research questions.

Methodological Eclecticism in Legal Science
From a methodological point of view, the spread of analytical platforms seems an enabling factor for the integration of different research approaches to the investigation of social phenomena. The idea of overcoming what has been called the "war of paradigms" [163], has gradually led to the emergence of a pluralist perspective [164,165] according to which social research is ever more understood as the integration of different scientific traditions. Faced with the impossibility to identify common epistemic foundations for all the social sciences, researchers are often adopting an eclectic [166] stance, geared towards discovering the complementarities across concepts and visions belonging to different research communities. The choice involves also the methodological dimension as the sharing of different methods [167] is proving to be crucial in enhancing the scientific investigation of social phenomena. This is real not only in the more traditional areas of social research, but also in the emerging field of computational social science [38,39] where the integration of heterogeneous research perspectives and methods spanning from data mining to social simulation or network analysis is essential to gain a deeper understanding of foundational social sciences issues. Moving along the same lines, to answer their research questions and cope with their practical problems, lawyers will have to gradually learn to eclectically combine perspectives and methods from different disciplinary sectors, devising new methodological solutions. The impression given by CrimeMiner where-even if in an experimental setting-various activities of legal interest (investigations, legal classification of criminal activities, law enforcement strategies, criminal analysis, etc.) exploit an integrated workflow in which different research and data analysis methods converge, suggests analytical platforms such as a useful place to experience the eclecticism we have just talked about.

A (Less) Disciplinary Approach to Legal Research and Practice
Scientific eclecticism allows us to make a final claim about the need for a more interdisciplinary approach to legal research. In general terms, interdisciplinarity has been given increasing attention in recent years [168]. In 2015, Nature [169] devoted an entire special issue to non-disciplinary research encouraging researchers to "break down barriers between fields to build common ground". Interdisciplinarity should allow to address problems that have proved unwilling to yield to conventional approaches as "many complex issues and pressing questions cannot be adequately addressed by people from just one discipline". The statement fits very well our case: giving an answer to many questions of legal science and practice-assess the impact of legal norms; understand the deep nature of legal systems; predict the evolution of law enforcement strategies is a complex task involving scientific knowledge that transcend the boundaries of traditional legal scholarship. Against this backdrop, our ability to integrate in new ways different knowledge and disciplines, becomes crucial. Analytical platforms-as well as, in general, computational artefacts-can carry a relevant role in promoting the cross-fertilisation between disciplines. As suggestively stated in [170], computer-based artefacts offer new unified theoretical and methodological frameworks facilitate the dialogue among scientists involved in the study of different aspects of a same complex phenomenon. In any case, the complexity of research questions at the boundaries between law and social sciences is not the only reason for pushing a less disciplinary mental habit in legal scholars. An interdisciplinary effort is specifically needed to conceive and develop computational platforms. As highlighted in a report [171] from the US President's Advisory Committee discussing the prospects of computational science "it takes scientific contributions across many disciplines to successfully fit software, systems, networks, and other IT components together to perform research tasks [. . . ] And it takes teams of skilled personnel representing those disciplines to manage computing system capabilities and apply them to complicated real-world challenges". Without such an attitude, legal scientist and practitioners will probably experience more difficulties in enhancing their work and dealing with what is going to be the most common mode of science and engineering discovery throughout the 21st century.
Thirty years ago, within an inspiring speculation about the future of law [172] building on the claim of a lack of scientificity (in a modern sense) of legal scholarship, the American jurist and economist Richard Posner advocated a new approach to legal research, a study of the law not as a means of acquiring conventional professional competence but a study from the outside, using the methods of scientific inquiry to enlarge knowledge of the legal system. Drawing inspiration from the "continuing rise in the prestige and authority of scientific and other exact modes of inquiry in general"-among which he explicitly mentions computer science together with other fields from economics to biology-Posner calls for a more prominent role of science in legal world and hopes the growth of interdisciplinary legal analysis seen as an essential part of the evolution allowing researchers to all "contribute creatively to the understanding and improvement of the legal system of the twenty-first century". Analytical platforms are certainly not the only means to make possible this transition, a change that will require time and a gradual process of cross-fertilisation modifying theoretical constructs entrenched over time, but will for sure play a role. It will take the ability to creatively sew together different methods and perspectives and also the daring to seek new scientific and methodological paths. One of the challenges of a mature computational legal science, perhaps one the most important, will be exactly this.