Methodological Proposal for the Study of Temporal and Spatial Dynamics during the Late Period between the Middle Ebro and the Pyrenees

: The article I present here deals with the methodological approach carried out in my PhD in which I analyzed the spatial and temporal dynamics of late rural settlements during ﬁve centuries in the southern Pyrenees area, using geographic information systems, spatial databases, and descriptive statistics to establish models of space occupation and try to determine how these vary over the di ﬀ erent centuries.


Introduction
This article is a part of my doctoral thesis, focused on the study of settlement dynamics during Late Antiquity in the space between the Middle Ebro and the Pyrenees (i.e., Huesca, Navarra, and part of Zaragoza). When I started my thesis, one of the most important problems was the homogenization of the data. The competences on the archaeological heritage correspond to each one of the autonomous communities in which the Spanish territory is divided. Because of this, and although there is a common general state framework, each community follows different rules for recording reports, archaeological inventories, and archaeological charts. This entails a difficulty when working with data from different autonomous communities. In my case, there are two administrations approached: the Foral Community of Navarra and Aragon. In addition to this difficulty, which is intrinsic to our area of study, there are also the difficulties inherent in recording archaeological interventions. The deep growth of preventive archaeological interventions from the eighties, but above all during the 1990s, has led to a large amount of grey literature that is often only consulted for other interventions, or when updating the regional archaeological chart.
On the other hand, the selected space, located on the border with France, in the northeast of the peninsula, is an area little studied for the selected chronological range (3rd-7th AD). Archaeological interventions that mention late Roman findings are widely dispersed in the literature and come from various sources: reports, archaeological inventories, syntheses, theses, etc. This made it even more difficult to achieve the final objective: to analyze the dynamics of late Roman settlement. Starting new interventions without knowing the type of data we have and its quality would have been a mistake. The works published for the study area only allude to the large late Roman settlements, leaving aside many others of small size that have been discovered in preventive archaeology interventions. Historiographically, archaeology and history have established a series of general conclusions based mainly on the study of the most significant cases (large villages, best-known sites, etc.), leaving aside minor sites or those of little archaeological relevance. This approach is interesting to know what the general dynamics are at an imperial level, but it does not allow me to empirically verify what the particularities of each region are. Moreover, an analysis based on the study of the most important examples can lead to hasty conclusions. Therefore, I proposed to "dig the archives" of the administration to extract from the different reports and the regional archaeological inventory the necessary data for the knowledge of the late settlement's structures. In fact, I considered that, before launching new archaeological interventions in the study area, it was necessary to make a "clean slate", unifying the information available in the administration. For this reason, the present research is based on secondary and tertiary sources and not on own data. My analysis aims to draw conclusions based on the overall analysis of all the archaeological data available for a given region. The aim is to test whether these general historiographic trends can be adapted to my area, whether there are nuances, but also to open new lines of research.
With the objective of proposing a new methodology for the study of late settlements in my area of study, I present here the proposed process. Thus, my research will revolve around three methodological premises. Firstly, starting from systemic thinking, all the entities that make up my corpus are part of the same system. Secondly, in order to carry out an adequate analysis of the entities separately and of the settlement models, it is necessary to take into account and evaluate the quality of the available data (based on three criteria: uncertainty, imprecision, and inaccuracy). However, this would not be possible if I do not employ a multidisciplinary methodology, my third premise. This requires a constant effort of analysis and epistemological reflection of the discipline itself because my research is based on the borderline between geography, computer science, and archaeology.
After a first section on the data and its life cycle, I will move on to a breakdown of the methodology. I will deal with the collection of data, to continue with the database and the criteria followed in the classification of the sites, detailing the explanation of the structure of the G.I.S. (Geographic Information System).

The Documentary Base of the Corpus
The documentary base of my study has been published in a partial and very dispersed way, as it happens for other areas of the Peninsula. The archaeological inventory of the different administrations, that is, Navarre and Aragon, has been the main corpus. However, the data collected there are mainly oriented to the management of archaeological heritage, finding much administrative information and little archaeological information. Below, I describe the specific characteristics of my documentary base.
In addition to the works of the scholars of the 19th and early 20th centuries [1][2][3][4][5], focused mainly on the Middle Navarre and the area around Fraga [6] and Monte Cillas (Huesca). To these are added the archaeological charts and the compilation work carried out in the 1980s and 1990s, both on the peninsula and in the region. The volume of the Archaeological Chart of Spain dedicated to the province of Huesca was published in 1984 and allowed me to add an important number of sites to the corpus, but also to have an extensive set of bibliographical references for the archaeological sites, extending the information given in the chart.
On the other hand, the Tabvla Imperii Romani, specifically the Sheets K-30 and K/J-31 [7,8], although they have some mistakes, mainly of coordinates and of duplication of sites, allows me to complete the cards facilitated by the administration, but also incorporating some new sites. Two of the most important works regarding the inventory of late deposits date from the end of the 1990s, that of A. Cepas Palanca [9] and that of J.M. Tudanca [10]. Both works have been one of the fundamental substrates of the corpus since they make quite detailed analysis of the sites and include both material sheets and planimetries. They usually make a critical evaluation of each of the sites, mainly in terms of the chronology proposed by other authors.
In addition to these works of synthesis, at the beginning of the 21st century I have the work of J.A. Paz Peralta [11], although this is a general approach and does not provide specific references for many of the sites, and that of A. Castiella, who focuses mainly on the Roman roads [12]. The latter offers an interesting compilation of Navarre's villae that allowed me to complete the data on some of the sites, mainly in terms of chronology. It allowed me to have chronologies, even if these are not very preciseusually indicated High Empire or Low Empire-, and in other cases "undetermined Roman period".
I also have some studies focused on archaeological prospecting within the framework of research projects. For Navarra (the Pamplona basin), I highlight the synthesis of the prospection project carried out by A. Castiella's team [13], with a diachronic approach to the approach. The work of the Santa Criz team, published this year in an exhaustive manner [14] but with some previous articles from certain campaigns [15], has made it possible to add more precise information to my corpus, especially with regard to the location and dating of the sites (The 2019 work could not be included in this research; the database was already closed when I'am informed of the publication of this work ([14]). Instead, I have consulted the previous works ( [15,16])).
In the Huesca area, the work carried out in 2000 and 2001 by L. Chasseigne [17,18], focused on the area around Labitolosa, mainly in the Cinca River Valley. Both the reports deposited with the Administration and the articles published contain detailed information on the different sites. In addition, the author incorporates an interesting number of bibliographical references in the master's report prepared from this data (University of Bordeaux). In addition to these interventions, the team of Los Bañales has been working in the city since 2008 [19,20], and the team of Cabeza Ladrero has been working since 2016 [21]. Príncipe de Viana and Trabajos de Arqueología Navarra, which have been compiled since 1940 and 1979, respectively, the main archaeological interventions carried out in the Community of Navarra, as well as the journal Cuadernos de Arqueología de la Universidad de Navarra are the most significant periodicals for the case of Navarra. Bolskan: Magazine of the Institute of High Aragonese Studies for the Aragonese area, focused mainly on the publication of interventions in the province of Huesca; Saldvie (in operation since 2000), which includes a significant number of interventions in the province of Zaragoza or the Boletín del Museo de Zaragoza which since 1982 publishes mainly studies of materials deposited in its funds, are the reference periodicals.
In these works, the articles are very varied, from the analysis of a specific ceramic piece to the presentation of a site or a diachronic study of a certain space. This implies having data at very different scales, and the information may be repeated in the various publications. That is to say, I can have a ceramic piece published in an isolated way in an article, five years later a complete study of the excavated site, and a decade later a global analysis of a river basin. This makes it very difficult to standardize the data, one of the biggest challenges of my research.
In addition, there is a set of data from prospecting reports that are largely carried out in the context of administrative interventions and which, for the most part, do not provide much archaeological data, but rather urban and/or administrative data. Furthermore, the information provided by some reports varies considerably. In this sense, an important change in documentation can be seen over the years. For example, in the documents of old interventions, the administrative part (expenditure, personnel, transport) accounts for almost 80% of the report, while in current interventions this part has been greatly simplified. Moreover, surveys carried out for administrative reasons are limited to the area concerned and are often not very homogeneous areas where intensive and systematic surveys are not carried out. There is also a very clear difference between the reports carried out by environmental companies and those in which archaeology is the main activity. In the former, the archaeological part is usually limited to the coordinates and a general cultural assignment of the site, while in the latter, detailed inventories of the materials found are usually incorporated, as well as a complementary interpretation.
On the other hand, there is a problem of accessibility to information. The interventions carried out between 1983 and 2005 in Aragon are included in the issues of the publication Arqueología Aragonesa [22][23][24][25][26][27][28], the journal Trabajos de Arqueología Navarra being the one that brings together since 1979 the archaeological operations carried out in the Foral Community. These publications also have a scientific character, so the administrative aspect is minimised. However, the so-called "grey literature" is not always as accessible as in these cases and can only be consulted on specific days and under not always ideal conditions (No photocopies can be made, in some cases no photographs can be taken, and there is a maximum number of searchable reports per day). Thus, most archaeological interventions, especially prospecting, are added to the archives of the administrations. In the case of Navarre, it was Doctor J. Sesma Sesma who helped me with this matter, while in Aragon, J. Rey Lanaspa, I. Rojas, S. Gracia and M. Gómez de Valenzuela provided me with access to the different reports, guiding me at all times in the administration's archives. These interventions, in which the administrative character usually prevails over the archaeological one, also have other limitations. In the first place, prospecting is not usually carried out following the latest techniques, that is, trying to establish the areas of dispersion of the material in a chronological manner in order to later fix the phases of the settlements [29]. Secondly, they are usually carried out considering the limits of protection of the site and not of dispersion of the material, providing surfaces that do not coincide with the archaeological reality. Because of this, the information is very disseminated, being also very heterogeneous. However, in addition to these difficulties of dispersion of documentation, there are the disadvantages of data. On the one hand, most of the available information comes, as I have already pointed out, from unpublished archaeological reports resulting from prospecting inserted in urban planning projects.
On the other hand, the data are both from archaeological excavations and from prospecting (systematic or not) as well as from fortuitous findings. When considering the type of intervention carried out at my sites, I can see that more than 77% have been the object of archaeological prospecting alone, while those excavated in extension are reduced to 3.99% (Cf. Figure 1). have a scientific character, so the administrative aspect is minimised. However, the so-called "grey literature" is not always as accessible as in these cases and can only be consulted on specific days and under not always ideal conditions (No photocopies can be made, in some cases no photographs can be taken, and there is a maximum number of searchable reports per day). Thus, most archaeological interventions, especially prospecting, are added to the archives of the administrations. In the case of Navarre, it was Doctor J. Sesma Sesma who helped me with this matter, while in Aragon, J. Rey Lanaspa, I. Rojas, S. Gracia and M. Gómez de Valenzuela provided me with access to the different reports, guiding me at all times in the administration's archives. These interventions, in which the administrative character usually prevails over the archaeological one, also have other limitations. In the first place, prospecting is not usually carried out following the latest techniques, that is, trying to establish the areas of dispersion of the material in a chronological manner in order to later fix the phases of the settlements [29]. Secondly, they are usually carried out considering the limits of protection of the site and not of dispersion of the material, providing surfaces that do not coincide with the archaeological reality. Because of this, the information is very disseminated, being also very heterogeneous. However, in addition to these difficulties of dispersion of documentation, there are the disadvantages of data. On the one hand, most of the available information comes, as I have already pointed out, from unpublished archaeological reports resulting from prospecting inserted in urban planning projects.
On the other hand, the data are both from archaeological excavations and from prospecting (systematic or not) as well as from fortuitous findings. When considering the type of intervention carried out at my sites, I can see that more than 77% have been the object of archaeological prospecting alone, while those excavated in extension are reduced to 3.99% (Cf. Figure 1). It is also significant that 10 sites were found by chance, without any further intervention, which is close to the 14 sites that have been excavated in extension (Cf. Figure 1). As we can see, the data come from very varied interventions, survey being the most common. Furthermore, if we look at the years in which the interventions (I do not have data for 43 sites, or 12.25% of the total. The sample is significant enough to be able to establish some hypotheses and ideas) were carried out, the result is very significant.
The largest number of sites is concentrated in the period between 1981 and 2000, with 48.43% of first interventions, far from the other periods (Cf. Figure 2). This is interesting because, if I compare this result with the evolution of the foundation of archaeological companies in Spain, the dynamics are parallel. The greatest creation of companies took place between 1986 and 2000, when the sector was born and consolidated, while between 2001 and 2007, there was a great boom in Spanish archaeology, with 50% of companies being created) (Cf. Figure 3). From 2008 the sector enters in crisis and the foundation of companies is anecdotal. Moreover, the practice of not creating companies in years prior to 1986 is due to the fact that the Spanish Historical Heritage Law (Law 16/1985 of 25 June) (Cf. Figure 3) that regulated archaeological activity dates from 1985, with very few companies being present in the territory before that date. It is also significant that 10 sites were found by chance, without any further intervention, which is close to the 14 sites that have been excavated in extension (Cf. Figure 1). As we can see, the data come from very varied interventions, survey being the most common. Furthermore, if we look at the years in which the interventions (I do not have data for 43 sites, or 12.25% of the total. The sample is significant enough to be able to establish some hypotheses and ideas) were carried out, the result is very significant.
The largest number of sites is concentrated in the period between 1981 and 2000, with 48.43% of first interventions, far from the other periods (Cf. Figure 2). This is interesting because, if I compare this result with the evolution of the foundation of archaeological companies in Spain, the dynamics are parallel. The greatest creation of companies took place between 1986 and 2000, when the sector was born and consolidated, while between 2001 and 2007, there was a great boom in Spanish archaeology, with 50% of companies being created) (Cf. Figure 3). From 2008 the sector enters in crisis and the foundation of companies is anecdotal. Moreover, the practice of not creating companies in years prior to 1986 is due to the fact that the Spanish Historical Heritage Law (Law 16/1985 of 25 June) (Cf. Figure 3) that regulated archaeological activity dates from 1985, with very few companies being present in the territory before that date.   These data therefore correspond to what I have been discussing so far. Since prospecting, and specifically those carried out within emergency archaeology, are the most numerous for my area of study, it is logical that the location of my sites is related to this. These data therefore correspond to what I have been discussing so far. Since prospecting, and specifically those carried out within emergency archaeology, are the most numerous for my area of study, it is logical that the location of my sites is related to this.

Between the Roman and Visigoth Periods: A Disparity in Documentation
The study includes two very different cultural periods from the point of view of the archaeological work carried out, so the documentation is uneven. Although the Roman period, especially up to the 4th century AD, is well studied, with ceramic typologies that allow the chronology of the sites to be adjusted, in the case of the Visigoths the situation is practically the opposite.
It is precisely from the fifth century onwards that a change in the area's ceramic production can be observed [31], and this can also be seen in the Iberian Peninsula and other European countries [32]. Production after this century has been little studied. Thus, J. Monnier indicates that, for his region of study, Switzerland, the results for the 4th and 5th centuries A.D. are distorted by the problems of dating archaeological material from the 4th and 5th centuries A.D. [33], the same problem that C. Gandini encounters for the early medieval period [34]. As is evident from the latest publications of the period the systematic study of ceramics is a major informant in understanding the complex peninsular world between the 5th and 8th centuries AD [35], so the few sequences that allow me to have typologies of the different facies of the period would allow me to adjust the chronologies of a large number of sites (An interesting work in this sense is the thesis dissertation of S. Chabert, defended in 2016 under the title of "La céramique en territoire arverne et sur ses marges de l'Antiquité tardive au haut Moyen Âge (fin IIIe -milieu VIIIe siècle). Approche chrono-typologique, économique et culturelle"; under the direction of F. Trément, has made it possible to establish a ceramic typology for Late Antiquity in the Auvergne that will serve as a basis for the study of the rural population of the region).
For my area of study, C. Zuza's (Gabinete TRAMA S.L.) thesis on the late ceramics of Pamplona, currently in progress, will be a very interesting starting point for Navarre. It will allow for the best dating of many of the sites in my area of study, as it will provide a basic typology based on the complete stratigraphy of the archaeological interventions carried out by TRAMA S.L. in Pamplona.

Between the Roman and Visigoth Periods: A Disparity in Documentation
The study includes two very different cultural periods from the point of view of the archaeological work carried out, so the documentation is uneven. Although the Roman period, especially up to the 4th century AD, is well studied, with ceramic typologies that allow the chronology of the sites to be adjusted, in the case of the Visigoths the situation is practically the opposite.
It is precisely from the fifth century onwards that a change in the area's ceramic production can be observed [31], and this can also be seen in the Iberian Peninsula and other European countries [32]. Production after this century has been little studied. Thus, J. Monnier indicates that, for his region of study, Switzerland, the results for the 4th and 5th centuries A.D. are distorted by the problems of dating archaeological material from the 4th and 5th centuries A.D. [33], the same problem that C. Gandini encounters for the early medieval period [34]. As is evident from the latest publications of the period the systematic study of ceramics is a major informant in understanding the complex peninsular world between the 5th and 8th centuries AD [35], so the few sequences that allow me to have typologies of the different facies of the period would allow me to adjust the chronologies of a large number of sites (An interesting work in this sense is the thesis dissertation of S. Chabert, defended in 2016 under the title of "La céramique en territoire arverne et sur ses marges de l'Antiquité tardive au haut Moyen Âge (fin IIIe -milieu VIIIe siècle). Approche chrono-typologique, économique et culturelle"; under the direction of F. Trément, has made it possible to establish a ceramic typology for Late Antiquity in the Auvergne that will serve as a basis for the study of the rural population of the region).
For my area of study, C. Zuza's (Gabinete TRAMA S.L.) thesis on the late ceramics of Pamplona, currently in progress, will be a very interesting starting point for Navarre. It will allow for the best dating of many of the sites in my area of study, as it will provide a basic typology based on the complete stratigraphy of the archaeological interventions carried out by TRAMA S.L. in Pamplona. For Salamanca, E. Ariño's work, based on a ceramic corpus from six sites, establishes the different typologies present in the late rural sites of Salamanca [36]. Finally, I highlight the recent work published in 2018 under the title of Cerámicas altomedievales en Hispania y su entorno (siglos V-VIII a.C.) (Early medieval ceramics in Hispania and its surroundings (5th-8th century BC)) [32], which includes a significant number of articles on ceramic studies in various regions of the Peninsula and which, without doubt, seems set to become a reference work.
Currently, there are practically no works on ceramics for the period in my study area. The most significant to date are those by J.A. Paz Peralta [37] and those by C. Zuza for Pamplona [38,39], but I do not yet have a typological synthesis for late ceramics. However, I do have a specific study of late materials, such as those of San Blas (Olite, Navarra) [40] or El Moro (Huesca) [41]. For the Roman period, the most recent reports usually provide a detailed explanation of the ceramic material, including forms and chronologies; in the case of the Visigoths, ceramic descriptions are usually limited to "Hispano-Visigothic pottery", "late-medieval grey pottery" or "Visigothic pottery". This has not allowed me to have precise chronologies for a large part of the sites in this chronology, and the dates are not very precise as they are very vague ceramic names that make it difficult to assign them chronologically by centuries. Therefore, in order to check the dynamics of the chronological resolution of the data in my study, I have made a graph in which the number of sites by centuries is presented in percentages according to the chronological accuracy of their data.
In the upper graph, I represent the evolution of available chronologies by centuries according to their resolution (century, two centuries, more than two centuries). As I approach the end of the period, the resolution of the chronologies is lower, presenting a resolution of more than two centuries for almost 60% of the sites from the 6th century A.D. On the other hand, I observe how the lines of evolution of "one century" and "more than two centuries", have an opposite dynamic. At the beginning of the period, the first one is almost at 60%; the second one is below 30%, closing the period in reverse. Finally, the line of "two centuries" remains constant, with small fluctuations between the fourth and fifth centuries AD. This graph shows us, therefore, that I am faced with very opposite chronological data resolutions, data of acceptable accuracy, or data with a very low resolution (Cf. Figure 4). For Salamanca, E. Ariño's work, based on a ceramic corpus from six sites, establishes the different typologies present in the late rural sites of Salamanca [36]. Finally, I highlight the recent work published in 2018 under the title of Cerámicas altomedievales en Hispania y su entorno (siglos V-VIII a.C.) (Early medieval ceramics in Hispania and its surroundings (5th-8th century BC)) [32], which includes a significant number of articles on ceramic studies in various regions of the Peninsula and which, without doubt, seems set to become a reference work. Currently, there are practically no works on ceramics for the period in my study area. The most significant to date are those by J.A. Paz Peralta [37] and those by C. Zuza for Pamplona [38,39], but I do not yet have a typological synthesis for late ceramics. However, I do have a specific study of late materials, such as those of San Blas (Olite, Navarra) [40] or El Moro (Huesca) [41]. For the Roman period, the most recent reports usually provide a detailed explanation of the ceramic material, including forms and chronologies; in the case of the Visigoths, ceramic descriptions are usually limited to "Hispano-Visigothic pottery", "late-medieval grey pottery" or "Visigothic pottery". This has not allowed me to have precise chronologies for a large part of the sites in this chronology, and the dates are not very precise as they are very vague ceramic names that make it difficult to assign them chronologically by centuries. Therefore, in order to check the dynamics of the chronological resolution of the data in my study, I have made a graph in which the number of sites by centuries is presented in percentages according to the chronological accuracy of their data.
In the upper graph, I represent the evolution of available chronologies by centuries according to their resolution (century, two centuries, more than two centuries). As I approach the end of the period, the resolution of the chronologies is lower, presenting a resolution of more than two centuries for almost 60% of the sites from the 6th century A.D. On the other hand, I observe how the lines of evolution of "one century" and "more than two centuries", have an opposite dynamic. At the beginning of the period, the first one is almost at 60%; the second one is below 30%, closing the period in reverse. Finally, the line of "two centuries" remains constant, with small fluctuations between the fourth and fifth centuries AD. This graph shows us, therefore, that I am faced with very opposite chronological data resolutions, data of acceptable accuracy, or data with a very low resolution (Cf. Figure 4).   In percentages with respect to the total number of phases in each chronology. Own elaboration.

General Characteristics of the Documentation
Therefore, after this brief review of the general characteristics of my documentation, I can summarize the main problems it presents in the following: In general, I can say that the data are scattered, partial, and incomplete.

•
Great variety of work scales, ranging from the isolated study of a set of materials found by chance (for example, San Blas in Olite) to the well-documented excavation of others (the case of El Mandalor). • Scarce data on some sites due to the lack of publication of the results compared to others that have unpublished memoirs, specific articles, and even monographs.

•
Differences in the quality and quantity of data in the administrative reports, in some cases with only the coordinates and in others with hundreds of detailed pages with the materials collected, plans and photographs.

•
Vague and imprecise chronologies, especially for the Visigothic period. Resolutions ranging from half a century to more than two centuries.

•
Great heterogeneity in the dates of data capture. From discoveries in the 19th century without other later mentions to recently found sites because of preventive archaeology. • Different techniques for collecting the data, with few spaces and sites where I have prospecting and excavations. In most cases, I have survey data, but the sites where I have both are minimal. • Different coordinate systems. In general, this is a minimal problem; most sites have WGS84 coordinates on the administration's archaeological maps or inventories, which are compatible with the ETRS89 system of web applications such as Iberpix (a cartographic viewer of the Government of Spain that aims to facilitate the location and visualization of places in the national territory).

Database Lifecycle Management: Applying a Business Data Model to Archaeology
When dealing with the management and processing of a large volume of data, it is necessary to take into account the management of the data life cycle, which is known as data lifecycle management (DLM) (Cf. Figure 5), used mainly in business management, but which I will adapt here to the archaeology. The DLM [42] puts the emphasis on aspects such as the design of the architecture, the development of the database, the processes that a data experiences in a certain company, its security measures, or its way of storage. A good management of the data life cycle allows organizational processes to be easier to plan and execute, as well as an optimization of resources. The DLM consists of four phases: data creation and capture, transmission, storage and security, analysis and finally publication. Studying the population through third party data is not an easy task, so it requires a clear starting methodology to manage and treat all the information considering its shortcomings. Based on this model, in my methodological process I set four stages: (1) The "collection of archaeological data"; in my case they come mainly from the bibliography, archaeological reports or doctoral theses, but in this phase I would also include those archaeological data produced by me (surveys and/or material studies); (2) The "storage and management of the db data", that is, once the data has been collected or created, I have to organize, store and manage it in order to work with it, Based on this model, in my methodological process I set four stages: (1) The "collection of archaeological data"; in my case they come mainly from the bibliography, archaeological reports or doctoral theses, but in this phase I would also include those archaeological data produced by me (surveys and/or material studies); (2) The "storage and management of the db data", that is, once the data has been collected or created, I have to organize, store and manage it in order to work with it, while establishing criteria to evaluate its quality. In addition, they must be in a system that is secure and that does not allow the loss of information; in my case I have chosen the ESRI and Access geodatabase as the DBMS (database management system); (3) The analysis and exploitation of the data in order to transform them into new archaeological knowledge; this implies the performance of statistics, spatial and alphanumeric analysis; (4) The dissemination of the data and the results of their treatment, that is, the future publication of the doctoral thesis.

The Collection of Archaeological Data
As I have already indicated, the main basis of my study is the corpus of archaeological sites. From it I carry out all the statistical and spatial analyses. Because of this, data collection is one of the fundamental aspects of the methodological process of my doctoral thesis. As exhaustiveness is my objective, in some cases the lack of data has not allowed me to reach the desired degree of exhaustiveness.
Initially, I have carried out a complete review of the administration's archaeological inventory. In the case of Aragon, the Cultural Assets Report and in the case of Navarre, the Archaeological Inventory of Navarre. The aim of this first exploration was to have a "corpus base" from which to locate the sites of my interest. Two of the main problems with the administration's inventories are: the fact that they are not continually updated, and the limited amount of data they provide, often limited to the coordinates, the code on the archaeological map and the name of the site. It should also be noted that the administration's data are not fully computerized. In the case of Navarre, the consultation of the "site sheets" is done on paper (The consultation is done on paper, but the Administration's technicians have a Geographic Information System that allows them to locate the sites in view of future preventive archaeology interventions. Access is restricted to the public. On the other hand, it is not possible to photocopy or photograph the site cards, having to copy by hand the data I am interested in from each one of them. The reports are also on paper, without being able to photocopy or photograph depending on the cases and only being able to consult a maximum number per day; this restriction was not taken into account by the technicians, since I did not have doctoral funding or enough time to be able to move continuously to the archives when I was working. In spite of the difficulties encountered, the work of the Archaeology technicians, who have given me their support and help in both administrations, should be valued), while in Aragon, through a pdf provided by the administration. Once I have carried out this first review, the next step has been a review of the monographs on the settlement and which I have already mentioned in previous pages. This review has been completed with the exhaustive revision of the indexes and articles of the specialised journals.
Once this third phase has been completed, I have proceeded to review a large number of archaeological reports, both from surveys and excavations, from both autonomous communities, with the aim of clarifying the information provided by the monographs, which is sometimes limited. The enormous volume of documentation kept in the administration's archives made it impossible to consult them all. It was decided to consult the available reports on archaeological prospecting from the last twenty years, in order to complete possible gaps in the documentation. In total, more than 250 reports have been consulted in both administrations (listed at the end of the thesis). In addition to the collection of data in the bibliography, the collection of data in the field is also included. Given the economic limitations I have, I have only chosen to carry out a prospecting campaign, in caves, financed within the POEM project of the University of Pau, during the year 2016. This first stage of data collection helped to make an initial assessment of the problems they presented, with the aim of establishing the criteria of certainty, precision, and accuracy to assess their quality.

Data Management: My Database Management System (SGBD)
When I talk about data management, I refer to the way in which data processes are organized and maintained, ensuring the data life cycle. In the first place, it is rarely possible to integrate data directly in the form in which it is available: most often it has to be transformed or supplemented or, in some cases, even created when it does not exist [43]. In my case, I have georeferenced geographical and archaeological data, but also non-spatial data. In other words, my data management must consider two components: the graphic or geometric part (base maps or planimetries), and the semantic or thematic part (alphanumeric tables).
The most common is to use the SGBD to store the thematic information and the GIS for the geometric and topological information. One of the functionalities of this model will be the linking of both types of information which is stored in completely different ways. This is the geo-relational data model. The spatial database involves the orderly structuring of the data so that spatial and statistical analyses can be carried out later. By database I mean the data set stored in a structured manner [44]. A spatial database is, therefore, one that allows an integration between the data and the spatial dimension. This has been a factor in valuing the use of geographic information, allowing the management of high volumes of data [44], but also work with spatial and non-spatial data.
After considering several options, I decided to create a geodatabase (GDB), because these have a complete information model to represent and manage geographic information. This comprehensive information model is implemented as a series of tables that store entity classes, raster datasets and attributes. A geodatabase is a native data structure from ArcGIS, one of the most widespread data storage systems.
There are several types of geodatabase, choosing for this work the personal (personal geodatabase). Compatible with Microsoft Access, it supports the storage, querying and management of both spatial and non-spatial data, with a maximum capacity of 2GB, and the inclusion of many data types. I chose this system for several reasons. Firstly, it supports the crossing of spatial data (vector or raster) and non-spatial data (Excel tables) and the possibility of making alphanumeric data entry through Access makes the work much faster, as well as allowing the creation of reports in the form of more visual site cards. On the other hand, I can carry out analyses in ArcGIS without the need to make an OLE between the database and the GIS program. Finally, it contains all the layers of the project in a single file, which can be transferred without the need to copy the shapefiles and the tables one by one. We are aware of the interoperability problems that my geodatabase has. However, since this was a personal work and a first attempt, we preferred to use a system that we mastered and that allowed us to easily move the database from one computer to another. Our priority was to find a solution adapted to our problems and then to evaluate an interoperable technical solution. However, in the future, we plan to make a registration system that allows multiple users to enter data. Without a doubt, the greatest advantage is the ease of creation within ArcGIS as the possibility of making explicit the relationships between the entities in the data model leads to a more agile consultation of the information. Among the disadvantages of this system is that only one person per connection can edit the data and that, from 250 Mb, the file can begin to give some problems.

Vagueness Management
The progressive increase in the amount of data produced makes it necessary to evaluate its quality in order to generate knowledge from its subsequent processing [45,46]. In the case of archaeological data, evaluating the quality of information is necessary in order to be able to measure the reliability of the data and to interpret the results of analyses carried out with these same data about past spatial structures and dynamics [47]. The most common is to have partial information on many remains that have not been systematically analysed or subjected to a regulated and prolonged excavation [48]. Facing the assessment of the quality of archaeological data means looking for a way to measure that quality. After having analysed my data set, I thought that the most appropriate thing to do was to try to manage vagueness as a solution, at least partially, to assess its quality. Intrinsically, data, archaeological or otherwise, have varying degrees of vagueness; of precision (how detailed they are), of accuracy (how well they represent what they are trying to describe), and of certainty (how confident I am about them). To be able to work with such a varied, partial, and imprecise amount of data is one of the challenges of today's archaeologist. This vagueness makes it difficult to carry out global analyses of sites over a wide area, as it does not allow work to be carried out jointly with all the archaeological sites. In this way, I have made a complete assessment of the data through vagueness management. This has allowed me to estimate, at least partially, the quality of the data, being a possible solution to the evaluation of the archaeological data. It is therefore a good starting point that needs to be refined and perfected, but which allows me to offer the reader more honest results. This moves me to combine the databases with GIS and vagueness management techniques for the study of population dynamics during late antiquity.
In this way, to measure the quality of my data, I have based my selves on the management of vagueness through three variables, (in)accuracy, (in)exactitude, and (un)certainty. These variables have been applied to three issues: the datation, the location, and the function of the settlement. The uncertainty (C) refers to situations where our knowledge about something is unclear or incomplete. For example, we state the statement: "the village of Liédena develops during the IV A.D."; we are considering the whole century, but this phenomenon occurred in a much more precise time frame which, due to the lack of data, we cannot specify. Thus, a fact is of better quality the less uncertain it is [49]. The inaccuracy (A) degree of variability of a measure taken repeatedly on an entity or the result of this one. The inexactitude (E) is any statement about an entity of this category remains true even when the quality expressed for such characteristics [50]. Exactitude (E) measures the distance between the observed value and a reference value that we think is right. Sometimes inexactitude is injected into an attribute so that is less uncertain. Applied to the function, this introduction of inexactitude is manifests itself, for example, when we talk about "an agricultural-residential structure", we are not clear about its typological ascription, so we reduce the vagueness epistemic with the injection of inexactitude.
Based on these definitions and considering our data set, we have applied it in a practical way to the study. For the location I have applied all three variables, while for the dating and the function, only two. In this way, I have applied a formula for each of the variables. For each of the issues and variables, I have assigned a number (from 1 to 3) (Cf. Figure 6), with 1 being a better grade and 3 a lower grade. To establish these grades, I have set a series of criteria considering my data set. Thus, to a site with a precise location I will assign a value of 1, while to a site with a great certainty about its dating, I will assign a 3. These values have allowed me to establish a series of formulas that allow me to value the quality. These formulas consider a greater or lesser weight of the mentioned variables depending on my data. Thus, for each of the variables (location, dating, typology), we created a simple formula that allows us to measure from 1 to 3 the quality of the data. dating, I will assign a 3. These values have allowed me to establish a series of formulas that allow me to value the quality. These formulas consider a greater or lesser weight of the mentioned variables depending on my data. Thus, for each of the variables (location, dating, typology), we created a simple formula that allows us to measure from 1 to 3 the quality of the data. Let's illustrate this with an example applied to location (Figure 6). For a site with the following values: E = 3, C = 2, A = 2, the quality of location value will be 2.4 (result of the operation: (3 × 0.40) + (2 × 0.40) + (2 × 0.20)). In this way, the values closest to 3 will be those with the worst quality (Cf. Figure 6), while those closest to 1 will be the best, with the values equal to 1 those with better quality. These criteria and their application in the different analysis will allow to show what the shortcomings of the data are.

Time Management
One of the most repeated problems in the literature is how to introduce time into the G.I.S. [49,51,52]. Indeed, the issue of time in databases is a much-debated topic [49,51,[53][54][55][56], and there are different proposals to try to solve the problem. Let's illustrate this with an example applied to location (Figure 6). For a site with the following values: E = 3, C = 2, A = 2, the quality of location value will be 2.4 (result of the operation: (3 × 0.40) + (2 × 0.40) + (2 × 0.20)). In this way, the values closest to 3 will be those with the worst quality (Cf. Figure 6), while those closest to 1 will be the best, with the values equal to 1 those with better quality. These criteria and their application in the different analysis will allow to show what the shortcomings of the data are.

Time Management
One of the most repeated problems in the literature is how to introduce time into the G.I.S. [49,51,52]. Indeed, the issue of time in databases is a much-debated topic [49,51,[53][54][55][56], and there are different proposals to try to solve the problem.
I have opted for numerical rather than textual forks. In my opinion, this is the best way to record the different phases of the sites. The figures may seem very accurate and precise, but it is a matter of prefixing chronological limits for each period, thus introducing a TPQ date and a TAQ date (Cf. Figure 7). This solves the problem of including text in the processing of the chronology. If I use different terms, I will not be able to locate the data later. That is, if for example for a given site indicating that it has a chronology of the "first quarter of the third century AD" for the second phase and for another site I indicate "first 25 years of the third century A.D.", there is a problem in searching for a chronology between the year 200 and 225: what terms should I include in my request for information to the database? LIKE "first 25 years_"? LIKE "first quarter_"? The possible questions are infinite, and, in many cases, I will manage to recover all the possible data for that question. However, if I start from the numerical time limits, I can make a numerical request to the numerical database. For example, if I want to know the sites that have a chronology of the 4th century AD, I will indicate "TPQ >= 300 AND TAQ < 399" date (Cf. Figure 7). Considering our data, we have established the following ranges by periods. The reader of the thesis can consult this table to know which criteria we consider and make the selection of the data according to their own criteria:  We apply the same criteria for the rest of the centuries Own elaboration.

Multiple Categories for the Same Settlements
If I review the work on late rural settlement in both France and Spain, the site typologies are almost as varied as the case studies [29,[57][58][59][60]. The main categories, such as villa or farm, are very recurrent; others such as villula, farmhouse, village, and a long etcetera pepper the pages of the specialized bibliography [10,[61][62][63][64][65]. In a general way, it is possible to group the typologies of the rural settlements in several groups.
The first hierarchies for the rural world in Roman times had a limited number of categories. In Spain, the one established by M. Ponsich for the lower Guadalquivir stands out. Based on the materials and the surface occupied by the sites, it was one of the pioneering works. This author We apply the same criteria for the rest of the centuries Own elaboration.

Multiple Categories for the Same Settlements
If I review the work on late rural settlement in both France and Spain, the site typologies are almost as varied as the case studies [29,[57][58][59][60]. The main categories, such as villa or farm, are very recurrent; others such as villula, farmhouse, village, and a long etcetera pepper the pages of the specialized bibliography [10,[61][62][63][64][65]. In a general way, it is possible to group the typologies of the rural settlements in several groups.
The first hierarchies for the rural world in Roman times had a limited number of categories. In Spain, the one established by M. Ponsich for the lower Guadalquivir stands out. Based on the materials and the surface occupied by the sites, it was one of the pioneering works. This author distinguishes between big agglomerations, villae, farm, and shelters. On the same dates, C. García Merino [66], another of the precursors in the study of rural population, proposes a typology for the Meseta and Alto Ebro quite similar to the previous one, based on four categories for the rural settlements: castra, vici, villae and dispersed settlements. As I see, these first proposals are quite limited in the number of categories, with little variation between the studies of the moment. From the eighties/nineties and until today, I can associate the typologies in four groups: (1) those that take into account the geographical location; (2) those that mainly use Latin terms from the sources; (3) those that make a combination of the previous ones, and (4) those that dispense with Latin terms and opt for other expressions.
Of the first group, the one made by I. García Camino for the late Basque Country stands out. The author sets out three categories: sites located on the plain, those located on high ground and those located near the sea [58]. This same line follows the typology of P. Aparicio, which groups the settlements according to their geographical location; it configures sets of deposits according to their proximity to rivers or their geographical location [67].
In the second group, which consists of categories of Latin sources, I include the works of J. Andreu for the settlement of the area around Los Bañales [20] and those of Cabeza Ladrero [21], where the Latin terms take precedence over others of a more geographical nature. For late times they include terms such as villula which, in my opinion, are very complicated to characterize in prospecting. Within the third group (mixed categories) there is a large body of work. T. Cordero, for the territory of Emerita Augusta, uses Latin terms combined with other more generic ones: villa, villa/vicus, religious centre, funerary space, rural prey, undefined, villula, constitute his typology, adding an extra class that gathers the decontextualised materials [68]. In the Portuguese case, I emphasize the classification of A. Carneiro. This author proposes four classes: villa, vicus e aldeias, sítios de funções viáriasandcasais agrícolas [69]. These categories take into account the terminology of the sources but have been reconsidered on the basis of the survey data. For his part, R. Barbosa established six types of habitat for the western part of the Serra d'Ossa: villa, farm, casal, little site, vicus and mansio, including a fairly detailed description that takes into account the surface area, materials, and location of the sites. In addition to these habitat categories, there are also necropolises and "other sites", which includes fortifications, bridges, quarries, undetermined sites and fortified settlements [70]. In the French case, for the territory of the city of Elusa, a distinction is made between villa, farm, and tiles site (This last category is quite recurrent in French literature, and refers to sites that, in survey, yielded a fairly large amount of tegulae, but in which we have no other indication of ( [71]). Another interesting typology, for southwestern France, is that of C. Balmelle, C. Petit-Aupert and P. Vergain, who propose the terms aristocratic villa, high-rise habitat, cave habitat, farm and "other sites" [72].
Finally, in the fourth group would be the work of B. Ode and T. Odiot [73], who differentiate between "presumed temporary habitat", which includes the caves, the "clustered habitat", sites on the level and at height, such as secondary agglomerations, "scattered settlements" ("long-term habitat", that is, the traditional villae, sites that have been reoccupied or where there has been a break in the habitat) and, finally, the foundations.
It can be seen how, in general, the village is maintained with its Latin word in practically all the typologies of rural population, both in France and in Spain or Portugal. However, others, such as vicus, vary. Some studies use this term, but prefer other more neutral expressions such as caserío, poblado or aldea. In the case of mansions, something similar occurs, sometimes preferring expressions such as "passing habitat" or "sites of road functions".
A common feature among all the papers cited so far is that their authors preestablish different criteria for each of the proposed categories. However, there are other researchers who have wanted to go one step further and opt for automatic typologies obtained through multi-criteria factor analysis [34]. Instead of presenting criteria and including sites that meet these characteristics, the categories are created retrospectively from the data entered in the database. This is the case of C. Gandini's doctoral thesis, who calls their site types "level 1", "level 2", etc., in line with the research carried out by the Archeomedes project [74,75]. A similar attempt at semi-automatic categorization is the typology of P. Aparicio. This author forms groups of sites based on the particularities of their location [67,76].
As I can see, there are many possibilities when it comes to establishing a typology for rural population, and these vary according to the area of study and, above all, the origin of the data. It is not the same to work with my own data, as in the case of C. Gandini or P. Aparicio, as it is to work with data obtained by third parties, as in my case.
Precisely, one of the factors that I believe has the most influence when it comes to developing a typology is the origin of the data. That is to say, as has been shown in some works and as it is able to verify in the study day "Traiter les données archéologiques tardo-antiques: approches, méthodes, et traitements de l'information" (Organized by L. Tobalina Pulido, A. Campo and S. Cabes at the UPPA and held on April 12, 2019. The complete program can be found on the page of the laboratory ITEM EA 3002: https://item.hypotheses.org/5372?fbclid= IwAR1MI2M9O98xwK6PGGX4RDg0WySQmK0ryRPuqXWxltQ9P4oLRBOLX7LcEy8 [Consulted on 12/09/2019]), the data from excavation provides me with greater insight into the phases of the sites. The prospecting data, sparser in vertical stratigraphy, allows me to obtain interesting results in horizontal and wider time spans.
In the case of my doctoral thesis, I discarded from the outset the idea of using the names of classical sources given the complications this entails with the material record I have. For example, in the case of the vici [77][78][79][80], if we are purists, I should have an inscription explaining that this is such a place, which is quite anecdotal. In recent years, several studies have been carried out on the vicissitudes, but there are still few studies that characterise this type of minor settlement as being like important rural enclaves with a range below civitates. On the other hand, since I work with data from third parties, I establish an automatic typology following the steps of the project Archeomedes [34] involved a few problems.
In the first place, my study area has not been studied in a systematic way; on the other hand, the surveys carried out are very varied and with quite partial and heterogeneous data; Finally, access to administrative data is more difficult than in France, where there is greater standarisation of prospection/excavation reports, but also of archaeological inventories, unified through the SRA and INRAP. This is not the case in Spain, where archaeological documentation varies greatly from one community to another. Therefore, I have chosen a typology that takes into account the large traditional categories of rural population (such as the villae), but incorporating others that use more neutral expressions such as "Farming", "Habitat of passage" or, simply "rural" for the undetermined sites of non-urban character. In addition, each of the classes has an assigned code consisting of a letter or a letter and a number that allows me to determine whether the function is precise or not. Thus, for example, a site with the letter A (rural site) will be less precise than one in category A21 (large villae).
Of the 351 rural sites reviewed, I am unable to categorize 152, or 43.30% of the total. In the case of other works with similar characteristics such as that of C. Gandini, the percentages are quite similar, with the author having classified 46.36% of the sites compared to 53.64% not categorized [34], while J.D. Laffite doesn't manage to categorize 30.5% and does categorize 69.5% [81].
Therefore, taking all this into account, I will now break down the typological classification that I have established. The letters indicate the general category, while the numbers establish the types within it. Thus, for example, the letter A indicates that I am dealing with a rural site, while A2 indicates that I am dealing with a village, that is, a type of rural site of an aristocratic and agricultural nature.
Categories A, B, and C correspond to the habitat, while the successive categories are complementary types such as necropolis, mausoleum, surveillance sites or religious enclaves. Finally, category H, which is between habitat, funerary and other uses that I do not know. For the cave sites I have two ways to register them in the database because of their particularities. Firstly, in the table "sites" I indicate the location of the archaeological entities on the ground, indicating whether they are on a plain, hill, slope or in a cave. Secondly, I have created a specific category for sites in caves that are only frequented in the late Roman period for which I do not have data that allows me to establish their function.
In the cases where I had doubts between two of the subclasses, I have opted for one of them, adding in the sections of certainty and precision that it is information of dubious quality. This is the case, for example, with El Mandalor. Qualified by its discoverers as a villa, if I analyze the materials and structures located, it could be a non-villa agricultural exploitation, the pars rustica or part of a farmhouse (The authors of the intervention themselves commented that in the last investigations they are more inclined to say that it was a farm or part of a farm. I have opted for the first option; the other three hypotheses are also feasible with the archaeological data available ( [82]). Below, I explain the criteria established for each of the classes of my typology.
A. Rural: Sites that I know to be of a rural nature. A1. Rural Ind: I cannot determine more about its function. A2. Villa: The site presents luxury elements such as glass, tesserae, mosaics, hypocaust remains, as well as storage and table ceramics, or I have the planimetry that indicates a thermal area, a living area with a patio.
A21. Big villa: It has the elements typical of a villa (indicated in A2), but in small quantity (more than 100 fr. of materials) or its plant and has dimensions over 1.5 ha.
A22. Little Villa: It has the elements typical of a villa (indicated in A2), but in small quantities (less than 100 fr. of materials) or its plant is small (less than 1.5 ha).
A3. Agricultural Exploitation No Villa: rural site of lesser entity than the villa and without luxury elements, but it presents enough elements that identify it with an agricultural exploitation, such as presses or great amount of dolia, presence of construction materials such as tegulae and imbrices and remains of buildings. It presents terra sigillata, common pottery, kitchen pottery. Extension less than 2 ha.
A4. Transit Habitat: Mansio/statio type habitat. The site is located close to an important Roman road, with remains of compartmentalization in rooms and services.
B. COTTAGE: Habitat spaces in which the presence of construction materials (tegulae, imbrices, stone blocks) and interesting ceramic assemblies (terra sigillata, common pottery, kitchen pottery, storage pottery) can be distinguished. They sometimes present remains of house structures. They do not present remains of mosaics. They do not present elements of luxury decoration (mosaics, marble) or aristocratic comfort (heating elements). It can be from one or several houses to a group of houses. I dispense with the use of "village" because with the archaeological data I have I cannot determine the level of entity of the sites of this type.
C. Urban: site that corresponds to an administrative center of a territorial district, presenting diverse public services and that acts as an axis in the articulation of the population. D. Funeral: site dedicated to funeral commemorations, of whatever type, it can be a tomb or grave, a mausoleum or a necropolis. D1. Funeral Without Apparently Associated Habitat: site of funerary character, without having an associated nearby D2. Funeral Associated with Habitat: site of a funerary nature, with a habitat in the vicinity with which it seems to be associated.
F. Religious F11. Church: temple intended for public religious worship, whatever its chronology. F12. Chapel: a small temple for public religious worship, located in the countryside, with a medieval or later chronology. F13. Monastery: religious centre where a residence of monks is located, usually far from a town. G. Military: site that is related to some function of surveillance, road/track control and or passages, without being able to determine its exact function.
H. Cave Frequency: Frequency of a cave with undetermined function. I cannot determine the type of frequency.
I am aware that there may be gaps in some categories, but I believe that it is the one that best fits my data set and that, in turn, allows, at least in a general way, to compare the results with other studies.

THE GDB Data Model: From the Archaeological Entity to the Archaeological Functional Unit
Before creating the GDB, it is necessary to carry out a process of theoretical reflection about the characteristics of my data and how to carry out the data model. This is what is known as "conceptual data modelling" [83]. It allows me to sort the data according to levels and the relationships between them in order to respond to the issues raised in the most appropriate way. However, I must be clear that however accurate and precise the spatial database may be, it does not represent the real world and will be an abstraction [84] or simplification [85] of the reality.
The design of a database is broken down into the conceptual, logical, and physical model, steps that I must follow for the development of a good GIS. Given the importance I attach to the theoretical conception of the data model, I believe it is appropriate to break down the entire process of reflection that I have followed until the development of the final geodatabase, since the research maturity that I have acquired since I began the research in 2014 made me modify the initial approach on several occasions (One of the main problems has been the interrupted continuity that I have had throughout the thesis due to periods of work outside the doctoral thesis in order to finance my research. The fact that the research has suffered continuous pauses without being able to have a full dedication means that the reflection process is interrupted on numerous occasions and, with this, the theoretical maturation is slower). Following the suggestions of some professors, mainly K. C. Bruhn (Professor of geo-informatics at the University of Mainz (Germany), specialist in digital humanities. He was one of the communicators and mentor teachers who participated in the Archaeology workshop at Casa de Velázquez in 2017), I will develop my approach trying to show the theoretical evolution that my research has suffered since its beginning until today.
The first step is to answer the three basic questions of the problem to be dealt with: what, where, when, that is, the temporal and spatial dimensions and the topic to be analysed (Cf. Figure 8). To these questions I should add "How?" (the tools and methodology I am going to use) and "what if" (situations that may come later, for example, adding more data, incorporating other information, etc.).
Information 2020, 11, x FOR PEER REVIEW 17 of 25 (situations that may come later, for example, adding more data, incorporating other information, etc.). In my case, the answer to those questions was a priori easy. "what?": The rural habitat during late antiquity; "where?": between the Ebro and the Pyrenees chain; when? Between the 3rd and 7th centuries A.D; "how?": By connecting spatial data with non-spatial data in order to manage, process and analyze the data. The question "what if?" was first posed to me with the isolated findings (did I incorporate them or not?) and, later, with the structures and materials (what if I wanted to include the inventories of each site in the future?).
The second issue was to be aware of the characteristics of the data set (mainly based on previous data collection). Thus, after a first assessment, the situation was as follows: around 350 very varied elements with a marked heterogeneity and of very diverse origin (administration, reports, monographs, etc.), at different scales, very incomplete and coming from both prospecting and excavation interventions. In my case, the answer to those questions was a priori easy. "what?": The rural habitat during late antiquity; "where?": between the Ebro and the Pyrenees chain; when? Between the 3rd and 7th centuries A.D; "how?": By connecting spatial data with non-spatial data in order to manage, process and analyze the data. The question "what if?" was first posed to me with the isolated findings (did I incorporate them or not?) and, later, with the structures and materials (what if I wanted to include the inventories of each site in the future?).
The second issue was to be aware of the characteristics of the data set (mainly based on previous data collection). Thus, after a first assessment, the situation was as follows: around 350 very varied elements with a marked heterogeneity and of very diverse origin (administration, reports, monographs, etc.), at different scales, very incomplete and coming from both prospecting and excavation interventions.
Based on these questions and their corresponding answers, I started to elaborate sketches of the different possibilities trying to check which elements could help to answer my main problem: to study the evolution of the late population evaluating the quality of the data. There are three fundamental aspects to reflect upon: the sites, the chronology and the levels of certainty, exactitude, and accuracy. From the first drafts I started to elaborate different models. There are three steps in the development of a data model (Firstly, the specification of the content, that is to say, identifying the real-world information that I need and that I believe to be sufficient to understand the study phenomenon; secondly, establishing the logical model, transcribing the elements evoked during the previous phase into a database representation language (classes, attributes and relationships between the elements); finally, the physical model, that is to say, implementing the logical model into a geographic information system (the creation of the tables themselves, with all their fields, types and relationships)) (Cf. Figure 9). The first physical model I developed consisted of 3 alphanumeric and 1 spatial table ("sites"). As we can see, a fairly simple construction where I took into account four elements, the "sites", the "phases" of these and the documented materials and structures. The relationships are made from 1-N starting from "sites" to "phases" and from "phases" to "structures" and "materials". After testing this design, I soon realized that it was not feasible for two reasons. Indeed, the large amount of data, but above all its partiality, made it unfeasible, by the time I had the thesis, to review the material of all the sites and include a precise inventory for each of the phases of the sites. On the other hand, this model did not allow me to contain isolated findings. The site table was too complex, with many variables that could not be completed for most isolated discoveries.
The first problem was solved in a relatively simple way; I chose to dispense with the "materials" table (second model), simply incorporating some "yes/no" boxes for the different types of materials found, without taking into account the quantity of these. This is a loss of a level of information, but it allows me to concentrate on a better analysis of the archaeological sites. For the problem of isolated finds, I did not have such a quick solution, which was going to lead me to rethink the structure of the database. On the other hand, I started to think about the "structures" table. On the one hand, the nonspatialization of the structures perhaps meant a loss of interesting information to see the evolution of the sites according to them, but it was a job that was temporarily unfeasible. On the other hand, including them without spatialization (as they are up to that moment) would not provide me with the information I am looking for by including them. In addition, the bias of the data for the lateancient structures, not only regarding their chronology but also their function, led me to seek a new approach to this part of the thesis.
The model had been significantly reduced but the problem of structures and isolated findings remained a problem. Together with the directors, I asked myself several questions:

•
Is it feasible to record all the structures for each phase or do I not have enough data? Do I have the material time to do so? Will entering them in the database provide me with significant The structure of a database requires time, reflection and, above all, trial and error. The conception took me more than 12 months and its elaboration 3-4 months. This difference shows me that almost 80% of the creation of a database is its approach. A good reflection can avoid later changes that slow down my work, as well as speed up the entry of information. A correctly designed database allows you to access up-to-date and accurate information.
The first physical model I developed consisted of 3 alphanumeric and 1 spatial table ("sites"). As we can see, a fairly simple construction where I took into account four elements, the "sites", the "phases" of these and the documented materials and structures. The relationships are made from 1-N starting from "sites" to "phases" and from "phases" to "structures" and "materials". After testing this design, I soon realized that it was not feasible for two reasons. Indeed, the large amount of data, but above all its partiality, made it unfeasible, by the time I had the thesis, to review the material of all the sites and include a precise inventory for each of the phases of the sites. On the other hand, this model did not allow me to contain isolated findings. The site table was too complex, with many variables that could not be completed for most isolated discoveries.
The first problem was solved in a relatively simple way; I chose to dispense with the "materials" table (second model), simply incorporating some "yes/no" boxes for the different types of materials found, without taking into account the quantity of these. This is a loss of a level of information, but it allows me to concentrate on a better analysis of the archaeological sites. For the problem of isolated finds, I did not have such a quick solution, which was going to lead me to rethink the structure of the database. On the other hand, I started to think about the "structures" table. On the one hand, the non-spatialization of the structures perhaps meant a loss of interesting information to see the evolution of the sites according to them, but it was a job that was temporarily unfeasible. On the other hand, including them without spatialization (as they are up to that moment) would not provide me with the information I am looking for by including them. In addition, the bias of the data for the late-ancient structures, not only regarding their chronology but also their function, led me to seek a new approach to this part of the thesis.
The model had been significantly reduced but the problem of structures and isolated findings remained a problem. Together with the directors, I asked myself several questions: It was precisely the last two questions that gave me the clues to follow about this part of the model. Thus, the situation was as follows at this point: I removed the material table, reflected on the structures and their treatment in the thesis, and looked for a way to dissociate the isolated findings from the sites, but being able to carry out the analyses of both together. This led me to rethink the database and data model. I decided to reflect on the terms used, but also on the structure and relationship between the different elements. From this process, the idea of not spatialising the sites but "spatial entities" emerged. That is, until that moment I am spatialising the sites table, something necessary to be able to carry out subsequent analyses, but I could establish an intermediate step to deal with "sites" and "isolated findings" together at a spatial level, but dissociate them in the next stage of the model in specific alphanumeric tables. I needed to find my own concepts for this. I decided to turn not only to archaeology, but also to geography and computer science in search of the most appropriate terms.
Finally, I chose to maintain "archaeological site" for a coherent set of remains bearing a functional and/or chronological unit (villa, medieval abbey, Neolithic dolmen, . . . ) [87] and "isolated finding" for a minimal archaeological evidence without association with other materials and outside of an archaeological context [87], while "archaeological entity" would group the two previous ones. This allowed to georeference all the elements, including the milestones, in the same spatial file, but at the same time to offer the specific particularities of each type of entity in a separate table. I had solved one of the problems, but I still had the inconvenience of how to articulate this new model with the "phases" and "structures" (which I left in the process of reflection in previous lines). Finally, I opted to only consider the phases of the "sites", including a specific chronology in the table of "isolated findings" which, being normally one or two elements, present a more concrete chronology without phases.
For the structures, I chose to stay at a higher level, that I called "functional archaeological unit" and defined as a set of data which, within a site, presents the same functionality (funeral, residential, artisanal, etc.) on a macro scale. I am aware of the arbitrariness of this option, but it allows me to have a more detailed level of information without having to descend to the structure level, which would require me to document both isolated walls and unidentified buildings.
Once the model is created and tested, I configure my final GDB (cf. simplified physical data model). This is composed of one vector type file or shapefile, point-geometry (This is a macro study of the population, so I chose to work with points and not polygons, but it is possible to evolve this table into polygons if in the future I want to use the database for example for administrative issues of delimitation of protection areas), called "Archaeological entities (EA)" with ETRS89 reference system (It is a reference system linked to the stable part of the European continental plate. It is the official coordinate system in Spain), WGS84 compatible, and two main alphanumeric tables, "sites (complex AE)" and "isolated findings (simple AE)". The relationship between the first and the other two is 1-1 through the change Num_EA, primary key. That is, each number assigned to an entity is unique to carry out the relationships between tables without data duplication. Let me say that an archaeological entity has a number 025 in the table "archaeological entities"; this identification will be identical in the table "sites" and there will not be another similar record in the table "site" or "isolated findings" or "archaeological entities". The code is a correlative Arabic number that does not consider the location of the element and is assigned according to the order of data entry, therefore being a random and discontinuous numbering.
On the other hand, I have two secondary tables, the "Archaeological Functional Units" table, which is related to the 1-N "sites" table, that is, for a site I can have one or several functional units. Finally, the "phases" table is related to the 1-N "sites" table because I can have several phases per site. This would be the general data logic model, i.e., with the main GDB tables. Other tables are added to them: the "isolated findings" table is related from 1-1 to the "milestones" and "other findings" tables. This distinction allows me to include specific fields for a fundamental epigraphic element in the articulation of the territory and the roads, such as milestones. In addition, a series of tables are included that allow me to manage vagueness (Cf. Figure 10). On the other hand, I have two secondary tables, the "Archaeological Functional Units" table, which is related to the 1-N "sites" table, that is, for a site I can have one or several functional units. Finally, the "phases" table is related to the 1-N "sites" table because I can have several phases per site. This would be the general data logic model, i.e., with the main GDB tables. Other tables are added to them: the "isolated findings" table is related from 1-1 to the "milestones" and "other findings" tables. This distinction allows me to include specific fields for a fundamental epigraphic element in the articulation of the territory and the roads, such as milestones. In addition, a series of tables are included that allow me to manage vagueness (Cf. Figure 10).
Instead, once my database is elaborated, users will only be able to visualize one level, while the rest (structure, management, etc.) remains in an abstract level that is not visible, showing only the friendliest part of the database. The interface of the GBD has been made with Access, trying to make it "friendly" and facilitating both the search and the printing of results through the creation of reports. It consists of an input screen with several buttons that facilitate data entry. From this first screen I can access the sites register and isolate findings and milestones. Archaeological entities can only be modified from ArcMap, but I can consult the data from the Access interface. Finally, from this screen I can access three subforms, linked to the previous table (Entities): "sites", "isolated findings" and "milestones", which I show below. Instead, once my database is elaborated, users will only be able to visualize one level, while the rest (structure, management, etc.) remains in an abstract level that is not visible, showing only the friendliest part of the database. The interface of the GBD has been made with Access, trying to make it "friendly" and facilitating both the search and the printing of results through the creation of reports. It consists of an input screen with several buttons that facilitate data entry. From this first screen I can access the sites register and isolate findings and milestones. Archaeological entities can only be modified from ArcMap, but I can consult the data from the Access interface. Finally, from this screen I can access three subforms, linked to the previous table (Entities): "sites", "isolated findings" and "milestones", which I show below.

Conclusions
The methodology used was a challenge for a research like the one proposed, as it tried to objectify, homogenize, and evaluate the information of the interpretation of the data made by other authors. It is not possible to cover all aspects, but it is possible to reach an acceptable degree of standardization and homogenization that allows me to work with the archaeological information together. In the methodological approach made in my thesis, and after having applied it to all my data, I can distinguish several strengths and others to be improved in the future.
There are several weak points. The standardization of data leads to a loss of information; to be able to work at the same level I have to necessarily renounce certain issues such as managing several scales of work (in my case I have used the macro) or having all the detailed information of the stratigraphy of each site (I have had to simplify the phases and evolution of the enclaves). On the other hand, I have evaluated the quality of the archaeological data, but I have not carried out this process for the geographical ones. A future challenge is to be able to manage the vagueness of both together. Another aspect that I need to improve is the treatment of duplicates in the recording system, since working with such a large amount of data usually avoids the repetition of certain information. A better automated identification of the data would allow me to reduce these errors.
Thirdly, the interface I have used (Access and ArcGIS) allows for individual work but does not support simultaneous collective work (for example, for a collaborative project to improve the method among several researchers from different universities). A more agile registration of information through a "web" interface would allow a lighter data entry and would avoid the duplication of data in the related tables. Nevertheless, the database can be operated by third parties in both ArcGIS and QGIS. In addition, for our postdoctoral project we will try to develop a more interoperable database.
Finally, I have only performed simple descriptive statistics calculations, which has sometimes limited the analyses and results. Performing relational factor analyses will open new avenues of work. However, my proposal admits a great number of possible combinations, only with the data of the sites or of these with their quality. In my work, I have carried out some of the possible ones, but the variations are almost unlimited.
As for the most remarkable aspects, I list the following. The management of vagueness as a method for assessing data quality is a line of research with which I must continue. Little explored in the archaeological field, it has allowed me to make a previous evaluation of the data I have from three perspectives: location, dating and function. Although for the moment it has only been applied to my case study and I have not been able to verify and evaluate its functioning in other projects, I believe that it is a good method for carrying out global settlement studies. Likewise, the procedure allows me to standardize the information of a substantial corpus that contains data from many different sources. Finally, the combination of the evaluation of the quality of the data with the spatial analysis allows me to make a more complete and honest study of the archaeological reality, which allows me to make the calculations considering different scenarios.