A Mexican Enhanced Dataset of Pollutant Releases and Transfers (2004 to 2022) with IARC Cancer Classifications
Abstract
1. Introduction
2. Materials and Methods
2.1. Data Acquisition
- Facilities dataset:The obtained facilities dataset () comprises 19 CSV files in raw format that require data preparation, imputation, and consolidation into a single, homogenized dataset. A facility is an organization (public or private) with some economic activity (commercial, industrial, services, etc.) that produces pollutant releases (to air, water, or soil) or transfers (to sewerage, co-processing, final disposal, incineration, recycling, reuse, treatment, and other destinations) of substances. For this work, we employed a six-stage methodology to address the inconsistencies identified in the dataset (duplicated data, different values for the same field, or unaccepted symbols). Figure 2 shows how the passes through the six processing stages () and produces a new version of the dataset (called ) for each stage until the consolidated dataset is obtained (). In this pipeline, i denotes the stage of the data curation life cycle.
- Pollutant Releases and Transfers dataset: The obtained pollutant releases and transfers dataset () reports the amounts of pollutant releases and transfers by facilities from 2004 to 2022. It contains 19 CSV files in raw format that require data preparation, imputation, and consolidation into a single, integrated, and homogenized dataset. It consists of 20 fields, as described in Table A2. For this curation, we applied a five-stage methodology to address the inconsistencies identified in . Figure 3 shows how the passed through the five processing stages () and produced a new version of the dataset (called ) for each stage until the consolidated dataset was obtained (). The i index corresponds to the number of the processing stage.
2.2. Data Curation
2.2.1. Field Homogenization
- Facilities Stage : We observed differences by analyzing the numbers and names of the fields in the CSV files; some annual reports have missing fields, different field numbers, or different names referring to the same field. By comparing these differences, we identified three distinct formats and created three corresponding groups (, , and ). These groups contain the years’ values in the same format.
- Pollutant Releases and Transfers Stage : We also analyzed the fields from the 19 Pollutant Releases and Transfers annual reports and identified the same discrepancies as in the facilities dataset. By comparing these discrepancies, we identified three distinct formats and created three corresponding groups (, , and ). These groups contain the years’ values in the same format. By comparing these groups, we detected missing fields. Table 2 shows the number of missing fields per group and which fields were incorporated with a default value. Inserting one missing field of into the dataset represented a modification of 82.4% (141,629 records updated) of the 171,860 total records, with representing a modification of 9.11% (15,667 records updated) and representing a modification of 8.47% (14,564 records updated).
2.2.2. File Concatenation
- Facilities dataset Stage : We read the 19 CSV files (), assuming they have the same field names and order, and concatenated them. As a first step, the 2004 data was used as the base to concatenate the other files at the end (2005 to 2022 CSV files). As a result, we obtained a single file () containing 50,908 records, which could be used as a dataset.
- PRT dataset Stage : When the 19 dataset files () had the same field names and order, we concatenated them as we did for the facilities dataset. We obtained the dataset, which can serve as an independent dataset and contains 171,860 records.
2.2.3. Category Homogenization
- State name: There are 32 states in Mexico. We analyzed the state name field in the and datasets. However, the dataset contains 64 values for the state name field because each state had two format names: uppercase and camel case (i.e., DURANGO and Durango). In some specific cases, the state names were larger (i.e., COAHUILA as Coahuila de Zaragoza) or were updated (i.e., Distrito Federal as Ciudad de Mexico). We defined a consistent and official catalog of states by using the states’ names according to the National Institute of Statistics and Geography (INEGI, by its Spanish acronym)4. During , we updated the values of the state name field in following the process described previously. In this process, we updated 164,451 records (95.68%) of the 171,860 total records for this field. Considering the total of 21 fields, this update accounted for 4.55% of the dataset.
- Municipality name: In Mexico, each state is divided into different municipalities (political and legal jurisdictions). The state with the fewest municipalities is Baja California Sur (5), and that with the most is Oaxaca (570)5 In the dataset (), the values of the municipality name field have multiple versions: uppercase, camel notation, abbreviations, extra blank spaces, and names with or without accents. To tackle this inconsistency, we created a comprehensive, official municipal name catalog for each state based on INEGI data. Then, we matched this municipality name catalog with the municipality name field in , removing blank spaces and special characters and converting the content to lowercase. Then, we updated the values in by using the corresponding value of the municipality’s name catalog. However, some municipalities’ names did not match and required a manual update. In this homogenization process, we updated 16,580 records (32.56%) of the original 50,908 records. As in the state name field, the field now contains content similar to the original values but in a homogenized, usable format for analysis.
- Sector name: We detected several typos and extra blank spaces between characters in the values in the sector name field. To solve these inconsistencies, we created a catalog with the 11 official sectors’ names of the Unique Environmental License (LUA, by its Spanish acronym)6 [39] managed by Semarnat, as shown in Table 3.
2.2.4. Incorporation of Identification Codes
- Facilities dataset Stage : We added the state and municipality geospatial codes as fields ( and , respectively), producing the dataset (described in Table A4). This addition was also crucial for the entire integration process because the cve_ent and cve_mun fields became common across facilities and pollutant release and transfer datasets; thus, they relate these datasets to accurate geolocation of state and municipality names.At this point, the facilities dataset contains 27 fields. In this manner, the dataset produced by this process could be integrated with another public dataset by intersecting the geospatial codes.
- PRT dataset Stage : We incorporated a new field with the state geolocation code called (described in Table A4) by using the geolocation codes of INEGI to identify regions in the Mexican territory. As a result, the dataset () now contains 22 fields.
2.2.5. Facility Location
- Coordinate Transformation ()
- Geolocation Reassignment ()
- Measurement Unit Homogenization ()
- To convert grams (categories ‘g’ and ‘g/año’) to ‘Kg/año’, we divided the fields’ value by 1000 and updated the category value to ‘Kg/año’. In this manner, we updated the values of 315 records (0.18%) and applied the corresponding conversion to the pollutant release fields affecting the following records: 313 for and 2 for .
- To convert tons (categories ‘ton’ and ‘ton/año’) to ‘Kg/año’, we multiplied the fields’ value by 1000 and updated the category value to ‘Kg/año’. We updated the values of 56,122 records (32.65%). We applied the corresponding conversion to the pollutant release fields affecting the following records: water (31,724), air (22,026), soil (591), sewerage (1675), finaldisposition (1356), incineration (15), others (384), recycling (713), sewageforreuse (104), and treatment (368).
- We updated the category values of ‘Kg’, ‘kg’, and ‘kg/año’ to ‘Kg/año’ (Kg/year). The 115,423 records (67.16%) were updated, and no conversion was applied to other fields.
2.2.6. Incorporated New Fields
2.2.7. Inconsistency Detection
- The NRA code must not include blank spaces. Given this restriction, we corrected 16 records by removing excess blank spaces in the dataset. Additionally, we detected black spaces and removed them from .
- The original RETC dataset must have one record per facility per year. Still, by grouping records by the and fields in , we identified facilities with duplicate records per year. These duplicated records may have been inserted in error instead of the real facility record. However, these inconsistencies were removed from the dataset to prevent future conflicts during integration. As a result, at this point, the dataset has 50,906 records. The 36 fields of the resultant facilities dataset () are described in Table A5.
2.2.8. IARC Substances Dataset
- 1: Carcinogenic to humans (129 agents);
- 2A: Probably carcinogenic to humans (96 agents);
- 2B: Possibly carcinogenic to humans (321 agents);
- 3: Not classifiable as to its carcinogenicity to humans (499 agents).
2.3. Dataset Integration
- Integration of pollutant releases and transfers and IARC substances: It is the required process to join the last version of the pollutant releases and transfers dataset () with the IARC substances dataset () to produce a new dataset ( in Figure 1).
- Integration of the RETC dataset with cancer classification groups: It is the required process to join the dataset obtained in the previous integration process () with the last version of the facilities database () to produce an enhanced and augmented database ( in Figure 1) as the final product (the enhanced and augmented Mexican RETC dataset with cancer classification groups).
- Integration of Pollutant Releases and Transfers and IARC Substances
- Integration of RETC Datasets with Cancer Classification Groups
| Algorithm 1 Incorporation of the IARC group into the pollutant releases and transfers dataset. |
|
| Algorithm 2 Integration process for the pollutant releases and transfers with IARC cancer group classification and the facilities datasets. |
|
3. Data Records
- —Contains all values for all fields in a clear form; however, values of latitude and longitude fields of facilities were mapped to the geolocation of their corresponding municipality (the location of the facility) to prevent the exact geolocation of the facilities. Access to this version with the sensitive data can be requested by email to the corresponding author.
- —Contains all the fields but anonymizes the sensitive fields of facilities. Data from facilities were anonymized using hashing, preserving the consistency of the entire dataset. The anonymized fields (16) are nra, facility name, street name, colony name, external number, internal number, between street 1, between street 2, locality name, postal code, industrial park name, computed latitude, computed longitude, north latitude, west longitude, UTMX coordinate, and UTMY coordinate. The values for the remaining fields are presented in a clear format. This dataset is publicly available as follows:
- Dataset identifier: https://doi.org/10.5281/zenodo.17100697
- Dataset correspondence: hugogreyesa@gmail.com
- Title: RETC20042022-IARC136
- Publisher: Zenodo
- Publication year: 2025
4. Technical Validation
4.1. Exploratory Results
4.2. Carcinogenic Substances (IARC 1) Released into the Air in Mexico
5. Discussion
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
Appendix A. Datasets Description
Appendix A.1. Dataset Field Descriptors
| # | Field Name | Description |
|---|---|---|
| 1 | Main Activity | Category that identifies the activity carried out by the facility |
| 2 | Semarnat Activity | Category assigned by Semarnat to describe the activity carried out by the facility |
| 3 1 | Street name | The proper noun with which the road is identified |
| 4 | Environmental code | Environmental code assigned by Semarnat: (https://dsiappsdev.semarnat.gob.mx/formatos/DGGCARETC/FF-SEMARNAT-033-Licencia-Ambiental-Unica.docx accessed on 10 September 2025) that represents the sector or subsector required for an application for a Single Environmental License |
| 5 1 | Postal code | The five-digit code defined by the Mexican Postal Service that is used to identify and locate geographical areas of the country and the post office that is responsible for the reception and distribution of mail (postal matter) in said area |
| 6 | NAICS Code | North American Industrial Classification System (NAICS) code to which the facility belongs (region) |
| 7 1 | Colony name | Proper name with which the human settlement is identified. A locality is almost attached to a city or town, which functions as part of it, which is why it generally has some of its services, equipment, and authority |
| 8 1 | UTMY Coordinate | Latitude coordinate value using the Universal Transverse Mercator (UTM) coordinate system |
| 9 1 | UTMX Coordinate | Longitudinal coordinate value using the UTM system |
| 10 | NAICS description | Description of the code found in the NAICS code field |
| 11 1 | Between Street 1 | A street that, being in front of the facility, is located on the left side |
| 12 1 | Between Street 2 | A street that, being in front of the facility, is located on the right side |
| 13 | State name | Name of the United Mexican State where the facility that released the substance is located |
| 14 1 | Latitude | The distance that exists between the facility and the equator, measured on the meridian that passes through the location point |
| 15 1 | Locality name | Name assigned to a locality by law or custom |
| 16 1 | Longitude | The distance that exists between the facility and the Greenwich meridian, measured on the parallel that passes through the location point |
| 17 | Municipality name | Name of the municipality where the facility that released the substance is located. |
| 18 1 | Facility name | Name of the facility that reports the pollutant released or transferred |
| 19 1 | Environmental registration number | Identifier that Semarnat generates for the “facility” of the Physical or Legal Person (applicant) obliged to carry out some procedure in environmental matters, which relates it to the RFC of the applicant and the municipality where it is located, unique, and non-transferable. |
| 20 1 | Internal identification number | The numerical value that identifies an address within a property |
| 21 1 | External identification number | The numerical value that identifies one or more properties on a road |
| 22 | Industrial park name | The geographically delimited surface designed especially for the settlement of the industrial plant in adequate conditions of location, infrastructure, equipment, and services, with permanent administration for its operation |
| 23 | Industrial sector name | Name of the industrial sector: (https://biblioteca.semarnat.gob.mx/janium/Documentos/Ciga/libros2009/CG009816.pdf accessed on 10 September 2025) to which the facility belongs. |
| 24 | Industrial subsector name | Name of the industrial subsector to which the facility belongs |
| # | Field Name | Description |
|---|---|---|
| 1 | Emission of the substance to water | Substance in any physical state released, directly or indirectly, to water |
| 2 | Emission of the substance to air | Substance in any physical state released, directly or indirectly, to air |
| 3 | Emission of the substance to soil | Substance in any physical state released, directly or indirectly, to soil |
| 4 | State name | Name of the United Mexican State where the facility that released the substance is located |
| 5 | Substance group | Group that identifies the substance |
| 6 | Municipality name | Name of the municipality where the facility that released the substance is located |
| 7 | Substance name | Name of the substance emitted or transferred by the facilities subject to reporting. This substance must be found within the list of 200 substances of interest for the NOM-165-Semarnat-2013 |
| 8 2 | Facility name | Name of the facility that reports the pollutant release |
| 9 | Name of the industrial sector | Name of the industrial sector (https://biblioteca.semarnat.gob.mx/janium/Documentos/Ciga/libros2009/CG009816.pdf accessed on 10 September 2025) to which the facility belongs |
| 10 | CAS Number | A unique number assigned by the Chemical Information Service (CAS), a division of the American Chemical Society, to every uniquely identifiable substance |
| 11 2 | Environmental registration number | Identifier that Semarnat generates for the “facilities” of the Physical or Legal Person (applicant) obliged to carry out some procedure in environmental matters, which relates it to the RFC of the applicant and the municipality where it is located, unique, and non-transferable |
| 12 | Substance transfer to final disposal | Transfer of substances to a site that is physically separated from the facility that generated them to deposit or permanently confine waste in sites and facilities whose characteristics allow for the prevention of their release into the environment and the consequent effects on the health of the population and the ecosystems and their elements |
| 13 | Substance transfer to incineration | Transfer to reduce the volume and decompose or change the physical, chemical, or biological composition of solid, liquid, or gaseous waste through thermal oxidation in which all combustion factors, such as temperature, retention time, and turbulence, can be controlled to achieve efficiency and effectiveness and meet pre-established environmental parameters. This definition includes pyrolysis, gasification, and plasma but only when the combustible byproducts generated in these processes are subjected to combustion in an oxygen-rich environment |
| 14 | Substance transfer to others | Transfer labeled as other |
| 15 | Substance transfer to recycling | Transfer of substances to transform waste through processes that allow its value to be restored, thereby avoiding its final disposal if this restitution favors energy and raw material savings without harm to health, ecosystems, or their elements |
| 16 | Substance transfer to treatment | Transfer of substances to carry out physical, chemical, biological, or thermal procedures, through which the characteristics of the waste are change and its volume or danger is reduced |
| 17 | Substance transfer to sewage | Transfer of substances to discharge, infiltrate, deposit, or inject wastewater into a receiving body or sewer |
| 18 | Substance transfer to reuse | Substances are transferred for using previously used material or waste without a transformation process |
| 19 | Substance transfer to co-processing | Transfer of substances for their environmentally safe integration of waste generated by an industry or known source as an input to another production process |
| 20 | Measure unit | Measurement unit used to report the release of a pollutant substance |
| # | Field | Description |
|---|---|---|
| 1 | CAS No. | A unique number assigned by the Chemical Information Service (CAS), a division of the American Chemical Society, to every uniquely identifiable substance |
| 2 | Agent | Name of the substance |
| 3 | Group | Represents the group assigned to the substance by the IARC based on the existing scientific evidence for carcinogenicity |
| 4 | Volume | Edition of the monograph where the substance was included |
| 5 | Volume publication year | Represents the publication year of the IARC monographs where the substance was included |
| 6 | Evaluation year | Year when the IARC recollects information about the substance to evaluate it and determine its group |
| 7 | Additional Information | Contains additional information to justify the assigned group or comments to clarify updates of the IARC group value |
Appendix A.2. Fields in the Original Datasets
| Field | Field Name in Dataset | Description |
|---|---|---|
| Year | year | Represents the year of publication of the RETC dataset |
| State code | cve_ent | A two-digit code that identifies the geostatistical Mexican state in which the facility is located (values from 01 to 32). This code was incorporated based on the state’s name and the key assigned by the INEGI to identify each Mexican state uniquely |
| Municipality code | cve_mun | A three-digit code that identifies the geostatistical municipality in which the facility is located (values from 001 to 553; not all 570 municipalities have RETC registers). This code was incorporated based on the municipality’s name and the key assigned by the INEGI to identify each municipality in Mexico. |
| Are the location values in DMS format? | indmsformat | A boolean value that is true if the latitude and longitude values were captured in the Degree Minutes and Seconds (DMS) format and false otherwise, used to identify which records require the Decimal Degrees (DD) transformation process |
| Do the location values have a default value? | dmsdefaultvalue | A boolean value that is true if the latitude and longitude contain invalid values or a default value and false otherwise, used to identify which records require the Decimal Degrees (DD) transformation process |
| Are the location values in Mexico? | inmexico | A boolean value that is true if the latitude and longitude are located in the Mexican territory () and false otherwise. It is used to identify which records require the set default location value process. |
| Are the location values in their corresponding state? | instate | A boolean value that is true if the latitude and longitude are located in the Mexican state (), as determined using the state code and the INEGI Mexican states shape/polygon, and false otherwise. It is used to identify which records require the set default location value process. |
| Are the location values in their corresponding municipality? | inmunicipality | A boolean value that is true if the latitude and longitude are located in the municipality () of the Mexican state, as determined using the municipality code and the INEGI Mexican municipality shape/polygon, and false otherwise. It is used to locate which records require the set default location value process. |
| Computed latitude | computedlatitude | The result of applying decimal degree transformation to the latitude if the value of the field is true; otherwise, it contains the original latitude registered in the RETC dataset. |
| Computed longitude | computedlongitude | The result of applying decimal degree transformation to the longitude if the value of the field is true; otherwise, it contains the original longitude registered in the RETC dataset. |
| Obtained municipality latitude | municipalitylatitude | The latitude value obtained by querying the municipality and state names to the Google Maps API if the value of is true; otherwise, the valid latitude is captured or transformed |
| Obtained municipality longitude | municipalitylongitude | The longitude value obtained by querying the municipality and state name to the Google Maps API if the value of is true; otherwise, the valid longitude is captured or transformed |
Appendix A.3. Field Status for the Updated Datasets
| # | Field | Field Name in Dataset | Status | # | Field Name | Field Name in Dataset | Status |
|---|---|---|---|---|---|---|---|
| 1 | Main Activity | mainactivity | Not updated | 19 | UTMY Coordinate | utmxcoordinate | Partially updated |
| 2 | Street name | street | Not updated | 20 | UTMX Coordinate | utmycoordinate | Partially updated |
| 3 | Postal code | postalcode | Not updated | 21 | SCIAN description | sciandescription | Partially updated |
| 4 | Colony name | colony | Not updated | 22 | Between Street 1 | betweenstreet1 | Partially updated |
| 5 | State name | statename | Not updated | 23 | Between Street 2 | betweenstreet2 | Partially updated |
| 6 | Latitude | northlatitude | Not updated | 24 | Industrial subsector name | industrialsubsectorname | Partially updated |
| 7 | Locality name | localityname | Not updated | 25 | Year | year | New |
| 8 | longitude | westlongitude | Not updated | 26 | State code | cve_ent | New |
| 9 | Municipality name | municipalityname | Not updated | 27 | Municipality Code | cve_mun | New |
| 10 | Facility name | facilityname | Not updated | 28 | Are the location values in DMS format? | indmsformat | New |
| 11 | Environmental registration number | nra | Not updated | 29 | Do the location values have a default value? | dmsdefaultvalue | New |
| 12 | Internal identification number | internalnumber | Not updated | 30 | Are the location values in Mexico? | inmexico | New |
| 13 | External identification number | externalnumber | Not updated | 31 | Are the location values in their corresponding state? | instate | New |
| 14 | Industrial park name | industrialparkname | Not updated | 32 | Are the location values in their corresponding municipality? | inmunicipality | New |
| 15 | Industrial sector name | industrialsectorname | Not updated | 33 | Computed latitude | computedlatitude | New |
| 16 | Semarnat activity | semarnatactivity | Partially updated | 34 | Computed longitude | computedlongitude | New |
| 17 | Environmental code | enviromentalcode | Partially updated | 35 | Obtained municipality latitude | municipalitylatitude | New |
| 18 | NAICS Code | naicscode | Partially updated | 36 | Obtained municipality longitude | municipalitylongitude | New |
| Field Name | Status | % of Updated Records in the Field | % of the Updated Content of the DataBase | Field Name | Status | % of Updated Records in the Field | % of the Updated Content of the DataBase |
|---|---|---|---|---|---|---|---|
| Main Activity | Not Updated | 0 | 0 | Latitude | Not Updated | 0 | 0 |
| Semarnat Activity | Partially updated | 91.81 | 3.67 | Locality name | Not Updated | 0 | 0 |
| Street name | Not Updated | 0 | 0 | longitude | Not Updated | 0 | 0 |
| Environmental code | Partially updated | 8.18 | 0.32 | Municipality name | Not Updated | 0 | 0 |
| Postal code | Not Updated | 0 | 0 | Facility name | Not Updated | 0 | 0 |
| NAICS Code | Partially updated | 8.18 | 0.32 | Environmental registration number | Not Updated | 0 | 0 |
| Colony name | Not Updated | 0 | 0 | Internal identification number | Not Updated | 0 | 0 |
| UTMY Coordinate | Partially updated | 6.2 | 0.24 | External identification number | Not Updated | 0 | 0 |
| UTMX Coordinate | Partially updated | 6.2 | 0.24 | Industrial park name | Not Updated | 0 | 0 |
| NAICS description | Partially updated | 8.18 | 0.32 | Industrial sector name | Not Updated | 0 | 0 |
| Between Street 1 | Partially updated | 85.6 | 3.42 | Industrial subsector name | Partially updated | 14.39 | 0.57 |
| Between Street 2 | Partially updated | 85.6 | 3.42 | Year | New | 100 | 4 |
| State name | Not Updated | 0 | 0 |
| Field Name | Status | % of Updated Records in the Field | % of the Updated Content of the Dataset | Field Name | Status | % of Updated Records in the Field | % of the Updated Content of the Dataset |
|---|---|---|---|---|---|---|---|
| Emission of the substance to water | Not Updated | 0 | 0 | Substance transfer to final disposal | Not Updated | 0 | 0 |
| Emission of the substance to air | Not Updated | 0 | 0 | Substance transfer to final incineration | Not Updated | 0 | 0 |
| Emission of the substance to soil | Not Updated | 0 | 0 | Substance transfer to others | Not Updated | 0 | 0 |
| State name | Not Updated | 0 | 0 | Substance transfer to recycle | Not Updated | 0 | 0 |
| Substance group | Partially updated | 91.52 | 4.35 | Substance transfer to treatment | Not Updated | 0 | 0 |
| Municipality name | Partially updated | 9.11 | 0.43 | Substance transfer to sewerage | Not Updated | 0 | 0 |
| Substance name | Partially updated | 0 | 0 | Substance transfer to reuse | Not Updated | 0 | 0 |
| Facility name | Not Updated | 0 | 0 | Substance transfer to co-processing | Not Updated | 0 | 0 |
| Name of the industrial sector | Not Updated | 0 | 0 | Measure unit | Not Updated | 0 | 0 |
| CAS Number | Not Updated | 0 | 0 | year | New | 100 | |
| Environmental registration number | Not Updated | 0 | 0 |
| 1 | The cancer agency of the World Health Organization. |
| 2 | These links correspond to Mexican government institutions. These sites have issues with their SSL certificates (outdated). Web browsers automatically attempt to access them by using the https protocol, but the links only respond correctly when accessed via http (non-secure protocol). Then, users must manually ensure they are using the http protocol and accept access to these pages without an SSL certificate. |
| 3 | Categorical values of the field can be used for filtering and grouping. |
| 4 | The INEGI is an autonomous public agency responsible for regulating and coordinating the National System of Statistical and Geographical Information, as well as for collecting and disseminating information about Mexico in terms of territory, resources, population, and economy; Available online at this link: https://www.inegi.org.mx/rnm/index.php/catalog/78/download/3300 accessed on 10 September 2025. |
| 5 | Data available from the INEGI: https://cuentame.inegi.org.mx/territorio/division/default.aspx?tema=T accessed on 10 September 2025. |
| 6 | The LUA is an official authorization based on the regulation for the operation and functioning of fixed sources under federal jurisdiction regarding the atmosphere. |
| 7 | https://www.gob.mx/conagua accessed on 15 September 2025. |
| 8 | Available for download: http://geoportal.conabio.gob.mx/metadatos/doc/html/dest2018gw.html accessed on 10 September 2025 http://geoportal.conabio.gob.mx/metadatos/doc/html/muni_2018gw.html accessed on 10 September 2025. These links correspond to Mexican government institutions. These sites have issues with their SSL certificates (outdated). Web browsers automatically attempt to access them by using the https protocol, but the links only respond correctly when accessed via http (non-secure protocol). Then, users must manually ensure they are using the http protocol and accept access to these pages without an SSL certificate. (Do not request by HTTPS protocol, just use HTTP) see Note 2. |
References
- García Arrazola, R.; Rojas, O.; Rodarte Ramón, H.; Martínez Sandoval, P.; Ramiréz, R. RETC (Registro de Emisiones y Transferencia de Contaminantes) como un instrumento para elevar la competitividad de las empresas en México. Investig. Ambient. Cienc. Política Pública 2010, 2, 41–44. [Google Scholar]
- Pacheco-Vega, R. Non-State Actors and Environmental Policy Change in North America: A Case Study of the “Registro de Emisiones y Transferencia de Contaminantes” (RETC) in Mexico. Pik Rep. 2001, 352. [Google Scholar]
- Gobierno del Estado de Jalisco. Reetc: Registro Estatal de Emisiones y Transferencia de Contaminantes. 2008. Available online: https://semadet.jalisco.gob.mx/medio-ambiente/calidad-del-aire/registro-estatal-de-emisiones-y-transferencia-de-contaminantes (accessed on 10 September 2025).
- Gobierno del Estado de Tabasco. Retc: Registro de Emisiones y Transferencia de Contaminantes. 2008. Available online: https://tabasco.gob.mx/retc (accessed on 10 September 2025).
- Gobierno del Estado de Guanajuato. RETC: Registro de Emisiones y Transferencia de Contaminantes Guanajuato. 2024. Available online: https://smaot.guanajuato.gob.mx/sitio/informacion-sobre-tramites/98/Registro-de-Emisiones-y-Transferencia-de-Contaminantes-(RETC)-de-Guanajuato (accessed on 10 September 2025).
- Government of Canada. Pollutant Release and Transfer Register: Organization for Economic Co-Operation and Development. 2024. Available online: https://www.canada.ca/en/environment-climate-change/corporate/international-affairs/partnerships-organizations/pollutant-release-transfer-registers.html (accessed on 20 September 2025).
- Wine, O.; Hackett, C.; Campbell, S.; Cabrera-Rivera, O.; Buka, I.; Zaiane, O.; DeVito, S.; Osornio-Vargas, A. Using pollutant release and transfer register data in human health research: A scoping review. Environ. Rev. 2014, 22, 51–65. [Google Scholar] [CrossRef]
- Berthiaume, A. Unveiling under-utilized public data on Canadian industrial pollutant transfers and disposals. J. Air Waste Manag. Assoc. 2024, 74, 664–684. [Google Scholar] [CrossRef]
- Johnston Edwards, S.; Walker, T.R. An overview of Canada’s National Pollutant Release Inventory program as a pollution control policy tool. J. Environ. Plan. Manag. 2020, 63, 1097–1113. [Google Scholar] [CrossRef]
- Berthiaume, A. Use of the National Pollutant Release Inventory in environmental research: A scoping review. Environ. Rev. 2021, 29, 329–339. [Google Scholar] [CrossRef]
- United States Environmental Protection Agency. Toxics Release Inventory (TRI) Around the World. Available online: https://www.epa.gov/toxics-release-inventory-tri-program/tri-around-world (accessed on 20 September 2025).
- IIngwersen, W.W.; Li, M.; Young, B.; Vendries, J.; Birney, C. USEEIO v2.0, the US environmentally-extended input-output model v2.0. Sci. Data 2022, 9, 194. [Google Scholar] [CrossRef] [PubMed]
- Varady, R.G.; Colnic, D.; Merideth, R.; Sprouse, T. The US-Mexican Border Environment Cooperation Commission: Collected perspectives on the First Two years. J. Borderl. Stud. 1996, 11, 89–119. [Google Scholar] [CrossRef]
- Villareal, M.; Fergusson, I.F. The North American Free Trade Agreement (NAFTA). 2017. Available online: https://ecommons.cornell.edu/server/api/core/bitstreams/223a1bce-953b-428f-a271-c806405361de/content (accessed on 20 September 2025).
- Jacott, M.; Reed, C.; Winfield, M. The Generation and Management of Hazardous Wastes and Transboundary Hazardous Waste Shipments Between Mexico, Canada and the United States Since NAFTA: A 2004 Update; Texas Center for Policy Studies: Austin, TX, USA, 2004. [Google Scholar]
- Semarnat. Secretaría de Medio Ambiente y Recursos Naturales. 2024. Available online: https://www.gob.mx/semarnat (accessed on 15 September 2025).
- Semarnat. Registro de Emisiones y Transferencia de Contaminantes (RETC). 2024. Available online: http://sinat.semarnat.gob.mx/retc/retc/index.php (accessed on 10 September 2025). (Do not request by HTTPS protocol, just use HTTP, see note 2).
- Diario Oficial de la Federación. Norma Oficial Mexicana NOM-165-Semarnat-2013: Que Establece la Lista de Sustancias Sujetas a Reporte Para el Registro de Emisiones y Transferencia de Contaminants. 2014. Available online: https://biblioteca.semarnat.gob.mx/janium/Documentos/Ciga/agenda/DOFsr/DO3231.pdf (accessed on 10 September 2025).
- Qian, H.; Ren, F.; Gong, Y.; Ma, R.; Wei, W.; Wu, L. China Industrial Environmental Database 1998–2015. Sci. Data 2022, 9, 259. [Google Scholar] [CrossRef] [PubMed]
- Tian, X.; Liu, Y.; Xu, M.; Liang, S.; Liu, Y. Chinese environmentally extended input-output database for 2017 and 2018. Sci. Data 2021, 8, 256. [Google Scholar] [CrossRef]
- Chakraborti, L.; Shimshack, J. Environmental disparities in urban Mexico: Evidence from toxic water pollution. Resour. Energy Econ. 2022, 67, 101281. [Google Scholar] [CrossRef]
- Aguilera, A.; Bautista, F.; Gutiérrez-Ruiz, M.; Ceniceros-Gómez, A.; Cejudo, R.; Goguitchaichvili, A. Heavy metal pollution of street dust in the largest city of Mexico, sources and health risk assessment. Environ. Monit. Assess. 2021, 193, 193. [Google Scholar] [CrossRef] [PubMed]
- Geissen, V.; Ramos, F.; de J. Bastidas-Bastidas, P.; Díaz-González, G.; Bello-Mendoza, R.; Huerta-Lwanga, E.; Ruiz-Suárez, L. Soil and water pollution in a banana production region in tropical Mexico. Bull. Environ. Contam. Toxicol. 2010, 85, 407–413. [Google Scholar] [CrossRef]
- Orta-García, S.; Ochoa-Martinez, A.; Carrizalez-Yáñez, L.; Varela-Silva, J.; Pérez-Vázquez, F.; Pruneda-Álvarez, L.; Torres-Dosal, A.; Guzmán-Mar, J.; Pérez-Maldonado, I. Persistent organic pollutants and heavy metal concentrations in soil from the Metropolitan Area of Monterrey, Nuevo Leon, Mexico. Arch. Environ. Contam. Toxicol. 2016, 70, 452–463. [Google Scholar] [CrossRef]
- Briseño-Bugarín, J.; Araujo-Padilla, X.; Escot-Espinoza, V.; Cardoso-Ortiz, J.; Torre, J.; López-Luna, A. Lead (Pb) Pollution in Soil: A Systematic Review and Meta-Analysis of Contamination Grade and Health Risk in Mexico. Environments 2024, 11, 43. [Google Scholar] [CrossRef]
- Castrezana Campos, M. Geografía del cáncer de mama en México. Investig. Geogr. 2017, 93. [Google Scholar] [CrossRef]
- Wang, S.; Mulligan, C. Occurrence of arsenic contamination in Canada: Sources, behavior and distribution. Sci. Total Environ. 2006, 366, 701–721. [Google Scholar] [CrossRef]
- Razo, I.; Carrizales, L.; Castro, J.; Díaz-Barriga, F.; Monroy, M. Arsenic and heavy metal pollution of soil, water and sediments in a semi-arid climate mining area in Mexico. Water Air Soil Pollut. 2004, 152, 129–152. [Google Scholar] [CrossRef]
- Osuna-Martínez, C.; Armienta, M.; Bergés-Tiznado, M.; Páez-Osuna, F. Arsenic in waters, soils, sediments, and biota from Mexico: An environmental review. Sci. Total Environ. 2021, 752, 142062. [Google Scholar] [CrossRef]
- Bhattacharya, P.; Welch, A.; Stollenwerk, K.; McLaughlin, M.; Bundschuh, J.; Panaullah, G. Arsenic in the environment: Biology and chemistry. Sci. Total Environ. 2007, 379, 109–120. [Google Scholar] [CrossRef]
- Hilpert, M.; Mora, B.; Ni, J.; Rule, A.; Nachman, K. Hydrocarbon release during fuel storage and transfer at gas stations: Environmental and health effects. Curr. Environ. Health Rep. 2015, 2, 412–422. [Google Scholar] [CrossRef] [PubMed]
- Weisel, C. Benzene exposure: An overview of monitoring methods and their findings. Chem.-Biol. Interact. 2010, 184, 58–66. [Google Scholar] [CrossRef]
- Hsieh, P.; Shearston, J.; Hilpert, M. Benzene emissions from gas station clusters: A new framework for estimating lifetime cancer risk. J. Environ. Health Sci. Eng. 2021, 19, 273–283. [Google Scholar] [CrossRef]
- INEGI National Statistical Directory of Economic Units. 2024. Available online: https://en.www.inegi.org.mx/app/descarga/?ti=6 (accessed on 11 November 2024).
- Sohrabizadeh, Z.; Sodaeizadeh, H.; Hakimzadeh, M.; Taghizadeh-Mehrjardi, R.; Ghanei Bafghi, M. A statistical approach to study the spatial heavy metal distribution in soils in the Kushk Mine, Iran. Geosci. Data J. 2023, 10, 315–327. [Google Scholar] [CrossRef]
- CONAGUA National Water Commission (CONAGUA): Water Quality in Mexico. 2024. Available online: https://www.gob.mx/conagua/articulos/calidad-del-agua (accessed on 20 September 2024).
- Ali, H.; Khan, E.; Ilahi, I. Environmental chemistry and ecotoxicology of hazardous heavy metals: Environmental persistence, toxicity, and bioaccumulation. J. Chem. 2019, 2019, 6730305. [Google Scholar] [CrossRef]
- Samet, J.; Chiu, W.; Cogliano, V.; Jinot, J.; Kriebel, D.; Lunn, R.; Beland, F.; Bero, L.; Browne, P.; Fritschi, L.; et al. The IARC monographs: Updated procedures for modern and transparent evidence synthesis in cancer hazard identification. JNCI J. Natl. Cancer Inst. 2020, 112, 30–37. [Google Scholar] [CrossRef] [PubMed]
- Semarnat. Licencia Ambiental úNica (LUA), Actualización de LAU y Licencia de Funcionamiento en la Zona Metropolitana del Valle de México y Sectores de Tratamiento de Residuos Peligrosos, petróLeo y Petroquímica a Nivel Nacional. 2020. Available online: https://dsiappsdev.semarnat.gob.mx/datos/portal/publicaciones/proc/715-DGGCARETC/09._Actualizacion_de_lau_y_lf.pdf (accessed on 10 November 2024).
- IARC. Monographs on the Evaluation of Carcinogenic Risks to Humans; World Health Organization, International Agency for Research on Cancer: Lyon, France, 2014. [Google Scholar]














| Group Name | Original Fields’ Number | Incorporated Fields’ Number | Incorporated Fields’ Names | Years Considered for Group |
|---|---|---|---|---|
| 21 | 3 | Semarnat activity, Between Street 1, and Between Street 2 | 2006 to 2011, and 2013 to 2022 | |
| 20 | 4 | NAICS code, NAICS description, Industrial subsector name, Environmental code | 2004 and 2005 | |
| 20 | 4 | UTMX coordinate, UTMX coordinate, Semarnat activity, Industrial subsector name | 2012 |
| Group Name | Original Fields’ Number | Incorporated Fields’ Number | Incorporated Fields’ Names | Years per Group |
|---|---|---|---|---|
| 19 | 1 | Substance group | 2006 to 2011, and 2013 to 2022 | |
| 18 | 2 | Substance group, Municipality name | 2004 and 2005 | |
| 20 | 0 | 2012 |
| 1. Petroleum and petrochemicals | 7. Cement and lime |
| 2. Chemistry | 8. Asbestos |
| 3. Paints and inks | 9. Glass |
| 4. Metallurgy (includes steel) | 10. Electrical power generation |
| 5. Automotive | 11. Hazardous waste treatment |
| 6. Pulp and paper |
| 12. Food and human consumption | 18. Wood-based products |
| 13. Articles and products composed of different materials | 19. Others |
| 14. Metal articles and products | 20. Business support services, hazardous waste management (does not include treatment), and remediation services |
| 15. Plastic articles and products | 21. Health and care services |
| 16. Beverages and tobacco | 22. Textiles, fibers, and threads |
| 17. Electronic, electrical, and household equipment and items |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Reyes-Anastacio, H.G.; Lopez-Arevalo, I.; Gonzalez-Compean, J.L.; Crespo-Sanchez, M.; Calderon, J.; Aguirre-Meneses, H. A Mexican Enhanced Dataset of Pollutant Releases and Transfers (2004 to 2022) with IARC Cancer Classifications. Data 2025, 10, 191. https://doi.org/10.3390/data10110191
Reyes-Anastacio HG, Lopez-Arevalo I, Gonzalez-Compean JL, Crespo-Sanchez M, Calderon J, Aguirre-Meneses H. A Mexican Enhanced Dataset of Pollutant Releases and Transfers (2004 to 2022) with IARC Cancer Classifications. Data. 2025; 10(11):191. https://doi.org/10.3390/data10110191
Chicago/Turabian StyleReyes-Anastacio, Hugo G., Ivan Lopez-Arevalo, Jose L. Gonzalez-Compean, Melesio Crespo-Sanchez, Jaqueline Calderon, and Heriberto Aguirre-Meneses. 2025. "A Mexican Enhanced Dataset of Pollutant Releases and Transfers (2004 to 2022) with IARC Cancer Classifications" Data 10, no. 11: 191. https://doi.org/10.3390/data10110191
APA StyleReyes-Anastacio, H. G., Lopez-Arevalo, I., Gonzalez-Compean, J. L., Crespo-Sanchez, M., Calderon, J., & Aguirre-Meneses, H. (2025). A Mexican Enhanced Dataset of Pollutant Releases and Transfers (2004 to 2022) with IARC Cancer Classifications. Data, 10(11), 191. https://doi.org/10.3390/data10110191

