1. Introduction
Cities have become central spaces for improving living conditions and economic development, yet rapid urbanization has also exposed deep disparities in access to essential resources, environmental quality, and social well-being [
1]. As population, infrastructure, services, and economic activity increasingly concentrate in urban areas, cities have emerged as key sites where environmental, social, and health-related challenges unfold [
2,
3]. At the same time, the spatial organization of cities plays a critical role in determining access to the benefits of urban life, including opportunities, services, and well-being [
4]. Current sustainable urban development guidance increasingly emphasizes innovation, inclusiveness, and adaptability, but this raises questions about how such principles are operationalized in local contexts [
5].
In this context, international organizations increasingly promote the identification and documentation of “best practices” (BPs) as a way to catalog lessons learned and support learning across countries in the implementation of sustainable urban development agendas [
6]. BPs are commonly understood as initiatives that demonstrate effective responses to urban challenges and whose experiences may inform policy development and offer transferable knowledge for other contexts [
7]. In the literature, however, the significance of BPs lies not only in the replication of specific interventions, but also in the circulation of broader forms of policy knowledge, including ideas, principles, governance arrangements, and institutional designs [
8,
9]. From this perspective, BPs are closely related to processes of policy transfer and policy mobility, through which urban policy knowledge travels across places and is reinterpreted according to local needs, resources, and institutional contexts [
10,
11]. Within this literature, Dolowitz and Marsh [
12] distinguish degrees of transfer along a spectrum that runs from direct copying through emulation, hybridization, and broad inspiration. The typology is more than descriptive. Each transfer mode places different demands on the knowledge format that supports it. Copying requires highly codified procedural and material specification, emulation depends on transferable institutional logics, hybridization presupposes context-sensitive adaptation, and inspiration tolerates the greatest level of abstraction. A reporting format therefore does not merely document what was done. It conditions which transfer modes a given practice can plausibly support elsewhere. Therefore, their value does not lie in providing ready-made solutions, but in offering documented experiences that can support policy learning, inform decision-making, and encourage the diffusion of potentially effective strategies in urban development [
1,
13,
14]. Nevertheless, the diffusion of BPs depends on active processes of promotion, framing, and legitimization carried out by networks of actors and institutions, including international organizations, consultants, and professional communities [
15].
Recognizing BPs as a standardized reporting genre rather than as a neutral collection of cases opens a complementary analytical line. In genre theory, recurrent textual forms are not simply templates but socially situated actions that stabilize how communities recognize, classify, and circulate knowledge [
16,
17,
18]. A reporting genre licenses certain claims, authorizes certain content, and filters what counts as a legitimate account of practice. The UN-Habitat BP report is therefore not only a record of an intervention but a vehicle that translates heterogeneous local experiences into an institutionally legible form, with standardized headings (Results, Lessons Learned, Transferability) that operate as interpretive scaffolding shaping what authors include, omit, and emphasize when narrating implementation.
This genre logic acquires sharper definition when analyzed in relation to Polanyi’s [
19] distinction between explicit knowledge, which can be codified and transmitted in propositional form, and tacit knowledge, which resides in skilled practice, situated judgment, and embodied familiarity with local conditions. A standardized reporting genre is structurally suited to carry the explicit pole because codified knowledge maps directly onto categorical headings and indicator-style claims. The tacit pole resists such codification, since the situated judgment that allowed a project to navigate a specific institutional environment, the material know-how embedded in physical design choices, and the experiential calibration through which community engagement becomes effective in a particular setting are exactly the kinds of content that a standardized format struggles to preserve. Genre theory and the tacit-explicit distinction together predict that a curated reporting format will systematically privilege procedural and codifiable content over situated and embodied content. The empirical question this raises, and the question this study addresses, is which thematic families the BP genre carries forward and which it filters out across the Results, Lessons Learned, and Transferability sequence.
These framings sit within a broader epistemological literature on how institutions render heterogeneous local experiences comparable for action at a distance. The sociology of quantification has documented how commensuration, the transformation of qualitative differences into a shared metric of comparison, makes diverse cases legible, comparable, and aggregable while compressing the contextual texture that distinguishes them [
20,
21]. Adjacent work on knowledge translation in international development and public policy shows that policy ideas do not move in unmediated form. They are translated, in the strong sense of being reconstituted as they cross institutional, linguistic, and contextual borders, and translation is necessarily selective [
22,
23]. Read together, these studies clarify why the relationship between implementation gap research and policy mobilities scholarship is not coincidental but structural. Implementation gap research foregrounds the situated, technical, and relational specificity that interventions require to succeed in place. Policy mobilities scholarship foregrounds the institutional and discursive infrastructures that select and reformat such specificity for circulation across places. The empirical question of what survives the reformatting, and what is filtered out, is exactly the question that a curated reporting genre such as UN-Habitat’s BP database stages.
This study takes that question as its analytical pivot through the working distinction between institutional legibility and operational specificity. Institutional legibility refers to the capacity of a reporting format to render local interventions classifiable, comparable, and searchable within a shared administrative vocabulary, in line with what Scott [
24] theorized as the precondition for governance at a distance. Operational specificity refers, by contrast, to the capacity of the same format to carry forward the situated technical, evidentiary, financial, and relational content that a practitioner would need to reproduce or adapt the intervention elsewhere. The two are not opposed, but they are not equivalent. A reporting genre that succeeds at the first does not automatically succeed at the second. This distinction yields a working proposition that this study tests empirically. The standardizing logic of a curated genre will tend to preserve content amenable to commensuration and procedural codification more reliably than content that is tacit, materially specific, or contextually embedded. The portability paradox advanced in this paper is the empirical pattern that emerges when a reporting format optimized for legibility encounters implementation knowledge that is not uniformly codifiable.
These dynamics situate BPs within the broader challenge of implementation in urban sustainability and planning, where the translation of policy goals into action on the ground is far from automatic [
7]. The persistent disconnection between policy ambitions and their effective realization is widely framed as an implementation gap [
25].
Research on this gap has advanced along two fronts. The first identifies the conditions under which implementation succeeds or fails, including institutional, financial, and contextual barriers that reduce the likelihood that plans are carried out and sustained over time [
25,
26,
27]. The second concerns the instruments designed to support implementation, including the growing infrastructure of indicators, rankings, observatories, and diagnostic tools that exist to assess and compare urban interventions, although these often remain fragmented, methodologically inconsistent, and difficult to translate into concrete action [
28,
29,
30,
31]. Taken together, these two bodies of work clarify why implementation is difficult and how its quality can be measured, but they share a common boundary. They say relatively little about how implementation knowledge is generated, formatted, and made transferable across contexts [
28,
32].
What remains largely unexamined is the reporting layer that sits between a successful local intervention and its uptake elsewhere. Prior scholarship on best practices has examined how individual BPs are constructed and politically mobilized [
8,
13,
14], but it has done so primarily through interpretive case studies of selected practices, leaving the systemic properties of the BP genre as a whole largely uncharacterized. This study makes three departures from that prior work. First, it scales the analysis to 250 reports drawn from a region-balanced sample of the world’s most globally visible BP repository, allowing genre-level patterns to be characterized rather than inferred from individual cases. Second, it treats the report itself, rather than the underlying intervention, as the analytical object, asking which thematic content survives the transition from situated practice to portable example and which is filtered out. Third, it operationalizes that question through systematic thematic coding and co-occurrence networks, making it possible to identify, at the level of the corpus, where in the reporting sequence translation loss concentrates and around which themes.
Four research questions guide the analysis, organized as a nested progression from description to synthesis. RQ1 to RQ3 are descriptive and structural, examining theme prevalence, within-domain association, and cross-domain linkage in turn. RQ4 is integrative and synthesizes the three preceding patterns into a single claim about how the reporting genre mediates between institutional legibility and operational specificity. First, (RQ1) what recurrent axial thematic categories and prevalence patterns structure how UN-Habitat Best Practices report Results, Lessons Learned, and Transferability across the full corpus? This question establishes the baseline vocabulary of the reporting grammar and documents whether and how the thematic repertoire narrows across the three domains. Second, (RQ2) what within-domain bundles and selective thematic pairings recur within the Results, Lessons Learned, and Transferability sections, and what do these associations reveal about the reporting logic of implementation? This question moves from individual theme prevalence to pairwise association, testing whether the corpus produces tightly coupled implementation packages or a more loosely organized co-occurrence structure. Third, (RQ3) what cross-domain linkages and disconnections emerge across the Results-Lessons-Transferability sequence, and what do they reveal about how reported outcomes are translated into portable implementation knowledge? This question traces the epistemic pipeline across its full length, identifying where selective translation occurs and where thematic content is lost between sections. Fourth, (RQ4) what do the prevalence, association, and disconnection patterns collectively reveal about how the BP reporting format mediates between the implementation gap and the portability of implementation knowledge? This integrative question synthesizes the preceding findings to characterize the relationship between institutional legibility and operational specificity in the genre as a whole.
The remainder of the paper proceeds as follows.
Section 2 describes the methods, including the construction of a region-balanced corpus of 250 BP reports, the two-level thematic coding strategy that translates narrative accounts into binary presence/absence indicators, and the quantitative framework of prevalence profiles, Jaccard similarity, and Lift used to estimate thematic co-occurrence, bundling, and disconnection.
Section 3 reports the results in two stages, first documenting within-domain prevalence and association patterns for Results, Lessons, and Transferability separately, and then tracing cross-domain linkages and disconnections across the full reporting sequence.
Section 4 discusses the implications of these patterns for how the BP format mediates implementation learning, identifies the thematic families most vulnerable to translation loss, and considers what the findings mean for the design of practice-sharing repositories that aim to support not only the visibility of successful interventions but also their reproducibility in new contexts.
Section 5 states the conclusions and outlines the scope of a subsequent regional analysis that will exploit the stratified design of the corpus.
2. Materials and Methods
This study draws on the UN-Habitat Best Practices Database, described by UN-Habitat as a free public access repository with approximately 4000 proven solutions to common social, economic, and environmental problems. From this database, we constructed a corpus of 250 BPs reports using a region-balanced sampling strategy based on the database’s standardized world regions: North America (NA), Europe (EU), Asia and Pacific (AP), Africa and Arab States (AA), and Latin America and the Caribbean (LC) (
Table S1). Reports were eligible for inclusion only if they were classified under the database’s highest recognition tiers, “Award Winner” or “Best Practice.” Other tiers were excluded to focus the analysis on the reporting conventions the platform most explicitly promotes as exemplary. To ensure comparable region sizes, two Europe-related categories were merged into a single EU stratum. This consolidation was necessary because one of the two categories, “European Union,” contained only five reports total classified as “Award Winner” or “Best Practice.”
Within each region, we selected the 50 most recent eligible reports according to the date displayed in the database (5 regions × 50 cases = 250 reports). The database does not explicitly define this date field; however, we considered it as a chronological reference. All reports in the final corpus were available in English and used a standardized format with clearly labeled headings for “Results,” “Lessons Learned,” and “Transferability,” enabling consistent section-based extraction and coding.
Epistemologically, we do not evaluate empirical effectiveness, validate the database’s “proven” designation, or attempt causal inference about urban change. Instead, BP reports were analyzed as a standardized mode of implementation reporting.
To translate narrative implementation accounts into analyzable data, we used a two-level thematic coding hierarchy that moves from inductive description to standardized comparison. Open coding was conducted inductively by two coders, who read the reports and tagged distinct themes, claims, and implementation-relevant elements as they appeared in the text. Coders worked from a shared codebook that was iteratively refined. We assessed inter-coder agreement on a stratified subset of reports and resolved disagreements through consensus.
The consolidation of Open Codes into Axial Codes followed a negotiated consensus process. Both coders independently proposed candidate axial groupings based on shared semantic and functional properties among Open Codes. The two proposals were then compared: convergent groupings were retained, while divergent ones were discussed and reconciled over multiple iterative rounds. The resulting Axial Codes were reviewed against the coded extracts to verify internal semantic consistency, and domain-specific definitions were formulated so that each thematic family could be expressed in terms appropriate to the narrative function of Results, Lessons, and Transferability. The final axial framework (
Table S2) is therefore a product of structured deliberation between coders rather than a predefined schema imposed on the data.
To illustrate how axial codes are anchored in the underlying narrative, three brief exemplars from the corpus are indicative. In a Results section from the Africa and Arab States, the SEPP Program report states that the project established a “solid waste center that serves 56 villages (approximately 300,000 people) with a capacity to process 150 tons per day,” which the open code “Waste Management” captured and consolidated under the axial code ESRO (Environmental Sustainability and Resilience Outcomes). In a Lessons section from the Asia and Pacific, the Making land-use climate-sensitive report observes that “coordination among local governments is essential in harmonizing land-use and development planning,” which the open code “Land Use Planning” captured and consolidated under SPMX (Strategic Planning and Management). In a Transferability section from the Latin America and Caribbean stratum, the Monitoring urban prosperity and sustainability in 153 municipalities in Mexico report notes that “the success of this project is verified in first instance by the replicability of the operation in a second phase that includes a number of 152 municipalities,” which the open code “Local Replication” captured and consolidated under RSPX (Replication and Scaling Pathways). The full mapping of axial codes to their constituent open codes (frequency at least 20) is reported in
Table S2.
This approach is grounded in established qualitative methodological literature on codebook development for team-based analysis. When categories emerge inductively from the data rather than being imposed a priori, formal reliability statistics partly measure shared codebook conventions rather than independent recognition of pre-given categories, and consensus-based deliberation has been argued to be epistemologically more appropriate for interpretive coding work [
33,
34,
35,
36]. We did not compute a formal reliability statistic such as Cohen’s kappa or Krippendorff’s alpha for the application of the final axial framework, and we acknowledge this as a methodological limitation (
Section 4.5). Three design features of the analytical pipeline mitigate the sensitivity of downstream results to individual coding decisions. First, binary presence/absence coding at the BP-by-domain level reduces interpretive ambiguity relative to intensity or scaled coding by removing judgments about narrative emphasis. Second, the frequency threshold of at least 20 occurrences buffers against marginal coding decisions, since codes whose retention depends on a small number of borderline calls do not enter the analysis. Third, the structural findings of interest rest on aggregate patterns across 250 cases and 861 code pairs, not on the specific coding of any individual report. These features do not substitute for a reliability statistic but they constrain the influence of any single coding decision on the patterns this study reports.
This two-stage process (inductive open coding followed by negotiated axial consolidation) produced a high-granularity vocabulary of 906 Open Codes. While this breadth captured the diversity of reporting language, it also complicated deliberation around consolidating themes into Axial Codes in a way that would remain stable and interpretable across cases. For this reason, Open Codes were first mapped to a standardized set of Axial Codes (a controlled analytic vocabulary), and then frequency-thresholded at ≥20 occurrences, reducing the Open Code set from 906 to 313. This “map first, cut later” approach preserves the conceptual structure generated by inductive coding while explicitly controlling analytic complexity. No additional selection was applied to the Axial Codes themselves. The different domain sizes follow entirely from the frequency distribution of their constituent Open Codes.
Analysis is organized around the reports’ Results, Lessons, and Transferability narrative structure, treated as three analytically distinct domains. All Axial Codes were recorded as binary indicators (presence/absence) at the level of BP-by-domain. For each BP and each domain, an Axial Code was coded as present (1) if at least one of its mapped (and retained) Open Codes appeared in that domain’s text, and absent (0) otherwise. This presence/absence design is intentional: it reduces sensitivity to report length and repetition, and aligns the dataset with association measures that estimate thematic coupling based on shared occurrence rather than narrative intensity.
Co-occurrence matrices were constructed directly from the binary BP-by-domain coding tables. For each domain, a BP contributes 1 to a code pair (i, j) if both Axial Codes i and j are present in that BP’s domain text; otherwise, it contributes 0. Summing across BPs yields a symmetric code-by-code co-occurrence matrix for each domain, which then serves as the basis for estimating thematic clustering and disconnection patterns within Results, Lessons, and Transfers.
2.1. Quantitative and Network Metrics
Quantitative analysis proceeded in three steps. First, we computed BP-level prevalence statistics to describe domain-specific emphasis. Second, we estimated baseline thematic overlap using the Jaccard Index to identify dominant core bundles of reporting. Third, we estimated statistical dependence using Lift to distinguish genuine thematic attraction or repulsion from co-occurrence driven by base-rate frequency.
For each narrative domain (Results, Lessons, Transfers), we computed BP-level prevalence for each Axial Code as
where
fi,d is the number of BPs in which code
is present in domain
, and
. Prevalence profiles support cross-sectional comparisons of which themes are most widely mobilized in each domain. To operationalize translation loss as measurable narrowing from Results to Transfers, we used three complementary indicators: (1) domain breadth, defined as the number of distinct Axial Codes appearing at least once in a domain; (2) per-BP thematic richness, defined as the mean number of Axial Codes present per BP within each domain; and (3) thematic concentration, defined using normalized HHI, which represents whether a domain’s prevalence mass becomes dominated by fewer codes.
To compute concentration, we first converted prevalences into a distribution across codes by normalizing them:
We then computed the Herfindahl-Hirschman Index (HHI) for each domain as
. Because the number of Axial Codes differs by domain, we used normalized HHI to enable direct comparison:
where
is the number of Axial Codes in domain
. Higher
indicates greater thematic concentration and therefore stronger narrowing of the reporting repertoire, consistent with translation loss.
To examine how themes are bundled rather than merely how often they appear, we computed within-domain pairwise overlap using the Jaccard Index. For Axial Codes
and
,
where
is the number of BPs in which both codes are present in domain
, and
and
are the numbers of BPs in which only one code is present. Jaccard measures baseline similarity in binary presence.
Because overlap can be inflated by highly prevalent codes, we additionally estimated thematic dependence using Lift, which compares observed co-occurrence to the level expected under independence. For codes
and
in domain
,
where
and
are marginal probabilities of code presence and
is the joint probability that both codes are present, all computed over BPs within the domain. Lift values greater than 1 indicate thematic attraction beyond base rates, while values less than 1 indicate under-co-occurrence relative to independence.
For cross-domain matrices (Results × Lessons, Results × Transfers, Lessons × Transfers), the same formulas apply with a modified co-occurrence logic. A BP contributes 1 to a cross-domain code pair (i, j) if Axial Code i is present in the BP’s domain d1 text and Axial Code j is present in the BP’s domain d2 text. Marginal probabilities are computed from each code’s own domain: P(i) is the prevalence of code i in d1 and P(j) is the prevalence of code j in d2, both over N = 250. The joint probability P(i ∩ j) is the proportion of BPs in which both codes are simultaneously present in their respective domains. Jaccard is computed analogously, with n11 counting BPs where both codes are present in their respective domains, and n10 and n01 counting BPs where only one code is present in its domain. This cross-domain extension preserves the BP as the unit of observation and treats the three narrative sections as analytically distinct but structurally linked layers of the same report.
2.2. Inferential Scope and Operational Definitions
Before the operational definitions, two methodological framings constrain what the downstream classifications can support. First, all Axial Codes are recorded as binary presence/absence indicators at the BP-by-domain level. This design supports inference about aggregate co-occurrence patterns, relative prevalence across domains, and structural disconnection between thematic families. It does not support claims about narrative emphasis, intensity of treatment, or the causal mechanisms by which themes co-occur. The findings that follow are accordingly structural and descriptive rather than weighted or causal. Second, Axial Codes derived inductively from densely coded implementation narratives share substantial topical context, and pairwise Lift values in this corpus consequently fall within a compressed range, with most values between 0.85 and 1.25. This compression is a structural feature of inductively coded thematic data rather than a sampling or coding artifact. To prevent the compressed magnitude from rendering classification arbitrary, we use matrix-specific percentile thresholds applied to the within-distribution of Jaccard and Lift values rather than fixed numeric cutoffs, identifying relative positions within a tightly distributed system rather than departures from independence in absolute terms. Even modest distributional Lift variation, when it concentrates around recurring themes across multiple matrices, carries diagnostic value for identifying systemic features of the reporting genre. All percentiles are computed separately within each co-occurrence matrix from the distributions of Jaccard similarities and Lift values across all unique Axial Code pairs.
The operational definitions that follow apply dual thresholds combining prevalence- or visibility-based criteria with statistical-attraction criteria. This is intentionally conservative. A pair classified as a bundle, for example, must be both highly co-present (high Jaccard) and selectively attracted (high Lift) within its matrix, rather than satisfying only one criterion. The corresponding implication, anticipated here so that it can be read prospectively in
Section 3, is that the absence of pairs simultaneously crossing both thresholds indicates that the genre lacks tightly coupled implementation packages within its own distribution, rather than indicating that no thematic associations exist. A more permissive single-threshold scheme would surface candidate pairs that are visible without being attracted, or attracted without being visible, but neither pattern would correspond to the conventional understanding of an integrated bundle.
Bundles. We define bundles as pairs or clusters of Axial Codes that are simultaneously highly visible in co-reporting and selectively associated beyond base rates. Operationally, a code pair is classified as a bundle when it satisfies both criteria:
is in the upper decile (≥90th percentile) of the domain’s Jaccard distribution.
is in the upper decile (≥90th percentile) of the domain’s Lift distribution.
Holes. We define holes as systematic thematic disconnections evidenced by under-co-occurrence relative to independence. Operationally, a code pair constitutes a hole when:
falls in the lower decile (≤10th percentile) of the domain’s Lift distribution.
, ensuring the relationship reflects under-co-occurrence rather than low-magnitude variation around independence.
Silos. We define silos as themes that are highly prevalent yet weakly integrated with other themes. To avoid over-labeling mid-prevalence themes as silos, we use a conservative node-level rule (reducing false positives). Operationally, a code i is classified as silo when it meets both:
High prevalence: is at or above the domain’s 75th percentile prevalence.
Low integration: the code’s median Lift across all its pairwise relations in domain, , falls in the lowest quartile (≤25th percentile) of the distribution of codes’ median Lift values within that domain.
When applied to cross-domain matrices (Results × Lessons, Results × Transfers, Lessons × Transfers), percentile thresholds are computed from each matrix’s own distribution of Jaccard and Lift values, following the same within-distribution logic used for the three within-domain tables.
3. Results
Tables S3–S8 report the full pairwise co-occurrence statistics. The narrative below highlights the patterns most relevant to the paper’s research questions.
3.1. Prevalence, Richness, and Concentration
Table 1 and
Table 2 show a progressive narrowing of the reporting repertoire across the Results, Lessons, and Transfers sequence. Results and Lessons each contain 15 axial codes, whereas Transfers contains 12. Mean per-BP thematic richness declines from 8.696 codes in Results to 7.960 in Lessons and 6.248 in Transfers, a reduction of about 8.5% at the first interface and a further 21.5% at the second, for a total decline of approximately 28.2% across the full sequence. Normalized HHI rises from 0.00419 in Results to 0.00613 in Lessons and 0.01743 in Transfers, making Transfers approximately 4.2 times more concentrated than Results. The top three codes account for about 26.8% of prevalence mass in Results, 27.9% in Lessons, and 39.5% in Transfers.
The distribution of prevalence differs in shape across domains. In Results, five codes cluster above 0.69 without a clear single leader: ESRO (0.816), GPIO (0.764), ICMP (0.752), EPCC (0.692), and SIAO (0.688). In Lessons, a single code separates from the rest: SPMX (0.808) stands 0.10 prevalence points above its nearest neighbors, CBLX and PMCX (both 0.708). In Transfers, this separation widens further: RSPX (0.924) exceeds KTCB (0.816) by more than 0.10 points and exceeds the remaining codes by wider margins still.
Across comparable thematic families, prevalence trajectories diverge (
Figure 1). Governance-related prevalence declines from GPIO (0.764) to GPIA (0.624) to GIPA (0.352). Sustainability-related prevalence shows the steepest decline, from ESRO (0.816) to SRPX (0.612) to SRTV (0.220). Evidence-related coding is moderate in Results (MEEI at 0.588) but lower in both Lessons (MEEU at 0.260) and Transfers (EMPT at 0.336). Built environment and design-related coding declines from IHBE (0.516) and TIIA (0.540) to IDPT (0.164), the lowest-prevalence code in the Transfers domain. Equity-related prevalence declines from SIAO (0.688) to EIEP (0.548) to EICR (0.468). The reverse trajectory belongs to scaling: SRDO (0.280) and SRTM (0.260) sit near the bottom of their respective domains, but RSPX reaches 0.924 in Transfers, the highest single-code prevalence in the entire corpus.
Figure 1 summarizes these patterns by showing the progressive narrowing of the reporting repertoire across the three domains.
3.2. Results vs. Results
The Results-by-Results matrix (
Table S3) has a more evenly distributed overlap profile than either of the other two within-domain tables. Five codes (ESRO, GPIO, ICMP, EPCC, SIAO) form a high-overlap cluster in which no single code commands the network. The largest Jaccard values, ESRO-ICMP (69.70%), ESRO-GPIO (65.97%), GPIO-ICMP (64.78%), and GPIO-SIAO (60.62%), are all concentrated within this group, and no code outside it exceeds 50% overlap with more than two partners. Lift values within the cluster remain close to 1.0, indicating that these pairings reflect shared high prevalence rather than selective co-occurrence.
KPIT and SRDO, the two lowest-prevalence codes in the domain, record the highest Lift value in the entire table (KPIT-SRDO, 32.28%, Lift 1.49). Their Jaccard values with most other Results codes remain below 30%. These cases are examined further in
Section 3.8 alongside the formal hole classification.
3.3. Lessons vs. Lessons
The Lessons-by-Lessons matrix (
Table S4) differs from the Results matrix in its overlap structure. SPMX is the single most connected code in any within-domain matrix, with Jaccard above 50% in six of its fourteen pairings: SPMX-PMCX (67.70%), SPMX-CBLX (59.92%), SPMX-GPIA (59.11%), SPMX-CEPP (56.77%), SPMX-SRPX (55.70%), and SPMX-TIIA (51.12%). Where the Results matrix distributes overlap across a five-code plateau, the Lessons matrix concentrates it around a single code from which other high-overlap ties extend. CBLX, PMCX, and GPIA function as secondary connectors within this group, but none approaches SPMX’s breadth. Lift values within this core cluster near 1.0.
MEEU and SRTM, the two lowest-prevalence codes in the domain (both at 0.260), show Jaccard values generally below 30% across the matrix but account for the strongest Lift values in the table. MEEU-SRTM records the highest Lift (23.81%, Lift 1.48), followed by OWSQ-SRTM (24.19%, Lift 1.30) and TIIA-MEEU (27.39%, Lift 1.23). These three codes form an internally selective grouping that is weakly connected to the core in overlap terms. MEEU also records some of the lowest Lift values in the matrix, notably with EIEP (0.93) and FVED (0.93). The clearest below-independence pairing involving SRTM is CEPP-SRTM (18.72%, Lift 0.86).
3.4. Transfers vs. Transfers
The Transfers-by-Transfers matrix (
Table S5) completes a structural progression visible across the three within-domain tables. Where Results distributes overlap across a five-code plateau and Lessons organizes it around a single hub, Transfers produces the most polarized architecture in the corpus: a tight four-code core separated from a sparse periphery by a wider gap than in either preceding domain.
RSPX anchors this core with the strongest hub profile of any code in any within-domain matrix. Its Jaccard values with KTCB (78.28%), NPCT (69.96%), and MTTA (67.90%) are the highest within-domain overlaps in the study, and it maintains Jaccard above 47% with every code except SRTV and IDPT. KTCB and NPCT further reinforce the core through their mutual overlap (67.83%). Lift values within this group remain near 1.0, consistent with the pattern observed in Results and Lessons.
The periphery is sparser and more internally selective than in either previous domain. EMPT, SRTV, and IDPT form a loose triangle that accounts for the strongest Lift values in the matrix: SRTV-IDPT (14.29%, Lift 1.33), EMPT-IDPT (16.82%, Lift 1.31), and EMPT-SRTV (20.87%, Lift 1.30). All three codes sit below 0.340 prevalence, and all show stronger affinity with each other than with the four-code core. IDPT has the weakest Jaccard profile of any code in the matrix, but its Lift values with SRTV, EMPT, and EICR (1.15) indicate selective co-occurrence with other low-prevalence transfer themes rather than isolation.
At the core-periphery boundary, several pairings fall below independence. DCVX-IDPT (11.76%, Lift 0.82) and KTCB-IDPT (13.95%, Lift 0.90) record under-co-occurrence between the highest-prevalence transfer codes and the lowest-prevalence one. Another notable below-independence pairing is EICR-CALX (25.41%, Lift 0.89), where equity safeguards and context adaptation under-co-occur despite their apparent conceptual proximity. These polarization patterns are examined through the formal hole and silo classification in
Section 3.8.
Figure 2 summarizes the structural progression across the three within-domain matrices. Results distributes overlap across a five-code plateau, Lessons concentrates it around a single strategic-planning hub, and Transfers produces the most polarized core-periphery architecture in the corpus.
3.5. Results vs. Lessons
The Results-by-Lessons matrix (
Table S6) is the first of three cross-domain tables. Its defining feature is breadth: many themes co-occur across the two domains, but few do so with elevated Lift. The large majority of Lift values fall between 0.90 and 1.15, producing a flatter profile than in either within-domain matrix.
SPMX is the clearest Lessons-side connector, with Jaccard values of 68.46% (ESRO), 67.38% (ICMP), and 65.82% (GPIO), confirming that the hub identified in the within-Lessons matrix (
Section 3.3) also functions as the primary receiving code for Results-domain content. PMCX and CBLX serve as secondary connectors, with Jaccard above 60% in their pairings with the top Results codes.
A small number of pairings show above-baseline selectivity along thematic-family lines. SIAO-CEPP (56.67%, Lift 1.10) links equity outcomes to community engagement lessons. IHBE-TIIA (40.43%, Lift 1.09) links built-environment outcomes to technical design lessons. These are the only pairings where a Results code co-occurs preferentially with its thematic counterpart on the Lessons side, and even here, the Lift magnitudes remain modest.
The most distinctive pattern in this matrix involves wellbeing outcomes. HWQO records the lowest Lift values of any Results code at this interface: HWQO-SRTM (15.48%, Lift 0.81) and HWQO-MEEU (16.23%, Lift 0.84). No other Results code shows comparably low Lift values with either SRTM or MEEU. Combined with the scarcity of high-overlap, high-Lift pairings anywhere in the matrix, these patterns are carried forward into the cross-domain analysis in
Section 3.6 and
Section 3.7.
3.6. Results vs. Transfers
The Results-by-Transfers matrix (
Table S7) has the flattest Lift profile of any of the six co-occurrence tables. The contrast between the strongest and weakest Lift values is smaller here than in any other matrix, and the large majority of pairings fall within a narrow band around 1.0.
RSPX is the clearest Transfers-side connector, with Jaccard values of 76.83% (ESRO), 72.43% (ICMP), and 72.24% (GPIO). KTCB, NPCT, and MTTA also maintain Jaccard above 55% in their pairings with the top Results codes, all with Lift near 1.0. This broad connectivity parallels the pattern observed in the Results-to-Lessons matrix (
Section 3.5), but with even less variation in Lift.
A small number of pairings show above-baseline selectivity. The highest Lift value in the matrix is IHBE-IDPT (22.30%, Lift 1.47), linking built-environment outcomes to infrastructure and design package transfer. Governance themes record elevated Lift through GPIO-GIPA (39.50%, Lift 1.18) and SIAO-GIPA (37.57%, Lift 1.17). Evidence outcomes connect to evidence for transfer through MEEI-EMPT (33.53%, Lift 1.17). These selective pairings are notable for their scarcity: no pair in this matrix reaches the P80/P80 joint threshold tested in
Section 3.8, making it the only matrix where that is the case.
Below-independence pairings concentrate around three codes on the Transfers side. SRDO-IDPT records the lowest Lift value in the entire corpus (7.77%, Lift 0.70). HWQO shows a consistent below-independence pattern with multiple Transfers codes, including HWQO-GIPA (Lift 0.87) and HWQO-IMRX (Lift 0.92). No other Results code records as many below-independence Lift values at this interface.
3.7. Lessons vs. Transfers
The Lessons-by-Transfers matrix (
Table S8) has the widest Lift range of the three cross-domain tables, distinguishing it from the Results-to-Lessons matrix (broad but flat) and the Results-to-Transfers matrix (broad and flatter still).
Outside the core, LTLC stands out as the Lessons-side code with the most differentiated Lift profile. It records elevated Lift with GIPA (31.97%, Lift 1.26), EMPT (29.25%, Lift 1.21), and SRTV (21.05%, Lift 1.20), while simultaneously recording among the lowest Lift values in the table with IDPT (9.70%, Lift 0.75). Other above-baseline pairings include SPMX-IMRX (52.34%, Lift 1.12) and EIEP-EICR (41.90%, Lift 1.17).
SRTV and IDPT again concentrate the extreme Lift values at this interface, as they do in the within-Transfers matrix (
Section 3.4). The highest-Lift edges are CSAP-IDPT (20.95%, Lift 1.37) and CAVX-IDPT (19.70%, Lift 1.36). The lowest-Lift edges are LTLC-IDPT (9.70%, Lift 0.75) and MEEU-IDPT (8.16%, Lift 0.75). On the SRTV side, below-independence pairings include OWSQ-SRTV (11.63%, Lift 0.77).
3.8. Bundles, Holes, and Silos
This section applies the formal detection rules defined in
Section 2.2 across all six matrices.
Applying the bundle rule (Jaccard ≥ P90 and Lift ≥ P90) to all six matrices yields zero qualifying pairs. Pairs in the top Jaccard decile cluster near the Lift median, while pairs in the top Lift decile fall well below the Jaccard median. This distributional separation between overlap and selectivity holds in every matrix. The zero-bundle finding is robust to threshold relaxation. Even at P80/P80, only 12 of 861 code pairs (1.4%) meet the joint threshold (
Table 3). The Results-to-Transfers matrix produces zero qualifying pairs at this level, making it the only matrix with no pair simultaneously in the top quintile on both measures. The pairs that do emerge at P80/P80 cluster around social infrastructure themes in the within-Results matrix (SIAO-EPCC, CBEO-PMCO, SIAO-CBEO, EPCC-CBEO) and around operational or institutional pairings in the cross-domain matrices: SIAO-CEPP (Lift 1.10), MEEI-SRPX (Lift 1.11), MEEI-GPIA (Lift 1.09), and SIAO-CSAP (Lift 1.08) in Results-to-Lessons, and SPMX-IMRX (Lift 1.12) and GPIA-DCVX (Lift 1.10) in Lessons-to-Transfers.
Holes (Lift ≤ P10 and Lift < 1) are identified in every matrix, totaling 89 holes across the six co-occurrence tables.
Table 4 aggregates these at the node level. IDPT is the most recurrent hole node in the corpus (11 holes across 3 matrices), followed by SRTV (9 holes, 2 matrices). Both sit exclusively on the Transfers side. The next tier includes HWQO (8 holes, 3 matrices), MEEU (8 holes, 3 matrices), and SRDO (8 holes, 2 matrices). HWQO and MEEU appear in both within-domain and cross-domain matrices, while SRDO and SRTV appear only in cross-domain matrices. The five most recurrent hole nodes group into four thematic areas: built environment and design (IDPT), sustainability framing (SRTV), wellbeing and evidence (HWQO, MEEU), and scaling outcomes (SRDO). Additional hole nodes are reported in
Table 4.
Silo detection (prevalence ≥ P75 and median Lift ≤ Q25) identifies four codes across the three within-domain matrices (
Table 5). In each domain, the single most prevalent code also qualifies as a silo: ESRO (prevalence 0.816, median Lift 1.011) in Results, SPMX (0.808, 1.020) in Lessons, and RSPX (0.924, 1.012) in Transfers. A fourth silo, CEPP (0.628, 1.022), enters in Lessons at the fourth prevalence rank, just clearing the 75th percentile threshold. All four codes show Lift profiles that are flat near 1.0 across their pairwise relations, co-occurring with nearly everything in their domain without elevated selectivity toward any particular partner.
5. Conclusions
This study analyzed 250 UN-Habitat Best Practice reports as a standardized genre of implementation reporting, revealing a portability paradox. The BP format bridges the implementation gap at the level of institutional legibility, making practices recognizable and comparable within a shared vocabulary, but not at the level of operational specificity. The physical, evidentiary, and context-sensitive content that operational reproduction would require is progressively filtered out across the reporting sequence. Four thematic families, namely infrastructure and design, evidence and monitoring, sustainability as a transferable value, and wellbeing as experienced impact, proved the most systematically disconnected from the portable transfer narrative, while the procedural machinery of circulation (replication pathways, knowledge transfer, partnerships, toolkits) survived with minimal attenuation. The portability paradox is not a critique of the database nor a claim that the UN-Habitat template is deficient; it is a property of the genre itself, in which the reformatting of local experience for broader circulation follows the standardizing logic of the institutional format [
37].
Three implications follow. First, the analytical framework developed here (prevalence profiles, Jaccard/Lift co-occurrence networks, bundle/hole/silo detection) provides a replicable audit methodology applicable to other curated reporting systems, from SDG Voluntary National Reviews to ICLEI, C40, and OECD urban policy compendia, to test whether analogous filtration patterns appear across reporting genres. More broadly urban and policy studies could engage with text-as-data and network-based methodologies capable of surfacing the systematic biases and silences that conventional case-based evaluation leaves unexamined. Second, the findings surface a design tension that merits empirical investigation: whether reporting templates can retain more situated, evidentiary, and material content without sacrificing the cross-context comparability that makes them institutionally useful. The legibility the current format achieves is a nontrivial accomplishment; whether structured prompts for design specifications, evidence chains, or operationalized sustainability could preserve more content without overloading the format is an open question beyond the scope of genre analysis. Future template design could incorporate structured fields that preserve local material and contextual specificity alongside the standardized categories that enable cross-case legibility. Third, the progressive shedding of content most resistant to formalization suggests that the implementation gap is not only a gap between policy and practice but also a gap between what is known locally and what survives the reporting formats through which that knowledge circulates.
Finally, because sustainability priorities and implementation conditions vary across regions [
61], the global patterns documented here may compress meaningful regional variation. Whether the portability paradox operates uniformly or is itself regionally differentiated is the focus of a subsequent study exploiting the stratified design of this corpus.