Appendix A.1
Prompt for automated pathological parameter extraction using GPT-4.1
You are a senior genitourinary pathologist. Extract ONLY the five parameters listed below from the user supplied pathology report and return a **STRICTLY valid single line JSON object** (no markdown, no commentary).
──────── OUTPUT FORMAT ────────
{
“extracted” : {
“Serum_PSA_ng_per_mL” : <float> | “Not mentioned”,
“M_Stage” : “cM0” | “cM1” | “pM0” | “pM1” | “Not mentioned”,
“Extraprostatic_Extension_EPE” : “Present” | “Absent” | “Not mentioned”,
“Seminal_Vesicle_Invasion_SVI” : “Present” | “Absent” | “Not mentioned”,
“Perineural_Invasion_Pn” : “Present” | “Absent” | “Not mentioned”
},
“evidence” : {
“Serum_PSA_ng_per_mL” : “<quote or null>“,
“M_Stage” : “<quote or null>“,
“Extraprostatic_Extension_EPE” : “<quote or null>“,
“Seminal_Vesicle_Invasion_SVI” : “<quote or null>“,
“Perineural_Invasion_Pn” : “<quote or null>“
},
“confidence” : {
“Serum_PSA_ng_per_mL” : 0.00 1.00,
“M_Stage” : 0.00 1.00,
“Extraprostatic_Extension_EPE” : 0.00 1.00,
“Seminal_Vesicle_Invasion_SVI” : 0.00 1.00,
“Perineural_Invasion_Pn” : 0.00 1.00
}
}
──────── EXTRACTION RULES ────────
- 1
Use **only** the allowed categories/number formats above.
- 2
PSA : choose the **pretreatment serum PSA** value; strip “ng/mL”; output as float.
- 3
If the report lacks clear info → set value = “Not mentioned”, evidence = null, confidence = 0.0.
- 4
Quote evidence verbatim; maximum 25 words.
- 5
Return JSON only—no other text.
Appendix A.2
Prompt for Parameter Extraction for External Validation Dataset Annotation Using GPT-4.1
You are a senior genitourinary pathologist.
Extract ONLY the parameters listed below from the user-supplied pathology report
and return a **STRICTLY valid single-line JSON object** (no markdown, no commentary).
──────── OUTPUT FORMAT ────────
{
“extracted”: {
“WHO_ISUP_Grade_Group”: “GG1” | “GG2” | “GG3” | “GG4” | “GG5” | “Not mentioned”,
“T_Stage_TNM”: “pT0” | “pT2a” | “pT2b” | “pT2c” | “pT3a” | “pT3b” | “pT4” | “Not mentioned”,
“N_Stage_TNM”: “pNx” | “pN0” | “pN1” | “Not mentioned”,
“Number_of_Lymph_Nodes_examined”: <integer> | “Not mentioned”,
“Lymph_Nodes_with_Metastasis”: <integer> | “Not mentioned”,
“Resection_Margins”: “R0” | “R1” | “Rx” | “Not mentioned”,
“Histologic_Subtype”: “acinar” | “ductal” | “mixed” | “Not mentioned”,
“Primary_Gleason_Pattern”: 3 | 4 | 5 | “Not mentioned”,
“Secondary_Gleason_Pattern”: 3 | 4 | 5 | “Not mentioned”,
“Tertiary_Gleason_Pattern”: 3 | 4 | 5 | “Not mentioned”,
“Percentage_of_Secondary_Gleason_Pattern”: <integer> | “Not mentioned”,
“Serum_PSA_ng_per_mL”: <float> | “Not mentioned”,
“M_Stage_TNM”: “cM0” | “cM1” | “pM0” | “pM1” | “Not mentioned”,
“Extraprostatic_Extension_EPE”: “Present” | “Absent” | “Not mentioned”,
“Seminal_Vesicle_Invasion_SVI”: “Present” | “Absent” | “Not mentioned”,
“Perineural_Invasion_Pn”: “Present” | “Absent” | “Not mentioned”
},
“evidence”: {
“WHO_ISUP_Grade_Group”: “<quote or null>“,
“T_Stage_TNM”: “<quote or null>“,
“N_Stage_TNM”: “<quote or null>“,
“Number_of_Lymph_Nodes_examined”: “<quote or null>“,
“Lymph_Nodes_with_Metastasis”: “<quote or null>“,
“Resection_Margins”: “<quote or null>“,
“Histologic_Subtype”: “<quote or null>“,
“Primary_Gleason_Pattern”: “<quote or null>“,
“Secondary_Gleason_Pattern”: “<quote or null>“,
“Tertiary_Gleason_Pattern”: “<quote or null>“,
“Percentage_of_Secondary_Gleason_Pattern”: “<quote or null>“,
“Serum_PSA_ng_per_mL”: “<quote or null>“,
“M_Stage_TNM”: “<quote or null>“,
“Extraprostatic_Extension_EPE”: “<quote or null>“,
“Seminal_Vesicle_Invasion_SVI”: “<quote or null>“,
“Perineural_Invasion_Pn”: “<quote or null>“
},
“confidence”: {
“WHO_ISUP_Grade_Group”: 0.00–1.00,
“T_Stage_TNM”: 0.00–1.00,
“N_Stage_TNM”: 0.00–1.00,
“Number_of_Lymph_Nodes_examined”: 0.00–1.00,
“Lymph_Nodes_with_Metastasis”: 0.00–1.00,
“Resection_Margins”: 0.00–1.00,
“Histologic_Subtype”: 0.00–1.00,
“Primary_Gleason_Pattern”: 0.00–1.00,
“Secondary_Gleason_Pattern”: 0.00–1.00,
“Tertiary_Gleason_Pattern”: 0.00–1.00,
“Percentage_of_Secondary_Gleason_Pattern”: 0.00–1.00,
“Serum_PSA_ng_per_mL”: 0.00–1.00,
“M_Stage_TNM”: 0.00–1.00,
“Extraprostatic_Extension_EPE”: 0.00–1.00,
“Seminal_Vesicle_Invasion_SVI”: 0.00–1.00,
“Perineural_Invasion_Pn”: 0.00–1.00
}
}
──────── EXTRACTION RULES ────────
Use **only** the allowed categories/number formats above.
**WHO/ISUP Grade Group**: Look for “Grade Group”, “ISUP grade”, “GG1-5”, or equivalent grading terminology.
**T-Stage**: Focus on pathologic T-stage (pT). Look for tumor extent and capsular involvement.
**N-Stage**: Extract nodal involvement status (pNx, pN0, pN1).
**Lymph Nodes**:
- -
Count total examined nodes separately from positive nodes
- -
Extract exact numbers when available
**Resection Margins**:
- -
R0 = negative/clear margins
- -
R1 = positive margins
- -
Rx = cannot be assessed
**Histologic Subtype**: Look for adenocarcinoma subtypes (acinar, ductal, mixed).
**Gleason Patterns**:
- -
Extract primary (predominant), secondary, and tertiary patterns
- -
Look for percentage of secondary pattern when mentioned
**PSA**: Use **pretreatment serum PSA** value; strip “ng/mL” units; output as float.
**M-Stage**: Look for metastasis status (clinical cM or pathologic pM).
**Extensions/Invasions**: Determine presence or absence of:
- -
Extraprostatic extension (EPE)
- -
Seminal vesicle invasion (SVI)
- -
Perineural invasion (Pn)
**General Rules**:
- -
If report lacks clear information → set value = “Not mentioned”, evidence = null, confidence = 0.0
- -
Quote evidence verbatim; maximum 25 words per quote
- -
Return JSON only—no other text or formatting
- -
Confidence should reflect certainty of extraction (1.0 = completely certain, 0.0 = not found)
Appendix A.3
Mistral-Small-3.2-24B-Instruct prompt for extracting 16 parameters from pathology reports
You are a senior genitourinary pathologist with expertise in prostate cancer. Extract ONLY the 16 parameters listed below from the pathology report and return a STRICTLY valid JSON object.
CRITICAL INSTRUCTIONS:
- 1.
Use EXACTLY the specified answer options for each parameter
- 2.
If information is not clearly stated, use “Not mentioned”
- 3.
For integer fields, provide only the numeric value
- 4.
For float fields (PSA), provide only the numeric value
- 5.
Return ONLY the JSON object - no explanations, no markdown, no commentary
OUTPUT FORMAT:
{
“WHO_ISUP_Grade_Group”: “GG1” | “GG2” | “GG3” | “GG4” | “GG5” | “Not mentioned”,
“T_Stage_TNM”: “pT0” | “pT2a” | “pT2b” | “pT2c” | “pT3a” | “pT3b” | “pT4” | “Not mentioned”,
“N_Stage_TNM”: “pNx” | “pN0” | “pN1” | “Not mentioned”,
“Number_of_Lymph_Nodes_examined”: <integer> | “Not mentioned”,
“Lymph_Nodes_with_Metastasis”: <integer> | “Not mentioned”,
“Resection_Margins”: “R0” | “R1” | “Rx” | “Not mentioned”,
“Histologic_Subtype”: “acinar” | “ductal” | “mixed” | “Not mentioned”,
“Primary_Gleason_Pattern”: 3 | 4 | 5 | “Not mentioned”,
“Secondary_Gleason_Pattern”: 3 | 4 | 5 | “Not mentioned”,
“Tertiary_Gleason_Pattern”: 3 | 4 | 5 | “Not mentioned”,
“Percentage_of_Secondary_Gleason_Pattern”: <integer> | “Not mentioned”,
“Serum_PSA_ng_per_mL”: <float> | “Not mentioned”,
“M_Stage_TNM”: “cM0” | “cM1” | “pM0” | “pM1” | “Not mentioned”,
“Extraprostatic_Extension_EPE”: “Present” | “Absent” | “Not mentioned”,
“Seminal_Vesicle_Invasion_SVI”: “Present” | “Absent” | “Not mentioned”,
“Perineural_Invasion_Pn”: “Present” | “Absent” | “Not mentioned”
}
EXTRACTION GUIDELINES:
- -
WHO/ISUP Grade Group: Look for Grade Group, ISUP grade, or equivalent grading
- -
T-Stage: Focus on pathologic T-stage (pT), primary tumor extent
- -
N-Stage: Look for nodal involvement status
- -
Lymph nodes: Count examined nodes and positive nodes separately
- -
Margins: R0 = negative, R1 = positive, Rx = cannot be assessed
- -
Gleason: Extract primary, secondary, and tertiary patterns with percentages
- -
PSA: Use pretreatment serum PSA value, remove units
Extensions/Invasions: Determine presence or absence of specific invasions
Analyze the following pathology report:
Table A1.
Representative examples of GPT-4.1 extraction output with confidence scores and supporting evidence for data element expansion.
Table A1.
Representative examples of GPT-4.1 extraction output with confidence scores and supporting evidence for data element expansion.
Parameter | Extracted Value | Confidence | Evidence Span |
---|
PSA level | 14 | 1.00 | PSA 14 ng/mL. |
M-stage | cM0 | 0.95 | Staging without metastases. |
EPE | Present | 1.00 | Right-sided capsular perforation with infiltration of the adjacent fatty/connective tissue. |
SVI | Absent | 1.00 | No infiltration of the seminal vesicles and the ductus deferens segments on both sides. |
Perineural invasion | Present | 1.00 | Extensive perineural sheath invasion. |
Table A2.
Representative examples of GPT-4.1 output for annotating external validation datasets.
Table A2.
Representative examples of GPT-4.1 output for annotating external validation datasets.
Parameter | Extracted Value | Confidence | Evidence Span |
---|
EPE | Present | 1.00 | EXTRAPROSTATIC EXTENSION: Yes-Established (>0.8 mm). |
Histologic Subtype | acinar | 0.95 | TUMOR HISTOLOGY: Adenocarcinoma NOS. |
Lymph Nodes with Metastasis | 0 | 1.00 | LYMPH NODES POSITIVE: o. |
M-Stage | pMx | 0.90 | M STAGE, PATHOLOGIC: pMX. |
N-Stage | pN0 | 1.00 | N STAGE, PATHOLOGIC: pNO |
Number of Lymph Nodes examined | 11 | 1.00 | LYMPH NODES EXAMINED: 11 |
Percentage of Secondary Gleason Pattern | 10 | 1.00 | Gleason pattern 5 constitutes approximately 10% of the tissues evaluated. |
Perineural Invasion | Present | 1.00 | MULTIFOCAL PERINEURAL INVASION BY THE CARCINOMA IS SEEN. |
Primary Gleason Pattern | 4 | 1.00 | PRIMARY GLEASON GRADE: 4. |
Resection Margins | R0 | 1.00 | All surgical margins free of tumor. RESIDUAL TUMOR: R0. |
Secondary Gleason Pattern | 5 | 1.00 | SECONDARY GLEASON GRADE: 5. |
SVI | Absent | 1.00 | BILATERAL SEMINAL VESICLES, NO EVIDENCE OF CARCINOMA SEEN. |
PSA | 35 | 1.00 | PSA value: 35. |
T-Stage | pT3a | 1.00 | T STAGE, PATHOLOGIC: pT3a |
Tertiary Gleason Pattern | 3 | 1.00 | Gleason pattern 3 constitutes approximately 35% |
WHO Grade Group | GG5 | 1.00 | Gleason score 4 + 5 a 9 |