1. Introduction
Automatic face detection and recognition critically rely on the accurate localization of facial components such as the eyes, mouth, and nose. While the eyes and mouth are often prioritized due to their distinct appearance, the nose plays an equally vital role in constraining the geometric configuration of the face and improving overall detection reliability. However, nose detection remains challenging due to low contrast boundaries, lighting variations, and interference from adjacent features such as lips and eyes.
Researchers have utilized gradient-based methods, deformable templates, and thresholding schemes to detect nose tips or nostril regions [
1,
2]. Nevertheless, these methods often fail under non-uniform illumination, occlusions, or complex backgrounds. To address these limitations, we propose a rule-driven nose detection framework that integrates quadratic curve fitting and a calibrated scoring mechanism, unifying geometric, photometric, and structural cues. This framework enhances robustness against confounding features while improving the reliability of nasal base and wing localization.
In this study, an ROI refinement that prioritizes eye/mouth hints (EyeMap/MouthMap) with a Cr channel fallback is explored. A multi-threshold Canny plus binned histogram projection scheme is generated for robust rectangular nose candidates. A quadratic fitting-based contour taxonomy (Nasal base/Nostrils/Nasal wings) is used to stabilize base–wing–nostril parsing, and a calibrated geometric–photometric–structural (GPS) scoring with red/black penalties is used to suppress lip/eye confounders.
The developed method targets the under-explored setting where eye/mouth cues are unreliable or partially missing. Different from landmark-heavy CNN pipelines requiring large annotated datasets, our rule-driven design explicitly enforces nasal geometry under illumination changes with lightweight computation. In addition to methodological novelty, the motivation for robust nose localization arises from a wide range of applications.
In surveillance and security systems, accurate nasal localization improves the reliability of face verification under unconstrained environments. In human–computer interaction, the nose provides a stable anchor point for gesture control and augmented reality alignment, where the eyes and mouth are occluded or highly expressive. In medical and diagnostic contexts, nasal contour analysis can assist in non-invasive respiratory monitoring and craniofacial assessment. Compared with the eyes and mouth, the nose remains relatively invariant across different facial expressions, making it an especially useful feature for consistent face alignment. These considerations highlight the importance of a dedicated framework for nose detection.
2. Related Work
Previous studies have explored various approaches to nose detection as part of facial feature localization.
Gradient and thresholding methods
Sankaran et al. proposed gradient-based thresholding for localizing nose tips, using histogram analysis to refine regions below the eyes [
1]. While effective in constrained settings, these approaches are sensitive to lighting variations and shadows. Similar gradient-driven nasal localization has also been exploited for pose estimation, underscoring the stabilizing role of nose geometry in face analysis [
1].
Template-based approaches
En and Basu introduced deformable templates for nostril and nose-side detection [
2]. Leaf-shaped and parabola-like templates modeled nostrils and nose wings, combined with energy minimization. Despite high accuracy in controlled environments, the method requires careful initialization and is computationally demanding. Classic deformable templates for the nose (nostrils and wings) further support the importance of explicit shape priors in robust localization [
2].
Color and region-growing techniques
Early works employed region growing in the luminance–chrominance-blue–chrominance-red (YCbCR) color spaces to restrict feature areas [
3]. Such methods leverage skin tone uniformity but struggle when the background or lip color resembles skin. Comparative studies indicate that YCbCr often yields more reliable skin separation than Hue–Saturation–Volume (HSV) under illumination changes, making it a common choice for facial component preprocessing [
4,
5]. The developed framework follows this convention and leverages YCbCr to confine the search region before structural parsing [
3].
In contrast to these methods, our framework fuses edge-based, curve-based, and scoring-driven modules. By integrating quadratic curve fitting with multi-feature scoring, we achieve stable localization across lighting conditions, occlusions, and diverse facial morphologies.
Beyond traditional rule-based methods, recent research has increasingly turned to machine learning and deep learning for nose detection. Convolutional Neural Networks (CNNs) and Graph Neural Networks have been applied to learn hierarchical representations of facial landmarks. These approaches generally achieve high accuracy when large annotated datasets are available, but they often suffer from high computational cost and reduced robustness in unconstrained environments with varying illumination, occlusion, or partial facial views. Furthermore, most deep models treat the nose merely as one among many landmarks without explicitly enforcing geometric or photometric consistency. This limits their stability in cases where eye or mouth cues are unreliable. In contrast, our framework adopts a rule-driven approach enhanced with curve fitting and calibrated scoring, thereby providing a lightweight yet robust alternative that explicitly exploits nasal geometry and structure.
Beyond these classical pipelines that combine color, edges, and histogram projections, recent works on nostril and key marker detection reaffirm the value of local intensity/structure cues for 2D nasal analysis [
6,
7]. In contrast to methods that require extensive landmark supervision, our approach emphasizes explicit geometric consistency via curve fitting while preserving the lightweight, rule-driven spirit.
3. Methodology
The framework consists of four main stages. The overall pipeline is illustrated in
Figure 1.
3.1. Face Candidate Preprocessing
First, we remove background via skin filtering in the YCbCr color space, consistent with prior findings that chrominance provides robust face/feature isolation [
3,
4,
5]. Then, ellipse validation is performed to eliminate non-facial regions. An ellipse match is accepted if the
, the
, and the axis ratio
.
3.2. ROI Extraction
A mid-lower facial ROI is framed between the eyes and above the mouth. If the mouth is detected, this refinement overrides Cr channel trimming. Otherwise, the Cr channel horizontal projection is applied to detect mouth peaks and remove the lower-lip band. We refined the ROI by clamping the vertical range and the horizontal range to the interocular distance as follows.
where
is a small safety padding to avoid over-cropping the nasal boundary,
ymounthmin and
yeyemax are the minimal and maximal
y coordinates of the mouth and eye regions, respectively, and
xeyeL and
xeyeR are the leftmost and rightmost
x coordinates, respectively. We set
where
=
yeyemax –
ymounthmin + 2
p and
=
xeyeR –
xeyeL + 2
p.
When mouth cues are unavailable, we fall back to Cr channel horizontal projection to locate the dominant mouth band and remove it from the lower part of the ROI (see
Figure 2). This eye/mouth-guided ROI with a Cr channel projection fallback is consistent with classic EyeMap/MouthMap-style constraints used to stabilize mid-lower facial analysis [
3]. Regarding Cr channel projection, we compute a horizontal projection of the Cr channel over the ROI, identify the dominant peak (lip band), and clamp the ROI bottom to a position slightly above that peak—removing the lower-lip band while keeping a small safety margin (
Figure 2).
3.3. Nose-Rectangle Candidate Generation
Given a refined ROI
, we generate multiple nose-rectangle candidates through five stages, and then aggregate candidates across threshold scales for downstream contour-based fitting (
Figure 3).
Morphological enhancement: We enhance the ridge–valley contrast by applying morphological black-hat and top-hat operations on the Y channel within the ROI, using an elliptical structuring element sized proportionally to the ROI. We then form a feature map:
. Top-hat and black-hat operations are widely used morphological contrast operators for enhancing ridge–valley structures in facial regions, which we adapt here to accentuate the nasal wings and base [
7].
Multi-threshold Canny (adaptive thresholds): We normalize
to
, estimate
via multi-level Otsu on the gradient magnitude, and scale them by
to produce three edge maps using L2-gradient Canny. Each scale is processed independently. Canny remains a principled edge detector under noise, while multi-threshold aggregation improves the recall of weak nasal boundaries without sacrificing specificity [
8]. The Otsu-derived thresholds yield data-adaptive operating points under heterogeneous lighting.
Binned histogram projection: From each edge map, we compute vertical and horizontal binned projections (see
Figure 3a,b): use
pixels and
. The vertical projection sums edge pixels column-wise within each
; the horizontal projection sums row-wise within each
.
Adaptive peak picking (with guarantees): We select peaks on both projections using a descending percentile threshold, enforcing a minimum inter-peak distance (in bins) along the vertical and horizontal directions. This suppresses spurious responses while preserving the dominant structures in each projection. If peaks are still insufficient, we apply a top- fallback to guarantee at least two vertical peaks and one horizontal peak (endpoints may be admitted when necessary).
Peak-paired rectangles proposal: Keep the top-4 vertical and top-2 horizontal peaks (by strength). For every vertical pair and each horizontal peak, form a rectangle with left/right edges padded one vertical bin beyond the two peaks, vertically centered on the horizontal peak, and with a fixed height of ~60% of the ROI; clamp to ROI bounds. Discard any degenerate rectangles.
3.4. Quadratic Curve Fitting and Rule-Based Classification
Regarding the fitting form and quality, for each connected edge set inside a rectangle, we choose the fitting form by its dominant orientation: use if the contour is predominantly horizontal. Otherwise, use x . Quadratic parameters are estimated by least squares. We accept a fit only if its root-mean-square error (RMSE) is below a threshold and its span-coverage exceeds a minimum ratio, and very short contours are rejected.
Regarding shape classification rules, we apply the following quadratic sign rules.
3.5. Nose Feature Extraction and Tip Point Proposal
Regarding contour selection and feature extraction, we first extract significant contours within each nose rectangle.
Nasal base (U): Choose the longest U-shape with horizontal in the lower half of the rectangle; record its vertex (symmetry anchor).
Nostrils (N): Collect up to two non-overlapping N-shapes above the base. If only one N-shape is found, assign its side by comparing the left/right endpoint heights (left end higher left nostril).
Nasal wings (C): Search from the side boundaries inward and accept one left and one right C-shape, above the base and lateral to the midline.
From selected N-shapes, compute the nostril midpoint : for the left nostril , set ; for the right nostril , set ; if both exist, ; if only one exists, equals that point.
Given the available anchors (
from the base,
from the nostrils), the tip is proposed by the following decision path (per rectangle) (
Figure 4).
Both and present: Draw the vertical line through (symmetry axis) and horizontally project onto this line; the intersection is the tip candidate.
Only present: Set and to the highest point on the base.
Only present: Set the tip equal to .
Neither present: Apply brightness fallback—if both nasal wings are available, take the brightest pixel between the nasal wings. Otherwise, take the brightest pixel in the rectangle.
As illustrated in
Figure 5, we visualize the fitted contours together with the selected anchors
and
, and the resulting tip candidate; the right panel shows the corresponding facial image with contour and tip annotations, where both nasal wings, the nasal base, and the nostrils are correctly identified, yielding highly accurate tip localization. For each rectangle, we output the nasal base and
, nostrils and
, nasal wings, and the proposed tip
. These are forwarded to the scoring/decision stage, where all candidates compete, and the distance to the local brightness peak is incorporated as a score term.
3.6. Geometric–Photometric–Structural Scoring
Scoring function: A calibrated scoring function fuses multiple cues for each rectangle-level tip candidate:
Geometric: left–right symmetry about the face/ROI midline; plausible tip height relative to the base; consistency between nostrils and wings (nostrils above the base and inside the wings; base spanning the two wings).
Photometric: local Y/Cb/Cr contrasts that favor a bright tip ridge with darker surroundings; wing inner outer contrast; “dark-inside/lighter-outside” pattern around nostrils.
Structural: edge density and continuity in nasal-wing regions; stability and coverage of fitted parts (base/wing/nostril) when available.
Brightness proximity term: The distance between the proposed tip and the local brightest pixel inside the rectangle (or between the wings) is converted to a soft bonus, encouraging tips that coincide with the nasal highlight while still allowing geometry to dominate when specularities are misleading.
Fallback awareness: Candidates derived from fallback routes are retained but down-weighted via confidence factors; they can still win if supported by strong photometric/structural evidence.
Penalties: Red/black penalties reduce false positives from lips (high Cr) and from eye/nostril dark patches outside the plausible nasal zone. Penalties are applied softly so they do not overrule consistent geometry.
Aggregation and selection: All candidates from multiple thresholds and rectangles are jointly ranked by the final normalized score. The top-scoring tip is selected; ties are broken by better symmetry and shorter brightness distance. The chosen tip and its associated parts are then passed to the output stage.
3.7. Geometric–Photometric–Structural Scoring
To improve robustness using the pipeline, we add a few engineering choices that do not change the algorithm’s flow.
Fallback awareness: Candidates derived from fallback routes are retained but down-weighted via confidence factors; they can still win if supported by strong photometric/structural evidence.
Across-scale aggregation: For the multi-threshold Canny, we run three scales , collect rectangles/contours from each scale, and then merge them. For every item, we compute a simple support score: how many scales showed (roughly overlapping) the same item. We rank by this support—high-support items stay on top; low-support ones are kept but given less weight instead of being dropped.
Curve-fit quality gates (lightweight): We fit quadratics by least squares and keep the same two easy checks of whether the RMSE is below a threshold and whether there is enough span coverage. We do not add new thresholds. We also cache each fit’s residual as a confidence value that later stages can reuse.
Scoring weights and tie-breaks: In the GPS score, geometric symmetry receives the highest weight, followed by photometric contrasts and structural continuity; ties are broken by better symmetry and shorter brightness distance.
4. Experiments
We evaluate the proposed nose tip localization pipeline on a test set of 77 face images spanning diverse illumination and head poses. To assess module contributions, we use single-factor ablation experiments targeting four key components. All methods are evaluated on the same 77 images. Unless otherwise stated, we keep the exact same pre-/post-processing, random seeds, and ordering. Each ablation toggles one component while keeping the rest identical to the full model to ensure fair comparison. We calculate the following metrics: normalized error as a fraction of the image diagonal, mean, RMSE, median, 25th/75th/90th percentiles, and PCK at 1, 2, 3, and 5%. We focus on effect sizes rather than statistical significance.
4.1. Quantitative Results
The whole algorithm, including the ROI with eye/mouth hints (with Cr channel fallback), morphology-enhanced edges, multi-threshold Canny, multi-rectangle pairing, quadratic fits + proposal, and GPS scoring, is shown in
Table 1 and
Table 2.
For the ablation study, the Eye/Mouth Hints (ROI guidance) indicate that removing ROI guidance produces the most severe deterioration, notably widening the error tails and causing a double-digit drop at coarse PCK thresholds. Typical failures include ROIs drifting into lip or eye regions, which propagate noise to subsequent stages (
Figure 6).
Single-scale Canny denotes the use of a single threshold scale (th_scale = 1.0) instead of multi-threshold aggregation. CurveFit (Centroid) refers to the removal of quadratic fitting and proposal rules, where the nose tip within each candidate rectangle is defined as the centroid of Canny edges (falling back to the brightest pixel only if no edges are present). GPS (Center+Brightness) replaces GPS (geometric–photometric–structural scoring) with a weaker score combining a Gaussian center prior and tip brightness, i.e., 0.7 × center + 0.3 × brightness.
4.2. Discussion
The ablations are interpreted along the pipeline: upstream modules define a clean search space, while downstream modules refine and select candidates. The observed shifts in distribution percentiles and PCK align with this cascade. Using a single threshold reduces candidate recall under illumination and texture variation. This degrades the middle and tail of the distribution and lowers the PCK at moderate thresholds, since a single sensitivity discards numerous legitimate edges. Replacing fitted geometry with edge centroids weakens PCK in the midrange. Centroids tend to bias toward dense wing edges, whereas quadratic fits enforce base/wing/nostril structure, yielding inherently stronger tip proposals. Employing the weaker prior primarily affects the final ranking when multiple plausible tips exist. Although central tendency changes are small, selection becomes less consistent in challenging scenes (e.g., specular highlights or nearby bright distractors).
A recurring failure mode arises when skin filtering does not precisely isolate the face, expanding the ROI beyond the mid-lower facial zone. As shown in
Figure 6b, this admits lip, cheek, and background textures; histogram peaks then align with non-nasal structures, curve fitting latches onto spurious contours, and final scoring must arbitrate among poorer candidates. Percentile trends (especially P75 and P90) and PCK consistently favor the full pipeline. The RMSE exhibits small fluctuations driven by a few difficult cases; therefore, interpretation relies primarily on percentiles (distribution shape) and PCK (task accuracy).
4.3. Summary
The full pipeline yields the most accurate and stable nose tip localization across the benchmark. Single-factor ablations confirm the upstream-to-downstream cascade. Removing Eye/Mouth Hints produces the greatest degradation, as precise ROI guidance is essential for all subsequent stages. Collapsing to a Single-Scale Canny configuration reduces robustness under varying illumination and shrinks the set of valid candidates. Disabling Quadratic Curve Fitting eliminates structured geometric constraints and replaces them with raw edge centroids, while substituting GPS with a center-plus-brightness prior primarily affects the final ranking. Overall, the upstream modules establish a clean search space, and the downstream modules consolidate and select among the most reliable candidates.
5. Conclusions
We introduced a lightweight, interpretable framework for nose tip localization that couples quadratic curve fitting with a calibrated geometric–photometric–structural (GPS) scoring. The pipeline integrates YCbCr-based preprocessing, eye/mouth hints for ROI guidance, multi-threshold Canny with binned projections for candidate generation, and rule-driven contour parsing to propose anatomically consistent tips. The ablation results substantiate the design: precise ROI guidance and multi-threshold edge evidence deliver the largest accuracy gains, while curve fitting and GPS scoring further stabilize the geometry and resolve close candidates. The full model consistently tightens the error dispersion (lower P75/P90) and improves the PCK at practical thresholds, supporting deployment in illumination- and pose-diverse settings where landmark detectors may be unreliable or unavailable. This rule-driven approach is data-efficient and complementary to learning-based systems. Future work will explore self-calibrated thresholds and adaptive priors for different sensors and skin tones.
Author Contributions
Conceptualization, Y.-C.C.; methodology, Y.-C.C. and S.-C.K.; software, Y.-C.C. and S.-C.K.; validation, Y.-C.C.; formal analysis, Y.-C.C. and S.-C.K.; investigation, Y.-C.C. and S.-C.K.; resources, Y.-C.C.; data curation, Y.-C.C.; writing—original draft preparation, Y.-C.C. and S.-C.K.; writing—review and editing, Y.-C.C., S.-C.K. and J.-J.D.; visualization, Y.-C.C. and S.-C.K.; supervision, J.-J.D.; project administration, J.-J.D.; funding acquisition, J.-J.D. All authors have read and agreed to the published version of the manuscript.
Funding
This research was funded by the National Science and Technology Council, Taiwan, under the contract of NSTC 114-2221-E-002-122-MY2.
Institutional Review Board Statement
Not applicable.
Informed Consent Statement
Not applicable.
Data Availability Statement
No new data were created.
Conflicts of Interest
The authors declare no conflicts of interest.
References
- Sankaran, P.; Gundimada, S.; Tompkins, R.C.; Asari, V.K. Pose angle determination by face, eyes and nose localization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, San Diego, CA, USA, 21–23 September 2005; p. 161. [Google Scholar]
- Yin, L.; Basu, A. Nose shape estimation and tracking for model-based coding. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, Salt Lake City, UT, USA, 7–11 May 2001; Volume 3, pp. 1477–1480. [Google Scholar]
- Hsu, R.L.; Abdel-Mottaleb, M.; Jain, A.K. Face detection in color images. IEEE Trans. Pattern Anal. Mach. Intell. 2022, 24, 696–706. [Google Scholar]
- Shaik, K.B.; Ganesan, P.; Kalist, V.; Sathish, B.S.; Jenitha, J.M.M. Comparative study of skin color detection and segmentation in HSV and YCbCr color space. Procedia Comput. Sci. 2015, 57, 41–48. [Google Scholar] [CrossRef]
- Liu, Q.; Peng, G.Z. A robust skin color based face detection algorithm. In Proceedings of the International Asia Conference on Informatics in Control, Automation and Robotics, Wuhan, China, 6–7 March 2010; pp. 525–528. [Google Scholar]
- Charoenjai, K.; Kusakunniran, W.; Thaipisutikul, T.; Yodrabum, N.; Chaikangwan, I. Automatic detection of nostril and key markers in images. Intell. Syst. Appl. 2024, 21, 200327. [Google Scholar] [CrossRef]
- Beigzadeh, M.; Vafadoost, M. Detection of face and facial features in digital images and video frames. In Proceedings of the Cairo International Biomedical Engineering Conference, Cairo, Egypt, 18–20 December 2008; pp. 1–4. [Google Scholar]
- Canny, J. A computational approach to edge detection. IEEE Trans. Pattern Anal. Mach. Intell. 1986, 8, 679–698. [Google Scholar] [CrossRef]
| Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |