1. Introduction
At every stage of a typical research process, conducting a sample survey is the initial step because it is a fundamental method for gathering data. Interviews may be conducted virtually or in person; arranged by individuals or groups; and conducted via mail, telephone, or in person methods. A questionnaire is one of the research instruments used across various disciplines, including education, medicine, economics, labor studies, industry studies, business studies, animal studies, and environmental studies. Personal, telephone, postal, and web-based surveys are the most common methods for conducting a survey. Surveys may be ineffective if flawed data collection methods are used. For this reason, the methods of data collection should be accurate to yield valid conclusions. As a result, developing an action plan based on strategic data collection approaches can significantly increase the quality of research and enhance the accuracy of results. Utilizing auxiliary technical variables closely related to the primary research variable can enhance the model and yield more valid and stable results [
1,
2,
3].
Sampling theory is distinct from other statistical theories in that it is based on supplementary information to maximize estimation accuracy. Auxiliary information is typically featured during different stages of a sampling study, including stratification and determination of selection probabilities, to maximize the effectiveness of population parameter estimation. Generally, the objective of surveys is to estimate the population total or mean. There are several techniques that have been suggested by researchers in using auxiliary information and improving the accuracy of population average estimation; see [
4,
5,
6,
7]. In the case of a direct relationship between supplementary and primary variables, regression and ratio estimators are particularly notable for estimating the population mean.
Nevertheless, classical techniques tend to become less accurate when extreme values or outliers in the data set come into play. To solve these shortcomings, effective ratio and regression-based techniques have been suggested in the estimation of means. Moreover, many sophisticated methods that specifically account for extreme values in augmented datasets have been designed. For example, Oral and Oral [
8] treat the deficiencies of the classical ratio estimator and propose a new one constructed on the modified maximum likelihood (MML) and order statistics. Their estimator is very robust and efficient in non-normal cases. Abid et al. [
9] explored estimating the population mean by employing unconventional location parameters. Zaman [
10,
11] and Bulut and Zaman [
12] introduced a category of robust ratio estimators by synthesizing the existing ratio estimators. Ref. [
13] and Ibrahim et al. [
14] consider the problem of outliers in ratio estimators and employ robust regressions such as LTS, LMS, and Huber estimations. Under data contamination, their results indicate that estimation accuracy has been boosted by a substantial margin compared to the conventional approaches. In addition, the new double-sampling estimator, implied by robust regression methods, was proposed by Zaman and Bulut [
15]. Ref. [
16] suggested ratio-based estimators employing quantile regression as a replacement for conventional regression estimators. Ali et al. [
17] obtained estimators through a robust regression technique, which is specifically appropriate in sensitive surveys that can be conducted in simple random samples. More recently, Koc [
18] came up with the new idea of using the Poisson regression coefficient to estimate means in a specific situation, namely, in the context of count data. Their study is unique because they are concerned with count data and the utilization of Poisson regression in the estimation of ratio-based average estimators.
The basic principle of a useful statistical analysis has been count data modeling and estimation of mean. It can be directly applied in such domains as epidemiology, ecology, economics, and industrial engineering. The Poisson regression model of count outcomes has historically been the statistical workhorse because it is simple in both mathematics and analysis, and because it has a straightforward expression linking the mean of the response variable with its variance (Cameron and Trivedi [
19]). That is one of the reasons that Koc [
18] came up with the new idea of using the Poisson regression coefficient in mean estimation. After that, many studies extended their work by defining average estimators using the regression coefficient of Poisson. For example, Koc et al. [
20] extend this work to double sampling. Ref. [
21] developed a Poisson regression-based estimator of the mean that aims to suit the simple random sampling scenario to count data. Their approach is an indication of the significance of matching estimation methods with the distribution of underlying data. Wani et al. [
22] proposed an enhanced version of the Koc [
18] estimator by incorporating power components from Shahzad et al. [
21]. Raghav et al. [
23] and Alomair and Shahzad [
24] also extended this idea to Poisson and Tweedie–Poisson regression-based average estimators under probability sampling designed to improve the accuracy of count data.
However, the assumptions of the Poisson distribution do not always hold in the real world due to overdispersion in a data set. This violation may cause an underestimation of the standard errors and lead to inaccurate inference. To overcome these weaknesses, it is necessary to apply a more appropriate regression model that can accommodate dispersion. The P-IG regression model is another useful alternative, and it is a mixture of Poisson and inverse Gaussian. The model retains the simplicity of interpretation of the classical Poisson model and introduces a dispersion parameter to account for extra-Poisson variability; hence, it is particularly effective in fitting overdispersed counts (Sellers and Shmueli [
25]; Farghali et al. [
26]). The principal goal of the current research is to formulate mean estimators on the basis of the P-IG regression coefficient, which is more convenient than the classical Poisson estimators under conditions of overdispersion.
As far as we are aware, no prior research has investigated the problem of mean estimation in sampling theory using the P-IG regression coefficient. This gap is addressed in the present paper, where mean estimators based on the P-IG regression have been derived and tested, and their performance is compared with Poisson regression-based mean estimators. These estimators are based on auxiliary information, i.e., extra covariates or known quantities that enhance the efficiency and accuracy of statistical estimators (Sarndal et al. [
27]; Rao [
28]). The strategic application of auxiliary variables is a well-established procedure in survey sampling and small-area estimation; however, the incorporation of auxiliary variables into more sophisticated regression models involving count data has received relatively little attention. This aspect is also explored in the current research article.
The subsequent parts of the paper are arranged into four parts to discuss how the development and application of robust mean estimators concerning the P-IG regression coefficient are carried out. In
Section 2, the background of count regression is reviewed, and the existing estimators developed by Koc [
18] based on Poisson regression coefficients are explained.
Section 3 contains the theory of the P-IG regression, its superiority over the classical equivalents, and the construction of the new family of average estimators, which are more efficient by utilizing auxiliary variable information.
Section 4 presents a critical comparison of the developed and classical ones in both simulation and empirical settings using three real-world populations of mussel data and three artificial ones generated using the P-IG distribution.
Section 4.3 summarizes the findings, and excellent results of the proposed estimators are indicated.
Section 5 offers the conclusion of the study.
2. Poisson Regression for Count Data
It is necessary to understand the context of the main principles of count data regression to appreciate the usefulness of traditional models and the strategic importance of more adaptive ones. Count data are non-negative integer observations that appear in a wide variety of applied situations, such as counting the number of hospital visits, insurance claims, and ecological events that are recorded. The Poisson regression model has traditionally been the preferred choice for modelling such data because it is simple and has a clear interpretive framework. It presupposes that the response variable is Poisson distributed, the average of which is linked with explanatory variables in a log-linear fashion [
19,
29].
One of the most popular statistical methods of the count-type data analysis is the Poisson regression model. The method can be used when the dependent variable
for
represents the frequency of events observed within a fixed duration or spatial domain.
considers Poisson distribution with
where
The respective log-likelihood of the set of observations is developed in the following way:
Suppose that X is a predictor matrix of dimension
, and
is the
ith row of
. The model relates
with the predictors by use of the logarithmic link function:
where
signifies regression coefficients.
The log-likelihood is differentiated with respect to the coefficient vector and equated to zero in order to estimate
. The maximization of the likelihood leads to the following:
Such a set of nonlinear equations is usually solved using an iterative numerical solution algorithm like the Newton–Raphson or the Fisher scoring algorithm [
19,
29].
Mean Estimators Using Poisson Regression
The use of linear regression models to analyze count data is generally inappropriate because negative values of predicted results can occur, and these models do not satisfy the assumptions of constant variance. Such issues can also adversely affect mean estimation. Koc [
18] attempted to address this problem by proposing ratio estimators for mean estimation that incorporate Poisson regression coefficients.
The ratio-type methods of estimation are beneficial in a survey sampling context where a strong positive correlation exists between the supplementary variable and the research variable. The method was initially introduced by Cochran in the middle of the 1900s, and since then, it has developed to form an essential part of statistical inference. Although it was initially used extensively in agricultural research, it has since been found helpful in a variety of other scientific fields. Interested readers can consult [
30,
31,
32] to know more about ratio estimators and their different variations. Koc’s [
18] class of ratio-type mean estimators is given below:
where
where
The
class has an MSE in Equation (
3) that depends on finite population correction (fpc) factor
containing population and sample sizes
, population averages
, sample averages
, variances
, CV
, kurtosis
, and co-variance
.