Exploring the Patterns of Job Satisfaction for Individuals Aged 50 and over from Three Historical Regions of Romania. An Inductive Approach with Respect to Triangulation, Cross-Validation and Support for Replication of Results

: In this paper, we explore the determinants of being satisﬁed with a job, starting from a SHARE-ERIC dataset (Wave 7), including responses collected from Romania. To explore and discover reliable predictors in this large amount of data, mostly because of the staggeringly high number of dimensions, we considered the triangulation principle in science by using many di ﬀ erent approaches, techniques and applications to study such a complex phenomenon. For merging the data, cleaning it and doing further derivations, we comparatively used many methods based on spreadsheets and their easy-to-use functions, custom ﬁlters and auto-ﬁll options, DAX and Open Reﬁne expressions, traditional SQL queries and also powerful 1:1 merge statements in Stata. For data mining, we used in three consecutive rounds: Microsoft SQL Server Analysis Services and SQL DMX queries on models built involving both decision trees and naive Bayes algorithms applied on raw and memory consuming text data, three LASSO variable selection techniques in Stata on recoded variables followed by logistic and Poisson regressions with average marginal e ﬀ ects and generation of corresponding prediction nomograms operating directly in probabilistic terms, and ﬁnally the WEKA tool for an additional validation. We obtained three Romanian regional models with an excellent accuracy of classiﬁcation (AUROC > 0.9) and found several peculiarities in them. More, we discovered that a good atmosphere in the workplace and receiving recognition as deserved for work done are the top two most reliable predictors (dual-core) of career satisfaction, conﬁrmed in this order of importance by many robustness checks. This type of meritocratic recognition has a more powerful inﬂuence on job satisfaction for male respondents rather than female ones and for married individuals rather unmarried ones. When testing the dual-core on respondents aged 50 and over from most of the European countries (more than 75,000 observations), the positive surprise was that it undoubtedly resisted, conﬁrming most of our hypotheses and also the working principles of support for replication of results, triangulation and the golden rule of robustness using cross-validation.


Introduction
This study engages in a scientific endeavour through an in-depth analysis of a broad sample of data on job satisfaction behaviour. The concept of job satisfaction is defined as how an individual feels about his or her job and various aspects of it, usually in the sense of how favourable-how positive or negative-those feelings are [1]. To explain it briefly, job satisfaction is the extent to which people like their jobs [2].
In general, the theoretical approach to study the influences on job satisfaction is based on two facets of the latter, namely the environmental factors and the personal characteristics [3]. While the dichotomy of the physical and psychological factors [4] has received little empirical confirmation [5], a much more intuitive and realistic approach is based on the theory proposed by Locke. According to Locke [6], the job satisfaction is a pleasurable emotional state resulting from the perception of one's job as fulfilling or allowing the fulfilment of one's important job values, providing these values are compatible with one's needs. Values refer here to what one considers beneficial, whereas needs are conditions actually required for one's well-being. Hence, if the discrepancy in this perception between real and ideal is lower, then the job satisfaction increases in the case of an employee. Moreover, it is largely accepted that between career satisfaction and life satisfaction there is a reciprocal relationship in general [7]. Usually, a lower job satisfaction has a negative consequence on physical and mental health, and on the level of grievances [6].
In business terms, career satisfaction is worth analysing due to its impact on employee turnover and performance [8,9], a firm's productivity, competitiveness and long-term efficiency [10][11][12]. It is also a fact that Romanians spend most of their time at work in comparison with other European citizens [13]. Consequently, it is important to pay attention to job satisfaction since it is also a part of life satisfaction and of subjective well-being [14]. A recent study starting from 18,247 responses of people from the Romanian business, engineering and pharmaceutical sectors [15], focused on the 100 most desirable employers from 2019, revealed interesting facts relating to job satisfaction. In this direction, the research emphasized that most of these respondents are satisfied and choose a certain employer if they are well paid, the company is big and important enough on the market, if their career will bring new professional opportunities, there are role models within the corporation they could learn from and last but not least, the job is safe enough. In other terms, it is important to note that career satisfaction is a heterogeneous phenomenon, since it may reflect a dual satisfaction based on financial and non-financial rewards. Furthermore, different people may consider it from different unknown perspectives when they evaluate their satisfaction with their job [16,17]. Therefore, any investigation based on such a subjective variable, namely job satisfaction, seems to involve major risks, although other previous research has shown the opposite [18].
Starting from a SHARE-ERIC's survey dataset corresponding to a sample of Romanian people aged 50 and over in 2017, the main question we seek to answer in this paper is: Which are the common and which are the particular determinants of job satisfaction, dependent on belonging to those three major historical regions of Romania?
The paper is structured as follows: in the next section, we present the literature review related to this topic and, according to it, several hypotheses to be further tested; next, we present the data we have used and the methods, which are explained in detail, followed by the section where the main results are underlined; the paper ends with a discussion of the results and brief conclusions.

Literature Review
Cook, Hepworth, Wall and Warr (1981) [19] and Cranny, Smith and Stone (1992) [20] are a few among those who consider that for an individual job satisfaction is an affective reaction concerning his job, based on a subjective comparative analysis between reality together with proper expectations and the desire for personal fulfilment. If an individual is not satisfied by their job, manifesting other desires and needs, eventually he may wish to withdraw and choose another job [21,22]. Professional support, development and opportunities are considered the most important factors that may influence the job satisfaction [23][24][25]. In this direction, relationships with colleagues and managers, job security and recognition are other determinants of job satisfaction [26]. It is a well-known fact that financial incentives are not the most important influence on job satisfaction, but appreciation and recognition [27][28][29].
While a large body of literature stresses the role played by wages on job satisfaction [30], little evidence is given regarding the impact of intrinsic factors on the same outcome. It is considered that in certain domains of activity, such as nursing or social work, the intrinsic job satisfaction elements exert a stronger influence than the extrinsic ones such as monetary payments or rewards [31][32][33]. For instance, in the case of nurses, the idea of helping people in need sometimes goes beyond the importance of the size of financial rewards [34] and the same for the desire to gain immaterial rewards, such as appreciation, recognition, opportunities or good relations with co-workers [35]. Westover and Taylor (2010) [36] even found that the intrinsic rewards are the most powerful predictors of the respondents' job satisfaction. Moreover, it was demonstrated that satisfied employees are more likely to become more productive [37], more attached to their job by manifesting lower absenteeism [38], less stressed [39] and less disposed to turnover [40], hence receiving also increased levels in terms of life satisfaction [41,42].
The effects of personality on job satisfaction have recently received augmented attention, being considered that job satisfaction has mostly an affective nature as part of employees' personality characteristics [43]. Furthermore, some other authors agree that there is an important relationship between the Big Five personality traits and job satisfaction [44,45]. Judge et al. (2002) [46] found that extraversion, neuroticism, and conscientiousness are the most important predictors of job satisfaction, while Triandis and Suh (2002) [47] considered that agreeableness is a personality trait that may function only in a very particular environment full of interpersonal harmony. Templer (2011) [48] and Berglund et al. (2015) [49] found that openness has no significance when tested as a predictor of job satisfaction. On the other hand, recent research by Törnroos et al. (2019) [50] emphasized that when the average occupational openness is low enough, the individuals displaying high openness to experience have a lower level of job satisfaction than those with low openness. The most probable explanation relies on the fact that such individuals are not happy with the occupation's relatively low level of novelty, cognitive challenge, and opportunities for personal innovation. This may be analogous to a mismatch between the person's educational level and the occupation's skill requirements. In other words, over-educated and over-skilled individuals are less likely to be happy with their jobs [51]. The role of job changes on career satisfaction is worth mentioning [52,53], although there is not a definitive approach about the role of changes [54]. Occupational mobility could have different motives, from earnings inequality [55] to the desire for fulfilment, individual development, opportunities or the atmosphere at work [56].
Choudhury and Gupta (2011) [57] highlighted an influence of low job satisfaction on behaviours that may induce higher costs for the employers, such as high absenteeism, laziness in work, and poor organizational performances. In the same direction, Dalal (2005) [58] found that high levels of job satisfaction could generate negative actions (e.g., laziness) with counterproductive effects for the institutions.
The relationship between job satisfaction and retirement age is well documented in the literature, but the approaches differ greatly. Bidewell et al. (2006) [59] demonstrated that extrinsic job satisfaction was significantly correlated with an increased age for retirement. More nuanced, Kalokerinos et al. (2015) [60] found that a negative relationship between phased retirement and job satisfaction, emphasizing the positive influence of career continuity for the satisfied workers.
In the case of nurse residents from rural and urban centres, Bratt et al. (2012) [61] emphasized that there were differences between them in terms of job satisfaction and stress. Therefore, they found that the rural residents recorded higher levels of job satisfaction and lower ones for job stress at the residency programme's end than their urban counterparts did.
The relationship between job satisfaction and loneliness is studied by few researchers, especially in the case of migration abroad (Anderson, 1999) [62]. Ernst and Cacioppo (1999) [63] defined loneliness as a condition associated with a variety of individual differences including depression, hostility, pessimism, social withdrawal, alienation, shyness and low positive affect. According to other opinions, loneliness was considered as a discrepancy between one's desired and achieved level of social contact [64]. While Bhagat (1982) [65], Aytac (2015) [66] and Wright (2005) [67] stressed the fact that loneliness and isolation are the best predictors of job dissatisfaction, Chan and Qiu (2011) [68] found that what matters for increasing the job satisfaction are not the levels of loneliness, but the management's attitudes towards the migrant employees, including the regulations to preserve their rights. Landry (2000) [69] and Petrovski and Gleeson (2007) [70] also underlined a negative correlation between loneliness and job satisfaction and a positive one between job satisfaction and life satisfaction. Other empirical studies confirmed that workplace loneliness and poor quality of interpersonal relationships have a negative influence on job satisfaction and happiness [71]. In the case of a sample of Brazilian teachers, Neto (2015) [72] emphasized that the relationship between loneliness and job satisfaction could be clear enough if considering that loneliness is linked with life satisfaction. Neto (2005) [72], Dussault and Deaudelin (2001) [73] and Wei, Russell and Zakalik (2005) [74] also emphasized a negative relationship between self-efficacy as predictor of job satisfaction, and loneliness. Moreover, it has already been emphasized that, usually, all emotional tensions lead to lower performances in both public and private institutions [75].
The number of jobs previously held is an interesting variable that may give a clue regarding one's decision to become satisfied at work. It could easily be understood that somebody who changed many jobs during their lifetime did so for many reasons, but one could be linked with the idea that he/she was usually unsatisfied. There are few studies emphasizing such a correlation or even a lack of it. For example, Cardina and Wicks (2004) [76] found no such relationship among academic reference librarians in a study conducted between 1991 to 2001 in six different regions of the United States. Gaszynska et al. (2014) [77] also stressed no evidence between job satisfaction among Polish anaesthesiologists and previous number of jobs. On the contrary, other studies emphasized a relationship between these two. Hence, Bacopanos and Edgar (2016) [78] found that the number of jobs held since graduation influenced job satisfaction. Therefore, they underlined that those Australian graduate physiotherapists who changed five or more jobs since graduation recorded lower satisfaction. Moreover, they could not identify which variable was the cause for the other.
The idea that a life full of opportunities could predict the level of job satisfaction is very little explored in specific studies. Kinicki et al. (2002) [79] emphasized that existing promotion opportunities, among other relevant factors, influence the job satisfaction, especially in a positive manner [80]. Aiken et al. (2012) [81] found that, among nurses from various European countries, the lack of educational and advancement opportunities was considered as a major negative influence towards their career satisfaction.
The authorities' measures to protect the employees from various hazards at work also have a certain potential to explain the level of job satisfaction. In this case, there are some examples in the specific literature such as studies in which work-related health hazards, personal hygiene and utilization of personal protective equipment are studied together with the job satisfaction [82] or papers in which the purpose is even to assess the impact of safety and health on job satisfaction [83]. Moreover, other authors [84], who previously focused on investigations regarding the relationship between perceived job security and the labour market institutions, pointed out the need for clear explanations regarding the concept of job security when asking respondents about it because of its diverse meaning.
It was already explained that job satisfaction is greatly influenced by the team climate [85]. Furthermore, if employees create a positive team climate in the workplace, then job satisfaction increases [86,87]. Therefore, some authors consider this type of climate even an antecedent of career satisfaction [88].
Job satisfaction could also be stimulated by receiving recognition and rewards deserved for the work done within an organization. Flynn (1998) [89] argued that recognition programs are extremely motivating for employees, boosting their morale and performance, creating a rational link between pecuniary rewards and productivity and finally, generating increased job satisfaction. Other scholars found that the rewards received from their organizations made the employees be more satisfied by their job [90][91][92].
Following the arguments based on the existing literature, we expect that: Hypothesis 1 (H1). The recognition of one's own efforts is the strongest predictor of job satisfaction.
Hypothesis 2 (H2). The individuals from rural areas have higher job satisfaction levels than those from urban ones.

Hypothesis 3 (H3)
. The more jobs a person had, the greater his/her job satisfaction.

Hypothesis 4 (H4).
The exogenous influences on job satisfaction count much more than the endogenous ones.

Hypothesis 5 (H5).
The strongest predictors of job satisfaction correspond to the superior layers in Maslow's hierarchy of needs.

Data and Methods
This research started from the questions of the SHARE-ERIC (Survey of Health, Ageing and Retirement in Europe, www.share-project.org/home0.html), located at MEA (Munich Center for the Economics of Aging), Max-Planck-Institute for Social Law and Social Policy, Munich, Germany, also considered as the first ever (2006) European Research Infrastructure Consortium (ERIC). In 2017, SHARE-ERIC collected for this project 76,520 observations, including more than 2000 for Romanian citizens aged 50 or more, from which we were interested in a selection of almost 1700 questions (from a total of 7700), originally grouped in eight categories, namely: AC-Activities; CC-Childhood Circumstances; DN-Demographics; GV_BIG5-Personality traits; RA-Retrospective Accommodation; RE-Retrospective Employment; WQ-Work Quality; GV_ISCED-ISCED standards for classifying education (doi:10.6103/SHARE.w7.700, Stata version, the sharew7_rel7-0-0_ALL_datasets_stata.zip file of 69.82 MB, uploaded by SHARE-ERIC on 3rd of April 2019). Therefore, our mission to explore and discover reliable predictors in such a large amount of data, mostly because of the high number of afferent dimensions, was a challenging one.
We considered several scientific principles in order to foster the robustness of our research inquiry for better credibility and trustworthiness. One of them, the triangulation principle, means to rely on many different, but convergent approaches, techniques and tools in order to properly investigate any complex phenomenon.
A summary of the methodology used was synthesized below: • SHARE-ERIC Wave 7 dataset (only 8 categories of variables in 8 different files merged and used; only responses of respondents aged 50 or more, from Romania and with the declared country region for their residence); • Both decision trees and naive Bayes (NB) approaches have been used in Microsoft SQL Server Analysis Services (only the primary outcome was pre-processed: wq727_ became wq727bin34- Table A1, Appendix A); NB's dependency network in this first round mining acted as support for variables pre-selection; • Variables selected at previous steps and some others of interest (mostly individual traits) have been processed (Table A1, Appendix A) based on a balanced approach of treating missing responses; • Different regional splits and some meant for robustness checks have been done; • Second round mining was applied (LASSOPACK package in Stata) on the processed variables (previous steps); • Logistic and Poisson regressions with robust standard errors have been used and further robustness confirmations of the dual-core and additional selections for regional models have been performed based on p, VIF, R-squared, ROC and GOF test values; • Average marginal effects have been computed and Zlotnik probability prediction nomograms were generated for the regional models; • Third round mining using WEKA was performed only for an additional validation; • Additional regressions to confirm the already identified dual-core at a higher level (almost all European countries) have been performed by following an inductive approach.
For merging the divided sources (the eight aforementioned categories), we have used comparatively SQL queries involving the INNER JOIN clause in Access from Microsoft -Redmond, Washington, USA (after previously importing the .csv exports of Stata .dta files), join statements in Stata 16 64 bit Multiprocessing/Parallel Edition (MP), the Open Refine tool and Cell. Cross GREL (The Refine Expression Language of Google -Mountain View, California, USA) statements and also the DAX (Data Analysis eXpression) language function extension for spreadsheets, available after installing the Power Pivot add-in from Microsoft (more precisely, the RELATED function, simpler and easier to use than the traditional time consuming VLOOKUPs). For cleaning the data and performing further derivations, we also relied on all the tools above. One of the most important derivations started from the lists of declared residences for each respondent and consisted in the last non-blank one, based on which we divided the dataset into three large Romanian historical regions, namely: MB (Moldova and Bucovina or the northeast region), TBCM (Transylvania, Banat, Crisana and Maramures or the central part together with the northwest and the west regions) and OMD (Oltenia, Muntenia and Dobrogea or the south, southeast and southwest regions).
To ensure the server-side persistency of all the tests in the first DM round, we have used a 64 bit version of Microsoft SQL Server 2016 x64 Analysis Services (SSAS) running on a Windows 10 x64 virtual machine (inside the VM Virtual Box Manager 5.0.14 from Oracle -Redwood City, California, USA) with 6 GB of Virtual RAM Memory and 2 physical processing cores (half of the total amount) of an Intel Core I7-4710 HQ central processing unit. To effectively create the first mining structures corresponding to the whole of Romania, deploy them on the server-side layer (SSAS) and explore the results, we comparatively used the designer from Microsoft Visual Studio 2017 (SQL Server Data Tools for Business Intelligence also required as distinct component to install), the Excel DM add-in for Office Excel 2013 and specific queries, namely of DMX type DM eXtensions in the traditional Structured Query Language (SQL). In these first DM tests, we have involved the Microsoft algorithms based on both the decision trees and naive Bayes techniques, which belong to the traditional classifiers that have been commonly used in the context of classification problems [93]. The latter also acted as a sort of fields pre-selection method before the next steps of data processing, including renaming, numerical scale derivations and missing values' treatment, which preceded the use of automatic variable selection procedures acting as next rounds DM followed by statistical analysis.
To process the variables for this particular region (Romania-those corresponding to all 8 categories of questions above) as well as for a larger one (most of the European countries-just the ones corresponding to the WQ category because of finally testing only the powerful dual-core by induction), we aimed for clear and trustful answers. We were also aware of the traditional procedures of treatment for missing values and their effect on the classifier accuracy [94]. Still, we did not assimilate the missing values, the undecided state of the responses and the unwillingness to answer to a certain state of the original scale. Hence, we generated an extra level (usually a middle one), which cost us a lot in terms of artificially generated variance, but ensured support for a balanced approach with more realistic values for the coefficient of determination (R-squared), accuracy tests and ratios between the magnitudes of the most powerful underlined influences. Therefore, the entire set of treated fields/variables, resulting from the first round DM (Table A1, Appendix A), served as an input for further applying variable selection procedures and regression analysis in Stata 16.
For further exploration of the two most powerful common influences (models' core) and particular ones, we have considered a general variable selection technique, namely LASSO (Least Absolute Shrinkage and Selection Operator), originally documented in geophysics as a L1-regularization approach for a problem called sparse spike deconvolution [95][96][97]. Actually, three types of LASSO have been tested in Stata (the LASSOPACK package): -CV LASSO (a time-consuming approach based on cross-validation; we have used it with 10 folds, elastic net with alpha = 1 and two options for post-estimation: lopt-lambda that minimizes the mean squared prediction error/MSPE and lse-largest lambda for which MSPE is within one standard error of the minimal MSPE); -LASSO 2 (a faster approach centred on information criteria; we have used it with alpha = 1, followed by a post-estimation to select the best value of lambda when considering the Extended Bayesian Information Criteria-EBIC); -R LASSO (an even faster but more rigorous/penalizing approach focused on controlling over-fitting; we have used it on default settings).
In order to analyze the determinant factors that influence the probability of being satisfied with a job in our proposed models, we have started from the sets of variables obtained from the second round DM and selection (based on LASSO), which had as the input the set resulting from the first round (based on naive Bayes in SSAS), only after some processing steps described in Table A1, Appendix A (questionnaire item and coding columns). Next, we used a well-known generic econometric model (Equation (1)), namely the binary logistic regression model [98] as the particular case of multinomial logistic regression [99]. where: • p is the probability of being satisfied with the job; • (1 − p) is the probability of not being satisfied with it; • p/(1 − p) represents the odds of being satisfied with the job; • i = 2,.., n and n is the total number of independent variables; • ß 0 is the bias (intercept) term; • ß i measures the effect of a change in variable Xi on the probability of being satisfied with the job; • X i is one explanatory variable from the array ( ) of features selected after using LASSO; • ε represents the error term.
The entire statistical analysis, including all regressions and post-estimations, was performed using the Stata 16.0 64 bit MP.
The binary logistic regressions served both as support for the robustness checks of the dual-core and the confirmation of particular regional influences. After performing these regressions, we filtered the remaining influences, resulting after applying LASSO, with respect to: p values, as the results of comparing errors and coefficients; VIF (Variance Inflation Factor) ones, by considering the lower, the better and proofs of collinearity for values larger than 10 [100]; results of goodness-of-fit (GOF) tests [101], by taking into account the higher, the better both for p values (to reject the null hypothesis) and chi square; values for the area under the curve of receiver operating characteristic, known as AUCROC, AUROC or, shortly, ROC [102], by taking into account the higher, the better in terms of accuracy of classification for a scenario/model; R-square values, by applying the same simple rule, as in the case of AUCROC type.
For these three regions, we have also generated Zlotnik probability prediction nomograms [103], starting from the results when applying the binary logistic regressions-Equation (1). Moreover, we have tested the previous regional results using modified Poisson regressions for the binary data [104] with robust error variance. Such regressions support interpretations directly in terms of relative risks and not odds ratios treated as relative risks which, except for rare phenomena, may lead to undesirable exaggerations [105].
Additionally (third round DM), we have confirmed the results obtained using Microsoft SSAS (the core influences) and Stata (the core influences and the additional ones for all three regional models) also by involving many techniques in WEKA (Waikato Environment for Knowledge Analysis), version 3.8.3 x64.
The descriptive statistics, containing the list of the variables selected for this study, are available in Table A2, Appendix A. More details and explanations about these variables are available in Table A1, Appendix A. The three study sites (MB, TBCM, and OMD- Table A2, Appendix A) reveal, from the very beginning, noticeable differences in terms of average intensity of the primary outcome and several possible predictors assumed to be the most related with the phenomenon. Next, for reasons of explanatory compression, we list the variables that are aligned to a certain ranking, involving the highest corresponding values. Hence, when considering the highest values for TBCM, followed by OMD and MB, a few variables aligned to this ranking are: job satisfaction, receiving recognition for work done, a good atmosphere in the workplace, the proper treatment of the employee, appropriate support received in difficult situations at work, the opportunity to develop new skills, the number of jobs held, retirement after the first job, stress resistance, interpersonal trust, the quality of being sociable and the disinclination to activity (laziness). When considering the highest values for OMD, followed by TBCM and MB, some variables, with respect to this ranking, are: the consideration that salary should be in line with one's own efforts and achievements, the protective measures ensured by legal authorities and urbanity. When taking into account the highest ones for OMD, followed by MB and TBCM, life satisfaction is the first one that is subject to this ranking, followed then by marital status and the variable related to high school graduation. The financial well-being of the family, as well as the feeling of loneliness, recorded the highest values for MB, followed by OMD and TBCM.

Results and Discussion
In terms of merging the original vertically divided data subsets, the first method (based on SQL queries, followed by exports) was apparently the handiest one, except for the limitation of 255 columns per table. This was the reason of finally choosing 1:1 merge statements in Stata 16. For cleaning the data and performing additional derivations, in most cases, the spreadsheets with their immediate visual feedback and insight, powerful built-in functions (e.g., UNIQUE in Google Sheets to check uniqueness of values for a field, cascade calls of IFs to find the region of the last residence, COUNTA to get the number of non-blank distinct jobs) and user-defined ones, customisable filters using the CONTAINS option for identifying particular text patterns, and fast auto-fill and split into columns (text to columns) facilities, seemed enough to perform manual cleaning tasks.
First, we filtered the resulting dataset, corresponding to Wave 7 (2017), by considering the original field W7-ac_country set with the value of "Romania", for which we obtained 2144 unique records. Then, we set the variable dn003_ (birth year) on less than or equal to 1967, in order to indicate only those responses of Romanian people aged fifty or over (2056 records).
Next, in order to identify the specific subsets corresponding to those three large regions in terms of last declared residence, we started from those 2056 filtered records above and used a cascade of IFs to generate the last non-blank residence, by considering only the first 17 related fields (from ra025c_1 to ra025c_17) from a total of 30 and excluding the last 13 (due to containing only blank values): e.g., =IF(SH2="",IF(SG2="",..,IF(RS2="",IF(RR2="","",RR2),RS2),..,SG2),SH2). Thus, we have obtained 2052 records corresponding to 330 observations for MB, 793 observations for TBCM and 929 for OMD. The difference to the previous filtered amount of 2056 records consisted in only four observations with unspecified last residence, which therefore have been dropped.
After the first round DM in SSAS, we successfully extracted 18 variables (Table A1, Appendix A and Figure 1) considered to be possible important predictors for our chosen outcome, namely job satisfaction. The slider used to adjust the strongest links level and show the corresponding results was captured and included in Figure 1. As seen, the algorithm based on naive Bayes has automatically identified especially the variables from the same category to which the outcome belongs to (questions on work quality-wq), as the most important predictors of job satisfaction. Moreover, we have tested the one related to our first hypothesis (H1) using DMX queries on persistent DM models trained differently (100% in Figure 1, 85% and 75% of all data). The DMX query's result in Figure A1, Appendix B indicates a powerful influence of the variable associated with a question asking about the recognition of work done (Table A1, Appendix A), meaning that the individuals who have received recognition considered as deserved for the effort done are more satisfied with their own job. This variable's overwhelming influence on job satisfaction is explainable when admitting the holistic nature of the underlying question. Still, we cannot state that our first hypothesis, concerning the deserved recognition for the work done, is validated until performing further statistical analysis.
Appl. Sci. 2020, Special Issue, x FOR PEER REVIEW 9 of 33 After the first round DM in SSAS, we successfully extracted 18 variables (Table A1, Appendix A and Figure 1) considered to be possible important predictors for our chosen outcome, namely job satisfaction. The slider used to adjust the strongest links level and show the corresponding results was captured and included in Figure 1. As seen, the algorithm based on naive Bayes has automatically identified especially the variables from the same category to which the outcome belongs to (questions on work quality-wq), as the most important predictors of job satisfaction. Moreover, we have tested the one related to our first hypothesis (H1) using DMX queries on persistent DM models trained differently (100% in Figure 1, 85% and 75% of all data). The DMX query's result in Figure A1, Appendix B indicates a powerful influence of the variable associated with a question asking about the recognition of work done (Table A1, Appendix A), meaning that the individuals who have received recognition considered as deserved for the effort done are more satisfied with their own job. This variable's overwhelming influence on job satisfaction is explainable when admitting the holistic nature of the underlying question. Still, we cannot state that our first hypothesis, concerning the deserved recognition for the work done, is validated until performing further statistical analysis. As observed in Figure A1, Appendix B, the probability of job satisfaction (99.075%, 'Agree' or 'Strongly agree' as equivalent with wq727bin34=1) indicates high chances for a respondent to be satisfied with his/her job when he/she already received the full recognition of work done (the condition 'Strongly agree' for the variable wq009_ introduced using the NATURAL PREDICTION JOIN clause in a SQL DMX query- Figure A1). When checking the source fields involved in this As observed in Figure A1, Appendix B, the probability of job satisfaction (99.075%, 'Agree' or 'Strongly agree' as equivalent with wq727bin34 = 1) indicates high chances for a respondent to be satisfied with his/her job when he/she already received the full recognition of work done (the condition 'Strongly agree' for the variable wq009_ introduced using the NATURAL PREDICTION JOIN clause in a SQL DMX query- Figure A1). When checking the source fields involved in this query above, we found enough support, namely 362 records (17.64% from the considered dataset of 2052 observations) which satisfy the input condition ('Strongly agree' for wq009_) and, next, after filtering forward on the outcome's state of being satisfied with their job (wq727bin34 = 1), we identified 359 from those 362 above.
Further to this, we did it in a similar way with SSAS in order to identify the most important influences when considering the other outcome assumed to be related to job satisfaction, namely life satisfaction [77]. This time, we selected 16 other distinct variables (Table A1, Appendix A) considered to be possible significant predictors for life satisfaction and, hence, we extended the initial list to 52 (including a particular set of additional 18 items of interest related, but not directly resulting from the first round DM and excluding 5 items for filtering purposes). They have been considered for further processing and coding and then they served as input for the next variable selection procedures and statistical analysis (Tables 1 and 2).
The renaming, codification and treatment of values for the variables (e.g., from text to numerical scale and dummy) identified as possible predictors in the previous step consumed, even so, a lot of time and, therefore, we excluded from the very beginning the scenario of treating the entire initial dataset (more than 1700 fields) with no pre-selection (naive Bayes classification).
Next, in Stata 16, we considered the results obtained after applying three types of LASSO, namely CV LASSO, R LASSO and LASSO 2 in order to identify the dual-core together with other particular and powerful influences.
After that, we performed binary logistic regressions, preserving only those influences satisfying selection rules depending on: significance (p), VIF, GOF, ROC and R-sq values and reported the coefficients only for marginal effects for ensuring support for comparability when the magnitude concerned both intra-and inter-scenarios/models.   A consistent part of those regressions were meant for robustness checks, considering both forms of the outcome (full job satisfaction: compl_satisf_with_my_job and full and partial job satisfaction: satisf_with_my_job), three criteria regarding the respondent type (Table 1, scenarios from a to f) and one for its residence as geographical dimension ( Table 2, scenarios from a to d).
These robustness checks were meant to confirm not only the powerful dual-core of the models (both as significance and magnitude), but also the order of importance of those two composing influences (the first one is associated with colleag_good_atmosph and the second corresponds to dsv_recog4my_work).
These two influences are strong and reliable in all scenarios (Table 1, a to f) we have tested and for both derivations, starting from the original scale of the outcome, namely full job satisfaction (the peak value of 4 from the original 0-4 scale) and full and partial job satisfaction (the values of 3 and 4 from the same scale above). The same scenario was followed for the data subsets, depending on gender, marital status, high school education, belonging to the three Romanian historical regions and for the overall model in the case of the entire dataset. In both cases that correspond to full job satisfaction (compl_satisf_with_my_job) and full and partial job satisfaction (satisf_with_my_job), the two core variables (Table 1), namely a good atmosphere in the workplace and receiving the recognition deserved for one's own work, count as the magnitude of their influence on the outcome in this very particular order. When we analyzed the outcome corresponding to satisf_with_my_job, the results confirmed even better (the same significance of 1% , but with better ROC and Pseudo R-square values) the robustness of the dual-core set above (dsv_recog4my_work and colleag_good_atmosph). Moreover, in the case of the male respondents and for married ones, the influence exerted by dsv_recog4my_work on the outcome in both its forms compl_satisf_with_my_job and satisf_with_my_job is stronger (Table 1, scenarios a, b, e and f), meaning that receiving recognition for work done counts more in terms of the impact on job satisfaction for the married people than for unmarried ones and much more for males rather than for females.
The overall scenario (all 2052 distinct responses in Table 2) also confirms the structure of this powerful and significant dual-core and the order of importance already identified when we analyzed the regional subsets (MB, TBCM and OMD) and the six derived after we filtered on different types of respondents (Table 1). The accuracy of classification when we considered only the dual-core in all ten scenarios in Tables 1 and 2 is remarkably high (excellent or good to excellent-ROC over or near the threshold of 0.9) and the maximum VIF is well below 10 (value less than 6.75), indicating good chances to identify some other particular influences. The values of the pseudo R-square also indicate a good explanatory power, even when we took into account only the dual-core in all ten cases above.
Consequently, these very promising results encouraged us to try more in terms of exploration and identify specific patterns for each of those three aforementioned historical regions.
Therefore, another consistent part of the binary logistic regressions served for validating the common and particular influences, as indicated when using the LASSO variable selection procedure for the regional subsets (Table 3, scenarios from a to c). For reasons concerning better comparability inside and across the regional models built using both types of binary logistic and Poisson regressions, we have not reported the corresponding raw coefficients, but only the average marginal effects for both. Source: Own calculations in Stata 16. Note: Standard errors in parentheses. *, **, ***, **** indicate significance at 10%, 5%, 1%, and 1% , respectively.
The results above ( Table 3) clearly indicate that for the respondents from the second region (TBCM), the fact of receiving the deserved recognition of one's own work counts the most (almost double as magnitude) when compared with the other two regions, when we took into consideration one's state of being fully satisfied with their job as the outcome. We expected this in light of our previous research results. These results emphasized that people from this region, mostly geographically overlapped with Transylvania under the former Habsburg occupation, seem to have different patterns, at least when it comes to migration intentions [106][107][108], and the attitude against immoral issues in the case of young people and job satisfaction for the ones with considerable work experience (in this particular study).
The additional results in Table 3 (the most comprehensive scenarios of our regional models) clearly indicate another common point corresponding to the level of financial well-being of the family (fam_fin). We have made additional tests for the latter, together with the dual-core, for robustness checks (all three criteria indicated in Table 1), using the binary logistic model, and fam_fin failed in all six resulting scenarios, but only in the case when we considered full and partial job satisfaction (satisf_with_my_job, 3 or 4 from the 0-4 scale) as the outcome. For the other form of the primary outcome, namely full job satisfaction (compl_satisf_with_my_job, value of 4 from the 0 to 4 scale) also considered when presenting the regional models in Table 3, the influence of the variable fam_fin passed all checks (at least ** as significance), when considered together with the dual-core. Moreover, the results above (Table 3) emphasized some peculiarities of these three regional models, namely the highly significant negative influence of laziness and the significant positive one associated with the total number of jobs for the respondents from MB; the positive ones corresponding to optimism and a perspective with a lot of opportunities and also to the urbanity level in the case of OMD (the Romanian capital, Bucharest, belongs to this region); finally, the positive ones of openness and stable career, being retired after the first job (ret_work_aft_1 st _job) for the respondents from TBCM. In addition, the influence of laziness seems to behave in opposition when considering TBCM (positive influence) and MB (negative influence on full job satisfaction, as the chosen outcome). Moreover, TBCM is the only region in which the influence of loneliness on the outcome appears and shows a negative sign.
The use of Zlotnik probability prediction nomograms ensured more visual support for comparability and served for interpretations directly in probabilistic terms, including the automatic computation of the maximum probability thresholds for the most advantageous combination of values for predictors. After dragging imaginary perpendiculars intersecting the first X axis (Score) and summing up the scores from the nomogram above (Figure 2, the regional model of Moldova and Bucovina) for the seven identified influences in the most optimistic combination, we obtained a total (total score axis) of almost 26.2, as the sum of 2.8, 10, 2, 1.7, 3.3, 1.4 and 5. The latter value corresponds to a high probability of full job satisfaction (more than 99%, as automatically indicated under the form of a maximum percentage threshold on the right of Figure 2) and is derived based on a scenario with a very good accuracy of classification (value for the area under the ROC curve = 0.9173) and explanatory power (Pseudo R-square = 0.45, Table 3, scenario a).
Appl. Sci. 2020, Special Issue, x FOR PEER REVIEW 14 of 33 when considering TBCM (positive influence) and MB (negative influence on full job satisfaction, as the chosen outcome). Moreover, TBCM is the only region in which the influence of loneliness on the outcome appears and shows a negative sign. The use of Zlotnik probability prediction nomograms ensured more visual support for comparability and served for interpretations directly in probabilistic terms, including the automatic computation of the maximum probability thresholds for the most advantageous combination of values for predictors. After dragging imaginary perpendiculars intersecting the first X axis (Score) and summing up the scores from the nomogram above ( Figure 2, the regional model of Moldova and Bucovina) for the seven identified influences in the most optimistic combination, we obtained a total (total score axis) of almost 26.2, as the sum of 2.8, 10, 2, 1.7, 3.3, 1.4 and 5. The latter value corresponds to a high probability of full job satisfaction (more than 99%, as automatically indicated under the form of a maximum percentage threshold on the right of Figure 2) and is derived based on a scenario with a very good accuracy of classification (value for the area under the ROC curve = 0.9173) and explanatory power (Pseudo R-square = 0.45, Table 3, scenario a).

Figure 2.
Full job satisfaction prediction nomogram for a regional model (Table 3, scenario a), corresponding to the first Romanian historical region being analyzed (MB).

Figure 2.
Full job satisfaction prediction nomogram for a regional model (Table 3, scenario a), corresponding to the first Romanian historical region being analyzed (MB).
In a similar way, the maximum resulting probability of full job satisfaction (Figure 3) for the regional model of Transylvania, Banat, Crisana and Maramures was still high (much more than 95%, corresponding to a total score of almost 22, as a sum of 4.1, 10, 1.2, 2.7, 1.7, 1.2 and 1.1 for the seven influences above). Moreover, this probability is derived based on a scenario with a very good accuracy of classification (ROC = 0.905) and explaining power (Pseudo R-square = 0.4327, Table 3, scenario b).
Appl. Sci. 2020, Special Issue, x FOR PEER REVIEW 15 of 33 Figure 3. Full job satisfaction prediction nomogram for a regional model ( Table 3, scenario b), corresponding to the second Romanian historical region being analyzed (TBCM).
In a similar way, the maximum resulting probability of full job satisfaction (Figure 3) for the regional model of Transylvania, Banat, Crisana and Maramures was still high (much more than 95%, corresponding to a total score of almost 22, as a sum of 4.1, 10, 1.2, 2.7, 1.7, 1.2 and 1.1 for the seven influences above). Moreover, this probability is derived based on a scenario with a very good accuracy of classification (ROC = 0.905) and explaining power (Pseudo R-square = 0.4327, Table 3, scenario b). After summing up the scores from the nomogram (Figure 4, the regional model of Oltenia, Muntenia and Dobrogea), including the seven significant influences, we obtained a total score of more than 17.5 (the sum of 2.25, 10, 2, 1.25, 1.2 and 0.8). This score corresponds to a high probability of full job satisfaction (more than 95%) and is derived based on a scenario with only six influences and a very good accuracy of classification (ROC value = 0.9157) and explanatory power too (Pseudo R-square = 0.4553, Table 3, scenario c).
We consider that the familial financial well-being could be an interesting predictor of job satisfaction. We believe that a better off economic situation of a family may exert a positive influence on career satisfaction due to increased psychological comfort, greater confidence in personal abilities and aptitudes and lower fear related to career uncertainties and evolution.
Appl. Sci. 2020, Special Issue, x FOR PEER REVIEW 16 of 33 Figure 4. Full job satisfaction prediction nomogram for a regional model (Table 3, scenario c), corresponding to the third Romanian historical region being analyzed (OMD).
After summing up the scores from the nomogram ( Figure 4, the regional model of Oltenia, Muntenia and Dobrogea), including the seven significant influences, we obtained a total score of more than 17.5 (the sum of 2.25, 10, 2, 1.25, 1.2 and 0.8). This score corresponds to a high probability of full job satisfaction (more than 95%) and is derived based on a scenario with only six influences and a very good accuracy of classification (ROC value = 0.9157) and explanatory power too (Pseudo R-square = 0.4553, Table 3, scenario c).
We consider that the familial financial well-being could be an interesting predictor of job satisfaction. We believe that a better off economic situation of a family may exert a positive influence on career satisfaction due to increased psychological comfort, greater confidence in personal abilities and aptitudes and lower fear related to career uncertainties and evolution.
All three nomograms (Figures 2-4) and the influences expressed in terms of marginal effects (Tables 1-3) indicate the prevalence of the influence associated with colleag_good_atmosph, when we analyzed full job satisfaction. The one related to the idea to receive the deserved recognition for one's own work (dsv_recog4my_work) came second in the most scenarios, when we considered stable influences. This means that the first hypothesis (H1) is partially validated because the recognition of work done represents just a part of the dual-core but not the most important one. Furthermore, the influence corresponding to urbanity, which is positively correlated with job satisfaction (Figure 4 and Table 3, scenarios c and f), explicitly contradicts the second hypothesis (H2) for the particular region of OMD. In addition, the positive influence associated with the respondent's number of jobs held ( Figure 2 and Table 3, scenarios a and d) partially confirmed the third hypothesis (H3) for the specific region of MB.
For all three historical regions, both the Deviance and Pearson Goodness of Fit (GOF) tests contradicted the null hypothesis, stating that the models do not fit (p values are higher than 0.05, Figure 4. Full job satisfaction prediction nomogram for a regional model (Table 3, scenario c), corresponding to the third Romanian historical region being analyzed (OMD).
All three nomograms (Figures 2-4) and the influences expressed in terms of marginal effects (Tables 1-3) indicate the prevalence of the influence associated with colleag_good_atmosph, when we analyzed full job satisfaction. The one related to the idea to receive the deserved recognition for one's own work (dsv_recog4my_work) came second in the most scenarios, when we considered stable influences. This means that the first hypothesis (H1) is partially validated because the recognition of work done represents just a part of the dual-core but not the most important one. Furthermore, the influence corresponding to urbanity, which is positively correlated with job satisfaction (Figure 4 and Table 3, scenarios c and f), explicitly contradicts the second hypothesis (H2) for the particular region of OMD. In addition, the positive influence associated with the respondent's number of jobs held ( Figure 2 and Table 3, scenarios a and d) partially confirmed the third hypothesis (H3) for the specific region of MB.
For all three historical regions, both the Deviance and Pearson Goodness of Fit (GOF) tests contradicted the null hypothesis, stating that the models do not fit (p values are higher than 0.05, namely between 0.98 and 1) in all the cases corresponding to the most comprehensive scenarios (Table 3). We did not perform yet the skewness tests of variables in the case of these regional datasets. Still, asymmetry was not a matter of concern for the corresponding models, since both the binary logistic regression [109] and the Poisson one with robust standard errors as a generalized linear model form of regression analysis [110] are proven to be robust when using non-normal data.
In terms of third round DM, most of the techniques available in WEKA confirmed both the dual-core variables and the particular influences for those three regional models. Here, we have started from the same data source with recoded fields used in Stata, and we have involved the Correlation and ReliefF Attribute Evaluator, by using the Ranker Search Method, the Cfs Subset Evaluation, by employing the Greedy Stepwise Search one and the Wrapper Subset Evaluation, by using the Best First Search Method, all implying both the options of feeding with the full training set or using cross-validation, a case in which we have chosen the default option with ten folds and one seed.
We have also performed further tests (Tables 4 and 5) for the two most powerful influences identified above (the dual-core) which we determine are so strong because they correspond to questions of a holistic nature. Table 4. Testing the strongest influences (dual-core) for a European model of predicting job satisfaction in terms of average marginal effects by following the same balanced approach of treating missing values (0 to 4 scale, where 2 is an additional distinct level for missing, undecided answers and unwillingness to respond). Source: Own calculations in Stata 16. Note: Standard errors in parentheses. **** indicates significance at 1% . Table 5. Testing the strongest influences (dual-core) for a European model of predicting job satisfaction in terms of average marginal effects by following a pessimistic and unbalanced approach of treating missing values (0 to 3 scale, where 0, which indicates a strong disagree, is also used for missing, undecided answers and unwillingness to respond). To do these tests, we considered the case with all respondents from most of the European countries included in the Wave 7, as provided by SHARE-ERIC (75,674 European individuals aged 50 and over from the entire 76,520 records dataset). At this level, we had an outstanding positive surprise due to these two influences corresponding to the Maslow's pyramid of hierarchical levels of needs mostly for belonging and esteem (the third and fourth upper layers) identified in this paper by triangulating [111] and cross-validating, starting from a much lower level (Romania, more than 2000 records). Under these circumstances, these top two predictors suggested by the three nomograms (Figures 2-4) have additionally confirmed the first hypothesis (H1) and the last two (H4 and H5) at this higher level, and also their previously identified order of importance (the first is the atmosphere at the workplace and the second is receiving recognition for work done). All these previous findings were done under the conditions of using the same approach of assimilating the missing, undecided and unwilling state of answers to a middle extra level (the value of 2 from 0 to 4 scale) in the resulting scale (Table 4) and analysing both full and partial job satisfaction and only full job satisfaction. The maximum VIF of 8.25 still leaves room for exploring and adding new specific influences on this dual-core foundation.
At this macro level, after switching to a pessimistic and unbalanced way of treating missing data and unclear responses, we recomputed the results for the dual-core (Table 5) and compared them with the ones corresponding to the balanced approach applied before (Table 4) and, also in the entire paper. We found that, although using this improper treatment indicated in Table 5, the dual-core still persists, but the ratio between the magnitudes of its components visibly changes in favour of the variable corresponding to receiving recognition of work done. Moreover, the values of ROC and R-squared increase and, as expected, the values of the maximum VIF decrease in all scenarios involving first, second and both parts of the dual-core. These additional tests in Table 5 emphasize the general impact of the way of pre-processing and treating missing data on the resulting predictive models. Moreover, they underline the strength of the already identified dual-core part, as the common set of the strongest influences on job satisfaction, for this particular region and this well-defined target group.
In our future research, we intend to explore the peculiarities of other European countries, using the same approaches, methods and tools described in this paper. We consider the datasets provided by SHARE-ERIC to be very promising in terms of potential to support the exploration of many other patterns, when trying to identify reliable predictors of not just job or life satisfaction.

Conclusions
The careful treatment of data, the principle of support for replication of the results through transparent approaches and methods, the triangulation principle in terms of the many approaches, techniques and tools used as a basic principle in research, together with the golden rule of cross-validation, as a reliable way of testing for data randomly and non-randomly divided in subsets, all these allowed us to better explore this complex phenomenon. We also extended the conclusions regarding the first two most powerful predictors (which composed the already identified dual-core of the Romanian regional models) to a higher level, namely for the individuals aged 50 and over from most of the European countries. In addition, in this large-scale example, we have demonstrated that different ways of treating missing data may lead to major differences in terms of the magnitude of influences, collinearity, accuracy of classification and explanatory power of the resulting models.
We found that a good atmosphere in the workplace, because of a healthy relationship with colleagues, together with the desire to receive the recognition deserved for work done, are the most important predictors of job satisfaction in the case of Romanians aged 50 and over. We revealed these findings when we analyzed both full and/or partial job satisfaction. We have also identified some peculiarities for the three Romanian historical regions analyzed and found that, in all cases, the atmosphere in the workplace seems decisive and counts more than the deserved recognition for the work done there. Moreover, this type of recognition has a more powerful influence on job satisfaction in the case of individuals who graduated at least high school, for married persons and for male respondents.  Table A1. Questionnaire items mostly selected after the first round mining, then renamed and processed (the extended set).

Appendix B
Appl. Sci. 2020, Special Issue, x FOR PEER REVIEW 9 of 33 Appendix B Figure A1. Example of a SQL DMX query to explore a naive Bayes based DM model in the first round mining. Figure A1. Example of a SQL DMX query to explore a naive Bayes based DM model in the first round mining.