Child Safety Assessment: Do Instrument-Based Decisions Concur with Decisions of Expert Panels?

: To make decisions on children’s immediate safety, child welfare agencies have been using safety assessment instruments for decades. However, very little research on the quality of these instruments has been conducted. This study is the ﬁrst to inspect the concurrent validity of a child safety assessment instrument by comparing its outcomes to a different measure of immediate child safety. It was examined to what extent decisions of practitioners using a safety assessment instrument concur with decisions of child maltreatment expert panels. A total of 26 experts on immediate child safety participated in 7 expert panels, in which the safety of children as described in 24 vignettes was discussed. Additionally, 74 practitioners rated the same vignettes using the ARIJ safety assessment instrument. The instrument-based safety decisions of practitioners concurred for a small majority with the safety decisions reached by the expert panels (58% agreement). Expert panels often identiﬁed more types of immediate safety threats than practitioners using the instrument; however, the latter group more often deemed the child to be in immediate danger than the ﬁrst group. These ﬁndings provide indications on how the instrument can be improved and give insight into how immediate safety decisions are made.


Introduction
Child welfare professionals frequently make crucial decisions on the safety of children in the families they supervise. For example, a professional needs to determine whether or not a child needs to be protected immediately, and if so, how the child can be protected. If a child is in immediate danger, it can be safeguarded in different ways, for example, by an in-home safety intervention, an out-of-home safety intervention, or placement in residential care. To make these safety decisions, child welfare agencies have been using safety assessment instruments for over three decades (DePanfilis and Scannapieco 1994). However, very little research on the quality of these instruments has been conducted, and instruments are most often only practice-based (Vial et al. 2020).
In an attempt to fill the gap in research on child safety assessment instruments and to develop an evidence-based as well as a practice-based instrument, we extensively examined the quality of a widely used Dutch safety assessment instrument (the ARIJ safety assessment instrument; Actuarial Risk Assessment Instrument Youth Protection; Van der Put et al. 2016). Studies on the reliability, content validity, and usability of the ARIJ safety assessment instrument have already been conducted (Vial et al. 2019a(Vial et al. , 2019b. Complementary to these studies, the current study was the first to thoroughly examine the concurrent validity of a safety assessment instrument by comparing its outcomes to a different measure of immediate child safety. Child welfare decision-making tools often comprise a safety assessment instrument and a risk assessment instrument. Safety assessment instruments help professionals to determine the child's immediate safety. In other words, these instruments help professionals to determine whether a child has recently been harmed, if it is being harmed right now, or if it may be harmed in the immediate future (Hughes and Rycus 2006;Knoke and Trocme 2005). Immediate is often defined as within 24 to 72 h of the assessment (Vial et al. 2020). If a child is deemed to be in immediate danger, immediate measures need to be taken to safeguard the child. Risk assessment instruments help professionals to assess the risk for future child maltreatment, so that those children and families with a substantial risk for child maltreatment can be identified and this risk can be lowered by offering the caregivers treatment for the identified risk factors. These two assessment types are often mixed up and sometimes used interchangeably (Hughes and Rycus 2006). However, distinguishing safety assessment from risk assessment is important, since they serve different purposes that require different approaches.
In a recent literature review, the immediate safety aspects measured in internationally used safety assessment instruments were compared (Vial et al. 2020). This review revealed several immediate safety threats that are generally measured with these instruments, such as sexual abuse, neglect, physical abuse, domestic violence, refusing access to the child by caregivers, a caregiver's substance abuse, and behaving toward the child in a predominantly negative way. These aspects are measured with the majority of the instruments, which supports their content validity. However, the quality of most of the included instruments has not been studied and should be examined first before we can draw inferences on the validity of these immediate safety threats.
The following studies on safety assessment instruments have been conducted. A focus group study examined the usability of a South African safety assessment instrument and reported positive first experiences of practitioners working with this particular instrument (Spies et al. 2015). The participants indicated that the instrument supported their decision making, gave direction to the substantiation of their child welfare decisions, empowered them as a professional, and enhanced their report writing. Another qualitative study examined the usability and content validity of the ARIJ safety assessment instrument (Vial et al. 2019a). Professionals generally considered the instrument to be useful, but they also provided recommendations for improvement. For instance, the wording of the (potential) outcomes of the instrument could be clarified. The professionals also indicated that several immediate safety threats were missing in the instrument and specifically mentioned emotional abuse, harm to the child inflicted by individuals from whom caregivers are unable or unwilling to protect the child, a caregiver's psychiatric disorder that poses an immediate threat to the child, and a child's psychiatric problems that pose an immediate threat to themselves. It was concluded that the content validity of the safety assessment instrument could be improved by adding these immediate safety threats to the instrument.
Three other studies have focused on the reliability of different safety assessment instruments. A Dutch safety assessment instrument (LIRIK) showed a low to fair interrater reliability of the individual items, and moderate interrater reliability of the overall safety outcome (Bartelink et al. 2017). Additionally, Orsi et al. (2014) studied the interrater reliability of the items of multiple American safety assessment instruments. The interrater reliability of the items varied largely from a low to substantial reliability. Further, the reliability of the ARIJ safety assessment instrument has been studied and was found to be moderate to high (Vial et al. 2019b).
Other studies have focused on the criterion validity of safety assessment instruments, in particular their predictive validity (Bartelink et al. 2017;Wells 1998, 2003;Fuller et al. 2001;Wells and Correia 2012) and concurrent validity (Baird and Rycus 2004;Johnson 2004). However, these studies did not provide the information that is needed to draw conclusions on the quality of these instruments, as safety assessments were compared to measures of child safety in the future, such as child maltreatment recurrence, re-entry into out-of-home care, and risk assessments. Although these studies gave some indication that safety assessment outcomes predicted (future) child safety, they did not provide information on how well these instruments assessed immediate child safety.
Safety assessment instruments assess immediate child safety and should therefore be compared with other measures of a child's immediate safety. As there are no safety assessment instruments available that have been studied thoroughly, we studied the concurrent validity of the ARIJ safety assessment instrument by comparing its outcomes with safety assessment outcomes produced by expert panels. In such panels, experts are presented with a vignette in which a child safety situation is described, and they are asked to reach consensus on the immediate safety of the child described in the vignette.
Three reasons can be put forward as to why safety assessments performed by expert panels can be an appropriate measure with which to compare ARIJ safety assessments. First, individual professionals are often advised to make decisions on a child's safety in collaboration with a colleague, supervisor, or their (multidisciplinary) team rather than making decisions on their own. It is therefore not uncommon to discuss a child's immediate safety with other professionals, which resembles experts reaching consensus in a panel. Second, the experts in these panels were expected to thoroughly discuss each vignette, which should result in a comprehensive argument as to why the child is considered to be safe or in immediate danger. All experts in a panel have to agree on the final decision, which encourages discussion between the experts. Third, researchers in different fields also use group decision methods to come to better decisions (e.g., Grofman et al. 1983;Schulz-Hardt et al. 2006).
The current study is an important contribution to studies on safety assessment instruments, as it is the first to study the concurrent validity of a safety assessment instrument by comparing it to another measure of immediate child safety. Additionally, it provides information on the quality of the ARIJ safety assessment instrument and the decisions made with this instrument, which is essential given the great impact these decisions have on the lives of children. In studying the concurrent validity, we not only examined the validity of the immediate safety outcomes, but also the validity of the individual immediate safety threats that are measured with the ARIJ safety assessment instrument. Thus, the aim of this study was to examine the extent to which decisions of individual practitioners using the ARIJ safety assessment instrument concur with the decisions of child maltreatment expert panels, which do not use an instrument, on immediate child safety. First, we compared the final safety decisions reached by practitioners using the ARIJ safety assessment instrument with the safety decisions reached by the expert panels. Second, the immediate safety threats identified by the practitioners using the ARIJ were compared with the immediate safety threats identified by the expert panels. As the expert panels did not identify the immediate safety threats in a structured manner, we used qualitative analyses to determine what immediate safety threats were identified by the expert panels.

Participants
Twenty-six experts on immediate child safety (21 women, 5 men; Mage = 41 years, SD = 10) participated in seven expert panels. They were (child) psychologists or (child) social workers who worked at different agencies that provide child protection services, child and family support services, hotline (i.e., crisis) services, and community outreach services. On average, they conducted 7.6 child safety assessments each week (SD = 8.4, range: 0-35) and had 16 years of experience in youth services (SD = 9.4, range: 1.5-40).
Additionally, a total of 74 practitioners rated the vignettes using the ARIJ safety assessment instrument. These practitioners worked at a child and family support agency or a child protection agency. A description of these participants as well as more information on the ARIJ safety assessments can be found in Vial et al. (2019b).

Procedure
We used expert sampling to recruit participants for the expert panels, which is a purposive sampling method (Etikan et al. 2016). Participants were recruited by contacting both child welfare services and professionals in the social network of the authors of this study (for example, through social media). Professionals could only participate if assessing immediate child safety was an important aspect of their daily work, either because they conduct safety assessments themselves or because they supervise others conducting safety assessments. Our goal was to include four participants in six different expert panels and to recruit professionals that work at child protection services, child and family support services, hotline/crisis services, and community outreach services to obtain interdisciplinary assessments in each panel. Each panel assessed four different vignettes. During the study, a few experts cancelled the panel meetings, and therefore, one panel was split in two panels of two experts who held separate meetings.
In total, 26 experts on immediate child safety participated in the expert panels. Of the 24 vignettes, 20 vignettes were assessed by 1 expert panel and 4 vignettes were assessed by 2 expert panels (the split panel). All the experts were asked to assess 4 vignettes individually, and to return their assessments before the expert panels were formed. Of the experts, 73% indicated that they normally use an instrument or a structured method to assess a child's immediate safety. However, they indicated that they did not actively apply this method to assess the vignettes in this study.
Each panel had one meeting. In these meetings, the vignettes were discussed one by one, and after discussion of each vignette, the panel had to decide on the immediate safety of the child. Each panel meeting lasted no longer than 1.5 h and took place in a meeting room at the university where this study was conducted. All meetings were led by the first author of this manuscript. Audio recordings were collected with the experts' informed consent, and the experts received reimbursement as compensation for their time spent participating in this study. The ARIJ safety assessment instrument was developed to help professionals to determine whether a child is in immediate danger (Van der Put et al. 2016). The instrument consists of eight items that all describe a different immediate safety threat. A short description of the items of the ARIJ safety assessment instrument can be found in Appendix A. When an immediate safety threat is considered to be present, the child immediately needs to be safeguarded to prevent harm. Each of the items can be responded to with one of three categories: "Yes" (implying the threat described in the item is present), "No" (implying the threat described in the item is not present), and "Unknown" (implying there is insufficient information available at time of the assessment for a proper response). When at least one of the items is answered with "yes", the instrument concludes that the child is in immediate danger. For the purpose of this study, the response categories "No" and "Unknown" were combined into a single category. In practice, professionals are often required to make a decision on a child's immediate safety. If a professional decides it is unknown whether a safety threat is present or not, then no immediate safety measures will be taken at that time. Therefore, the safety conclusion of the instrument was "Safe" in cases where no immediate safety threats were deemed to be present in a vignette. Research has shown that the items and outcome of the ARIJ safety assessment instrument have a moderate to high interrater and intrarater reliability (Vial et al. 2019b).
The ARIJ safety assessment instrument is used for families and children that are already assigned to an agency, and both the ARIJ safety and risk assessments are performed as a part of the intake process. The ARIJ is not used as a gatekeeping assessment by agencies. The professionals who perform the assessment will also discuss the safety measures with the family, develop a safety plan, and perform further assessments to determine risk and family needs.

The Individual Expert Questionnaire
The questionnaire filled out by the experts in the panels started with a short explanation on the study procedures, after which eight questions followed on several characteristics of the expert, such as their work experience. Next, a definition of immediate child safety was given, which was followed by the subsequent presentation of four vignettes. For each vignette, experts were asked whether they thought the child as described in the vignette is safe or in immediate danger. The experts were asked to provide an explanation for their decision.

The Expert Panel
In the expert panels, the vignettes were discussed one by one. The experts had to agree on their final group decision on the child's immediate safety.

Vignettes
A total of 24 vignettes were assessed, of which half were based on real cases of child and family support services (Vignettes 1-12), and the other half were based on real cases of child protection services (Vignettes 13-24). A fictional English vignette, which is similar to the vignettes used in this study, can be found in Appendix B. All the Dutch vignettes are available upon request. The child and family support vignettes had been created and used in a previous study by Bartelink et al. (2017). The vignettes described a variety of family compositions, social backgrounds, cultural backgrounds, child maltreatment forms (physical, sexual, emotional abuse, and neglect), and maltreatment severity levels. More information on these vignettes can be found in Bartelink et al. (2017). The child protection services vignettes were also created for a previous study (Vial et al. 2019b) and describe a variety of immediate safety problems in families. The vignettes were reviewed by practitioners of the child protection agency to assure they were representative of cases in their daily practice. Since the child protection agency usually handles more cases of children in immediate danger than the child and family support services, the vignettes designed for the former have a higher prevalence of possible immediate safety threats. An example of a vignette similar to the vignettes that were used in this study as well as more information on these vignettes can be found in Vial et al. (2019b).

Data Analyses
First, we compared the safety decisions reached by practitioners using the ARIJ safety assessment instrument with the safety decisions reached by the expert panels by calculating the percentage agreement. The three individual measures of immediate safety were all compared with each other (i.e., ARIJ assessments vs. expert panels; ARIJ assessments vs. individual expert assessments; individual expert assessments vs. expert panels). The calculated percentages show how often the measures of immediate safety were the same. For the ARIJ assessments and the individual expert assessments, we looked at the decision that was reached by the majority of the ARIJ or expert assessments. Thus, if in 80% of the ARIJ assessments, the child was deemed to be safe, the overall safety decision of the ARIJ assessments was set at "safe" for that particular vignette.
Second, the immediate safety threats identified by the practitioners using the ARIJ were compared with the immediate safety threats identified by the expert panels. For the ARIJ assessments, the prevalence of the response category "yes" showed which immediate safety threats were identified as present in a vignette. For the expert panels, the transcripts of the panel discussions were analyzed qualitatively to determine what immediate safety threats were identified as present in a vignette by the panels. For each vignette, the immediate safety threats were coded. All vignettes were coded by two research assistants who were carefully instructed to identify the immediate safety threats (i.e., the reasons the experts decided that the child was in immediate danger). Next, the first author of this manuscript merged the codes made by the assistants, identifying the immediate safety threats for each vignette. This same procedure was followed to identify the reasons of the expert panel to identify the child as safe.
Last, the immediate safety threats identified in the individual expert assessments were compared to the threats identified in the other two immediate safety measures. To do this, all questionnaires were coded by two research assistants and subsequently merged by the first author of this manuscript to identify the immediate safety threats mentioned for each vignette. The software program ATLAS.ti version 8 (ATLAS.ti Scientific Software Development GmbH, Berlin, Germany) was used for all qualitative analyses. Table 1 shows the prevalence of the safety decisions for (1) the ARIJ assessments, (2) the individual expert assessments, and (3) the expert panel assessments. The children described in the vignettes were more often determined as being in immediate danger in the ARIJ assessments (69%) than in the expert panel assessments (52%) and individual expert assessments (56%). For 58% of the vignettes (n = 14), the majority of the ARIJ safety decisions concurred with the safety decision of the expert panels. Both assessment types led to the conclusion that the child was in immediate danger for 10 vignettes and that the child was safe for 4 vignettes. The percentage of the assessments that judged the child to be in immediate danger or safe for the ARIJ assessments, individual expert assessments, and expert panel assessments. SD = standard deviation. 1 Due to practical reasons, this expert panel was split into two different meetings with other experts.

Comparison of the Safety Decisions
In 29% of the vignettes (n = 7), the ARIJ safety decisions differed from the safety decision by the expert panels. For six of these vignettes, the child was supposed to be in immediate danger by the majority of the ARIJ safety assessments, whereas the expert panels deemed the children to be safe, and for one vignette, the expert panel deemed the child to be in immediate danger, whereas the majority of ARIJ assessments deemed the child to be safe. In 13% of the vignettes (n = 3), either the ARIJ safety assessments (n = 2) or the expert panels (n = 1) were inconclusive on the child's immediate safety (for practical reasons, one of the expert panels was split into two meetings; for one vignette, the safety decision differed between these two meetings).
For 83% of the vignettes (n = 20), the majority of the individual expert decisions concurred with the final expert panel's decision. For the remaining 17% of the vignettes (n = 4), the individual expert decisions were inconclusive, because half of the individual experts judged the child to be safe and the other half judged the child to be in immediate danger.

Comparison of Immediate Safety Threats
Appendix A presents the identified immediate safety threats and the reasons a child was identified as safe, separately for the expert panel, individual experts, and ARIJ assessments. If the child was deemed to be in immediate danger by the expert panel, then the immediate safety threats identified by the experts are presented. If the child was deemed safe by the expert panels, then the explanations of the expert as to why the child was deemed to be safe are presented.
First, we looked into the immediate safety threats identified for the vignettes in which both the expert panel and the majority of the ARIJ safety assessments identified the child as being in immediate danger (Vignettes 1,4,6,12,13,14,15,17,19,and 21). In these vignettes, the safety threats identified in the ARIJ assessments and by the expert panels were similar. However, for most vignettes, the expert panels described more different types of safety threats than the ARIJ assessments. These additional threats were often related to the child's behavior, the child's vulnerability, mental health problems of the caregivers, the availability of the caregivers, and other family members (e.g., a brother).
Only in one vignette (Vignette 18) did the expert panel identify the child as being in immediate danger, whereas the majority of the ARIJ safety assessment decisions indicated that the child was safe. The expert panel mainly identified safety threats related to the child: "Child makes and shares her own nude pictures", "Child runs away multiple nights at a time", "Child does not want help", "Child has contact with multiple men/boys", "Child has money and expensive clothes/objects", "Child uses substances", and "Parents are not able to protect her". The safety threats identified in the minority of the ARIJ safety assessments were "Physical abuse" (14%), "Sexual abuse" (14%), and "Parental availability" (14%).
In six vignettes (Vignettes 3,9,16,20,22,and 23) the majority of the ARIJ assessment decisions indicated that the child was in immediate danger, whereas the expert panels decided that the child was safe. For these vignettes, we describe briefly what the explanations of the experts were as to why they considered the children to be safe.
For Vignette 3, none of the identified safety threats in the ARIJ safety assessments were identified by the majority of the assessments. The most prevalent immediate safety threat in this vignette was "Parental availability" (46%). The expert panel mostly argued that the child was not in immediate danger due to factors related to the child's father: "Father wants to learn and seems able to learn", "Father asks for help", and "Father knows that change is necessary". Additionally, they explained that the child's grandfather was able to help the family, and that the child goes to school and a sports club.
For Vignette 9, the majority of the ARIJ safety assessments described "Domestic violence" (73%) as an immediate safety threat, whereas the expert panel reasoned that "Parents seem to manage fairly", and that "The child danger is chronic but not immediate".
Notably, four immediate safety threats were identified by the majority of the ARIJ assessments for Vignette 16: "Child abduction and honor-related violence" (100%), "Domestic violence" (100%), "Physical abuse" (67%), and "Parental availability" (67%). However, the expert panel described the child as being safe. Most of the reasons of the expert panel as to why the child was considered to be safe were related to the mother (e.g., "Mother can reflect on her own behavior" and "Mother recognizes her shortcomings, which caused danger to her child"). Additionally, they described the current living situation as protective: "Mother and child currently stay in a safety house".
The ARIJ assessments identified "Physical abuse" (89%) as an immediate safety threat for Vignette 20. In contrast, the expert panel reasoned that the child was not in immediate danger, because "The child has no injuries", "The child is 16 years old", and "The incident was not recent".
For Vignette 22, "Psychiatric problems" (75%) was identified as an immediate safety threat by the majority of the ARIJ assessments. The expert panel argued that the child was not in immediate danger because, "The problems are chronic, and not immediate", "The parents recognize the brother's disorder (which is harmful to the child)", "Their social network is involved", and "The parents want help for their own problems".
Half of the ARIJ assessments identified "Parental availability" (50%) as an immediate safety threat in Vignette 23. However, the expert panel considered the child to be safe, because "There is a social network available", "The unofficial foster parent indicated that the child is doing fine at her place", "Child still has a place to life", "Father is involved with the child", and because of "The child's age".
Finally, another interesting vignette is Vignette 24, as half of the ARIJ safety assessments indicated that the child was in immediate danger, whereas the other half of the ARIJ assessments identified the child as safe. The most prevalent identified immediate safety threat in the ARIJ assessments was "Parental availability" (40%), followed by "Physical abuse" (30%). The expert panel decided that the child was in immediate danger and identified the following immediate safety threats: "The child's grandfather hits mother and child", "Grandfather is unpredictable", "Grandfather has Alzheimer's disease", "The child has behavioral problems", "Child's behavioral problems increase the chance that grandfather hits him", and "Child assaulted someone".

Discussion
The safety decisions reached by practitioners with the ARIJ safety assessment instrument concur with the safety decisions reached by the expert panels for a small majority of the cases (58%). Thus, 42% of the safety decisions reached by the ARIJ assessments did not concur with the expert panel decisions. In these cases, the ARIJ safety assessments often deemed the child to be in immediate danger, whereas the expert panels deemed the child to be safe. The immediate safety threats identified across the two assessment types were often comparable. However, the expert panels often identified more types of immediate safety threats than the practitioners using the assessment instrument. In general, the following threats were added to the threats mentioned in the ARIJ: threats related to the child's behavior, the child's vulnerability, other family members (e.g., a brother), and mental health problems of the caregivers. These added safety aspects are also measured in most internationally used safety assessment instruments (Vial et al. 2020). Moreover, a previous study on the content validity of the ARIJ safety assessment instrument showed that these threats should be included in the instrument (Vial et al. 2019a). As these immediate safety threats are not measured in the ARIJ safety assessment instrument, it is important to improve the instrument by adding these threats.
Interestingly, the expert panels also mention immediate safety aspects that can often be classified as risk factors. Safety and risk assessment instruments often assess factors that describe very similar problematic behaviors of caregivers. However, they need to be assessed differently in these different assessment types. This, for instance, applies to substance abuse of caregivers. In a risk assessment, this factor should be assessed as present if a caregiver uses substances problematically. In a safety assessment, however, this factor must only be assessed as present if the caregiver's substance abuse causes an immediate safety threat to the child. The experts sometimes mention factors without explaining how they pose an immediate threat to the child. For example, for Vignette 1, the experts mentioned two factors as safety threats (i.e., "Mother suffered from child maltreatment as a child" and "Mother's boyfriend went to prison"), whereas the experts seemed to use these factors as indicators of the severity of problems in the family rather than safety threats.
Also noticeable is that experts weighed child characteristics in their assessments of the child's immediate safety, such as the child's age or how well the child is functioning. The experts reasoned in some cases (e.g., Vignette 23) that a child is not in immediate danger, as it was relatively old (e.g., 16 years old) or because it seemed to function normally. This type of reasoning can be problematic, as studies on incident reports in the Netherlands and the United Kingdom have shown that practitioners tend to underestimate immediate safety threats if the child does not have any (behavioral) problems or does not show any signs of abuse (Trench and Griffiths 2014;Health and Youth Care Inspectorate 2016). Further, aspects related to the capacities of caregivers are also often mentioned by the experts as to why the child is not in immediate danger. Especially caregivers who are willing to change their behavior are mentioned often by the experts (e.g., Vignette 16). This may also be problematic, as risk assessment research showed that risk factors have a larger impact on child outcomes than protective factors (Luthar and Goldstein 2004;Miller et al. 1999;Shaw 2008a, 2008b;Van der Put et al. 2016). Protective capacities of caregivers may not (always) be able to mitigate immediate safety threats. Thus, even though aspects related to the child and caregiver's capacities are often measured with safety assessment instruments (Vial et al. 2020), it is debatable whether these aspects should be assessed in this manner in safety assessments. Future research should specifically examine the impact of child characteristics and caregiver capacities on the quality of safety assessments.
An underlying assumption of this study is that a group decision is better than an individual decision, which can be criticized. In this study, the final decision of the expert panel was very often the same as the final decision of the majority of the individual experts. However, in some panels, there were experts who held a strong opinion, which had a large impact on the final decision of the panel. In Vignette 20, for example, three experts decided that the child was in immediate danger in their individual assessment. However, the final decision of the expert panel was that the child was safe, which was in line with the decision of only one expert. Noticeable was that particularly the experts who worked at the domestic violence and child maltreatment hotline crisis services had a large impact on the final decisions of the expert panels. In the discussion of some vignettes, it was even noticed that the other experts seemed to avoid a discussion, as the crisis services experts were seen as an authority on the subject, even though all panel members had dealt with the safety of children on a daily basis. Therefore, not all vignettes were discussed as extensively as would be desirable. That the crisis services professionals were seen as an authority could also negatively influence decision making in practice, as their authority could undermine the views of other professionals working on a case. This is especially problematic because the crisis services also provide consultation to anyone worried about a child. Important to note is that the experts in the panels worked at different agencies, which do not use exactly the same definitions of immediate child safety. This was most apparent for the experts working at the crisis services. There, the time that has passed since the last incident has a large impact on decisions, as this period is also an important aspect of the assessment instrument that is normally used by experts working at the crisis services. Additionally, in the crisis services instrument, a distinction is made between immediate safety problems and chronic safety problems, which became apparent in the explanations of the experts working at the crisis services (e.g., "Child danger is chronic but not immediate"). On the other hand, the time that has passed since the last incident and the chronicity of the safety problems was much less relevant for experts working at other agencies. Not every panel did have an expert working at the crisis services, and this may have caused differences between the final panel decisions. In future research, it would be interesting to use more homogeneous expert panels and to compare how these professionals with different backgrounds assess child safety.
It is important to mention several limitations of this study. As this is a vignette study, the professionals do not need to act on the decisions they reached. In practice, stating that a child is in immediate danger actually means that the professional should come into immediate action and safeguard the child. Given the large impact this decision has on a child, a professional could in reality be more reluctant to state that a child is in immediate danger. This effect is supported by the fact that the children described in the vignettes that were used in this study were often deemed to be in immediate danger, whereas in practice, these same children were not deemed to be in immediate danger as much, as we varied the severity of the cases. Additionally, for many vignettes, multiple immediate safety threats were identified by de practitioners, whereas in practice, it is rare that multiple immediate safety threats are deemed to be present in a single family. The practitioners who assessed the vignettes in this study may not have taken into account that-in reality-a child needs to be safeguarded immediately whenever a safety threat is assessed as present, even though this was described in the questionnaire.
Another limitation of a vignette study is the rather low level of ecological validity. It was, for example, not possible for the participants to obtain more information if they felt that they needed more information to decide on the child's safety. Future research should try to study the concurrent validity of an instrument for cases that are actually being handled in practice. Additionally, an extensive investigation of the immediate child safety established by a multidisciplinary team of experts, such as a pediatrician, a psychologist, a social worker, etc., using multiple sources of information on the child and its living environment should be used as measure of immediate safety in future research. For this type of research, ethical limitations should be taken into account, as a comprehensive investigation is needed for children who are in immediate danger, but also for children who are not in danger.
A final limitation is that the majority of the experts use an instrument or a structured method on a daily basis to assess children's immediate safety. Even though the experts did not use the instruments they are familiar with in the current study, their conclusions could have been influenced by these instruments. One expert working at the hotline services even explicitly disclosed all criteria described in the instrument that this expert was very familiar with. For further research, it would be interesting to compare the outcomes of these different safety assessment methods for the same cases, even though these methods have not been validated yet.
Despite these limitations, this study gives important indications on how the ARIJ safety assessment instrument needs to be improved. Some of the immediate safety threats identified by the experts should be added to the instrument: threats caused by the child's behavior, and threats caused by other family members (e.g., a brother). Adding these threats to the instrument will most likely improve its validity and help to prevent cases where professionals overlook these immediate safety threats in their assessments.
This study also shows that there is still much room for improvement of the assessments performed in practice. For instance, professionals could be better informed on how safety assessment differs from risk assessment. The results showed that even very experienced professionals struggled to keep these two assessment types apart. Additionally, it is important to align different agency types when it comes to the definitions and operationalizations of child safety that are used in daily practice across agencies. Between expert panel members, there were large discrepancies in immediate child safety definitions. Much more work is required to achieve more unity on child safety decisions.
Finally, the decision on a child's safety is only the first in line of many decisions that practitioners need to make in order to safeguard a child. Reliable and valid decisions on children's safety need to be followed by effective and appropriate safety measures described in a safety plan. Further research should study whether using an instrument improves the development of an effective safety plan.

Conclusions
The current study was the first to examine the concurrent validity of a safety assessment instrument by comparing its outcomes to another measure of the immediate child safety. This type of research is essential to determine the quality of safety assessment instruments and the accuracy of decisions that are made with such an instrument. The decisions made with the ARIJ safety assessment instrument concurred for a small majority with the expert panel decisions. The results provide important indications on how the instrument can be improved, so that the likelihood of professionals missing relevant threats in their assessments reduces. However, deciding on the presence of immediate safety threats remains a clinical decision that is susceptible to bias. The ARIJ safety assessment instrument helps to structure this decision, but merely implementing an instrument such as the ARIJ is not sufficient. Many steps need to be taken to achieve more consensus in safety decision making. Training and educating professionals on how to thoroughly and properly perform a safety assessment and conduct an interview is highly needed. Also important are adequate supervision and the realization of safe work environments.

Acknowledgments:
We thank all the professionals that participated in this study. Additionally, we would like to thank Franziska Yasrebi-de Kom and Isidora Stolwijk for coding all the safety assessments.

Conflicts of Interest:
Annemiek Vial declares that she has no conflict of interest. Mark Assink, Geert Jan Stams, and Claudia van der Put were involved in the development of the ARIJ Safety assessment instrument. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results. Overall safety decision: Immediate danger (100%) The child is being hit Domestic violence Father has substance abuse problems Father is absent Mother is incapable to protect the child Psychological violence towards the children The child has behavioral problems The child runs away and without anyone knowing where she is The child has no connection to peers The child burdened with adult problems The child is wary and aggressive The child stabbed a peer with a scissor The child does not accept authority The child is young Parents ask too much from the child Chronic unsafety     important to remain calm and to not raise your voice during one of these tantrums. Taking C. outside for a walk may sometimes help calming him down. According to the parents, timing is important, because C. may run away. F. (C.'s sister) suffers a lot from the disrupting behavior of her brother. It makes her sad, and she regularly expresses her sadness. According to herself, she fights a lot with C, and she is regularly confronted with C.'s temper tantrums. Because C's disrupting behavior also happens at night, she regularly sleeps in the hall at her father's place.

Appendix A
Throughout the years, the parents have begun several parenting programs to improve their parenting skills for handling C's behavior. However, they have repeatedly decided to drop out of these programs. For example, both parents have terminated parental guidance and psycho-education on their own initiative and without discussing her reasons for dropping out with professionals. The parents felt that the program was too intense, and that they were too busy to follow the program. Consequently, interventions have not been successful, and the behavior of C remains problematic.
Mother says that she is worn out. Both parents indicate that they are suffering from C.'s behavior and that they have multiple parenting questions. They don't know how to successfully cope with C. and feel helpless. Father says he is experiencing depressed feelings.
Since the divorce two years ago, the communication between the parents has been difficult. They strongly distrust each other. The parents hold onto old grudges against each other, for instance regarding the (ex-)in-laws, causing heated fights. Recently, neighbors have reported verbal domestic violence to the emergency hotline. It seems that parents lose sight of C.'s and F.'s needs because of these conflicts. In addition, the parents do not agree on the upbringing of C., and they seem negatively influenced by the behavior of their son. Because of C.'s disrupting behavior, both parents have rather little attention for F.
Mother has a sister and only one good friend. The maternal grandfather and grandmother live in close proximity to the mother of C. and F., and both try to be supportive. The father of C. and F. receives somewhat support from his parents and two sisters. C. and F. sometimes stay with their paternal aunt and C. sometimes plays with his nephews. F. has two best friends, and she sometimes stays over at their places.